Lin Clark

Exploring Drupal module interactions with pidgin UML

Submitted by Lin on

The essence of Drupal is complex interactions between modules. These interactions currently take the form of hooks. The most difficult (and most useful) thing for a Drupal developer to understand is how different sets of modules interact using these hooks and in which order.

Even though they are the heart of our system, we don't have a good way of documenting these interactions. While the API docs do contain this information for some hooks (eg, hook_node_load), it's hard to grasp the overall picture from reading docblocks.

This causes a bigger problem than just the learning curve. It also makes it harder for us to discuss changes in APIs and architecture. We have been seeing this in the Drupal 8 development cycle—we are considering major overhauls of systems, and it's unclear to many what the impacts of these decisions are. The only people who can intelligently participate are those who are intensely involved in the architecting process or those who take a considerable amount of time to step through the code. As sun points out in Drupal 8: The Path Forward, this communication challenge remains unresolved.

We need to find a way to communicate about these interactions which are so central to what we do. I have recently starting working with a visualization technique that helps me understand these interactions better, and I would be interested to hear whether others find it helpful as well.

It is based on a method that a lot of people outside of our community use to design and document systems, called UML. While most UML diagram types are tailored for Object Oriented design, I think that (with a few tweaks) the diagrams could help us construct visual models of hook interactions.

I call this tweaked version "pidgin UML" because, like a pidgin language, the important thing isn't whether it follows grammatical rules but instead that the language helps people understand each other.

I'll be talking about a particular kind of UML diagram, the sequence diagram, which shows how different parts of a system use functions to talk to each other and the order of those function calls and responses.

For example, a sequence diagram of a meal at a restaurant might look like this:

Restaurant interaction as a sequence diagram.

Sequence diagraming with pidgin UML

While some people think that for UML to be useful, you have to be using the OO paradigm with a language like Java, I couldn't disagree more.

I recently started working on such diagrams for the microdata module. For example, this diagram shows how an array of microdata attributes is attached to a node when the node is loaded.

Microdata is attached to the node object when node calls entity_load().

I also disagree that UML diagrams are too hard for non-CS majors to understand. There are just a few conventions you need to know before you can read them.

Conventions

These are the conventions I used for the diagram.

Participant

This is a distinct part of a system which interacts with other parts. In many diagrams, this would be an object. In my diagrams, I've chosen to go with the module that contains the relevant function. In addition, if there is a class which is involved in a significant way, such as the NodeController, I break it out as its own participant.

Function call

The participant calls a function. This function can belong to another participant (above) or be one of the participant's own functions (below). The arrow points to a white bar which represents the function and extends downward until the function either returns or terminates.

Return (optional)

Return values can optionally be noted as dashed arrows at the end of a function's bar.

Loop

In general, you don't want to get too detailed about the procedural logic in your diagram. However, sometimes including some detail is necessarily to clearly communicate how an interaction happens. In the example diagram, I include a for loop to demonstrate that this process happens for each entity, not just once for the bundle.

Reference
A reference allows you to just give a quick summary of what happens in a portion of the code flow and provide more detail in another diagram. This can make the diagram clearer and also means that you can refer to the detail diagram from multiple places.
Hook invocation (pidgin)
Not surprisingly, there is no standard notation for hook invocations. In my diagrams, I chose to modify the notation for a function call. The hook name is shown in italics and then a grey bar is used to show the duration of the hook invocation. Any hook implementations that are important for the diagram are shown as function calls coming off of the hook's bar.

Benefits

I'm not the type to advocate for something because some priesthood has declared it The Right Way or because it has the right pedigree. Proposed changes in Drupal (both code and process) should be judged on their usefulness.

I think that this technique would introduce a number of benefits, particularly for core development.

Grokability

Before I diagramed this, I couldn't tell you off the top of my head which hooks were involved in the interaction. This set of diagrams gives a quick reminder.

It also makes it possible to give other module developers who want to introduce microdata integration a better idea of how their module hooks in. For example, for the node load interaction, the two hooks that modules integrating with microdata need to be aware of are shown below.

A diagram showing where an implementing module's hooks are invoked.

Scalability

The Rethinking WSSCI thread is one example of the communication problem we're talking about here. There were a lot of people (myself included) who simply didn't have the background knowledge necessary to intelligently engage in the discussion. If it takes 6 hours to step through the code, there will be a limited number of people who have the time to prepare themselves in order to usefully add to the discussion.

If diagrams could be used and if those diagrams made appropriate use of reference diagrams, then it would be easier to point to the changes in API interactions and solicit quality feedback.

Sprintability (!)

Creating (and occasionally maintaining) these diagrams is quite a bit of work. Our initiative leads are already giving 5,000,000%. We can't put another task on their shoulders.

The nice thing about sequence diagrams for existing systems is that the architect doesn't have to do it him/herself. It's something that can just as easily be done by a new contributor... and it is actually a good way to on-board that contributor because he/she will have a vastly improved understanding of the system afterwards.

I believe that in many ways, it's actually a better sprint activity than a lot of the things we sprint on because it can be a more interactive process.

Conclusion

Module interactions are the most important and most difficult part of Drupal to understand. Using a lightweight, slightly tweaked version of the UML sequence diagram has helped me understand these interactions more. I think that they could also help us collectively understand and communicate about these interactions better.

I'm interested to hear whether others agree that this could be a useful tool for us as we develop and refactor. Please also chime in if you have any other tools or tips for how to make sense of and communicate about these interactions.

Comments

UML tools?

What tool(s) did you use to create the UML diagrams?  Could you provide the source files for the diagrams in the post?

Omnigraffle

I used Omnigraffle, which is a pretty primitive way to create the diagram. If anyone has any suggestions for better tools, I'm very interested to hear them.

I've posted the .graffle files for the first and the second

Better tool for sequence diagrams

Nice explanation, sequence diagrams are indeed a very productive way to get up to speed with (parts of) a system. Unfortunately they take a lot of time to draw if you're not using the right tool.

I'd be happy to sponsor your efforts with a copy of Trace Modeler, if you're interested. Should enable you to crank out those diagrams at full speed :)

Tools—manual or power? . . .

Lin--

Do you have any updates on your progress in this effort?

What FGM says below piques my interest as D8 creeps closer . . . namely, an automatic means to "render" Drupal's code into legible diagrams.

Then again, I keep seeing free manual tools show up: Diagramly and Graphity both have prefab UML symbols and even BPMN.

I guess I'm interested in seeing a de facto standard laid down for Drupal diagramming, either manual or automatic. Both for the coders and a general overview of core and module functions for site builders (an often overlooked audience). Do you have any insight or goals on any of this too?

Thanks.

 

 

I don't know enough about the

I don't know enough about the state of automated UML diagramming tools to say conclusively, but I don't believe that it would be possible to automatically render Drupal into sequence diagrams using existing tools at this point.

As far as manual diagramming, I've started using more UML diagrams to communicate with other developers, though usually activity diagrams more than sequence diagrams. I construct these manually.

I am not championing this as a Drupal standard practice, though... my focus with Drupal core development and education is genearlly around a different set of technologies, and I don't have the bandwidth to take on such a large cultural change and educational project. I'd be happy to hear if someone else takes it on, though.

Good timing

Great idea. I was just digging through the theme/render system yesterday trying to figure out the lifecycle of a node as it's being rendered. More specifically how the render array is created/modified and relates to template variables.  Needless to say, I didn't get very far.

Are you using a php debugger to step through the function calls?

Excellent idea

This is one of the best ideas I've seen since long ago. I'm so fed up reading tutorials with simple code snippets you're supposed to understand as is, without any conceptual vision, but just recipe example.

I'm sure modeling some of the most used Core API processes could make the number of contributors skyrocket. A diagram describing the Views API leaves me dreaming ...

I'm new to drupal and could

I'm new to drupal and could not believe that there is not a single UML or sequence diagriam describing the way it works. There is no other way to understand how things really work but looking at a nice diagram. Reading the code and documentation is a waste of time if you don't write diagrams at the same time anyway, because you won't be able to remember the architecture for more than a few hours/minutes.

So, where is the diagram database ?

Diagrams

There are already a few diagrams on drupal.org, some of them which I did myself back in 2005 (hint, search for images matching "grokking drupal"), but luckily a good number more recent than this.

What they all share, however, is a quick obsolescence, because Drupal is always changing. So any effort in that direction has to be in automatic generation, just like api.drupal.org for code-level documentation.

Various graph generation solutions exist:

- DamZ' entitygraph Drush plugin provides a high level UML diagram of entity relationships. It's entity only, but rather rich:

  http://drupal.org/sandbox/damz/1438582

- my Class Grapher Drush plugin provides plotting of classes, interfaces and their inheritance and implementation relationships:

  http://drupal.org/sandbox/fgm/1553284 or, better, http://git.osinet.eu/?p=php_lib.git

- Doxygen provides inheritance, implementation, and caller/callee graphs if you have GraphViz enabled.

None of these is currently deployed on Drupal.org, but this is likely to be only a matter of time, since the problem grows only more acute with Drupal increase in complexity.

Note that all these tools focus on class/interface relationships, not on sequence as Lin mentions, but at some point, having formal declarations for hooks - if these remain in D8 and later - probably means another possibilty for auto generation too.

Great Idea

This is a great idea.

When you know you know, when you don't you don't.

I find In Drupal there can be a gap (chasm) between the two which is hard to cross, often due to a difficutlty in conceptualizing what the hell is going on.

After a couple of years with Drupal, I am at the point where I get the basics of the architecture and of writing  basic, isolated modules and glue code but find the next step of leveraging core and major contrib modules via api's hampered by a difficulty in grasping the big picture.

Anything that can help the penny drop would be fantastic and IMO these kind of initiatives would really help me and others in my postion. 

I suppose in practice the issue could be that most modules have very basic documentation due to resources and to get the diagrams produced would be tricky. This of course does not mean it should be done.

Maybe a bit ambitious but are there any tools that can auto-generate these diagrams ! Now that would be good.

Rich

Excellent

Thank you very much for this introduction. I'm gonna try it right now!