8 Graph-includes creates a graph of dependencies between source-files
9 and/or groups of source-files, with an emphasis on getting readable
10 and usable graphs even for large projects.
12 Usability of the dependency graphs are currently improved by:
13 - customizable grouping of several source files into a single node
14 - transitive reduction of the graph
16 It currently supports graphing the C/C++ #include relationship, using
23 This tool has evolved from a 50-line script written for a particular
24 project (Battle for Wesnoth). Although it has been generalized much,
25 there are still somewhat ad-hoc heuristics harcoded here and there,
26 especially in the default project class (see class descriptions below).
28 Although work is under way to make this tool as generic as possible,
29 work still has to be done at all levels. It is still under
30 development, and may not suit your needs (at least, not yet).
33 INSTALLATION INSTRUCTIONS
34 -------------------------
36 Like standard perl packages. Eg:
38 $ perl Makefile.PL prefix=/usr/local
44 New versions can be found at http://ydirson.free.fr/soft/graph-includes/.
46 A darcs repository is available at
47 http://ydirson.free.fr/soft/graph-includes/darcs/. Note however, that
48 I "push" to it using plain FTP mirroring, which is not an official way
49 of pushing a darcs repository, so only time will tell whether that
50 completely works, but it seems to be at least somewhat functional.
53 HOW TO TAKE ADVANTAGE OF THIS TOOL TO IMPROVE YOUR CODE
54 -------------------------------------------------------
56 Graph-includes is only a supporting tool for a refactoring effort. It
57 can be useful in helping a developper to see where he should put its
58 efforts in order to get cleaner and saner dependencies in a project.
60 In this respect, it is quite similar to a microscope: if you don't
61 look at the right place, you won't see anything interesting. But if
62 you start with a small magnifying factor, you can locate regions of
63 interest, and then zoom on those to get to the interesting stuff.
66 1. on the spirit of dependency cleanup
68 1.1. first look at a dependency graph
70 When developping a project of medium size (we'll talk mostly C/C++
71 here, but that will apply to most languages), expecially with many
72 people writing code, it is quite easy to get to a point where each
73 file (out of several tens of hundreds of files) depends on too many
76 The most obvious relation is the #include one. The more #includes a
77 file has, the more time it takes to build - especially when those
78 included files #include themselves a bunch of other files. For a
79 project of about 100 files, just producing a graph of all those files,
80 with arrows representing the #include dependencies, will usually give
81 an unreadable graph, and will show very little about possible
82 improvements. This is why this tool has been written: to make it
83 possible to get to the useful information hidden in this unusable
89 A less obvious relation appears more clearly when you consider not
90 files by themselves, but the set of files made of an interface and the
91 matching implementation. Let's consider two such sets, made of the
92 files a.h, a.c, b.h, b.c. a.c includes b.h, and b.c includes a.h, and
93 each implementation, following good practice, includes its own
94 interface. A simple dependency graph as described above would show
104 If OTOH we represent those sets of files instead of the files
105 themselves, we now have something like:
109 This shows much more clearly that those two modules are intrinsicately
110 related. In many cases, this will express that whenever you use the
111 a.o file resulting from the build of a.c, you'll need to link b.o as
112 well, and vice versa. This will be the case when each file uses the
113 headers to get function prototypes. Then hunting for abusive
114 dependencies will allow, for example, to select with finer grain which
115 of those modules of code will need to go into which executable, thus
116 producing lighter executables.
118 Note that such a reciprocal dependency may not be pathological. Many
119 projects tend to split a large module into several files for clarity,
120 even when those files are inter-dependant. It is much often in cycles
121 of unidirectional dependencies that we find dependencies that should
124 In other cases, headers would just have been used to access a type
125 definition from b.h, and the associated b.o would not be needed. In
126 such cases, you may want to consider splitting such "low-level"
127 declarations into their own headers. Not only this would simplify the
128 graph, allowing you to get a better grasp on your source code, but it
129 can also lead to faster compilations, since each file will be able
130 include less unrelated definitions.
133 2. possible strategies to help locating abusive dependencies
142 See "graph-includes --help".
146 The default output is a .dot file on standard output, suitable for
147 formatting by dot (from the graphviz toolkit), or interactive editing
148 by dotty (also from graphviz).
150 You can ask graph-includes to do the formatting for you, eg. using
151 "--output=<file>.<suffix>". It will run "dot -T<suffix>", so that
152 "--output=mydeps.ps" or "--output=mydeps.jpg" will have the expected
153 behaviour. If your suffix is not known to dot, it will complain
154 itself, so asking for --output=foo.bar will cause a message like:
156 Warning: language bar not recognized, use one of: canon cmap cmapx dia dot fig gd gd2 gif hpgl imap ismap jpeg jpg mif mp pcl pic plain plain-ext png ps ps2 svg svgz vrml vtx wbmp xdot
158 If you intend to print the result on paper, the default layout will
159 likely be too large. You can use --paper=a4 to select parameters that
160 will produce a smaller graph and spilt it into pages. This flag also
161 changes the default output format to postscript. Be warned that dot
162 may not honor the page-splitting parameter for all output formats.
164 Since the transitive reduction can take time, you may like the
165 --verbose switch, which will show a progress bar.
170 The files to be analyzed are given as non-option arguments, and are
171 typically generated by a "find" command. Eg:
173 $ graph-includes `find src -name '*.[ch]'`
175 How dependencies get extracted from the source files depend on the
176 language used in those files. You can specify it with the --language
177 flag. Default value is C (which should also be used for other
178 languages based on the C preprocessor, like C++). There is also some
179 partial support for perl - see comments in
180 lib/graphincludes/extractor/perl.pm for more details.
182 In order to tell the #include resolver where to look for included
183 files, you can use the cpp-like -I (aka. --Include) flag. Eg:
185 $ graph-includes -I src `find src -name '*.[ch]'`
187 Dependencies not found in the project (ie. files appearing in #include
188 but not given on command-line) are listed as "not found" in the
189 graph-includes.report file for diagnostics purposes, unless they are
190 found in a system directory. System directories are declared in a
191 similar fashion, with the --sysInclude option. Eg:
193 $ graph-includes -I src -sysI /usr/include `find src -name '*.[ch]'`
195 To avoid having useless information on the graph,
196 --prefixstrip=<prefix> can be used to avoid repeating a given prefix
197 in all node labels. Typically:
199 $ graph-includes --prefixstrip=src/ `find src -name '*.[ch]'`
204 Files will be grouped in a hierarchy of groups, level 0 groups
205 typically containing just one file. Groups are defined by the
206 selected project class, selected by the --class=<class> option. See
207 below for descriptions of the project classes available by default,
208 and for instructions to write customized project classes.
210 The range of group levels to be drawn is selected with
211 --group=<min>-<max>, which defaults to 1-1. Eg, for class "default",
212 whose group levels are defined as:
214 0: one file per group
215 1: what/ever.* go into a "what/ever" group (usually interface + implementation)
216 2: what/* go into a "what" group, supposing directories denote modules of some sort
218 Group levels below "min" or above "max" are not displayed as nodes.
219 Groups of level "min" are drawn as nodes of the graph. If "max" is
220 strictly greater than "min", then groups of levels "min+1" through
221 "max" are drawn as box clusters containing lower-level groups.
223 Since such a way of grouping nodes will not improve the readability in
224 projects where the inter-groups dependencies have not been cleaned up
225 yet, higher-level groups can instead be colored, using a class-defined
226 color scheme, possibly modified by "--color <n>:<label>=<color>[,<label>=<color>...]"
227 options, where <n> is the group level in which the group name <label> will
228 receive a background of the specified color, which can be defined
229 either by a named X11 color (like "blue" or "palegreen"), or by a RGB
230 color using the standard X11 "#RRGGBB" syntax.
233 For those wanting to see what edges the transitive reduction dropped,
234 the --showdropped will add them to the graph in a different color. Be
235 prepared for your computer room to get a noticeable temperature
236 increase for anything else than a small set of files with only few
239 OTOH, --focus=<node-label> will do the same, but only for the
240 dependencies of a specified node. That should prevent the nasty
241 effects described above, and will be useful for various purposes,
242 including debugging the transitive reducer. The node-label refers to
243 a node in the lowest group-level drawn, ie. the "min" argument to
246 People still getting cold may also like to circumvent the
247 transitive-reduction engine completely, using --alldeps. The author
248 assumes no responsibility for losses of mental health induced by
249 trying to make any serious use of the resulting graph.
252 EXISTING PROJECT CLASSES
253 ------------------------
257 As implied by its name, it is the one which will be used unless you
258 use the --class option. Although it is the default one, it may still
259 be quite rough at the moment, still using some ad-hoc heuristics, and
260 will be improved in the near future. Here are its main
263 - looks at C-style #include lines
264 - creates level-1 groups for all files sharing the same path and
265 (disregarding the suffix) filename. Eg, files "foo/bar.c" and
266 "foo/bar.h" would be grouped in a "foo/bar" level-1 group.
267 In clear, it won't connect include files if they are all located
268 in an include/ directory.
269 - creates by-directory level-2 groups. Eg. in the above example, a
270 group "foo" would exist at level-2.
273 2. class "uniqueincludes"
275 Built on top of the default class, it is meant for projects where file
276 names are kept unique across all directories. If the ad-hoc #include
277 processing of the default class does not suit your project, it is the
278 only out-of-the-box alternative available today. Here are its main
281 - provides a single grouping level based on filenames, disregarding
282 all the directory hierarchy.
284 Note that it is not meant for general use, as:
286 - it will group any files with the same name in the same level-0
287 group, possibly causing confusion.
288 - it does not make any directory name appear in the node names
291 DEFINING YOUR OWN PROJECT CLASS
292 -------------------------------
294 See graphincludes::project::wesnoth in the examples/ dir as an example.
296 Keep in mind that the API is not frozen yet, and will probably be
297 overhauled more than once before an official API gets blessed.
305 Maelstrom-3.0.6$ graph-includes -v -sysI /usr/include -sysI /usr/include/SDL -I . -I ./netlogic -I ./maclib -I ./screenlib --prefixstrip ./ -o deps.ps $(find . -name '*.[ch]' -o -name '*.cpp')
307 [ a rather clean dependency graph ]
309 wesnoth-0.9.1$ graph-includes -v --class wesnoth --group 1-1 -sysI /usr/include/c++/3.3 -sysI /usr/include -sysI /usr/include/SDL --prefixstrip src/ -I src -o deps.ps `find src -name '*.[ch]pp'`)
311 [ more work has to be put in the wesnoth example class,
312 especially since the graph-includes-0.7 layout change ]
315 Examples only here as a reminder to write proper project classes for them:
317 qemu-0.7.0$ graph-includes -v -sysI /usr/include/ -sysI /usr/include/SDL $(find -name CVS -prune -o -type d -printf '-I %p\n') -o deps.ps $(find . -name '*.[ch]')
319 [ needs supporting features for multi-arch source trees ]
321 mesag-6.2.1$ graph-includes -o -sysI /usr/include -I ./include -I ./include/GL -I ./src/mesa -I ./src/mesa/main -I ./src/glu/sgi/include -I ./src/glu/sgi/libnurbs/internals -I ./src/mesa/glapi -o deps.ps $(find . -name '*.[ch]')
323 [ needs proper file-grouping ]
329 - this script only handles explicitely-declared dependencies, it
330 won't detect it if eg. a prototype cut'n'paste was used instead of
331 using the correct #include, but you shouldn't do that anyway :)
337 I finally found a couple of tools out there, from which I may borrow
338 ideas some day. I'd be happy to hear about more of them.
340 - cinclude2dot, originally from Darxus
341 (http://www.chaosreigns.com/code/cinclude2dot/), then taken over by
342 F. Flourish (http://www.flourish.org/cinclude2dot/) is a GPL
343 C/C++-only tool, which apparently has support for grouping, but not
344 for transitive reduction. Should I have searched better, and found it
345 a couple of months ago, maybe graph-includes would have never been
348 - http://www.tarind.com/depgraph.html has a dependency grapher for
349 python, without transitive reduction as well. It does however allow
350 customisation of project classes, somewhat similar to graph-includes.
352 - OptimalAdvisor (http://javacentral.compuware.com/pasta/) is a
353 refactoring tool, which goes far beyond simple dependency analysis,
354 but is non-free/libre/open-source (also they have a
355 functionally-limited free/gratis edition) and seems to support only
358 - codeproject.com has some VisualStudio(tm) plugins targetting C++,
359 which I cannot test, but appear to scale badly for large projects
360 (http://www.codeproject.com/csharp/DependencyGraph.asp).
367 - continue merging the verbose/debug behaviour into the global report file.
368 - change case of class names when the API gets stabilized
369 - allow to associate attributes to files (eg. an ARCH attribute for
370 multi-architecture trees, like kernels, development tools and emulators)
371 - modularization (finish the restructuring into a cleaner and more modular design)
372 + rework the recording of edges to make them apply to files, not to graph nodes,
373 since more advanced features will need more flexibility
374 - allow passing options to modules (-O param=value ?)
375 - graph output syntax (allow to generate tulip graphs)
376 - separate styling from project classes
377 - allow to define several views in a project-class, several of which
378 can be generated by default.
379 - find out whether we can declare protocols/pure-virtual-classes in
380 some way, to cleanup the class graph
381 - generalize --prefix-strip
382 - give consistent access to all commonly-needed features through
383 command-line and class customization
384 - graph-includes tool
385 + find the accessory classes as easily as possible (like bugzilla ?)
386 - better robustness to incorrect arguments (eg. --group 1:2)
387 - automate --help production (see Pod::Usage ?)
388 + multi-sheet paper support may be broken
389 - use an existing source of paper formats (libpaper, LC_PAPER, whatever)
390 - maybe use graphviz' tred(1) to check our transitive reductions.
391 - some autodetection of the language to use based on filenames ?
392 - provide an initial list of system directories to avoid repeating them (ask compiler)
394 - provide an interactive tool to help understanding a project's
395 structure. Maybe with graphviz' lefty, or as a specialized tulip
398 + allow -I syntax for programs using eg. -I. from source subdirectory
399 + behave as expected wrt leading "./", use File::Spec for more portability
400 - consider using Cwd::realpath or so, for correct "../" handling
401 - write other extractors (java, python, ...)
403 - some support for CPP symbol conditionals (mostly #ifdef), perhaps coupling
405 - write an openc++-based dependency extractor
406 - extract more fine-grained dependency (depending on a header does
407 not necessarily imply depending on code)
408 - handle (warn about) the case where the declarations for a given
409 implementation file are scattered in more than one header
410 - detect undeclared dependencies (eg. manually inserted prototypes)
411 - check necessity of declared includes
413 - improve the perl extractor
415 - proper way to define include paths in project class
416 - make default project-class consider multiple levels of directories
417 as group levels, but only if they (consistently ?) have multiple
419 - write a linux-kernel class and others as examples :)
420 - provide a simple hash-based filelabel implementation
421 - provide tools for automatic grouping (eg. using cycles, or
422 selected external deps, or from leaves)
424 + allow coloring other things than just level 2
425 - generalize the special_edge() mechanism (use a hash of edge attributes ?)
426 - allow different node shapes when mixing high-level nodes with
427 lower-level ones through the default singleton groups
428 (special_node mechanism similar to the special_edge one ?)
429 + optionally show labels (using attributes ?) or count for files
430 (subnodes) in a node and color arcs according to them
431 - optionally show external deps (deps on files not on command-line)
432 - limit graph to one or more given group(s) of files (specified by <level>:<label>)
433 - draw cycles in a given color
434 - draw a specific path
435 - allow setting fg color for a specific group level
436 - provide automatic coloring schemes
437 - color intra-group edges with the same color as nodes (post-processing ?)
438 - allow to request drawing of who in a high-level node points to
439 another node (ie. violates some constraint)
440 - propagate excuses in some way when they are dropped by the transitive reducer
441 - investigate candidate tools for hyperbolic layout ?
443 - write more documentation
444 - lift the doc to docbook
447 - ensure that all provided non-abstract classes are self-contained
453 - --showdropped mode draws too many edges as dropped (ie. does not
454 consider marked edges as dropped when deciding whether to consider
455 subsequent edges as dropped)
456 - when showing only 3-3, colors from level 2 get propagated to level-3 groups
457 - transitive reduction may not be complete, some more edges could
458 possibly be dropped - wesnoth tree at 2005-03-25 exhibits the problem
459 with the "display -> builder -> animated -> image" path
465 Copyright (c) 2005 Yann Dirson <ydirson@altern.org>
467 This program is free software; you can redistribute it and/or modify
468 it under the terms of the GNU General Public License, version 2,
469 as published by the Free Software Foundation.
471 This program is distributed in the hope that it will be useful,
472 but WITHOUT ANY WARRANTY; without even the implied warranty of
473 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
474 GNU General Public License for more details.