README

   1
   2                 graph-includes toolkit
   3                 ======================
   4
   5 IN SHORT
   6 --------
   7
   8 Graph-includes creates a graph of dependencies between source-files
   9 and/or groups of source-files, with an emphasis on getting readable
  10 and usable graphs even for large projects.
  11
  12 Usability of the dependency graphs are currently improved by:
  13 - customizable grouping of several source files into a single node
  14 - transitive reduction of the graph
  15
  16 It currently supports graphing the C/C++ #include relationship, using
  17 graphviz.
  18
  19
  20 IMPORTANT NOTICE
  21 ----------------
  22
  23 This tool has evolved from a 50-line script written for a particular
  24 project (Battle for Wesnoth).  Although it has been generalized much,
  25 there are still somewhat ad-hoc heuristics harcoded here and there,
  26 especially in the default project class (see class descriptions below).
  27
  28 Although work is under way to make this tool as generic as possible,
  29 work still has to be done at all levels.  It is still under
  30 development, and may not suit your needs (at least, not yet).
  31
  32
  33 INSTALLATION INSTRUCTIONS
  34 -------------------------
  35
  36 Like standard perl packages.  Eg:
  37
  38 $ perl Makefile.PL prefix=/usr/local
  39 $ make
  40 $ su
  41 # make install
  42
  43
  44 New versions can be found at http://ydirson.free.fr/soft/graph-includes/.
  45
  46
  47 HOW TO TAKE ADVANTAGE OF THIS TOOL TO IMPROVE YOUR CODE
  48 -------------------------------------------------------
  49
  50 1. on the spirit of dependency cleanup
  51
  52 When developping a project of medium size (we'll talk mostly C/C++
  53 here, but that will apply to most languages), expecially with many
  54 people writing code, it is quite easy to get to a point where each
  55 file (out of several tens of hundreds of files) depends on too many
  56 other files.
  57
  58 The most obvious relation is the #include one.  The more #includes a
  59 file has, the more time it takes to build - especially when those
  60 included files #include themselves a bunch of other files.  For a
  61 project of about 100 files, just producing a graph of all those files,
  62 with arrows representing the #include dependencies, will usually give
  63 an unreadable graph, and will show very little about possible
  64 improvements.
  65
  66 A less obvious relation appears more clearly when you consider not
  67 files by themselves, but the set of files made of an interface and the
  68 matching implementation.  Let's consider two such sets, made of the
  69 files a.h, a.c, b.h, b.c.  a.c includes b.h, and b.c includes a.h, and
  70 each implementation, following good practice, includes its own
  71 interface.  A simple dependency graph as described above would show
  72 such a graph:
  73
  74         a.c -> b.h
  75            \  /|
  76             \/
  77             /\
  78            /  \|
  79         b.c -> a.h
  80
  81 If OTOH we represent those sets of files instead of the files
  82 themselves, we now have something like:
  83
  84         a <--> b
  85
  86 This shows much more clearly that those two modules are intrinsicately
  87 related.  In many cases, this will express that whenever you use the
  88 a.o file resulting from the build of a.c, you'll need to link b.o as
  89 well, and vice versa.  This will be the case when each file uses the
  90 headers to get function prototypes.  Then hunting for abusive
  91 dependencies will allow, for example, to select with finer grain which
  92 of those modules of code will need to go into which executable, thus
  93 producing lighter executables.
  94
  95 In other cases, headers would just have been used to access a type
  96 definition from b.h, and the associated b.o would not be needed.  In
  97 such cases, you may want to consider splitting such "low-level"
  98 declarations into their own headers.  Not only this would simplify the
  99 graph, allowing you to get a better grasp on your source code, but it
 100 can also lead to faster compilations, since each file will be able
 101 include less unrelated definitions.
 102
 103
 104 2. possible strategies to help locating abusive dependencies
 105
 106 More to be written.
 107
 108
 109
 110 COMMAND-LINE USAGE
 111 ------------------
 112
 113 See "graph-includes --help".
 114
 115 1. output type
 116
 117 The default output is a .dot file on standard output, suitable for
 118 formatting by dot (from the graphviz toolkit), or interactive editing
 119 by dotty (also from graphviz).
 120
 121 You can ask graph-includes to do the formatting for you, eg. using
 122 "--output=<file>.<suffix>".  It will run "dot -T<suffix>", so that
 123 "--output=mydeps.ps" or "--output=mydeps.jpg" will have the expected
 124 behaviour.  If your suffix is not known to dot, it will complain
 125 itself, so asking for --output=foo.bar will cause a message like:
 126
 127 Warning: language bar not recognized, use one of: canon cmap cmapx dia dot fig gd gd2 gif hpgl imap ismap jpeg jpg mif mp pcl pic plain plain-ext png ps ps2 svg svgz vrml vtx wbmp xdot
 128
 129 If you intend to print the result on paper, the default layout will
 130 likely be too large.  You can use --paper=a4 to select parameters that
 131 will produce a smaller graph and spilt it into pages.  This flag also
 132 changes the default output format to postscript.  Be warned that dot
 133 may not honor the page-splitting parameter for all output formats.
 134
 135 Since the transitive reduction can take time, you may like the
 136 --verbose switch, which will show a progress bar.
 137
 138
 139 2. what to draw
 140
 141 The files to be analyzed are given as non-option arguments, and are
 142 typically generated by a "find" command.  Eg:
 143
 144         $ graph-includes `find src -name '*.[ch]'`
 145
 146 In order to tell the #include resolver where to look for included
 147 files, you can use the cpp-like -I flag.  Eg:
 148
 149         $ graph-includes -I src `find src -name '*.[ch]'`
 150
 151 To avoid having useless information on the graph,
 152 --prefixstrip=<prefix> can be used to avoid repeating a given prefix
 153 in all node labels.  Typically:
 154
 155         $ graph-includes --prefixstrip=src/ `find src -name '*.[ch]'`
 156
 157 Files will be grouped in a hierarchy of groups, level 0 groups
 158 typically containing just one file.  Groups are defined by the
 159 selected project class, selected by the --class=<class> option.  See
 160 below for descriptions of the project classes available by default,
 161 and for instructions to write customized project classes.
 162
 163 The range of group levels to be drawn is selected with
 164 --group=<min>-<max>, which defaults to 1-1.  Eg, for class "default",
 165 whose group levels are defined as:
 166
 167 0: one file per group
 168 1: what/ever.* go into a "what/ever" group (usually interface + implementation)
 169 2: what/* go into a "what" group, supposing directories denote modules of some sort
 170
 171 Group levels below "min" or above "max" are not displayed as nodes.
 172 Groups of level "min" are drawn as nodes of the graph.  If "max" is
 173 strictly greater than "min", then groups of levels "min+1" through
 174 "max" are drawn as box clusters containing lower-level groups.
 175
 176 Since such a way of grouping nodes will not improve the readability in
 177 projects where the inter-groups dependencies have not been cleaned up
 178 yet, higher-level groups can instead be colored, using a class-defined
 179 color scheme, possibly modified by "--color <n>:<label>=<color>[,<label>=<color>...]"
 180 options, where <n> is the group level in which the group name <label> will
 181 receive a background of the specified color, which can be defined
 182 either by a named X11 color (like "blue" or "palegreen"), or by a RGB
 183 color using the standard X11 "#RRGGBB" syntax.
 184
 185
 186 For those wanting to see what edges the transitive reduction dropped,
 187 the --showdropped will add them to the graph in a different color.  Be
 188 prepared for your computer room to get a noticeable temperature
 189 increase for anything else than a small set of files with only few
 190 dependencies.
 191
 192 OTOH, --focus=<node-label> will do the same, but only for the
 193 dependencies of a specified node.  That should prevent the nasty
 194 effects described above, and will be useful for various purposes,
 195 including debugging the transitive reducer.  The node-label refers to
 196 a node in the lowest group-level drawn, ie. the "min" argument to
 197 --group.
 198
 199 People still getting cold may also like to circumvent the
 200 transitive-reduction engine completely, using --alldeps.  The author
 201 assumes no responsibility for losses of mental health induced by
 202 trying to make any serious use of the resulting graph.
 203
 204
 205 EXISTING PROJECT CLASSES
 206 ------------------------
 207
 208 1. class "default"
 209
 210 As implied by its name, it is the one which will be used unless you
 211 use the --class option.  Although it is the default one, it may still
 212 be quite rough at the moment, still using some ad-hoc heuristics, and
 213 will be improved in the near future.  Here are its main
 214 characteristics:
 215
 216  - looks at C-style #include lines
 217  - creates level-1 groups for all files sharing the same path and
 218    (disregarding the suffix) filename.  Eg, files "foo/bar.c" and
 219    "foo/bar.h" would be grouped in a "foo/bar" level-1 group.
 220    In clear, it won't connect include files if they are all located
 221    in an include/ directory.
 222  - creates by-directory level-2 groups.  Eg. in the above example, a
 223    group "foo" would exist at level-2.
 224
 225
 226 2. class "uniqueincludes"
 227
 228 Built on top of the default class, it is meant for projects where file
 229 names are kept unique across all directories.  If the ad-hoc #include
 230 processing of the default class does not suit your project, it is the
 231 only out-of-the-box alternative available today.  Here are its main
 232 characteristics:
 233
 234  - provides a single grouping level based on filenames, disregarding
 235    all the directory hierarchy.
 236
 237 Note that it is not meant for general use, as:
 238
 239  - it will group any files with the same name in the same level-0
 240    group, possibly causing confusion.
 241  - it does not make any directory name appear in the node names
 242
 243
 244 DEFINING YOUR OWN PROJECT CLASS
 245 -------------------------------
 246
 247 See graphincludes::project::wesnoth in the examples/ dir as an example.
 248
 249 Keep in mind that the API is not frozen yet, and will probably be
 250 overhauled more than once before an official API gets blessed.
 251
 252
 253 CAVEATS
 254 -------
 255
 256 - this script only handles explicitely-declared dependencies, it
 257   won't detect it if eg. a prototype cut'n'paste was used instead of
 258   using the correct #include, but you shouldn't do that anyway :)
 259
 260
 261 TODO
 262 ----
 263
 264 - misc improvements
 265  - automate --help production (see Pod::Usage ?)
 266  - make default project-class consider multiple levels of directories
 267    as group levels, but only if they (consistently ?) have multiple
 268    subgroups ?
 269  - write more documentation
 270  - merge the verbose/debug behaviour into a global "report" file.
 271  - write a linux-kernel class as example :)
 272  - use an existing source of paper formats (libpaper, LC_PAPER, whatever)
 273 - modularization (finish the restructuring into a cleaner and more modular design)
 274   + allow coloring other things than just level 2
 275   + include syntax definition (allow support for java imports)
 276   - graph output syntax (allow to generate tulip graphs)
 277   - provide a simple hash-based filelabel implementation
 278   + use hash-based constructors (eg. like CGI.pm)
 279   + find the accessory classes as easily as possible (like nagios-plugins ?)
 280   - separate styling from project classes
 281   - allow to define several views in a project-class, several of which
 282     can be generated by default.
 283 - presentation
 284   - generalize the special_edge() mechanism (use a hash of edge attributes ?)
 285   - allow different node shapes when mixing high-level nodes with
 286     lower-level ones through the default singleton groups
 287     (special_node mechanism similar to the special_edge one ?)
 288   + optionally show labels or count for files (subnodes) in a node and
 289     color arcs according to them
 290   - optionally show external deps (deps on files not on command-line)
 291   - limit graph to one or more given group(s) of files (specified by <level>:<label>)
 292   - draw cycles in a given color
 293   - draw a specific path
 294   - allow setting fg color for a specific group level
 295   - provide automatic coloring schemes
 296   - color intra-group edges with the same color as nodes (post-processing ?)
 297   - allow to request drawing of who in a high-level node points to
 298     another node (ie. violates some constraint)
 299   + label edges with the number of explicit inclusions flowing through them
 300   - propagate excuses in some way when they are dropped by the transitive reducer
 301   - provide tools for automatic grouping (eg. using cycles, or selected external deps)
 302   - investigate candidate tools for hyperbolic layout ?
 303 - CLI improvements
 304   + recursive directory search to avoid long command-lines
 305 - provide an interactive tool to help understanding a project's
 306   structure.  Maybe with graphviz' lefty, or as a specialized tulip
 307   gui ?
 308 - bugs
 309   - --showdropped mode draws too many edges as dropped (ie. does not
 310     consider marked edges as dropped when deciding whether to consider
 311     subsequent edges as dropped)
 312   - when showing only 3-3, colors from level 2 get propagated to level-3 groups
 313   - transitive reduction may not be complete, some more edges could
 314     possibly be dropped - wesnoth tree at 2005-03-25 exhibits the problem
 315     with the "display -> builder -> animated -> image" path
 316
 317
 318 LICENSE
 319 -------
 320
 321     Copyright (c) 2005 Yann Dirson <ydirson@altern.org>
 322
 323     This program is free software; you can redistribute it and/or modify
 324     it under the terms of the GNU General Public License, version 2,
 325     as published by the Free Software Foundation.
 326
 327     This program is distributed in the hope that it will be useful,
 328     but WITHOUT ANY WARRANTY; without even the implied warranty of
 329     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 330     GNU General Public License for more details.