README

   1
   2                 graph-includes toolkit
   3                 ======================
   4
   5 IN SHORT
   6 --------
   7
   8 Graph-includes creates a graph of dependencies between source-files
   9 and/or groups of source-files, with an emphasis on getting readable
  10 and usable graphs even for large projects.
  11
  12 Usability of the dependency graphs are currently improved by:
  13 - customizable grouping of several source files into a single node
  14 - transitive reduction of the graph
  15
  16 It currently supports graphing the C/C++ #include relationship, using
  17 graphviz.
  18
  19
  20 IMPORTANT NOTICE
  21 ----------------
  22
  23 This tool has evolved from a 50-line script written for a particular
  24 project (Battle for Wesnoth).  Although it has been generalized much,
  25 there are still somewhat ad-hoc heuristics harcoded here and there,
  26 especially in the default project class (see class descriptions below).
  27
  28 Although work is under way to make this tool as generic as possible,
  29 work still has to be done at all levels.  It is still under
  30 development, and may not suit your needs (at least, not yet).
  31
  32
  33 INSTALLATION INSTRUCTIONS
  34 -------------------------
  35
  36 Like standard perl packages.  Eg:
  37
  38 $ perl Makefile.PL prefix=/usr/local
  39 $ make
  40 $ su
  41 # make install
  42
  43
  44 New versions can be found at http://ydirson.free.fr/soft/graph-includes/.
  45
  46
  47 HOW TO TAKE ADVANTAGE OF THIS TOOL TO IMPROVE YOUR CODE
  48 -------------------------------------------------------
  49
  50 1. on the spirit of dependency cleanup
  51
  52 When developping a project of medium size (we'll talk mostly C/C++
  53 here, but that will apply to most languages), expecially with many
  54 people writing code, it is quite easy to get to a point where each
  55 file (out of several tens of hundreds of files) depends on too many
  56 other files.
  57
  58 The most obvious relation is the #include one.  The more #includes a
  59 file has, the more time it takes to build - especially when those
  60 included files #include themselves a bunch of other files.  For a
  61 project of about 100 files, just producing a graph of all those files,
  62 with arrows representing the #include dependencies, will usually give
  63 an unreadable graph, and will show very little about possible
  64 improvements.
  65
  66 A less obvious relation appears more clearly when you consider not
  67 files by themselves, but the set of files made of an interface and the
  68 matching implementation.  Let's consider two such sets, made of the
  69 files a.h, a.c, b.h, b.c.  a.c includes b.h, and b.c includes a.h, and
  70 each implementation, following good practice, includes its own
  71 interface.  A simple dependency graph as described above would show
  72 such a graph:
  73
  74         a.c -> b.h
  75            \  /|
  76             \/
  77             /\
  78            /  \|
  79         b.c -> a.h
  80
  81 If OTOH we represent those sets of files instead of the files
  82 themselves, we now have something like:
  83
  84         a <--> b
  85
  86 This shows much more clearly that those two modules are intrinsicately
  87 related.  In many cases, this will express that whenever you use the
  88 a.o file resulting from the build of a.c, you'll need to link b.o as
  89 well, and vice versa.  This will be the case when each file uses the
  90 headers to get function prototypes.  Then hunting for abusive
  91 dependencies will allow, for example, to select with finer grain which
  92 of those modules of code will need to go into which executable, thus
  93 producing lighter executables.
  94
  95 In other cases, headers would just have been used to access a type
  96 definition from b.h, and the associated b.o would not be needed.  In
  97 such cases, you may want to consider splitting such "low-level"
  98 declarations into their own headers.  Not only this would simplify the
  99 graph, allowing you to get a better grasp on your source code, but it
 100 can also lead to faster compilations, since each file will be able
 101 include less unrelated definitions.
 102
 103
 104 2. possible strategies to help locating abusive dependencies
 105
 106 More to be written.
 107
 108
 109
 110 COMMAND-LINE USAGE
 111 ------------------
 112
 113 See "graph-includes --help".
 114
 115 1. output type
 116
 117 The default output is a .dot file on standard output, suitable for
 118 formatting by dot (from the graphviz toolkit), or interactive editing
 119 by dotty (also from graphviz).
 120
 121 You can ask graph-includes to do the formatting for you, eg. using
 122 "--output=<file>.<suffix>".  It will run "dot -T<suffix>", so that
 123 "--output=mydeps.ps" or "--output=mydeps.jpg" will have the expected
 124 behaviour.  If your suffix is not known to dot, it will complain
 125 itself, so asking for --output=foo.bar will cause a message like:
 126
 127 Warning: language bar not recognized, use one of: canon cmap cmapx dia dot fig gd gd2 gif hpgl imap ismap jpeg jpg mif mp pcl pic plain plain-ext png ps ps2 svg svgz vrml vtx wbmp xdot
 128
 129 If you intend to print the result on paper, the default layout will
 130 likely be too large.  You can use --paper=a4 to select parameters that
 131 will produce a smaller graph and spilt it into pages.  This flag also
 132 changes the default output format to postscript.  Be warned that dot
 133 may not honor the page-splitting parameter for all output formats.
 134
 135 Since the transitive reduction can take time, you may like the
 136 --verbose switch, which will show a progress bar.
 137
 138
 139 2. what to draw
 140
 141 The files to be analyzed are given as non-option arguments, and are
 142 typically generated by a "find" command.  Eg:
 143
 144         $ graph-includes `find src -name '*.[ch]'`
 145
 146 How dependencies get extracted from the source files depend on the
 147 language used in those files.  You can specify it with the --language
 148 flag.  Default value is C (which should also be used for other
 149 languages based on the C preprocessor, like C++).  There is also some
 150 partial support for perl - see comments in
 151 lib/graphincludes/extractor/perl.pm for more details.
 152
 153 In order to tell the #include resolver where to look for included
 154 files, you can use the cpp-like -I (aka. --Include) flag.  Eg:
 155
 156         $ graph-includes -I src `find src -name '*.[ch]'`
 157
 158 Dependencies not found in the project (ie. files appearing in #include
 159 but not given on command-line) are listed as "not found" in the
 160 graph-includes.report file for diagnostics purposes, unless they are
 161 found in a system directory.  System directories are declared in a
 162 similar fashion, with the --sysInclude option.  Eg:
 163
 164         $ graph-includes -I src -sysI /usr/include `find src -name '*.[ch]'`
 165
 166 To avoid having useless information on the graph,
 167 --prefixstrip=<prefix> can be used to avoid repeating a given prefix
 168 in all node labels.  Typically:
 169
 170         $ graph-includes --prefixstrip=src/ `find src -name '*.[ch]'`
 171
 172 Files will be grouped in a hierarchy of groups, level 0 groups
 173 typically containing just one file.  Groups are defined by the
 174 selected project class, selected by the --class=<class> option.  See
 175 below for descriptions of the project classes available by default,
 176 and for instructions to write customized project classes.
 177
 178 The range of group levels to be drawn is selected with
 179 --group=<min>-<max>, which defaults to 1-1.  Eg, for class "default",
 180 whose group levels are defined as:
 181
 182 0: one file per group
 183 1: what/ever.* go into a "what/ever" group (usually interface + implementation)
 184 2: what/* go into a "what" group, supposing directories denote modules of some sort
 185
 186 Group levels below "min" or above "max" are not displayed as nodes.
 187 Groups of level "min" are drawn as nodes of the graph.  If "max" is
 188 strictly greater than "min", then groups of levels "min+1" through
 189 "max" are drawn as box clusters containing lower-level groups.
 190
 191 Since such a way of grouping nodes will not improve the readability in
 192 projects where the inter-groups dependencies have not been cleaned up
 193 yet, higher-level groups can instead be colored, using a class-defined
 194 color scheme, possibly modified by "--color <n>:<label>=<color>[,<label>=<color>...]"
 195 options, where <n> is the group level in which the group name <label> will
 196 receive a background of the specified color, which can be defined
 197 either by a named X11 color (like "blue" or "palegreen"), or by a RGB
 198 color using the standard X11 "#RRGGBB" syntax.
 199
 200
 201 For those wanting to see what edges the transitive reduction dropped,
 202 the --showdropped will add them to the graph in a different color.  Be
 203 prepared for your computer room to get a noticeable temperature
 204 increase for anything else than a small set of files with only few
 205 dependencies.
 206
 207 OTOH, --focus=<node-label> will do the same, but only for the
 208 dependencies of a specified node.  That should prevent the nasty
 209 effects described above, and will be useful for various purposes,
 210 including debugging the transitive reducer.  The node-label refers to
 211 a node in the lowest group-level drawn, ie. the "min" argument to
 212 --group.
 213
 214 People still getting cold may also like to circumvent the
 215 transitive-reduction engine completely, using --alldeps.  The author
 216 assumes no responsibility for losses of mental health induced by
 217 trying to make any serious use of the resulting graph.
 218
 219
 220 EXISTING PROJECT CLASSES
 221 ------------------------
 222
 223 1. class "default"
 224
 225 As implied by its name, it is the one which will be used unless you
 226 use the --class option.  Although it is the default one, it may still
 227 be quite rough at the moment, still using some ad-hoc heuristics, and
 228 will be improved in the near future.  Here are its main
 229 characteristics:
 230
 231  - looks at C-style #include lines
 232  - creates level-1 groups for all files sharing the same path and
 233    (disregarding the suffix) filename.  Eg, files "foo/bar.c" and
 234    "foo/bar.h" would be grouped in a "foo/bar" level-1 group.
 235    In clear, it won't connect include files if they are all located
 236    in an include/ directory.
 237  - creates by-directory level-2 groups.  Eg. in the above example, a
 238    group "foo" would exist at level-2.
 239
 240
 241 2. class "uniqueincludes"
 242
 243 Built on top of the default class, it is meant for projects where file
 244 names are kept unique across all directories.  If the ad-hoc #include
 245 processing of the default class does not suit your project, it is the
 246 only out-of-the-box alternative available today.  Here are its main
 247 characteristics:
 248
 249  - provides a single grouping level based on filenames, disregarding
 250    all the directory hierarchy.
 251
 252 Note that it is not meant for general use, as:
 253
 254  - it will group any files with the same name in the same level-0
 255    group, possibly causing confusion.
 256  - it does not make any directory name appear in the node names
 257
 258
 259 DEFINING YOUR OWN PROJECT CLASS
 260 -------------------------------
 261
 262 See graphincludes::project::wesnoth in the examples/ dir as an example.
 263
 264 Keep in mind that the API is not frozen yet, and will probably be
 265 overhauled more than once before an official API gets blessed.
 266
 267
 268 CAVEATS
 269 -------
 270
 271 - this script only handles explicitely-declared dependencies, it
 272   won't detect it if eg. a prototype cut'n'paste was used instead of
 273   using the correct #include, but you shouldn't do that anyway :)
 274
 275
 276 TODO
 277 ----
 278
 279 - misc improvements
 280  - automate --help production (see Pod::Usage ?)
 281  - make default project-class consider multiple levels of directories
 282    as group levels, but only if they (consistently ?) have multiple
 283    subgroups ?
 284  - write more documentation
 285  - continue merging the verbose/debug behaviour into the global report file.
 286  - write a linux-kernel class as example :)
 287  - use an existing source of paper formats (libpaper, LC_PAPER, whatever)
 288  - find out how to use this damn Exporter mechanism for
 289    graphincludes::params, or find another way of getting rid of those
 290    "used only once" warnings.
 291  - maybe use graphviz' tred(1) to check our transitive reductions.
 292  - write a testsuite.
 293 - modularization (finish the restructuring into a cleaner and more modular design)
 294   + allow coloring other things than just level 2
 295   - write a perl extractor
 296   - graph output syntax (allow to generate tulip graphs)
 297   - provide a simple hash-based filelabel implementation
 298   + find the accessory classes as easily as possible (like nagios-plugins ?)
 299   - separate styling from project classes
 300   - allow to define several views in a project-class, several of which
 301     can be generated by default.
 302   - find out whether we can declare protocols/pure-virtual-classes in
 303     some way, to cleanup the class graph
 304   - generalize --prefix-strip
 305   - give consistent access to all commonly-needed features through
 306     command-line and class customization
 307   - ensure in the testsuite that all provided non-abstract classes are
 308     self-contained
 309 - write an openc++-based dependency extractor
 310  - extract more fine-grained dependency (depending on a header does
 311    not necessarily imply depending on code)
 312  - handle (warn about) the case where the declarations for a given
 313    implementation file are scattered in more than one header
 314  - detect undeclared dependencies (eg. manually inserted prototypes)
 315  - check necessity of declared includes
 316 - presentation
 317   - generalize the special_edge() mechanism (use a hash of edge attributes ?)
 318   - allow different node shapes when mixing high-level nodes with
 319     lower-level ones through the default singleton groups
 320     (special_node mechanism similar to the special_edge one ?)
 321   + optionally show labels or count for files (subnodes) in a node and
 322     color arcs according to them
 323   - optionally show external deps (deps on files not on command-line)
 324   - limit graph to one or more given group(s) of files (specified by <level>:<label>)
 325   - draw cycles in a given color
 326   - draw a specific path
 327   - allow setting fg color for a specific group level
 328   - provide automatic coloring schemes
 329   - color intra-group edges with the same color as nodes (post-processing ?)
 330   - allow to request drawing of who in a high-level node points to
 331     another node (ie. violates some constraint)
 332   + label edges with the number of explicit inclusions flowing through them
 333   - propagate excuses in some way when they are dropped by the transitive reducer
 334   - provide tools for automatic grouping (eg. using cycles, or selected external deps)
 335   - investigate candidate tools for hyperbolic layout ?
 336 - CLI improvements
 337   + recursive directory search to avoid long command-lines
 338   - provide an initial list of system directories to avoid repeating them (ask compiler)
 339 - provide an interactive tool to help understanding a project's
 340   structure.  Maybe with graphviz' lefty, or as a specialized tulip
 341   gui ?
 342 - bugs
 343   - --showdropped mode draws too many edges as dropped (ie. does not
 344     consider marked edges as dropped when deciding whether to consider
 345     subsequent edges as dropped)
 346   - when showing only 3-3, colors from level 2 get propagated to level-3 groups
 347   - transitive reduction may not be complete, some more edges could
 348     possibly be dropped - wesnoth tree at 2005-03-25 exhibits the problem
 349     with the "display -> builder -> animated -> image" path
 350
 351
 352 LICENSE
 353 -------
 354
 355     Copyright (c) 2005 Yann Dirson <ydirson@altern.org>
 356
 357     This program is free software; you can redistribute it and/or modify
 358     it under the terms of the GNU General Public License, version 2,
 359     as published by the Free Software Foundation.
 360
 361     This program is distributed in the hope that it will be useful,
 362     but WITHOUT ANY WARRANTY; without even the implied warranty of
 363     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 364     GNU General Public License for more details.