[ci skip] multi-user should be multiuser
[scons.git] / doc / user / cons.pl
blob8afbfecdd3eff875f85ce8be09ed1637d44c3068
1 =head1 Introduction
3 B<Cons> is a system for constructing, primarily, software, but is quite
4 different from previous software construction systems. Cons was designed
5 from the ground up to deal easily with the construction of software spread
6 over multiple source directories. Cons makes it easy to create build scripts
7 that are simple, understandable and maintainable. Cons ensures that complex
8 software is easily and accurately reproducible.
10 Cons uses a number of techniques to accomplish all of this. Construction
11 scripts are just Perl scripts, making them both easy to comprehend and very
12 flexible. Global scoping of variables is replaced with an import/export
13 mechanism for sharing information between scripts, significantly improving
14 the readability and maintainability of each script. B<Construction
15 environments> are introduced: these are Perl objects that capture the
16 information required for controlling the build process. Multiple
17 environments are used when different semantics are required for generating
18 products in the build tree. Cons implements automatic dependency analysis
19 and uses this to globally sequence the entire build. Variant builds are
20 easily produced from a single source tree. Intelligent build subsetting is
21 possible, when working on localized changes. Overrides can be setup to
22 easily override build instructions without modifying any scripts. MD5
23 cryptographic B<signatures> are associated with derived files, and are used
24 to accurately determine whether a given file needs to be rebuilt.
26 While offering all of the above, and more, Cons remains simple and easy to
27 use. This will, hopefully, become clear as you read the remainder of this
28 document.
32 =head2 Automatic global build sequencing
34 Because Cons does full and accurate dependency analysis, and does this
35 globally, for the entire build, Cons is able to use this information to take
36 full control of the B<sequencing> of the build. This sequencing is evident
37 in the above examples, and is equivalent to what you would expect for make,
38 given a full set of dependencies. With Cons, this extends trivially to
39 larger, multi-directory builds. As a result, all of the complexity involved
40 in making sure that a build is organized correctly--including multi-pass
41 hierarchical builds--is eliminated. We'll discuss this further in the next
42 sections.
46 =head1 A Model for sharing files
49 =head2 Some simple conventions
51 In any complex software system, a method for sharing build products needs to
52 be established. We propose a simple set of conventions which are trivial to
53 implement with Cons, but very effective.
55 The basic rule is to require that all build products which need to be shared
56 between directories are shared via an intermediate directory. We have
57 typically called this F<export>, and, in a C environment, provided
58 conventional sub-directories of this directory, such as F<include>, F<lib>,
59 F<bin>, etc.
61 These directories are defined by the top-level F<Construct> file. A simple
62 F<Construct> file for a B<Hello, World!> application, organized using
63 multiple directories, might look like this:
65 # Construct file for Hello, World!
67 # Where to put all our shared products.
68 $EXPORT = '#export';
70 Export qw( CONS INCLUDE LIB BIN );
72 # Standard directories for sharing products.
73 $INCLUDE = "$EXPORT/include";
74 $LIB = "$EXPORT/lib";
75 $BIN = "$EXPORT/bin";
77 # A standard construction environment.
78 $CONS = new cons (
79 CPPPATH => $INCLUDE, # Include path for C Compilations
80 LIBPATH => $LIB, # Library path for linking programs
81 LIBS => '-lworld', # List of standard libraries
84 Build qw(
85 hello/Conscript
86 world/Conscript
89 The F<world> directory's F<Conscript> file looks like this:
91 # Conscript file for directory world
92 Import qw( CONS INCLUDE LIB );
94 # Install the products of this directory
95 Install $CONS $LIB, 'libworld.a';
96 Install $CONS $INCLUDE, 'world.h';
98 # Internal products
99 Library $CONS 'libworld.a', 'world.c';
101 and the F<hello> directory's F<Conscript> file looks like this:
103 # Conscript file for directory hello
104 Import qw( CONS BIN );
106 # Exported products
107 Install $CONS $BIN, 'hello';
109 # Internal products
110 Program $CONS 'hello', 'hello.c';
112 To construct a B<Hello, World!> program with this directory structure, go to
113 the top-level directory, and invoke C<cons> with the appropriate
114 arguments. In the following example, we tell Cons to build the directory
115 F<export>. To build a directory, Cons recursively builds all known products
116 within that directory (only if they need rebuilding, of course). If any of
117 those products depend upon other products in other directories, then those
118 will be built, too.
120 % cons export
121 Install world/world.h as export/include/world.h
122 cc -Iexport/include -c hello/hello.c -o hello/hello.o
123 cc -Iexport/include -c world/world.c -o world/world.o
124 ar r world/libworld.a world/world.o
125 ar: creating world/libworld.a
126 ranlib world/libworld.a
127 Install world/libworld.a as export/lib/libworld.a
128 cc -o hello/hello hello/hello.o -Lexport/lib -lworld
129 Install hello/hello as export/bin/hello
132 =head2 Clean, understandable, location-independent scripts
134 You'll note that the two F<Conscript> files are very clean and
135 to-the-point. They simply specify products of the directory and how to build
136 those products. The build instructions are minimal: they specify which
137 construction environment to use, the name of the product, and the name of
138 the inputs. Note also that the scripts are location-independent: if you wish
139 to reorganize your source tree, you are free to do so: you only have to
140 change the F<Construct> file (in this example), to specify the new locations
141 of the F<Conscript> files. The use of an export tree makes this goal easy.
143 Note, too, how Cons takes care of little details for you. All the F<export>
144 directories, for example, were made automatically. And the installed files
145 were really hard-linked into the respective export directories, to save
146 space and time. This attention to detail saves considerable work, and makes
147 it even easier to produce simple, maintainable scripts.
151 =head1 Signatures
153 Cons uses file B<signatures> to decide if a derived file is out-of-date
154 and needs rebuilding. In essence, if the contents of a file change,
155 or the manner in which the file is built changes, the file's signature
156 changes as well. This allows Cons to decide with certainty when a file
157 needs rebuilding, because Cons can detect, quickly and reliably, whether
158 any of its dependency files have been changed.
161 =head2 MD5 content and build signatures
163 Cons uses the B<MD5> (B<Message Digest 5>) algorithm to compute file
164 signatures. The MD5 algorithm computes a strong cryptographic checksum
165 for any given input string. Cons can, based on configuration, use two
166 different MD5 signatures for a given file:
168 The B<content signature> of a file is an MD5 checksum of the file's
169 contents. Consequently, when the contents of a file change, its content
170 signature changes as well.
172 The B<build signature> of a file is a combined MD5 checksum of:
174 =over 4
176 the signatures of all the input files used to build the file
178 the signatures of all dependency files discovered by source scanners
179 (for example, C<.h> files)
181 the signatures of all dependency files specified explicitly via the
182 C<Depends> method)
184 the command-line string used to build the file
186 =back
188 The build signature is, in effect, a digest of all the dependency
189 information for the specified file. Consequently, a file's build
190 signature changes whenever any part of its dependency information
191 changes: a new file is added, the contents of a file on which it depends
192 change, there's a change to the command line used to build the file (or
193 any of its dependency files), etc.
195 For example, in the previous section, the build signature of the
196 F<world.o> file will include:
198 =over 4
200 the signature of the F<world.c> file
202 the signatures of any header files that Cons detects are included,
203 directly or indirectly, by F<world.c>
205 the text of the actual command line was used to generate F<world.o>
207 =back
209 Similarly, the build signature of the F<libworld.a> file will include
210 all the signatures of its constituents (and hence, transitively, the
211 signatures of B<their> constituents), as well as the command line that
212 created the file.
214 Note that there is no need for a derived file to depend upon any
215 particular F<Construct> or F<Conscript> file. If changes to these files
216 affect a file, then this will be automatically reflected in its build
217 signature, since relevant parts of the command line are included in the
218 signature. Unrelated F<Construct> or F<Conscript> changes will have no
219 effect.
222 =head2 Storing signatures in .consign files
224 Before Cons exits, it stores the calculated signatures for all of the
225 files it built or examined in F<.consign> files, one per directory.
226 Cons uses this stored information on later invocations to decide if
227 derived files need to be rebuilt.
229 After the previous example was compiled, the F<.consign> file in the
230 F<build/peach/world> directory looked like this:
232 world.h:985533370 - d181712f2fdc07c1f05d97b16bfad904
233 world.o:985533372 2a0f71e0766927c0532977b0d2158981
234 world.c:985533370 - c712f77189307907f4189b5a7ab62ff3
235 libworld.a:985533374 69e568fc5241d7d25be86d581e1fb6aa
237 After the file name and colon, the first number is a timestamp of the
238 file's modification time (on UNIX systems, this is typically the number
239 of seconds since January 1st, 1970). The second value is the build
240 signature of the file (or ``-'' in the case of files with no build
241 signature--that is, source files). The third value, if any, is the
242 content signature of the file.
245 =head2 Using build signatures to decide when to rebuild files
247 When Cons is deciding whether to build or rebuild a derived file, it
248 first computes the file's current build signature. If the file doesn't
249 exist, it must obviously be built.
251 If, however, the file already exists, Cons next compares the
252 modification timestamp of the file against the timestamp value in
253 the F<.consign> file. If the timestamps match, Cons compares the
254 newly-computed build signature against the build signature in the
255 F<.consign> file. If the timestamps do not match or the build
256 signatures do not match, the derived file is rebuilt.
258 After the file is built or rebuilt, Cons arranges to store the
259 newly-computed build signature in the F<.consign> file when it exits.
262 =head2 Signature example
264 The use of these signatures is an extremely simple, efficient, and
265 effective method of improving--dramatically--the reproducibility of a
266 system.
268 We'll demonstrate this with a simple example:
270 # Simple "Hello, World!" Construct file
271 $CFLAGS = '-g' if $ARG{DEBUG} eq 'on';
272 $CONS = new cons(CFLAGS => $CFLAGS);
273 Program $CONS 'hello', 'hello.c';
275 Notice how Cons recompiles at the appropriate times:
277 % cons hello
278 cc -c hello.c -o hello.o
279 cc -o hello hello.o
280 % cons hello
281 cons: "hello" is up-to-date.
282 % cons DEBUG=on hello
283 cc -g -c hello.c -o hello.o
284 cc -o hello hello.o
285 % cons DEBUG=on hello
286 cons: "hello" is up-to-date.
287 % cons hello
288 cc -c hello.c -o hello.o
289 cc -o hello hello.o
292 =head2 Source-file signature configuration
294 Cons provides a C<SourceSignature> method that allows you to configure
295 how the signature should be calculated for any source file when its
296 signature is being used to decide if a dependent file is up-to-date.
297 The arguments to the C<SourceSignature> method consist of one or more
298 pairs of strings:
300 SourceSignature 'auto/*.c' => 'content',
301 '*' => 'stored-content';
303 The first string in each pair is a pattern to match against derived file
304 path names. The pattern is a file-globbing pattern, not a Perl regular
305 expression; the pattern <*.l> will match all Lex source files. The C<*>
306 wildcard will match across directory separators; the pattern C<foo/*.c>
307 would match all C source files in any subdirectory underneath the C<foo>
308 subdirectory.
310 The second string in each pair contains one of the following keywords to
311 specify how signatures should be calculated for source files that match
312 the pattern. The available keywords are:
314 =over 4
316 =item content
318 Use the content signature of the source file when calculating signatures
319 of files that depend on it. This guarantees correct calculation of the
320 file's signature for all builds, by telling Cons to read the contents of
321 a source file to calculate its content signature each time it is run.
323 =item stored-content
325 Use the source file's content signature as stored in the F<.consign>
326 file, provided the file's timestamp matches the cached timestamp value
327 in the F<.consign> file. This optimizes performance, with the slight
328 risk of an incorrect build if a source file's contents have been changed
329 so quickly after its previous update that the timestamp still matches
330 the stored timestamp in the F<.consign> file even though the contents
331 have changed.
333 =back
335 The Cons default behavior of always calculating a source file's
336 signature from the file's contents is equivalent to specifying:
338 SourceSignature '*' => 'content';
340 The C<*> will match all source files. The C<content> keyword
341 specifies that Cons will read the contents of a source file to calculate
342 its signature each time it is run.
344 A useful global performance optimization is:
346 SourceSignature '*' => 'stored-content';
348 This specifies that Cons will use pre-computed content signatures
349 from F<.consign> files, when available, rather than re-calculating a
350 signature from the the source file's contents each time Cons is run. In
351 practice, this is safe for most build situations, and only a problem
352 when source files are changed automatically (by scripts, for example).
353 The Cons default, however, errs on the side of guaranteeing a correct
354 build in all situations.
356 Cons tries to match source file path names against the patterns in the
357 order they are specified in the C<SourceSignature> arguments:
359 SourceSignature '/usr/repository/objects/*' => 'stored-content',
360 '/usr/repository/*' => 'content',
361 '*.y' => 'content',
362 '*' => 'stored-content';
364 In this example, all source files under the F</usr/repository/objects>
365 directory will use F<.consign> file content signatures, source files
366 anywhere else underneath F</usr/repository> will not use F<.consign>
367 signature values, all Yacc source files (C<*.y>) anywhere else will not
368 use F<.consign> signature values, and any other source file will use
369 F<.consign> signature values.
372 =head2 Derived-file signature configuration
374 Cons provides a C<SIGNATURE> construction variable that allows you to
375 configure how signatures are calculated for any derived file when its
376 signature is being used to decide if a dependent file is up-to-date.
377 The value of the C<SIGNATURE> construction variable is a Perl array
378 reference that holds one or more pairs of strings, like the arguments to
379 the C<SourceSignature> method.
381 The first string in each pair is a pattern to match against derived file
382 path names. The pattern is a file-globbing pattern, not a Perl regular
383 expression; the pattern `*.obj' will match all (Win32) object files.
384 The C<*> wildcard will match across directory separators; the pattern
385 `foo/*.a' would match all (UNIX) library archives in any subdirectory
386 underneath the foo subdirectory.
388 The second string in each pair contains one of the following keywords
389 to specify how signatures should be calculated for derived files that
390 match the pattern. The available keywords are the same as for the
391 C<SourceSignature> method, with an additional keyword:
393 =over 4
395 =item build
397 Use the build signature of the derived file when calculating signatures
398 of files that depend on it. This guarantees correct builds by forcing
399 Cons to rebuild any and all files that depend on the derived file.
401 =item content
403 Use the content signature of the derived file when calculating signatures
404 of files that depend on it. This guarantees correct calculation of the
405 file's signature for all builds, by telling Cons to read the contents of
406 a derived file to calculate its content signature each time it is run.
408 =item stored-content
410 Use the derived file's content signature as stored in the F<.consign>
411 file, provided the file's timestamp matches the cached timestamp value
412 in the F<.consign> file. This optimizes performance, with the slight
413 risk of an incorrect build if a derived file's contents have been
414 changed so quickly after a Cons build that the file's timestamp still
415 matches the stored timestamp in the F<.consign> file.
417 =back
419 The Cons default behavior (as previously described) for using
420 derived-file signatures is equivalent to:
422 $env = new cons(SIGNATURE => ['*' => 'build']);
424 The C<*> will match all derived files. The C<build> keyword specifies
425 that all derived files' build signatures will be used when calculating
426 whether a dependent file is up-to-date.
428 A useful alternative default C<SIGNATURE> configuration for many sites:
430 $env = new cons(SIGNATURE => ['*' => 'content']);
432 In this configuration, derived files have their signatures calculated
433 from the file contents. This adds slightly to Cons' workload, but has
434 the useful effect of "stopping" further rebuilds if a derived file is
435 rebuilt to exactly the same file contents as before, which usually
436 outweighs the additional computation Cons must perform.
438 For example, changing a comment in a C file and recompiling should
439 generate the exact same object file (assuming the compiler doesn't
440 insert a timestamp in the object file's header). In that case,
441 specifying C<content> or C<stored-content> for the signature calculation
442 will cause Cons to recognize that the object file did not actually
443 change as a result of being rebuilt, and libraries or programs that
444 include the object file will not be rebuilt. When C<build> is
445 specified, however, Cons will only "know" that the object file was
446 rebuilt, and proceed to rebuild any additional files that include the
447 object file.
449 Note that Cons tries to match derived file path names against the
450 patterns in the order they are specified in the C<SIGNATURE> array
451 reference:
453 $env = new cons(SIGNATURE => ['foo/*.o' => 'build',
454 '*.o' => 'content',
455 '*.a' => 'stored-content',
456 '*' => 'content']);
458 In this example, all object files underneath the F<foo> subdirectory
459 will use build signatures, all other object files (including object
460 files underneath other subdirectories!) will use F<.consign> file
461 content signatures, libraries will use F<.consign> file build
462 signatures, and all other derived files will use content signatures.
465 =head2 Debugging signature calculation
467 Cons provides a C<-S> option that can be used to specify what internal
468 Perl package Cons should use to calculate signatures. The default Cons
469 behavior is equivalent to specifying C<-S md5> on the command line.
471 The only other package (currently) available is an C<md5::debug>
472 package that prints out detailed information about the MD5 signature
473 calculations performed by Cons:
475 % cons -S md5::debug hello
476 sig::md5::srcsig(hello.c)
477 => |52d891204c62fe93ecb95281e1571938|
478 sig::md5::collect(52d891204c62fe93ecb95281e1571938)
479 => |fb0660af4002c40461a2f01fbb5ffd03|
480 sig::md5::collect(52d891204c62fe93ecb95281e1571938,
481 fb0660af4002c40461a2f01fbb5ffd03,
482 cc -c %< -o %>)
483 => |f7128da6c3fe3c377dc22ade70647b39|
484 sig::md5::current(||
485 eq |f7128da6c3fe3c377dc22ade70647b39|)
486 cc -c hello.c -o hello.o
487 sig::md5::collect()
488 => |d41d8cd98f00b204e9800998ecf8427e|
489 sig::md5::collect(f7128da6c3fe3c377dc22ade70647b39,
490 d41d8cd98f00b204e9800998ecf8427e,
491 cc -o %> %< )
492 => |a0bdce7fd09e0350e7efbbdb043a00b0|
493 sig::md5::current(||
494 eq |a0bdce7fd09e0350e7efbbdb043a00b0|)
495 cc -o hello, hello.o
503 =head1 Temporary overrides
505 Cons provides a very simple mechanism for overriding aspects of a build. The
506 essence is that you write an override file containing one or more
507 C<Override> commands, and you specify this on the command line, when you run
508 C<cons>:
510 % cons -o over export
512 will build the F<export> directory, with all derived files subject to the
513 overrides present in the F<over> file. If you leave out the C<-o> option,
514 then everything necessary to remove all overrides will be rebuilt.
517 =head2 Overriding environment variables
519 The override file can contain two types of overrides. The first is incoming
520 environment variables. These are normally accessible by the F<Construct>
521 file from the C<%ENV> hash variable. These can trivially be overridden in
522 the override file by setting the appropriate elements of C<%ENV> (these
523 could also be overridden in the user's environment, of course).
526 =head2 The Override command
528 The second type of override is accomplished with the C<Override> command,
529 which looks like this:
531 Override <regexp>, <var1> => <value1>, <var2> => <value2>, ...;
533 The regular expression I<regexp> is matched against every derived file that
534 is a candidate for the build. If the derived file matches, then the
535 variable/value pairs are used to override the values in the construction
536 environment associated with the derived file.
538 Let's suppose that we have a construction environment like this:
540 $CONS = new cons(
541 COPT => '',
542 CDBG => '-g',
543 CFLAGS => '%COPT %CDBG',
546 Then if we have an override file F<over> containing this command:
548 Override '\.o$', COPT => '-O', CDBG => '';
550 then any C<cons> invocation with C<-o over> that creates F<.o> files via
551 this environment will cause them to be compiled with C<-O >and no C<-g>. The
552 override could, of course, be restricted to a single directory by the
553 appropriate selection of a regular expression.
555 Here's the original version of the Hello, World! program, built with this
556 environment. Note that Cons rebuilds the appropriate pieces when the
557 override is applied or removed:
559 % cons hello
560 cc -g -c hello.c -o hello.o
561 cc -o hello hello.o
562 % cons -o over hello
563 cc -O -c hello.c -o hello.o
564 cc -o hello hello.o
565 % cons -o over hello
566 cons: "hello" is up-to-date.
567 % cons hello
568 cc -g -c hello.c -o hello.o
569 cc -o hello hello.o
571 It's important that the C<Override> command only be used for temporary,
572 on-the-fly overrides necessary for development because the overrides are not
573 platform independent and because they rely too much on intimate knowledge of
574 the workings of the scripts. For temporary use, however, they are exactly
575 what you want.
577 Note that it is still useful to provide, say, the ability to create a fully
578 optimized version of a system for production use--from the F<Construct> and
579 F<Conscript> files. This way you can tailor the optimized system to the
580 platform. Where optimizer trade-offs need to be made (particular files may
581 not be compiled with full optimization, for example), then these can be
582 recorded for posterity (and reproducibility) directly in the scripts.
586 =head2 The C<Module> method
588 The C<Module> method is a combination of the C<Program> and C<Command>
589 methods. Rather than generating an executable program directly, this command
590 allows you to specify your own command to actually generate a module. The
591 method is invoked as follows:
593 Module $env <module name>, <source or object files>, <construction command>;
595 This command is useful in instances where you wish to create, for example,
596 dynamically loaded modules, or statically linked code libraries.
601 =head2 The C<RuleSet> method
603 The C<RuleSet> method returns the construction variables for building
604 various components with one of the rule sets supported by Cons. The
605 currently supported rule sets are:
607 =over 4
609 =item msvc
611 Rules for the Microsoft Visual C++ compiler suite.
613 =item unix
615 Generic rules for most UNIX-like compiler suites.
617 =back
619 On systems with more than one available compiler suite, this allows you
620 to easily create side-by-side environments for building software with
621 multiple tools:
623 $msvcenv = new cons(RuleSet("msvc"));
624 $cygnusenv = new cons(RuleSet("unix"));
626 In the future, this could also be extended to other platforms that
627 have different default rule sets.
630 =head2 The C<DefaultRules> method
632 The C<DefaultRules> method sets the default construction variables that
633 will be returned by the C<new> method to the specified arguments:
635 DefaultRules(CC => 'gcc',
636 CFLAGS => '',
637 CCCOM => '%CC %CFLAGS %_IFLAGS -c %< -o %>');
638 $env = new cons();
639 # $env now contains *only* the CC, CFLAGS,
640 # and CCCOM construction variables
642 Combined with the C<RuleSet> method, this also provides an easy way
643 to set explicitly the default build environment to use some supported
644 toolset other than the Cons defaults:
646 # use a UNIX-like tool suite (like cygwin) on Win32
647 DefaultRules(RuleSet('unix'));
648 $env = new cons();
650 Note that the C<DefaultRules> method completely replaces the default
651 construction environment with the specified arguments, it does not
652 simply override the existing defaults. To override one or more
653 variables in a supported C<RuleSet>, append the variables and values:
655 DefaultRules(RuleSet('unix'), CFLAGS => '-O3');
656 $env1 = new cons();
657 $env2 = new cons();
658 # both $env1 and $env2 have 'unix' defaults
659 # with CFLAGS set to '-O3'
668 =head2 The C<SourcePath> method
670 The C<SourcePath> mathod returns the real source path name of a file,
671 as opposed to the path name within a build directory. It is invoked
672 as follows:
674 $path = SourcePath <buildpath>;
677 =head2 The C<ConsPath> method
679 The C<ConsPath> method returns true if the supplied path is a derivable
680 file, and returns undef (false) otherwise.
681 It is invoked as follows:
683 $result = ConsPath <path>;
686 =head2 The C<SplitPath> method
688 The C<SplitPath> method looks up multiple path names in a string separated
689 by the default path separator for the operating system (':' on UNIX
690 systems, ';' on Windows NT), and returns the fully-qualified names.
691 It is invoked as follows:
693 @paths = SplitPath <pathlist>;
695 The C<SplitPath> method will convert names prefixed '#' to the
696 appropriate top-level build name (without the '#') and will convert
697 relative names to top-level names.
700 =head2 The C<DirPath> method
702 The C<DirPath> method returns the build path name(s) of a directory or
703 list of directories. It is invoked as follows:
705 $cwd = DirPath <paths>;
707 The most common use for the C<DirPath> method is:
709 $cwd = DirPath '.';
711 to fetch the path to the current directory of a subsidiary F<Conscript>
712 file.
715 =head2 The C<FilePath> method
717 The C<FilePath> method returns the build path name(s) of a file or
718 list of files. It is invoked as follows:
720 $file = FilePath <path>;