3 acd \- a compiler driver
18 is a compiler driver, a program that calls the several passes that are needed
19 to compile a source file. It keeps track of all the temporary files used
20 between the passes. It also defines the interface of the compiler, the
21 options the user gets to see.
23 This text only describes
25 itself, it says nothing about the different options the C-compiler accepts.
26 (It has nothing to do with any language, other than being a tool to give
27 a compiler a user interface.)
30 itself takes five options:
33 Sets the diagnostic level to
43 does not produce any output.
45 prints the basenames of the programs called.
47 prints names and arguments of the programs called.
49 shows the commands executed from the description file too.
51 shows the program read from the description file too. Levels 3 and 4 use
52 backspace overstrikes that look good when viewing the output with a smart
58 except that no command is executed. The driver is just play-acting.
62 is normally linked to the name the compiler is to be called with by the
63 user. The basename of this, say
65 is the call name of the driver. It plays a role in selecting the proper
66 description file. With the
68 option one can change this.
70 has the same effect as calling the program as
74 Allows one to choose the pass description file of the driver. By default
78 the call name of the program. If
86 .BI /usr/lib/ descr /descr
87 will be used for the description, otherwise
91 calls the C-compiler with a different description file without changing the
92 call name. Finally, if
94 is \fB"\-"\fP, standard input is read. (The default lib directory
98 at compile time by \fB\-DLIB=\e"\fP\fIdir\fP\fB\e"\fP. The default
100 may be set with \fB\-DDESCR=\e"\fP\fIdescr\fP\fB\e"\fP for simple
101 installations on a system without symlinks.)
104 Temporary files are made in
106 by default, which may be overridden by the environment variable
108 which may be overridden by the
111 .SH "THE DESCRIPTION FILE"
112 The description file is a program interpreted by the driver. It has variables,
113 lists of files, argument parsing commands, and rules for transforming input
116 There are four simple objects:
119 Words, Substitutions, Letters, and Operators.
122 And there are two ways to group objects:
125 Lists, forming sequences of anything but letters,
127 Strings, forming sequences of anything but Words and Operators.
130 Each object has the following syntax:
132 They are sequences of characters, like
134 .BR \-I/usr/include ,
136 No whitespace and no special characters. The backslash character
138 may be used to make special characters common, except whitespace. A backslash
139 followed by whitespace is completely removed from the input. The sequence
141 is changed to a newline.
143 A substitution (henceforth called 'subst') is formed with a
150 The variable name after the
152 is made of letters, digits and underscores, or any sequence of characters
153 between parentheses or braces, or a single other character. A subst indicates
154 that the value of the named variable must be substituted in the list or string
155 when fully evaluated.
157 Letters are the single characters that would make up a word.
167 are the operators. The first four must be surrounded by whitespace if they
168 are to be seen as special (they are often used in arguments). The last two
171 One line of objects in the description file forms a list. Put parentheses
172 around it and you have a sublist. The values of variables are lists.
174 Anything that is not yet a word is a string. All it needs is that the substs
175 in it are evaluated, e.g.
176 .BR $LIBPATH/lib$key.a .
177 A single subst doesn't make a string, it expands to a list. You need at
178 least one letter or other subst next to it. Strings (and words) may also
179 be formed by enclosing them in double quotes. Only
183 keep their special meaning within quotes.
185 One thing has to be carefully understood: Substitutions are delayed until
186 the last possible moment, and description files make heavy use of this.
187 Only if a subst is tainted, either because its variable is declared local, or
188 because a subst in its variable's value is tainted, is it immediately
189 substituted. So if a list is assigned to a variable then this list is only
190 checked for tainted substs. Those substs are replaced by the value
191 of their variable. This is called partial evaluation.
193 Full evaluation expands all substs, the list is flattened, i.e. all
194 parentheses are removed from sublists.
196 Implosive evaluation is the last that has to be done to a list before it
197 can be used as a command to execute. The substs within a string have been
198 evaluated to lists after full expansion, but a string must be turned into
199 a single word, not a list. To make this happen, a string is first exploded
200 to all possible combinations of words choosing one member of the lists within
201 the string. These words are tried one by one to see if they exist as a
202 file. The first one that exists is taken, if none exists than the first
203 choice is used. As an example, assume
206 .BR "(/lib /usr/lib)" ,
212 happens to be local. Then we have:
215 \fB"$LIBPATH/lib$key.a"\fP
221 \fB"$LIBPATH/lib(c).a"\fP
224 after partial evaluation,
227 \fB"(/lib/libc.a /usr/lib/libc.a)"\fP
230 after full evaluation, and finally
236 after implosion, if the file exists.
238 The operators modify the way evaluation is done and perform a special
242 Forces full evaluation on all the list elements following it. Use it to
243 force substitution of the current value of a variable. This is the only
244 operator that forces immediate evaluation.
249 exists in a list that is fully evaluated, then all the elements before the
251 are imploded and all elements after the
253 are imploded and added to the list if they are not already in the list. So
254 this operator can be used either for set addition, or to force implosive
255 expansion within a sublist.
260 except that elements after the
262 are removed from the list.
264 The set operators can be used to gather options that exclude each other
265 or for their side effect of implosive expansion. You may want to write:
268 \fBcpp \-I$LIBPATH/include\fP
271 to call cpp with an extra include directory, but
273 is expanded using a filename starting with
275 so this won't work. Given that any problem in Computer Science can be solved
276 with an extra level of indirection, use this instead:
282 INCLUDE = $LIBPATH/include +
285 .SS "Special Variables"
286 There are three special variables used in a description file:
291 These variables are always local and mostly read-only. They will be
294 The lists in a description file form a program that is executed from the
295 first to the last list. The first word in a list may be recognized as a
296 builtin command (only if the first list element is indeed simply a word.)
297 If it is not a builtin command then the list is imploded and used as a
298 \s-2UNIX\s+2 command with arguments.
300 Indentation (by tabs or spaces) is not just makeup for a program, but are
301 used to group lines together. Some builtin commands need a body. These
302 bodies are simply lines at a deeper indentation.
304 Empty lines are not ignored either, they have the same indentation level as
305 the line before it. Comments (starting with a
307 and ending at end of line) have an indentation of their own and can be used
311 will complain about unexpected indentation shifts and empty bodies. Commands
312 can share the same body by placing them at the same indentation level before
313 the indented body. They are then "guards" to the same body, and are tried
314 one by one until one succeeds, after which the body is executed.
316 Semicolons may be used to separate commands instead of newlines. The commands
317 are then all at the indentation level of the first.
318 .SS "Execution phases"
319 The driver runs in three phases: Initialization, Argument scanning, and
320 Compilation. Not all commands work in all phases. This is further explained
323 The commands accept arguments that are usually generic expressions that
324 implode to a word or a list of words. When
326 is specified, then a single word or subst needs to be given, so
327 an assignment can be either
336 .IB "var " = " expr ..."
337 The partially evaluated list of expressions is assigned to
339 During the evaluation is
341 marked as local, and after the assignment set from undefined to defined.
345 is set to null and is marked as undefined.
350 is defined in the environment of
352 then it is assigned to
354 The environment variable is split into words at whitespace and colons. Empty
355 space between two colons
359 .BI mktemp " var " [ suffix ]
362 the name of a new temporary file, usually something like /tmp/acd12345x. If
364 is present then it will be added to the temporary file's name. (Use it
365 because some programs require it, or just because it looks good.)
367 remembers this file, and will delete it as soon as you stop referencing it.
369 .BI temporary " word"
370 Mark the file named by
372 as a temporary file. You have to make sure that the name is stored in some
373 list in imploded form, and not just temporarily created when
375 is evaluated, because then it will be immediately removed and forgotten.
378 Sets the target suffix for the compilation phase. Something like
380 means that the source files must be compiled to object files. At least one
382 command must be executed before the compilation phase begins. It may not be
383 changed during the compilation phase. (Note: There is no restriction on
385 it need not start with a dot.)
387 .BI treat " file suffix"
388 Marks the file as having the given suffix for the compile phase. Useful
391 option directly to the loader by treating it as having the
398 is a number. If not then
400 will exit with a nice error message.
402 .BI error " expr ..."
403 Makes the driver print the error message
407 .BI if " expr " = " expr"
409 tests if the two expressions are equal using set comparison, i.e. each
410 expression should contain all the words in the other expression. If the
411 test succeeds then the if-body is executed.
414 Executes the ifdef-body if
419 Executes the ifndef-body if
424 Executes the iftemp-body if
426 is a temporary file. Use it when a command has the same file as input and
427 output and you don't want to clobber the source file:
443 Executes the ifhash-body if
445 is an existing file with a '\fB#\fP' as the very first character. This
446 usually indicates that the file must be pre-processed:
464 Executes the else-body if the last executed
471 was unsuccessful. Note that
473 need not immediately follow an if, but you are advised not to make use of
474 this. It is a "feature" that may not last.
476 .BI apply " suffix1 suffix2"
477 Executed inside a transform rule body to transform the input file according
478 to another transform rule that has the given input and output suffixes. The
481 will be replaced by the new file. So if there is a
483 preprocessor rule then the example of
499 Reads another description file and replaces the
501 with it. Execution continues with the first list in the new program. The
504 is the same as used for the
508 to switch in different front ends or back ends, or to call a shared
509 description file with a different initialization. Note that
511 is only evaluated the first time the
513 is called. After that the
515 has been replaced with the included program, so changing its argument won't
516 get you a different file.
518 .BI arg " string ..."
520 may be executed in the initialization and scanning phase to post an argument
521 scanning rule, that's all the command itself does. Like an
523 that fails it allows more guards to share the same body.
525 .BI transform " suffix1 suffix2"
529 only posts a rule to transform a file with the suffix
531 into a file with the suffix
534 .BI prefer " suffix1 suffix2"
535 Tells that the transformation rule from
539 is to be preferred when looking for a transformation path to the stop suffix.
540 Normally the shortest route to the stop suffix is used.
544 because the special nature of combines does not allow ambiguity.
546 The two suffixes on a
550 may be the same, giving a rule that is only executed when preferred.
552 .BI combine " suffix-list suffix"
556 except that it allows a list of input suffixes to match several types of
557 input files that must be combined into one.
560 The scanning phase may be run early from the initialization phase with the
562 command. Use it if you need to make choices based on the arguments before
563 posting the transformation rules. After running this,
570 Move on to the compilation phase early, so that you have a chance to run
571 a few extra commands before exiting. This command implies a
574 Any other command is seen as a \s-2UNIX\s+2 command. This is where the
578 operators come into play. They redirect standard input and standard output
579 to the file mentioned after them, just like the shell.
581 will stop with an error if the command is not successful.
582 .SS The Initialization Phase
583 The driver starts by executing the program once from top to bottom to
584 initialize variables and post argument scanning and transformation rules.
585 .SS The Scanning Phase
586 In this phase the driver makes a pass over the command line arguments to
587 process options. Each
589 rule is tried one by one in the order they were posted against the front of
590 the argument list. If a match is made then the matched arguments are removed
591 from the argument list and the arg-body is executed. If no match can be made
592 then the first argument is moved to the list of files waiting to be
593 transformed and the scan is restarted.
595 The match is done as follows: Each of the strings after
597 must match one argument at the front of the argument list. A character
598 in a string must match a character in an argument word, a subst in a string
599 may match 1 to all remaining characters in the argument, preferring the
600 shortest possible match. The hyphen in a argument starting with a hyphen
601 cannot be matched by a subst. Therefore:
607 matches only the argument
614 matches any argument that starts with
616 and is at least three characters long. Lastly,
624 and the argument following it, unless that argument starts with a hyphen.
628 is set to all the matched arguments before the arg-body is executed. All
629 the substs in the arg strings are set to the characters they match. The
632 is set to null. All the values of the variables are saved and the variables
633 marked local. All variables except
635 are marked read-only. After the arg-body is executed is the value of
637 concatenated to the file list. This allows one to stuff new files into the
638 transformation phase. These added names are not evaluated until the start
640 .SS The Compilation Phase
641 The files gathered in the file list in the scanning phase are now transformed
642 one by one using the transformation rules. The shortest, or preferred route
643 is computed for each file all the way to the stop suffix. Each file is
644 transformed until it lands at the stop suffix, or at a combine rule. After
645 a while all files are either fully transformed or at a combine rule.
647 The driver chooses a combine rule that is not on a path from another combine
648 rule and executes it. The file that results is then transformed until it
649 again lands at a combine rule or the stop suffix. This continues until all
650 files are at the stop suffix and the program exits.
652 The paths through transform rules may be ambiguous and have cycles, they will
653 be resolved. But paths through combines must be unambiguous, because of
654 the many paths from the different files that meet there. A description file
655 will usually have only one combine rule for the loader. However if you do
656 have a combine conflict then put a no-op transform rule in front of one to
659 If a file matches a long and a short suffix then the long suffix is preferred.
660 By putting a null input suffix (\fB""\fP) in a rule one can match any file
661 that no other rule matches. You can send unknown files to the loader this
666 is set to the file to be transformed or the files to be combined before the
667 transform or combine-body is executed.
669 is set to the output file name, it may again be modified.
671 is set to the original name of the first file of
673 with the leading directories and the suffix removed.
675 will be made up of temporary files after the first rule.
677 will be another temporary file or the name of the target file
679 plus the stop suffix), if the stop suffix is reached.
682 is passed to the next rule; it is imploded and checked to be a single word.
683 This driver does not store intermediate object files in the current directory
684 like most other compilers, but keeps them in
686 too. (Who knows if the current directory can have files created in?) As an
687 example, here is how you can express the "normal" method:
705 is not called if the target is already the object file, or you would lose
708 is known to be a word, because
710 is local. (Any string whose substs are all expanded changes to a word.)
711 .SS "Predefined Variables"
712 The driver has three variables predefined:
714 set to the call name of the driver,
716 the driver's version number, and
718 set to the name of the default output architecture. The latter is optional,
721 was compiled with \fB\-DARCH=\e"\fP\fIarch-name\fP\fB\e"\fP.
723 As an example a description file for a C compiler is given. It has a
724 front end (ccom), an intermediate code optimizer (opt), a code generator (cg),
725 an assembler (as), and a loader (ld). The compiler can pre-process, but
726 there is also a separate cpp. If the
728 and options like it are changed to look like
730 then this example is even as required by \s-2POSIX\s+2.
734 # The compiler support search path.
735 C = /lib /usr/lib /usr/local/lib
739 CCOM = $C/ccom $CPP_F
745 # Predefined symbols.
746 CPP_F = \-D__EXAMPLE_CC__
749 LIBPATH = $USERLIBPATH $C
751 # Default transformation target.
754 # Preprocessor directives.
777 # Add debug info to the executable.
781 # Add directories to the library path.
783 USERLIBPATH = $USERLIBPATH $dir
785 # \-llib must be searched in $LIBPATH later.
787 $> = $LIBPATH/lib$lib.a
789 # Change output file.
794 # Complain about a missing argument.
796 error "argument expected after '$\(**'"
798 # Any other option (like \-s) are for the loader.
802 # Preprocess C-source.
806 # Preprocess C-source and send it to standard output or $OUT.
813 # Compile C-source to intermediate code.
818 # Intermediate code optimizer.
822 # Intermediate to assembly.
826 # Assembler to object code.
833 # Combine object files and libraries to an executable.
837 $LD \-o $OUT $C/crtso.o $\(** $C/libc.a
842 .RI /usr/lib/ descr /descr
843 \- compiler driver description file.
847 Even though the end result doesn't look much like it, many ideas were
848 nevertheless derived from the ACK compiler driver by Ed Keizer.
850 \s-2POSIX\s+2 requires that if compiling one source file to an object file
851 fails then the compiler should continue with the next source file. There is
854 can do this, it always stops after error. It doesn't even know what an
855 object file is! (The requirement is stupid anyhow.)
857 If you don't think that tabs are 8 spaces wide, then don't mix them with
858 spaces for indentation.
860 Kees J. Bot (kjb@cs.vu.nl)