4 .if !
\a\\$1
\a\a \&\\$1 \\$2 \\$3 \\$4 \\$5 \\$6 \f1
7 .}S 5 1 \& "\\$1" "\\$2" "\\$3" "\\$4" "\\$5" "\\$6"
10 .}S 1 5 \& "\\$1" "\\$2" "\\$3" "\\$4" "\\$5" "\\$6"
12 .de EX \" start example
28 .SH NAME \" @(#)pp.3 (gsf@research.att.com) 04/01/92
29 pp \- ANSI C preprocessor library
34 %include "pptokens.yacc
40 library provides a tokenizing implementation of the C language preprocessor
41 and supports K&R (Reiser), ANSI and C++ dialects.
42 The preprocessor is comprised of 12 public functions,
43 a global character class table accessed by macros, and
44 a single global struct with 10 public elements.
47 operates in two modes.
49 mode is used to implement the traditional standalone C preprocessor.
51 mode provides a function interface to a stream of preprocessed tokens.
53 is by default ANSI; the only default predefined symbols are
57 Dialects (K&R, C++) and local conventions are determined by
60 information that is included at runtime.
63 information can be overridden by providing a file
65 with pragmas and definitions for each compiler implementation.
66 This file is usually located in the compiler specific
67 default include directory.
69 Directive, command line argument, option and pragma syntax is described in
72 specific semantics are described below.
73 Most semantic differences with standard or classic implementations are in the
74 form of optimizations.
76 Options and pragmas map to
78 function calls described below.
79 For the remaining descriptions,
80 ``setting \f5ppop(PP_\fP\fIoperation\fP\f5)\fP''
81 is a shorthand for calling
83 with the arguments appropriate for
84 \f5PP_\fP\fIoperation\fP.
86 The library interface describes only the public functions and struct elements.
87 Static structs and pointers to structs are provided by the library.
88 The user should not attempt to allocate structs.
97 provides readonly information.
100 must be done using the functions described below.
102 has the following public elements:
107 implementaion version string.
110 The current line sync directive name.
111 Used for standalone line sync output.
112 The default value is the empty string.
118 The current output file name.
121 The pragma pass name for
127 The string representation for the current input token.
140 .L ppop(PP_COMPATIBILITY)
144 Set if standalone line syncs require a file argument.
147 Set if standalone line syncs require a third argument.
148 The third argument is
150 for include file push,
152 for include file pop and null otherwise.
161 .L ppop(PP_TRANSITION)
165 .L "struct ppdirs* lcldirs"
166 The list of directories to be searched for "..." include files.
167 If the first directory name is "" then it is replaced by the
168 directory of the including file at include time.
169 The public elements of
175 The directory pathname.
177 .L "struct ppdirs* next"
180 if it is the last in the list.
183 .L "struct ppdirs* stddirs"
185 is the list of directories to be searched for <...> include files.
189 .L "struct ppsymbol* symbol"
194 points to the symbol table entry for the current identifier token.
196 is undefined for non-identifier tokens.
197 Once defined, an identifier will always have the same
204 is defined for macro and keyword tokens and
206 for all other identifiers.
216 The inclusive or of the following flags:
221 Currently being expanded.
227 Macro expansion currently disabled.
233 Initialization macro.
239 Loaded checkpoint macro.
246 No identifiers in macro body.
263 Variadic function-like macro.
266 First unused symbol flag bit index.
269 on are initially unset and may be set by the user.
273 .L "struct ppmacro* macro"
274 Non-zero if the identifier is a macro.
275 .L "int macro\->arity"
276 is the number of formal arguments for function-like macros and
277 .L "char* macro\->value"
278 is the macro definition value, a
280 terminated string that may contain internal mark sequences.
285 and never modified by
287 This field may be set by the user.
290 .L "Hash_table_t* symtab"
291 The macro and identifier
296 routines may be used to examine the table, with the exception that the
297 following macros must be used for individual
302 .L "struct ppsymbol* ppsymget(Hash_table_t* table, char* name)"
311 .L "struct ppsymbol* ppsymset(Hash_table_t* table, char* name)"
318 is not defined then allocate and return a new
324 Error messages are reported using
326 and the following globals relate to
329 .L "int error_info.errors"
330 The level 2 error count.
331 Error levels above 2 cause immediate exit.
334 is non-zero then the user program exit status should also be non-zero.
336 .L "char* error_info.file"
337 The current input file name.
339 .L "int error_info.line"
340 The current input line number.
342 .L "int error_info.trace"
343 The debug trace level,
346 Larger negative numbers produce more trace information.
347 Enabled when the user program is linked with the
352 .L "int error_info.warnings"
353 The level 1 error count.
354 Warnings do not affect the exit status.
358 .L "extern int ppargs(char** argv, int last);"
363 style options and arguments.
364 The user may also supply application specific option parsers.
365 Also handles non-standard options like the sun
369 Hello in there, ever here of
372 .L "extern void ppcpp(void);"
373 This is the standalone
377 consumes all of the input and writes the preprocessed text to the output.
380 is equivalent to, but more efficient than:
382 ppop(PP_SPACEOUT, 1);
384 ppprintf(" %s", pp.token);
387 .L "extern int ppcomment(char* head, char* comment, char* tail, int line);"
388 The default comment handler that passes comments to the output.
389 May be used as an argument to
390 .LR ppop(PP_COMMENT) ,
391 or the user may supply an application specific handler.
393 is the comment head text,
401 is the comment tail text,
407 is the comment starting line number.
409 .L "extern void pperror(int level, char* format, ...);"
414 error and warning messages pass through
416 The user may link with an application specific
418 to override the library default.
420 .L "extern int ppincref(char* parent, char* file, int line, int push);"
421 The default include reference handler that outputs
423 to the standard error.
424 May be used as an argument to the
425 .LR ppop(PP_INCREF) ,
426 or the user may supply an application specific handler.
428 is the including file name,
430 is the current include file name,
432 is the current line number in
440 if file is being popped.
442 .L "extern void ppinput(char* buffer, char* file, int line);"
451 is the pseudo file name used in line syncs for
455 is the starting line number.
458 Returns the token type of the next input token.
462 are updated to refer to the new token.
463 The token type constants are defined in
472 The token constant names match
474 some are encoded by oring with
478 The numeric constant tokens and encodings are:
480 T_DOUBLE (N_NUMBER|N_REAL)
481 T_DOUBLE_L (N_NUMBER|N_REAL|N_LONG)
482 T_FLOAT (N_NUMBER|N_REAL|N_FLOAT)
484 T_DECIMAL_L (N_NUMBER|N_LONG)
485 T_DECIMAL_U (N_NUMBER|N_UNSIGNED)
486 T_DECIMAL_UL (N_NUMBER|N_UNSIGNED|N_LONG)
487 T_OCTAL (N_NUMBER|N_OCTAL)
488 T_OCTAL_L (N_NUMBER|N_OCTAL|N_LONG)
489 T_OCTAL_U (N_NUMBER|N_OCTAL|N_UNSIGNED)
490 T_OCTAL_UL (N_NUMBER|N_OCTAL|N_UNSIGNED|N_LONG)
491 T_HEXADECIMAL (N_NUMBER|N_HEXADECIMAL)
492 T_HEXADECIMAL_L (N_NUMBER|N_HEXADECIMAL|N_LONG)
493 T_HEXADECIMAL_U (N_NUMBER|N_HEXADECIMAL|N_UNSIGNED)
494 T_HEXADECIMAL_UL (N_NUMBER|N_HEXADECIMAL|N_UNSIGNED|N_LONG)
496 The normal C tokens are:
498 T_ID \fIC identifier\fP
499 T_INVALID \fIinvalid token\fP
528 T_DOTREF .* [\fIif\fP PP_PLUSPLUS]
529 T_PTRMEMREF ->* [\fIif\fP PP_PLUSPLUS]
530 T_SCOPE :: [\fIif\fP PP_PLUSPLUS]
531 T_UMINUS \fIunary minus\fP
535 was set then the keyword tokens are also defined.
536 Compiler differences and dialects are detected by the
539 information, and only the appropriate keywords are enabled.
540 The ANSI keyword tokens are:
542 T_AUTO T_BREAK T_CASE T_CHAR
543 T_CONTINUE T_DEFAULT T_DO T_DOUBLE_T
544 T_ELSE T_EXTERN T_FLOAT_T T_FOR
545 T_GOTO T_IF T_INT T_LONG
546 T_REGISTER T_RETURN T_SHORT T_SIZEOF
547 T_STATIC T_STRUCT T_SWITCH T_TYPEDEF
548 T_UNION T_UNSIGNED T_WHILE T_CONST
549 T_ENUM T_SIGNED T_VOID T_VOLATILE
551 and the C++ keyword tokens are:
553 T_CATCH T_CLASS T_DELETE T_FRIEND
554 T_INLINE T_NEW T_OPERATOR T_OVERLOAD
555 T_PRIVATE T_PROTECTED T_PUBLIC T_TEMPLATE
556 T_THIS T_THROW T_TRY T_VIRTUAL
560 is recognized where appropriate.
561 Additional keyword tokens
564 .LR ppop(PP_COMPILE) .
566 Many C implementations show no restraint in adding new keywords; some
567 PC compilers have tripled the number of keywords.
568 For the most part these new keywords introduce noise constructs that
569 can be ignored for standard
571 analysis and compilation.
572 The noise keywords fall in four syntactic categories that map into the two
580 points to the entire noise construct, including the offending noise keyword.
581 The basic noise keyword categories are:
585 The simplest noise: a single keyword that is noise in any context and maps to
589 A noise keyword that precedes an optional grouping construct, either
597 A noise keyword that consumes the remaining tokens in the line
602 A noise keyword that consumes the tokens up to the next
612 then implementation specific noise constructs are mapped to either
620 then noise constructs are completely ignored,
621 otherwise the unmapped grouping noise tokens
625 Token encodings may be tested by the following macros:
628 .L "int isnumber(int token);"
631 is an integral or floating point numeric constant.
633 .L "int isinteger(int token);"
636 is an integral numeric constant.
638 .L "int isreal(int token);"
641 is a floating point numeric constant.
643 .L "int isassignop(int token);"
646 is a C assignment operator.
648 .L "int isseparate(int token);"
651 must be separated from other tokens by
654 .L "int isnoise(int token);"
660 .L "extern int ppline(int line, char* file);"
661 The default line sync handler that outputs line sync pragmas for the C compiler
663 May be used as an argument to
665 or the user may supply an application specific handler.
667 is the line number and
672 was set then the directive
673 \fB#\fP \fIlineid line \fP"\fIfile\fP" is output.
675 .L "extern int ppmacref(struct ppsymbol* symbol, char* file, int line, int type);"
676 The default macro reference handler that outputs a macro reference pragmas.
677 May be used as an argument to
678 .LR ppop(PP_MACREF) ,
679 or the user may supply an application specific handler.
685 is the reference file,
687 is the reference line,
690 is non-zero a macro value checksum is also output.
692 \fB#pragma pp:macref\fP "\fIsymbol\->name\fP" \fIline checksum\fP.
694 .L "int ppop(int op, ...)"
696 is the option control interface.
698 determines the type(s) of the remaining argument(s).
705 .L "(PP_ASSERT, char* string) /*INIT*/"
710 .L "(PP_BUILTIN, char*(*fun)(char* buf, char* name, char* args)) /*INIT*/"
713 as the unknown builtin macro handler.
714 Builtin macros are of the form
719 set to the unknown builtin macro name and
721 set to the arguments.
725 buffer that can be used for the
729 should be returned on error.
731 .L "(PP_COMMENT,void (*fun)(char*head,char*body,char*tail,int line) /*INIT*/"
733 .L "(PP_COMPATIBILITY, char* string) /*INIT*/"
735 .L "(PP_COMPILE, char* string) /*INIT*/"
737 .L "(PP_DEBUG, char* string) /*INIT*/"
739 .L "(PP_DEFAULT, char* string) /*INIT*/"
741 .L "(PP_DEFINE, char* string) /*INIT*/"
746 .L "(PP_DIRECTIVE, char* string) /*INIT*/"
751 .L "(PP_DONE, char* string) /*INIT*/"
753 .L "(PP_DUMP, char* string) /*INIT*/"
755 .L "(PP_FILEDEPS, char* string) /*INIT*/"
757 .L "(PP_FILENAME, char* string) /*INIT*/"
759 .L "(PP_HOSTDIR, char* string) /*INIT*/"
761 .L "(PP_HOSTED, char* string) /*INIT*/"
763 .L "(PP_ID, char* string) /*INIT*/"
765 .L "(PP_IGNORE, char* string) /*INIT*/"
767 .L "(PP_INCLUDE, char* string) /*INIT*/"
769 .L "(PP_INCREF, char* string) /*INIT*/"
771 .L "(PP_INIT, char* string) /*INIT*/"
773 .L "(PP_INPUT, char* string) /*INIT*/"
775 .L "(PP_LINE, char* string) /*INIT*/"
777 .L "(PP_LINEFILE, char* string) /*INIT*/"
779 .L "(PP_LINEID, char* string) /*INIT*/"
781 .L "(PP_LINETYPE, char* string) /*INIT*/"
783 .L "(PP_LOCAL, char* string) /*INIT*/"
785 .L "(PP_MACREF, char* string) /*INIT*/"
787 .L "(PP_MULTIPLE, char* string) /*INIT*/"
789 .L "(PP_NOHASH, char* string) /*INIT*/"
791 .L "(PP_NOID, char* string) /*INIT*/"
793 .L "(PP_NOISE, char* string) /*INIT*/"
795 .L "(PP_OPTION, char* string) /*INIT*/"
797 \fB#pragma pp:\fP\fIstring\fP
800 .L "(PP_OPTARG, char* string) /*INIT*/"
802 .L "(PP_OUTPUT, char* string) /*INIT*/"
804 .L "(PP_PASSNEWLINE, char* string) /*INIT*/"
806 .L "(PP_PASSTHROUGH, char* string) /*INIT*/"
808 .L "(PP_PLUSPLUS, char* string) /*INIT*/"
810 .L "(PP_PRAGMA, char* string) /*INIT*/"
812 .L "(PP_PREFIX, char* string) /*INIT*/"
814 .L "(PP_PROBE, char* string) /*INIT*/"
816 .L "(PP_READ, char* string) /*INIT*/"
818 .L "(PP_RESERVED, char* string) /*INIT*/"
820 .L "(PP_SPACEOUT, char* string) /*INIT*/"
822 .L "(PP_STANDALONE, char* string) /*INIT*/"
824 .L "(PP_STANDARD, char* string) /*INIT*/"
826 .L "(PP_STRICT, char* string) /*INIT*/"
828 .L "(PP_TEST, char* string) /*INIT*/"
830 .L "(PP_TRUNCATE, char* string) /*INIT*/"
832 .L "(PP_UNDEF, char* string) /*INIT*/"
834 .L "(PP_WARN, char* string) /*INIT*/"
837 .L "int pppragma(char* dir, char* pass, char* name, char* value, int nl);"
838 The default handler that
839 copies unknown directives and pragmas to the output.
840 May be used as an argument to
841 .LR ppop(PP_PRAGMA) ,
842 or the user may supply an application specific handler.
843 This function is most often called after directive and pragma mapping.
844 Any of the arguments may be
847 is the directive name,
849 is the pragma pass name,
851 is the pragma option name,
853 is the pragma option value, and
856 if a trailing newline is required if the pragma is copied to the output.
858 .L "int ppprintf(char* format, ...);"
861 interface to the standalone
864 Macros provide limited control over output buffering:
865 .L "void ppflushout()"
866 flushes the output buffer,
867 .L "void ppcheckout()"
868 flushes the output buffer if over
870 character are buffered,
872 returns the number of pending character in the output buffer, and
873 .L "void ppputchar(int c)"
876 in the output buffer.
878 The ANSI mode is intended to be true to the standard.
879 The compatibility mode has been proven in practice, but there are
880 surely dark corners of some implementations that may have been omitted.
882 cc(1), cpp(1), nmake(1), probe(1), yacc(1),
884 ast(3), error(3), hash(3), optjoin(3)
888 (Dennis Ritchie provided the original table driven lexer.)
890 AT&T Bell Laboratories