commands/flex-2.5.4/MISC/texinfo/flex.texi

   1 \input texinfo
   2 @c %**start of header
   3 @setfilename flex.info
   4 @settitle Flex - a scanner generator
   5 @c @finalout
   6 @c @setchapternewpage odd
   7 @c %**end of header
   8
   9 @set EDITION 2.5
  10 @set UPDATED March 1995
  11 @set VERSION 2.5
  12
  13 @c FIXME - Reread a printed copy with a red pen and patience.
  14 @c FIXME - Modify all "See ..." references and replace with @xref's.
  15
  16 @ifinfo
  17 @format
  18 START-INFO-DIR-ENTRY
  19 * Flex: (flex).         A fast scanner generator.
  20 END-INFO-DIR-ENTRY
  21 @end format
  22 @end ifinfo
  23
  24 @c Define new indices for commands, filenames, and options.
  25 @c @defcodeindex cm
  26 @c @defcodeindex fl
  27 @c @defcodeindex op
  28
  29 @c Put everything in one index (arbitrarily chosen to be the concept index).
  30 @c @syncodeindex cm cp
  31 @c @syncodeindex fl cp
  32 @syncodeindex fn cp
  33 @syncodeindex ky cp
  34 @c @syncodeindex op cp
  35 @syncodeindex pg cp
  36 @syncodeindex vr cp
  37
  38 @ifinfo
  39 This file documents Flex.
  40
  41 Copyright (c) 1990 The Regents of the University of California.
  42 All rights reserved.
  43
  44 This code is derived from software contributed to Berkeley by
  45 Vern Paxson.
  46
  47 The United States Government has rights in this work pursuant
  48 to contract no. DE-AC03-76SF00098 between the United States
  49 Department of Energy and the University of California.
  50
  51 Redistribution and use in source and binary forms with or without
  52 modification are permitted provided that: (1) source distributions
  53 retain this entire copyright notice and comment, and (2)
  54 distributions including binaries display the following
  55 acknowledgement:  ``This product includes software developed by the
  56 University of California, Berkeley and its contributors'' in the
  57 documentation or other materials provided with the distribution and
  58 in all advertising materials mentioning features or use of this
  59 software.  Neither the name of the University nor the names of its
  60 contributors may be used to endorse or promote products derived
  61 from this software without specific prior written permission.
  62
  63 THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
  64 IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
  65 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
  66 PURPOSE.
  67
  68 @ignore
  69 Permission is granted to process this file through TeX and print the
  70 results, provided the printed document carries copying permission
  71 notice identical to this one except for the removal of this paragraph
  72 (this paragraph not being relevant to the printed manual).
  73
  74 @end ignore
  75 @end ifinfo
  76
  77 @titlepage
  78 @title Flex, version @value{VERSION}
  79 @subtitle A fast scanner generator
  80 @subtitle Edition @value{EDITION}, @value{UPDATED}
  81 @author Vern Paxson
  82
  83 @page
  84 @vskip 0pt plus 1filll
  85 Copyright @copyright{} 1990 The Regents of the University of California.
  86 All rights reserved.
  87
  88 This code is derived from software contributed to Berkeley by
  89 Vern Paxson.
  90
  91 The United States Government has rights in this work pursuant
  92 to contract no. DE-AC03-76SF00098 between the United States
  93 Department of Energy and the University of California.
  94
  95 Redistribution and use in source and binary forms with or without
  96 modification are permitted provided that: (1) source distributions
  97 retain this entire copyright notice and comment, and (2)
  98 distributions including binaries display the following
  99 acknowledgement:  ``This product includes software developed by the
 100 University of California, Berkeley and its contributors'' in the
 101 documentation or other materials provided with the distribution and
 102 in all advertising materials mentioning features or use of this
 103 software.  Neither the name of the University nor the names of its
 104 contributors may be used to endorse or promote products derived
 105 from this software without specific prior written permission.
 106
 107 THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR
 108 IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
 109 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
 110 PURPOSE.
 111 @end titlepage
 112
 113 @ifinfo
 114
 115 @node Top, Name, (dir), (dir)
 116 @top flex
 117
 118 @cindex scanner generator
 119
 120 This manual documents @code{flex}.  It covers release @value{VERSION}.
 121
 122 @menu
 123 * Name::                        Name
 124 * Synopsis::                    Synopsis
 125 * Overview::                    Overview
 126 * Description::                 Description
 127 * Examples::                    Some simple examples
 128 * Format::                      Format of the input file
 129 * Patterns::                    Patterns
 130 * Matching::                    How the input is matched
 131 * Actions::                     Actions
 132 * Generated scanner::           The generated scanner
 133 * Start conditions::            Start conditions
 134 * Multiple buffers::            Multiple input buffers
 135 * End-of-file rules::           End-of-file rules
 136 * Miscellaneous::               Miscellaneous macros
 137 * User variables::              Values available to the user
 138 * YACC interface::              Interfacing with @code{yacc}
 139 * Options::                     Options
 140 * Performance::                 Performance considerations
 141 * C++::                         Generating C++ scanners
 142 * Incompatibilities::           Incompatibilities with @code{lex} and POSIX
 143 * Diagnostics::                 Diagnostics
 144 * Files::                       Files
 145 * Deficiencies::                Deficiencies / Bugs
 146 * See also::                    See also
 147 * Author::                      Author
 148 @c * Index::                       Index
 149 @end menu
 150
 151 @end ifinfo
 152
 153 @node Name, Synopsis, Top, Top
 154 @section Name
 155
 156 flex - fast lexical analyzer generator
 157
 158 @node Synopsis, Overview, Name, Top
 159 @section Synopsis
 160
 161 @example
 162 flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput -Pprefix -Sskeleton]
 163 [--help --version] [@var{filename} @dots{}]
 164 @end example
 165
 166 @node Overview, Description, Synopsis, Top
 167 @section Overview
 168
 169 This manual describes @code{flex}, a tool for generating programs
 170 that perform pattern-matching on text.  The manual
 171 includes both tutorial and reference sections:
 172
 173 @table @asis
 174 @item Description
 175 a brief overview of the tool
 176
 177 @item Some Simple Examples
 178
 179 @item Format Of The Input File
 180
 181 @item Patterns
 182 the extended regular expressions used by flex
 183
 184 @item How The Input Is Matched
 185 the rules for determining what has been matched
 186
 187 @item Actions
 188 how to specify what to do when a pattern is matched
 189
 190 @item The Generated Scanner
 191 details regarding the scanner that flex produces;
 192 how to control the input source
 193
 194 @item Start Conditions
 195 introducing context into your scanners, and
 196 managing "mini-scanners"
 197
 198 @item Multiple Input Buffers
 199 how to manipulate multiple input sources; how to
 200 scan from strings instead of files
 201
 202 @item End-of-file Rules
 203 special rules for matching the end of the input
 204
 205 @item Miscellaneous Macros
 206 a summary of macros available to the actions
 207
 208 @item Values Available To The User
 209 a summary of values available to the actions
 210
 211 @item Interfacing With Yacc
 212 connecting flex scanners together with yacc parsers
 213
 214 @item Options
 215 flex command-line options, and the "%option"
 216 directive
 217
 218 @item Performance Considerations
 219 how to make your scanner go as fast as possible
 220
 221 @item Generating C++ Scanners
 222 the (experimental) facility for generating C++
 223 scanner classes
 224
 225 @item Incompatibilities With Lex And POSIX
 226 how flex differs from AT&T lex and the POSIX lex
 227 standard
 228
 229 @item Diagnostics
 230 those error messages produced by flex (or scanners
 231 it generates) whose meanings might not be apparent
 232
 233 @item Files
 234 files used by flex
 235
 236 @item Deficiencies / Bugs
 237 known problems with flex
 238
 239 @item See Also
 240 other documentation, related tools
 241
 242 @item Author
 243 includes contact information
 244 @end table
 245
 246 @node Description, Examples, Overview, Top
 247 @section Description
 248
 249 @code{flex} is a tool for generating @dfn{scanners}: programs which
 250 recognized lexical patterns in text.  @code{flex} reads the given
 251 input files, or its standard input if no file names are
 252 given, for a description of a scanner to generate.  The
 253 description is in the form of pairs of regular expressions
 254 and C code, called @dfn{rules}. @code{flex} generates as output a C
 255 source file, @file{lex.yy.c}, which defines a routine @samp{yylex()}.
 256 This file is compiled and linked with the @samp{-lfl} library to
 257 produce an executable.  When the executable is run, it
 258 analyzes its input for occurrences of the regular
 259 expressions.  Whenever it finds one, it executes the
 260 corresponding C code.
 261
 262 @node Examples, Format, Description, Top
 263 @section Some simple examples
 264
 265 First some simple examples to get the flavor of how one
 266 uses @code{flex}.  The following @code{flex} input specifies a scanner
 267 which whenever it encounters the string "username" will
 268 replace it with the user's login name:
 269
 270 @example
 271 %%
 272 username    printf( "%s", getlogin() );
 273 @end example
 274
 275 By default, any text not matched by a @code{flex} scanner is
 276 copied to the output, so the net effect of this scanner is
 277 to copy its input file to its output with each occurrence
 278 of "username" expanded.  In this input, there is just one
 279 rule.  "username" is the @var{pattern} and the "printf" is the
 280 @var{action}.  The "%%" marks the beginning of the rules.
 281
 282 Here's another simple example:
 283
 284 @example
 285         int num_lines = 0, num_chars = 0;
 286
 287 %%
 288 \n      ++num_lines; ++num_chars;
 289 .       ++num_chars;
 290
 291 %%
 292 main()
 293         @{
 294         yylex();
 295         printf( "# of lines = %d, # of chars = %d\n",
 296                 num_lines, num_chars );
 297         @}
 298 @end example
 299
 300 This scanner counts the number of characters and the
 301 number of lines in its input (it produces no output other
 302 than the final report on the counts).  The first line
 303 declares two globals, "num_lines" and "num_chars", which
 304 are accessible both inside @samp{yylex()} and in the @samp{main()}
 305 routine declared after the second "%%".  There are two rules,
 306 one which matches a newline ("\n") and increments both the
 307 line count and the character count, and one which matches
 308 any character other than a newline (indicated by the "."
 309 regular expression).
 310
 311 A somewhat more complicated example:
 312
 313 @example
 314 /* scanner for a toy Pascal-like language */
 315
 316 %@{
 317 /* need this for the call to atof() below */
 318 #include <math.h>
 319 %@}
 320
 321 DIGIT    [0-9]
 322 ID       [a-z][a-z0-9]*
 323
 324 %%
 325
 326 @{DIGIT@}+    @{
 327             printf( "An integer: %s (%d)\n", yytext,
 328                     atoi( yytext ) );
 329             @}
 330
 331 @{DIGIT@}+"."@{DIGIT@}*        @{
 332             printf( "A float: %s (%g)\n", yytext,
 333                     atof( yytext ) );
 334             @}
 335
 336 if|then|begin|end|procedure|function        @{
 337             printf( "A keyword: %s\n", yytext );
 338             @}
 339
 340 @{ID@}        printf( "An identifier: %s\n", yytext );
 341
 342 "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
 343
 344 "@{"[^@}\n]*"@}"     /* eat up one-line comments */
 345
 346 [ \t\n]+          /* eat up whitespace */
 347
 348 .           printf( "Unrecognized character: %s\n", yytext );
 349
 350 %%
 351
 352 main( argc, argv )
 353 int argc;
 354 char **argv;
 355     @{
 356     ++argv, --argc;  /* skip over program name */
 357     if ( argc > 0 )
 358             yyin = fopen( argv[0], "r" );
 359     else
 360             yyin = stdin;
 361
 362     yylex();
 363     @}
 364 @end example
 365
 366 This is the beginnings of a simple scanner for a language
 367 like Pascal.  It identifies different types of @var{tokens} and
 368 reports on what it has seen.
 369
 370 The details of this example will be explained in the
 371 following sections.
 372
 373 @node Format, Patterns, Examples, Top
 374 @section Format of the input file
 375
 376 The @code{flex} input file consists of three sections, separated
 377 by a line with just @samp{%%} in it:
 378
 379 @example
 380 definitions
 381 %%
 382 rules
 383 %%
 384 user code
 385 @end example
 386
 387 The @dfn{definitions} section contains declarations of simple
 388 @dfn{name} definitions to simplify the scanner specification,
 389 and declarations of @dfn{start conditions}, which are explained
 390 in a later section.
 391 Name definitions have the form:
 392
 393 @example
 394 name definition
 395 @end example
 396
 397 The "name" is a word beginning with a letter or an
 398 underscore ('_') followed by zero or more letters, digits, '_',
 399 or '-' (dash).  The definition is taken to begin at the
 400 first non-white-space character following the name and
 401 continuing to the end of the line.  The definition can
 402 subsequently be referred to using "@{name@}", which will
 403 expand to "(definition)".  For example,
 404
 405 @example
 406 DIGIT    [0-9]
 407 ID       [a-z][a-z0-9]*
 408 @end example
 409
 410 @noindent
 411 defines "DIGIT" to be a regular expression which matches a
 412 single digit, and "ID" to be a regular expression which
 413 matches a letter followed by zero-or-more
 414 letters-or-digits.  A subsequent reference to
 415
 416 @example
 417 @{DIGIT@}+"."@{DIGIT@}*
 418 @end example
 419
 420 @noindent
 421 is identical to
 422
 423 @example
 424 ([0-9])+"."([0-9])*
 425 @end example
 426
 427 @noindent
 428 and matches one-or-more digits followed by a '.' followed
 429 by zero-or-more digits.
 430
 431 The @var{rules} section of the @code{flex} input contains a series of
 432 rules of the form:
 433
 434 @example
 435 pattern   action
 436 @end example
 437
 438 @noindent
 439 where the pattern must be unindented and the action must
 440 begin on the same line.
 441
 442 See below for a further description of patterns and
 443 actions.
 444
 445 Finally, the user code section is simply copied to
 446 @file{lex.yy.c} verbatim.  It is used for companion routines
 447 which call or are called by the scanner.  The presence of
 448 this section is optional; if it is missing, the second @samp{%%}
 449 in the input file may be skipped, too.
 450
 451 In the definitions and rules sections, any @emph{indented} text or
 452 text enclosed in @samp{%@{} and @samp{%@}} is copied verbatim to the
 453 output (with the @samp{%@{@}}'s removed).  The @samp{%@{@}}'s must
 454 appear unindented on lines by themselves.
 455
 456 In the rules section, any indented or %@{@} text appearing
 457 before the first rule may be used to declare variables
 458 which are local to the scanning routine and (after the
 459 declarations) code which is to be executed whenever the
 460 scanning routine is entered.  Other indented or %@{@} text
 461 in the rule section is still copied to the output, but its
 462 meaning is not well-defined and it may well cause
 463 compile-time errors (this feature is present for @code{POSIX} compliance;
 464 see below for other such features).
 465
 466 In the definitions section (but not in the rules section),
 467 an unindented comment (i.e., a line beginning with "/*")
 468 is also copied verbatim to the output up to the next "*/".
 469
 470 @node Patterns, Matching, Format, Top
 471 @section Patterns
 472
 473 The patterns in the input are written using an extended
 474 set of regular expressions.  These are:
 475
 476 @table @samp
 477 @item x
 478 match the character @samp{x}
 479 @item .
 480 any character (byte) except newline
 481 @item [xyz]
 482 a "character class"; in this case, the pattern
 483 matches either an @samp{x}, a @samp{y}, or a @samp{z}
 484 @item [abj-oZ]
 485 a "character class" with a range in it; matches
 486 an @samp{a}, a @samp{b}, any letter from @samp{j} through @samp{o},
 487 or a @samp{Z}
 488 @item [^A-Z]
 489 a "negated character class", i.e., any character
 490 but those in the class.  In this case, any
 491 character EXCEPT an uppercase letter.
 492 @item [^A-Z\n]
 493 any character EXCEPT an uppercase letter or
 494 a newline
 495 @item @var{r}*
 496 zero or more @var{r}'s, where @var{r} is any regular expression
 497 @item @var{r}+
 498 one or more @var{r}'s
 499 @item @var{r}?
 500 zero or one @var{r}'s (that is, "an optional @var{r}")
 501 @item @var{r}@{2,5@}
 502 anywhere from two to five @var{r}'s
 503 @item @var{r}@{2,@}
 504 two or more @var{r}'s
 505 @item @var{r}@{4@}
 506 exactly 4 @var{r}'s
 507 @item @{@var{name}@}
 508 the expansion of the "@var{name}" definition
 509 (see above)
 510 @item "[xyz]\"foo"
 511 the literal string: @samp{[xyz]"foo}
 512 @item \@var{x}
 513 if @var{x} is an @samp{a}, @samp{b}, @samp{f}, @samp{n}, @samp{r}, @samp{t}, or @samp{v},
 514 then the ANSI-C interpretation of \@var{x}.
 515 Otherwise, a literal @samp{@var{x}} (used to escape
 516 operators such as @samp{*})
 517 @item \0
 518 a NUL character (ASCII code 0)
 519 @item \123
 520 the character with octal value 123
 521 @item \x2a
 522 the character with hexadecimal value @code{2a}
 523 @item (@var{r})
 524 match an @var{r}; parentheses are used to override
 525 precedence (see below)
 526 @item @var{r}@var{s}
 527 the regular expression @var{r} followed by the
 528 regular expression @var{s}; called "concatenation"
 529 @item @var{r}|@var{s}
 530 either an @var{r} or an @var{s}
 531 @item @var{r}/@var{s}
 532 an @var{r} but only if it is followed by an @var{s}.  The text
 533 matched by @var{s} is included when determining whether this rule is
 534 the @dfn{longest match}, but is then returned to the input before
 535 the action is executed.  So the action only sees the text matched
 536 by @var{r}.  This type of pattern is called @dfn{trailing context}.
 537 (There are some combinations of @samp{@var{r}/@var{s}} that @code{flex}
 538 cannot match correctly; see notes in the Deficiencies / Bugs section
 539 below regarding "dangerous trailing context".)
 540 @item ^@var{r}
 541 an @var{r}, but only at the beginning of a line (i.e.,
 542 which just starting to scan, or right after a
 543 newline has been scanned).
 544 @item @var{r}$
 545 an @var{r}, but only at the end of a line (i.e., just
 546 before a newline).  Equivalent to "@var{r}/\n".
 547
 548 Note that flex's notion of "newline" is exactly
 549 whatever the C compiler used to compile flex
 550 interprets '\n' as; in particular, on some DOS
 551 systems you must either filter out \r's in the
 552 input yourself, or explicitly use @var{r}/\r\n for "r$".
 553 @item <@var{s}>@var{r}
 554 an @var{r}, but only in start condition @var{s} (see
 555 below for discussion of start conditions)
 556 <@var{s1},@var{s2},@var{s3}>@var{r}
 557 same, but in any of start conditions @var{s1},
 558 @var{s2}, or @var{s3}
 559 @item <*>@var{r}
 560 an @var{r} in any start condition, even an exclusive one.
 561 @item <<EOF>>
 562 an end-of-file
 563 <@var{s1},@var{s2}><<EOF>>
 564 an end-of-file when in start condition @var{s1} or @var{s2}
 565 @end table
 566
 567 Note that inside of a character class, all regular
 568 expression operators lose their special meaning except escape
 569 ('\') and the character class operators, '-', ']', and, at
 570 the beginning of the class, '^'.
 571
 572 The regular expressions listed above are grouped according
 573 to precedence, from highest precedence at the top to
 574 lowest at the bottom.  Those grouped together have equal
 575 precedence.  For example,
 576
 577 @example
 578 foo|bar*
 579 @end example
 580
 581 @noindent
 582 is the same as
 583
 584 @example
 585 (foo)|(ba(r*))
 586 @end example
 587
 588 @noindent
 589 since the '*' operator has higher precedence than
 590 concatenation, and concatenation higher than alternation ('|').
 591 This pattern therefore matches @emph{either} the string "foo" @emph{or}
 592 the string "ba" followed by zero-or-more r's.  To match
 593 "foo" or zero-or-more "bar"'s, use:
 594
 595 @example
 596 foo|(bar)*
 597 @end example
 598
 599 @noindent
 600 and to match zero-or-more "foo"'s-or-"bar"'s:
 601
 602 @example
 603 (foo|bar)*
 604 @end example
 605
 606 In addition to characters and ranges of characters,
 607 character classes can also contain character class
 608 @dfn{expressions}.  These are expressions enclosed inside @samp{[}: and @samp{:}]
 609 delimiters (which themselves must appear between the '['
 610 and ']' of the character class; other elements may occur
 611 inside the character class, too).  The valid expressions
 612 are:
 613
 614 @example
 615 [:alnum:] [:alpha:] [:blank:]
 616 [:cntrl:] [:digit:] [:graph:]
 617 [:lower:] [:print:] [:punct:]
 618 [:space:] [:upper:] [:xdigit:]
 619 @end example
 620
 621 These expressions all designate a set of characters
 622 equivalent to the corresponding standard C @samp{isXXX} function.  For
 623 example, @samp{[:alnum:]} designates those characters for which
 624 @samp{isalnum()} returns true - i.e., any alphabetic or numeric.
 625 Some systems don't provide @samp{isblank()}, so flex defines
 626 @samp{[:blank:]} as a blank or a tab.
 627
 628 For example, the following character classes are all
 629 equivalent:
 630
 631 @example
 632 [[:alnum:]]
 633 [[:alpha:][:digit:]
 634 [[:alpha:]0-9]
 635 [a-zA-Z0-9]
 636 @end example
 637
 638 If your scanner is case-insensitive (the @samp{-i} flag), then
 639 @samp{[:upper:]} and @samp{[:lower:]} are equivalent to @samp{[:alpha:]}.
 640
 641 Some notes on patterns:
 642
 643 @itemize -
 644 @item
 645 A negated character class such as the example
 646 "[^A-Z]" above @emph{will match a newline} unless "\n" (or an
 647 equivalent escape sequence) is one of the
 648 characters explicitly present in the negated character
 649 class (e.g., "[^A-Z\n]").  This is unlike how many
 650 other regular expression tools treat negated
 651 character classes, but unfortunately the inconsistency
 652 is historically entrenched.  Matching newlines
 653 means that a pattern like [^"]* can match the
 654 entire input unless there's another quote in the
 655 input.
 656
 657 @item
 658 A rule can have at most one instance of trailing
 659 context (the '/' operator or the '$' operator).
 660 The start condition, '^', and "<<EOF>>" patterns
 661 can only occur at the beginning of a pattern, and,
 662 as well as with '/' and '$', cannot be grouped
 663 inside parentheses.  A '^' which does not occur at
 664 the beginning of a rule or a '$' which does not
 665 occur at the end of a rule loses its special
 666 properties and is treated as a normal character.
 667
 668 The following are illegal:
 669
 670 @example
 671 foo/bar$
 672 <sc1>foo<sc2>bar
 673 @end example
 674
 675 Note that the first of these, can be written
 676 "foo/bar\n".
 677
 678 The following will result in '$' or '^' being
 679 treated as a normal character:
 680
 681 @example
 682 foo|(bar$)
 683 foo|^bar
 684 @end example
 685
 686 If what's wanted is a "foo" or a
 687 bar-followed-by-a-newline, the following could be used (the special
 688 '|' action is explained below):
 689
 690 @example
 691 foo      |
 692 bar$     /* action goes here */
 693 @end example
 694
 695 A similar trick will work for matching a foo or a
 696 bar-at-the-beginning-of-a-line.
 697 @end itemize
 698
 699 @node Matching, Actions, Patterns, Top
 700 @section How the input is matched
 701
 702 When the generated scanner is run, it analyzes its input
 703 looking for strings which match any of its patterns.  If
 704 it finds more than one match, it takes the one matching
 705 the most text (for trailing context rules, this includes
 706 the length of the trailing part, even though it will then
 707 be returned to the input).  If it finds two or more
 708 matches of the same length, the rule listed first in the
 709 @code{flex} input file is chosen.
 710
 711 Once the match is determined, the text corresponding to
 712 the match (called the @var{token}) is made available in the
 713 global character pointer @code{yytext}, and its length in the
 714 global integer @code{yyleng}.  The @var{action} corresponding to the
 715 matched pattern is then executed (a more detailed
 716 description of actions follows), and then the remaining input is
 717 scanned for another match.
 718
 719 If no match is found, then the @dfn{default rule} is executed:
 720 the next character in the input is considered matched and
 721 copied to the standard output.  Thus, the simplest legal
 722 @code{flex} input is:
 723
 724 @example
 725 %%
 726 @end example
 727
 728 which generates a scanner that simply copies its input
 729 (one character at a time) to its output.
 730
 731 Note that @code{yytext} can be defined in two different ways:
 732 either as a character @emph{pointer} or as a character @emph{array}.
 733 You can control which definition @code{flex} uses by including
 734 one of the special directives @samp{%pointer} or @samp{%array} in the
 735 first (definitions) section of your flex input.  The
 736 default is @samp{%pointer}, unless you use the @samp{-l} lex
 737 compatibility option, in which case @code{yytext} will be an array.  The
 738 advantage of using @samp{%pointer} is substantially faster
 739 scanning and no buffer overflow when matching very large
 740 tokens (unless you run out of dynamic memory).  The
 741 disadvantage is that you are restricted in how your actions can
 742 modify @code{yytext} (see the next section), and calls to the
 743 @samp{unput()} function destroys the present contents of @code{yytext},
 744 which can be a considerable porting headache when moving
 745 between different @code{lex} versions.
 746
 747 The advantage of @samp{%array} is that you can then modify @code{yytext}
 748 to your heart's content, and calls to @samp{unput()} do not
 749 destroy @code{yytext} (see below).  Furthermore, existing @code{lex}
 750 programs sometimes access @code{yytext} externally using
 751 declarations of the form:
 752 @example
 753 extern char yytext[];
 754 @end example
 755 This definition is erroneous when used with @samp{%pointer}, but
 756 correct for @samp{%array}.
 757
 758 @samp{%array} defines @code{yytext} to be an array of @code{YYLMAX} characters,
 759 which defaults to a fairly large value.  You can change
 760 the size by simply #define'ing @code{YYLMAX} to a different value
 761 in the first section of your @code{flex} input.  As mentioned
 762 above, with @samp{%pointer} yytext grows dynamically to
 763 accommodate large tokens.  While this means your @samp{%pointer} scanner
 764 can accommodate very large tokens (such as matching entire
 765 blocks of comments), bear in mind that each time the
 766 scanner must resize @code{yytext} it also must rescan the entire
 767 token from the beginning, so matching such tokens can
 768 prove slow.  @code{yytext} presently does @emph{not} dynamically grow if
 769 a call to @samp{unput()} results in too much text being pushed
 770 back; instead, a run-time error results.
 771
 772 Also note that you cannot use @samp{%array} with C++ scanner
 773 classes (the @code{c++} option; see below).
 774
 775 @node Actions, Generated scanner, Matching, Top
 776 @section Actions
 777
 778 Each pattern in a rule has a corresponding action, which
 779 can be any arbitrary C statement.  The pattern ends at the
 780 first non-escaped whitespace character; the remainder of
 781 the line is its action.  If the action is empty, then when
 782 the pattern is matched the input token is simply
 783 discarded.  For example, here is the specification for a
 784 program which deletes all occurrences of "zap me" from its
 785 input:
 786
 787 @example
 788 %%
 789 "zap me"
 790 @end example
 791
 792 (It will copy all other characters in the input to the
 793 output since they will be matched by the default rule.)
 794
 795 Here is a program which compresses multiple blanks and
 796 tabs down to a single blank, and throws away whitespace
 797 found at the end of a line:
 798
 799 @example
 800 %%
 801 [ \t]+        putchar( ' ' );
 802 [ \t]+$       /* ignore this token */
 803 @end example
 804
 805 If the action contains a '@{', then the action spans till
 806 the balancing '@}' is found, and the action may cross
 807 multiple lines.  @code{flex} knows about C strings and comments and
 808 won't be fooled by braces found within them, but also
 809 allows actions to begin with @samp{%@{} and will consider the
 810 action to be all the text up to the next @samp{%@}} (regardless of
 811 ordinary braces inside the action).
 812
 813 An action consisting solely of a vertical bar ('|') means
 814 "same as the action for the next rule." See below for an
 815 illustration.
 816
 817 Actions can include arbitrary C code, including @code{return}
 818 statements to return a value to whatever routine called
 819 @samp{yylex()}.  Each time @samp{yylex()} is called it continues
 820 processing tokens from where it last left off until it either
 821 reaches the end of the file or executes a return.
 822
 823 Actions are free to modify @code{yytext} except for lengthening
 824 it (adding characters to its end--these will overwrite
 825 later characters in the input stream).  This however does
 826 not apply when using @samp{%array} (see above); in that case,
 827 @code{yytext} may be freely modified in any way.
 828
 829 Actions are free to modify @code{yyleng} except they should not
 830 do so if the action also includes use of @samp{yymore()} (see
 831 below).
 832
 833 There are a number of special directives which can be
 834 included within an action:
 835
 836 @itemize -
 837 @item
 838 @samp{ECHO} copies yytext to the scanner's output.
 839
 840 @item
 841 @code{BEGIN} followed by the name of a start condition
 842 places the scanner in the corresponding start
 843 condition (see below).
 844
 845 @item
 846 @code{REJECT} directs the scanner to proceed on to the
 847 "second best" rule which matched the input (or a
 848 prefix of the input).  The rule is chosen as
 849 described above in "How the Input is Matched", and
 850 @code{yytext} and @code{yyleng} set up appropriately.  It may
 851 either be one which matched as much text as the
 852 originally chosen rule but came later in the @code{flex}
 853 input file, or one which matched less text.  For
 854 example, the following will both count the words in
 855 the input and call the routine special() whenever
 856 "frob" is seen:
 857
 858 @example
 859         int word_count = 0;
 860 %%
 861
 862 frob        special(); REJECT;
 863 [^ \t\n]+   ++word_count;
 864 @end example
 865
 866 Without the @code{REJECT}, any "frob"'s in the input would
 867 not be counted as words, since the scanner normally
 868 executes only one action per token.  Multiple
 869 @code{REJECT's} are allowed, each one finding the next
 870 best choice to the currently active rule.  For
 871 example, when the following scanner scans the token
 872 "abcd", it will write "abcdabcaba" to the output:
 873
 874 @example
 875 %%
 876 a        |
 877 ab       |
 878 abc      |
 879 abcd     ECHO; REJECT;
 880 .|\n     /* eat up any unmatched character */
 881 @end example
 882
 883 (The first three rules share the fourth's action
 884 since they use the special '|' action.)  @code{REJECT} is
 885 a particularly expensive feature in terms of
 886 scanner performance; if it is used in @emph{any} of the
 887 scanner's actions it will slow down @emph{all} of the
 888 scanner's matching.  Furthermore, @code{REJECT} cannot be used
 889 with the @samp{-Cf} or @samp{-CF} options (see below).
 890
 891 Note also that unlike the other special actions,
 892 @code{REJECT} is a @emph{branch}; code immediately following it
 893 in the action will @emph{not} be executed.
 894
 895 @item
 896 @samp{yymore()} tells the scanner that the next time it
 897 matches a rule, the corresponding token should be
 898 @emph{appended} onto the current value of @code{yytext} rather
 899 than replacing it.  For example, given the input
 900 "mega-kludge" the following will write
 901 "mega-mega-kludge" to the output:
 902
 903 @example
 904 %%
 905 mega-    ECHO; yymore();
 906 kludge   ECHO;
 907 @end example
 908
 909 First "mega-" is matched and echoed to the output.
 910 Then "kludge" is matched, but the previous "mega-"
 911 is still hanging around at the beginning of @code{yytext}
 912 so the @samp{ECHO} for the "kludge" rule will actually
 913 write "mega-kludge".
 914 @end itemize
 915
 916 Two notes regarding use of @samp{yymore()}.  First, @samp{yymore()}
 917 depends on the value of @code{yyleng} correctly reflecting the
 918 size of the current token, so you must not modify @code{yyleng}
 919 if you are using @samp{yymore()}.  Second, the presence of
 920 @samp{yymore()} in the scanner's action entails a minor
 921 performance penalty in the scanner's matching speed.
 922
 923 @itemize -
 924 @item
 925 @samp{yyless(n)} returns all but the first @var{n} characters of
 926 the current token back to the input stream, where
 927 they will be rescanned when the scanner looks for
 928 the next match.  @code{yytext} and @code{yyleng} are adjusted
 929 appropriately (e.g., @code{yyleng} will now be equal to @var{n}
 930 ).  For example, on the input "foobar" the
 931 following will write out "foobarbar":
 932
 933 @example
 934 %%
 935 foobar    ECHO; yyless(3);
 936 [a-z]+    ECHO;
 937 @end example
 938
 939 An argument of 0 to @code{yyless} will cause the entire
 940 current input string to be scanned again.  Unless
 941 you've changed how the scanner will subsequently
 942 process its input (using @code{BEGIN}, for example), this
 943 will result in an endless loop.
 944
 945 Note that @code{yyless} is a macro and can only be used in the
 946 flex input file, not from other source files.
 947
 948 @item
 949 @samp{unput(c)} puts the character @code{c} back onto the input
 950 stream.  It will be the next character scanned.
 951 The following action will take the current token
 952 and cause it to be rescanned enclosed in
 953 parentheses.
 954
 955 @example
 956 @{
 957 int i;
 958 /* Copy yytext because unput() trashes yytext */
 959 char *yycopy = strdup( yytext );
 960 unput( ')' );
 961 for ( i = yyleng - 1; i >= 0; --i )
 962     unput( yycopy[i] );
 963 unput( '(' );
 964 free( yycopy );
 965 @}
 966 @end example
 967
 968 Note that since each @samp{unput()} puts the given
 969 character back at the @emph{beginning} of the input stream,
 970 pushing back strings must be done back-to-front.
 971 An important potential problem when using @samp{unput()} is that
 972 if you are using @samp{%pointer} (the default), a call to @samp{unput()}
 973 @emph{destroys} the contents of @code{yytext}, starting with its
 974 rightmost character and devouring one character to the left
 975 with each call.  If you need the value of yytext preserved
 976 after a call to @samp{unput()} (as in the above example), you
 977 must either first copy it elsewhere, or build your scanner
 978 using @samp{%array} instead (see How The Input Is Matched).
 979
 980 Finally, note that you cannot put back @code{EOF} to attempt to
 981 mark the input stream with an end-of-file.
 982
 983 @item
 984 @samp{input()} reads the next character from the input
 985 stream.  For example, the following is one way to
 986 eat up C comments:
 987
 988 @example
 989 %%
 990 "/*"        @{
 991             register int c;
 992
 993             for ( ; ; )
 994                 @{
 995                 while ( (c = input()) != '*' &&
 996                         c != EOF )
 997                     ;    /* eat up text of comment */
 998
 999                 if ( c == '*' )
1000                     @{
1001                     while ( (c = input()) == '*' )
1002                         ;
1003                     if ( c == '/' )
1004                         break;    /* found the end */
1005                     @}
1006
1007                 if ( c == EOF )
1008                     @{
1009                     error( "EOF in comment" );
1010                     break;
1011                     @}
1012                 @}
1013             @}
1014 @end example
1015
1016 (Note that if the scanner is compiled using @samp{C++},
1017 then @samp{input()} is instead referred to as @samp{yyinput()},
1018 in order to avoid a name clash with the @samp{C++} stream
1019 by the name of @code{input}.)
1020
1021 @item YY_FLUSH_BUFFER
1022 flushes the scanner's internal buffer so that the next time the scanner
1023 attempts to match a token, it will first refill the buffer using
1024 @code{YY_INPUT} (see The Generated Scanner, below).  This action is
1025 a special case of the more general @samp{yy_flush_buffer()} function,
1026 described below in the section Multiple Input Buffers.
1027
1028 @item
1029 @samp{yyterminate()} can be used in lieu of a return
1030 statement in an action.  It terminates the scanner
1031 and returns a 0 to the scanner's caller, indicating
1032 "all done".  By default, @samp{yyterminate()} is also
1033 called when an end-of-file is encountered.  It is a
1034 macro and may be redefined.
1035 @end itemize
1036
1037 @node Generated scanner, Start conditions, Actions, Top
1038 @section The generated scanner
1039
1040 The output of @code{flex} is the file @file{lex.yy.c}, which contains
1041 the scanning routine @samp{yylex()}, a number of tables used by
1042 it for matching tokens, and a number of auxiliary routines
1043 and macros.  By default, @samp{yylex()} is declared as follows:
1044
1045 @example
1046 int yylex()
1047     @{
1048     @dots{} various definitions and the actions in here @dots{}
1049     @}
1050 @end example
1051
1052 (If your environment supports function prototypes, then it
1053 will be "int yylex( void  )".)   This  definition  may  be
1054 changed by defining the "YY_DECL" macro.  For example, you
1055 could use:
1056
1057 @example
1058 #define YY_DECL float lexscan( a, b ) float a, b;
1059 @end example
1060
1061 to give the scanning routine the name @code{lexscan}, returning a
1062 float, and taking two floats as arguments.  Note that if
1063 you give arguments to the scanning routine using a
1064 K&R-style/non-prototyped function declaration, you must
1065 terminate the definition with a semi-colon (@samp{;}).
1066
1067 Whenever @samp{yylex()} is called, it scans tokens from the
1068 global input file @code{yyin} (which defaults to stdin).  It
1069 continues until it either reaches an end-of-file (at which
1070 point it returns the value 0) or one of its actions
1071 executes a @code{return} statement.
1072
1073 If the scanner reaches an end-of-file, subsequent calls are undefined
1074 unless either @code{yyin} is pointed at a new input file (in which case
1075 scanning continues from that file), or @samp{yyrestart()} is called.
1076 @samp{yyrestart()} takes one argument, a @samp{FILE *} pointer (which
1077 can be nil, if you've set up @code{YY_INPUT} to scan from a source
1078 other than @code{yyin}), and initializes @code{yyin} for scanning from
1079 that file.  Essentially there is no difference between just assigning
1080 @code{yyin} to a new input file or using @samp{yyrestart()} to do so;
1081 the latter is available for compatibility with previous versions of
1082 @code{flex}, and because it can be used to switch input files in the
1083 middle of scanning.  It can also be used to throw away the current
1084 input buffer, by calling it with an argument of @code{yyin}; but
1085 better is to use @code{YY_FLUSH_BUFFER} (see above).  Note that
1086 @samp{yyrestart()} does @emph{not} reset the start condition to
1087 @code{INITIAL} (see Start Conditions, below).
1088
1089
1090 If @samp{yylex()} stops scanning due to executing a @code{return}
1091 statement in one of the actions, the scanner may then be called
1092 again and it will resume scanning where it left off.
1093
1094 By default (and for purposes of efficiency), the scanner
1095 uses block-reads rather than simple @samp{getc()} calls to read
1096 characters from @code{yyin}.  The nature of how it gets its input
1097 can be controlled by defining the @code{YY_INPUT} macro.
1098 YY_INPUT's calling sequence is
1099 "YY_INPUT(buf,result,max_size)".  Its action is to place
1100 up to @var{max_size} characters in the character array @var{buf} and
1101 return in the integer variable @var{result} either the number of
1102 characters read or the constant YY_NULL (0 on Unix
1103 systems) to indicate EOF.  The default YY_INPUT reads from
1104 the global file-pointer "yyin".
1105
1106 A sample definition of YY_INPUT (in the definitions
1107 section of the input file):
1108
1109 @example
1110 %@{
1111 #define YY_INPUT(buf,result,max_size) \
1112     @{ \
1113     int c = getchar(); \
1114     result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
1115     @}
1116 %@}
1117 @end example
1118
1119 This definition will change the input processing to occur
1120 one character at a time.
1121
1122 When the scanner receives an end-of-file indication from
1123 YY_INPUT, it then checks the @samp{yywrap()} function.  If
1124 @samp{yywrap()} returns false (zero), then it is assumed that the
1125 function has gone ahead and set up @code{yyin} to point to
1126 another input file, and scanning continues.  If it returns
1127 true (non-zero), then the scanner terminates, returning 0
1128 to its caller.  Note that in either case, the start
1129 condition remains unchanged; it does @emph{not} revert to @code{INITIAL}.
1130
1131 If you do not supply your own version of @samp{yywrap()}, then you
1132 must either use @samp{%option noyywrap} (in which case the scanner
1133 behaves as though @samp{yywrap()} returned 1), or you must link with
1134 @samp{-lfl} to obtain the default version of the routine, which always
1135 returns 1.
1136
1137 Three routines are available for scanning from in-memory
1138 buffers rather than files: @samp{yy_scan_string()},
1139 @samp{yy_scan_bytes()}, and @samp{yy_scan_buffer()}.  See the discussion
1140 of them below in the section Multiple Input Buffers.
1141
1142 The scanner writes its @samp{ECHO} output to the @code{yyout} global
1143 (default, stdout), which may be redefined by the user
1144 simply by assigning it to some other @code{FILE} pointer.
1145
1146 @node Start conditions, Multiple buffers, Generated scanner, Top
1147 @section Start conditions
1148
1149 @code{flex} provides a mechanism for conditionally activating
1150 rules.  Any rule whose pattern is prefixed with "<sc>"
1151 will only be active when the scanner is in the start
1152 condition named "sc".  For example,
1153
1154 @example
1155 <STRING>[^"]*        @{ /* eat up the string body ... */
1156             @dots{}
1157             @}
1158 @end example
1159
1160 @noindent
1161 will be active only when the scanner is in the "STRING"
1162 start condition, and
1163
1164 @example
1165 <INITIAL,STRING,QUOTE>\.        @{ /* handle an escape ... */
1166             @dots{}
1167             @}
1168 @end example
1169
1170 @noindent
1171 will be active only when the current start condition is
1172 either "INITIAL", "STRING", or "QUOTE".
1173
1174 Start conditions are declared in the definitions (first)
1175 section of the input using unindented lines beginning with
1176 either @samp{%s} or @samp{%x} followed by a list of names.  The former
1177 declares @emph{inclusive} start conditions, the latter @emph{exclusive}
1178 start conditions.  A start condition is activated using
1179 the @code{BEGIN} action.  Until the next @code{BEGIN} action is
1180 executed, rules with the given start condition will be active
1181 and rules with other start conditions will be inactive.
1182 If the start condition is @emph{inclusive}, then rules with no
1183 start conditions at all will also be active.  If it is
1184 @emph{exclusive}, then @emph{only} rules qualified with the start
1185 condition will be active.  A set of rules contingent on the
1186 same exclusive start condition describe a scanner which is
1187 independent of any of the other rules in the @code{flex} input.
1188 Because of this, exclusive start conditions make it easy
1189 to specify "mini-scanners" which scan portions of the
1190 input that are syntactically different from the rest
1191 (e.g., comments).
1192
1193 If the distinction between inclusive and exclusive start
1194 conditions is still a little vague, here's a simple
1195 example illustrating the connection between the two.  The set
1196 of rules:
1197
1198 @example
1199 %s example
1200 %%
1201
1202 <example>foo   do_something();
1203
1204 bar            something_else();
1205 @end example
1206
1207 @noindent
1208 is equivalent to
1209
1210 @example
1211 %x example
1212 %%
1213
1214 <example>foo   do_something();
1215
1216 <INITIAL,example>bar    something_else();
1217 @end example
1218
1219 Without the @samp{<INITIAL,example>} qualifier, the @samp{bar} pattern
1220 in the second example wouldn't be active (i.e., couldn't match) when
1221 in start condition @samp{example}.  If we just used @samp{<example>}
1222 to qualify @samp{bar}, though, then it would only be active in
1223 @samp{example} and not in @code{INITIAL}, while in the first example
1224 it's active in both, because in the first example the @samp{example}
1225 starting condition is an @emph{inclusive} (@samp{%s}) start condition.
1226
1227 Also note that the special start-condition specifier @samp{<*>}
1228 matches every start condition.  Thus, the above example
1229 could also have been written;
1230
1231 @example
1232 %x example
1233 %%
1234
1235 <example>foo   do_something();
1236
1237 <*>bar    something_else();
1238 @end example
1239
1240 The default rule (to @samp{ECHO} any unmatched character) remains
1241 active in start conditions.  It is equivalent to:
1242
1243 @example
1244 <*>.|\\n     ECHO;
1245 @end example
1246
1247 @samp{BEGIN(0)} returns to the original state where only the
1248 rules with no start conditions are active.  This state can
1249 also be referred to as the start-condition "INITIAL", so
1250 @samp{BEGIN(INITIAL)} is equivalent to @samp{BEGIN(0)}.  (The
1251 parentheses around the start condition name are not required but
1252 are considered good style.)
1253
1254 @code{BEGIN} actions can also be given as indented code at the
1255 beginning of the rules section.  For example, the
1256 following will cause the scanner to enter the "SPECIAL" start
1257 condition whenever @samp{yylex()} is called and the global
1258 variable @code{enter_special} is true:
1259
1260 @example
1261         int enter_special;
1262
1263 %x SPECIAL
1264 %%
1265         if ( enter_special )
1266             BEGIN(SPECIAL);
1267
1268 <SPECIAL>blahblahblah
1269 @dots{}more rules follow@dots{}
1270 @end example
1271
1272 To illustrate the uses of start conditions, here is a
1273 scanner which provides two different interpretations of a
1274 string like "123.456".  By default it will treat it as as
1275 three tokens, the integer "123", a dot ('.'), and the
1276 integer "456".  But if the string is preceded earlier in
1277 the line by the string "expect-floats" it will treat it as
1278 a single token, the floating-point number 123.456:
1279
1280 @example
1281 %@{
1282 #include <math.h>
1283 %@}
1284 %s expect
1285
1286 %%
1287 expect-floats        BEGIN(expect);
1288
1289 <expect>[0-9]+"."[0-9]+      @{
1290             printf( "found a float, = %f\n",
1291                     atof( yytext ) );
1292             @}
1293 <expect>\n           @{
1294             /* that's the end of the line, so
1295              * we need another "expect-number"
1296              * before we'll recognize any more
1297              * numbers
1298              */
1299             BEGIN(INITIAL);
1300             @}
1301
1302 [0-9]+      @{
1303
1304 Version 2.5               December 1994                        18
1305
1306             printf( "found an integer, = %d\n",
1307                     atoi( yytext ) );
1308             @}
1309
1310 "."         printf( "found a dot\n" );
1311 @end example
1312
1313 Here is a scanner which recognizes (and discards) C
1314 comments while maintaining a count of the current input line.
1315
1316 @example
1317 %x comment
1318 %%
1319         int line_num = 1;
1320
1321 "/*"         BEGIN(comment);
1322
1323 <comment>[^*\n]*        /* eat anything that's not a '*' */
1324 <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1325 <comment>\n             ++line_num;
1326 <comment>"*"+"/"        BEGIN(INITIAL);
1327 @end example
1328
1329 This scanner goes to a bit of trouble to match as much
1330 text as possible with each rule.  In general, when
1331 attempting to write a high-speed scanner try to match as
1332 much possible in each rule, as it's a big win.
1333
1334 Note that start-conditions names are really integer values
1335 and can be stored as such.  Thus, the above could be
1336 extended in the following fashion:
1337
1338 @example
1339 %x comment foo
1340 %%
1341         int line_num = 1;
1342         int comment_caller;
1343
1344 "/*"         @{
1345              comment_caller = INITIAL;
1346              BEGIN(comment);
1347              @}
1348
1349 @dots{}
1350
1351 <foo>"/*"    @{
1352              comment_caller = foo;
1353              BEGIN(comment);
1354              @}
1355
1356 <comment>[^*\n]*        /* eat anything that's not a '*' */
1357 <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1358 <comment>\n             ++line_num;
1359 <comment>"*"+"/"        BEGIN(comment_caller);
1360 @end example
1361
1362 Furthermore, you can access the current start condition
1363 using the integer-valued @code{YY_START} macro.  For example, the
1364 above assignments to @code{comment_caller} could instead be
1365 written
1366
1367 @example
1368 comment_caller = YY_START;
1369 @end example
1370
1371 Flex provides @code{YYSTATE} as an alias for @code{YY_START} (since that
1372 is what's used by AT&T @code{lex}).
1373
1374 Note that start conditions do not have their own
1375 name-space; %s's and %x's declare names in the same fashion as
1376 #define's.
1377
1378 Finally, here's an example of how to match C-style quoted
1379 strings using exclusive start conditions, including
1380 expanded escape sequences (but not including checking for
1381 a string that's too long):
1382
1383 @example
1384 %x str
1385
1386 %%
1387         char string_buf[MAX_STR_CONST];
1388         char *string_buf_ptr;
1389
1390 \"      string_buf_ptr = string_buf; BEGIN(str);
1391
1392 <str>\"        @{ /* saw closing quote - all done */
1393         BEGIN(INITIAL);
1394         *string_buf_ptr = '\0';
1395         /* return string constant token type and
1396          * value to parser
1397          */
1398         @}
1399
1400 <str>\n        @{
1401         /* error - unterminated string constant */
1402         /* generate error message */
1403         @}
1404
1405 <str>\\[0-7]@{1,3@} @{
1406         /* octal escape sequence */
1407         int result;
1408
1409         (void) sscanf( yytext + 1, "%o", &result );
1410
1411         if ( result > 0xff )
1412                 /* error, constant is out-of-bounds */
1413
1414         *string_buf_ptr++ = result;
1415         @}
1416
1417 <str>\\[0-9]+ @{
1418         /* generate error - bad escape sequence; something
1419          * like '\48' or '\0777777'
1420          */
1421         @}
1422
1423 <str>\\n  *string_buf_ptr++ = '\n';
1424 <str>\\t  *string_buf_ptr++ = '\t';
1425 <str>\\r  *string_buf_ptr++ = '\r';
1426 <str>\\b  *string_buf_ptr++ = '\b';
1427 <str>\\f  *string_buf_ptr++ = '\f';
1428
1429 <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];
1430
1431 <str>[^\\\n\"]+        @{
1432         char *yptr = yytext;
1433
1434         while ( *yptr )
1435                 *string_buf_ptr++ = *yptr++;
1436         @}
1437 @end example
1438
1439 Often, such as in some of the examples above, you wind up
1440 writing a whole bunch of rules all preceded by the same
1441 start condition(s).  Flex makes this a little easier and
1442 cleaner by introducing a notion of start condition @dfn{scope}.
1443 A start condition scope is begun with:
1444
1445 @example
1446 <SCs>@{
1447 @end example
1448
1449 @noindent
1450 where SCs is a list of one or more start conditions.
1451 Inside the start condition scope, every rule automatically
1452 has the prefix @samp{<SCs>} applied to it, until a @samp{@}} which
1453 matches the initial @samp{@{}.  So, for example,
1454
1455 @example
1456 <ESC>@{
1457     "\\n"   return '\n';
1458     "\\r"   return '\r';
1459     "\\f"   return '\f';
1460     "\\0"   return '\0';
1461 @}
1462 @end example
1463
1464 @noindent
1465 is equivalent to:
1466
1467 @example
1468 <ESC>"\\n"  return '\n';
1469 <ESC>"\\r"  return '\r';
1470 <ESC>"\\f"  return '\f';
1471 <ESC>"\\0"  return '\0';
1472 @end example
1473
1474 Start condition scopes may be nested.
1475
1476 Three routines are available for manipulating stacks of
1477 start conditions:
1478
1479 @table @samp
1480 @item void yy_push_state(int new_state)
1481 pushes the current start condition onto the top of
1482 the start condition stack and switches to @var{new_state}
1483 as though you had used @samp{BEGIN new_state} (recall that
1484 start condition names are also integers).
1485
1486 @item void yy_pop_state()
1487 pops the top of the stack and switches to it via
1488 @code{BEGIN}.
1489
1490 @item int yy_top_state()
1491 returns the top of the stack without altering the
1492 stack's contents.
1493 @end table
1494
1495 The start condition stack grows dynamically and so has no
1496 built-in size limitation.  If memory is exhausted, program
1497 execution aborts.
1498
1499 To use start condition stacks, your scanner must include a
1500 @samp{%option stack} directive (see Options below).
1501
1502 @node Multiple buffers, End-of-file rules, Start conditions, Top
1503 @section Multiple input buffers
1504
1505 Some scanners (such as those which support "include"
1506 files) require reading from several input streams.  As
1507 @code{flex} scanners do a large amount of buffering, one cannot
1508 control where the next input will be read from by simply
1509 writing a @code{YY_INPUT} which is sensitive to the scanning
1510 context.  @code{YY_INPUT} is only called when the scanner reaches
1511 the end of its buffer, which may be a long time after
1512 scanning a statement such as an "include" which requires
1513 switching the input source.
1514
1515 To negotiate these sorts of problems, @code{flex} provides a
1516 mechanism for creating and switching between multiple
1517 input buffers.  An input buffer is created by using:
1518
1519 @example
1520 YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
1521 @end example
1522
1523 @noindent
1524 which takes a @code{FILE} pointer and a size and creates a buffer
1525 associated with the given file and large enough to hold
1526 @var{size} characters (when in doubt, use @code{YY_BUF_SIZE} for the
1527 size).  It returns a @code{YY_BUFFER_STATE} handle, which may
1528 then be passed to other routines (see below).  The
1529 @code{YY_BUFFER_STATE} type is a pointer to an opaque @code{struct}
1530 @code{yy_buffer_state} structure, so you may safely initialize
1531 YY_BUFFER_STATE variables to @samp{((YY_BUFFER_STATE) 0)} if you
1532 wish, and also refer to the opaque structure in order to
1533 correctly declare input buffers in source files other than
1534 that of your scanner.  Note that the @code{FILE} pointer in the
1535 call to @code{yy_create_buffer} is only used as the value of @code{yyin}
1536 seen by @code{YY_INPUT}; if you redefine @code{YY_INPUT} so it no longer
1537 uses @code{yyin}, then you can safely pass a nil @code{FILE} pointer to
1538 @code{yy_create_buffer}.  You select a particular buffer to scan
1539 from using:
1540
1541 @example
1542 void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
1543 @end example
1544
1545 switches the scanner's input buffer so subsequent tokens
1546 will come from @var{new_buffer}.  Note that
1547 @samp{yy_switch_to_buffer()} may be used by @samp{yywrap()} to set
1548 things up for continued scanning, instead of opening a new
1549 file and pointing @code{yyin} at it.  Note also that switching
1550 input sources via either @samp{yy_switch_to_buffer()} or @samp{yywrap()}
1551 does @emph{not} change the start condition.
1552
1553 @example
1554 void yy_delete_buffer( YY_BUFFER_STATE buffer )
1555 @end example
1556
1557 @noindent
1558 is used to reclaim the storage associated with a buffer.
1559 You can also clear the current contents of a buffer using:
1560
1561 @example
1562 void yy_flush_buffer( YY_BUFFER_STATE buffer )
1563 @end example
1564
1565 This function discards the buffer's contents, so the next time the
1566 scanner attempts to match a token from the buffer, it will first fill
1567 the buffer anew using @code{YY_INPUT}.
1568
1569 @samp{yy_new_buffer()} is an alias for @samp{yy_create_buffer()},
1570 provided for compatibility with the C++ use of @code{new} and @code{delete}
1571 for creating and destroying dynamic objects.
1572
1573 Finally, the @code{YY_CURRENT_BUFFER} macro returns a
1574 @code{YY_BUFFER_STATE} handle to the current buffer.
1575
1576 Here is an example of using these features for writing a
1577 scanner which expands include files (the @samp{<<EOF>>} feature
1578 is discussed below):
1579
1580 @example
1581 /* the "incl" state is used for picking up the name
1582  * of an include file
1583  */
1584 %x incl
1585
1586 %@{
1587 #define MAX_INCLUDE_DEPTH 10
1588 YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1589 int include_stack_ptr = 0;
1590 %@}
1591
1592 %%
1593 include             BEGIN(incl);
1594
1595 [a-z]+              ECHO;
1596 [^a-z\n]*\n?        ECHO;
1597
1598 <incl>[ \t]*      /* eat the whitespace */
1599 <incl>[^ \t\n]+   @{ /* got the include file name */
1600         if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1601             @{
1602             fprintf( stderr, "Includes nested too deeply" );
1603             exit( 1 );
1604             @}
1605
1606         include_stack[include_stack_ptr++] =
1607             YY_CURRENT_BUFFER;
1608
1609         yyin = fopen( yytext, "r" );
1610
1611         if ( ! yyin )
1612             error( @dots{} );
1613
1614         yy_switch_to_buffer(
1615             yy_create_buffer( yyin, YY_BUF_SIZE ) );
1616
1617         BEGIN(INITIAL);
1618         @}
1619
1620 <<EOF>> @{
1621         if ( --include_stack_ptr < 0 )
1622             @{
1623             yyterminate();
1624             @}
1625
1626         else
1627             @{
1628             yy_delete_buffer( YY_CURRENT_BUFFER );
1629             yy_switch_to_buffer(
1630                  include_stack[include_stack_ptr] );
1631             @}
1632         @}
1633 @end example
1634
1635 Three routines are available for setting up input buffers
1636 for scanning in-memory strings instead of files.  All of
1637 them create a new input buffer for scanning the string,
1638 and return a corresponding @code{YY_BUFFER_STATE} handle (which
1639 you should delete with @samp{yy_delete_buffer()} when done with
1640 it).  They also switch to the new buffer using
1641 @samp{yy_switch_to_buffer()}, so the next call to @samp{yylex()} will
1642 start scanning the string.
1643
1644 @table @samp
1645 @item yy_scan_string(const char *str)
1646 scans a NUL-terminated string.
1647
1648 @item yy_scan_bytes(const char *bytes, int len)
1649 scans @code{len} bytes (including possibly NUL's) starting
1650 at location @var{bytes}.
1651 @end table
1652
1653 Note that both of these functions create and scan a @emph{copy}
1654 of the string or bytes.  (This may be desirable, since
1655 @samp{yylex()} modifies the contents of the buffer it is
1656 scanning.) You can avoid the copy by using:
1657
1658 @table @samp
1659 @item yy_scan_buffer(char *base, yy_size_t size)
1660 which scans in place the buffer starting at @var{base},
1661 consisting of @var{size} bytes, the last two bytes of
1662 which @emph{must} be @code{YY_END_OF_BUFFER_CHAR} (ASCII NUL).
1663 These last two bytes are not scanned; thus,
1664 scanning consists of @samp{base[0]} through @samp{base[size-2]},
1665 inclusive.
1666
1667 If you fail to set up @var{base} in this manner (i.e.,
1668 forget the final two @code{YY_END_OF_BUFFER_CHAR} bytes),
1669 then @samp{yy_scan_buffer()} returns a nil pointer instead
1670 of creating a new input buffer.
1671
1672 The type @code{yy_size_t} is an integral type to which you
1673 can cast an integer expression reflecting the size
1674 of the buffer.
1675 @end table
1676
1677 @node End-of-file rules, Miscellaneous, Multiple buffers, Top
1678 @section End-of-file rules
1679
1680 The special rule "<<EOF>>" indicates actions which are to
1681 be taken when an end-of-file is encountered and yywrap()
1682 returns non-zero (i.e., indicates no further files to
1683 process).  The action must finish by doing one of four
1684 things:
1685
1686 @itemize -
1687 @item
1688 assigning @code{yyin} to a new input file (in previous
1689 versions of flex, after doing the assignment you
1690 had to call the special action @code{YY_NEW_FILE}; this is
1691 no longer necessary);
1692
1693 @item
1694 executing a @code{return} statement;
1695
1696 @item
1697 executing the special @samp{yyterminate()} action;
1698
1699 @item
1700 or, switching to a new buffer using
1701 @samp{yy_switch_to_buffer()} as shown in the example
1702 above.
1703 @end itemize
1704
1705 <<EOF>> rules may not be used with other patterns; they
1706 may only be qualified with a list of start conditions.  If
1707 an unqualified <<EOF>> rule is given, it applies to @emph{all}
1708 start conditions which do not already have <<EOF>>
1709 actions.  To specify an <<EOF>> rule for only the initial
1710 start condition, use
1711
1712 @example
1713 <INITIAL><<EOF>>
1714 @end example
1715
1716 These rules are useful for catching things like unclosed
1717 comments.  An example:
1718
1719 @example
1720 %x quote
1721 %%
1722
1723 @dots{}other rules for dealing with quotes@dots{}
1724
1725 <quote><<EOF>>   @{
1726          error( "unterminated quote" );
1727          yyterminate();
1728          @}
1729 <<EOF>>  @{
1730          if ( *++filelist )
1731              yyin = fopen( *filelist, "r" );
1732          else
1733             yyterminate();
1734          @}
1735 @end example
1736
1737 @node Miscellaneous, User variables, End-of-file rules, Top
1738 @section Miscellaneous macros
1739
1740 The macro @code{YY_USER_ACTION} can be defined to provide an
1741 action which is always executed prior to the matched
1742 rule's action.  For example, it could be #define'd to call
1743 a routine to convert yytext to lower-case.  When
1744 @code{YY_USER_ACTION} is invoked, the variable @code{yy_act} gives the
1745 number of the matched rule (rules are numbered starting
1746 with 1).  Suppose you want to profile how often each of
1747 your rules is matched.  The following would do the trick:
1748
1749 @example
1750 #define YY_USER_ACTION ++ctr[yy_act]
1751 @end example
1752
1753 where @code{ctr} is an array to hold the counts for the different
1754 rules.  Note that the macro @code{YY_NUM_RULES} gives the total number
1755 of rules (including the default rule, even if you use @samp{-s}, so
1756 a correct declaration for @code{ctr} is:
1757
1758 @example
1759 int ctr[YY_NUM_RULES];
1760 @end example
1761
1762 The macro @code{YY_USER_INIT} may be defined to provide an action
1763 which is always executed before the first scan (and before
1764 the scanner's internal initializations are done).  For
1765 example, it could be used to call a routine to read in a
1766 data table or open a logging file.
1767
1768 The macro @samp{yy_set_interactive(is_interactive)} can be used
1769 to control whether the current buffer is considered
1770 @emph{interactive}.  An interactive buffer is processed more slowly,
1771 but must be used when the scanner's input source is indeed
1772 interactive to avoid problems due to waiting to fill
1773 buffers (see the discussion of the @samp{-I} flag below).  A
1774 non-zero value in the macro invocation marks the buffer as
1775 interactive, a zero value as non-interactive.  Note that
1776 use of this macro overrides @samp{%option always-interactive} or
1777 @samp{%option never-interactive} (see Options below).
1778 @samp{yy_set_interactive()} must be invoked prior to beginning to
1779 scan the buffer that is (or is not) to be considered
1780 interactive.
1781
1782 The macro @samp{yy_set_bol(at_bol)} can be used to control
1783 whether the current buffer's scanning context for the next
1784 token match is done as though at the beginning of a line.
1785 A non-zero macro argument makes rules anchored with
1786
1787 The macro @samp{YY_AT_BOL()} returns true if the next token
1788 scanned from the current buffer will have '^' rules
1789 active, false otherwise.
1790
1791 In the generated scanner, the actions are all gathered in
1792 one large switch statement and separated using @code{YY_BREAK},
1793 which may be redefined.  By default, it is simply a
1794 "break", to separate each rule's action from the following
1795 rule's.  Redefining @code{YY_BREAK} allows, for example, C++
1796 users to #define YY_BREAK to do nothing (while being very
1797 careful that every rule ends with a "break" or a
1798 "return"!) to avoid suffering from unreachable statement
1799 warnings where because a rule's action ends with "return",
1800 the @code{YY_BREAK} is inaccessible.
1801
1802 @node User variables, YACC interface, Miscellaneous, Top
1803 @section Values available to the user
1804
1805 This section summarizes the various values available to
1806 the user in the rule actions.
1807
1808 @itemize -
1809 @item
1810 @samp{char *yytext} holds the text of the current token.
1811 It may be modified but not lengthened (you cannot
1812 append characters to the end).
1813
1814 If the special directive @samp{%array} appears in the
1815 first section of the scanner description, then
1816 @code{yytext} is instead declared @samp{char yytext[YYLMAX]},
1817 where @code{YYLMAX} is a macro definition that you can
1818 redefine in the first section if you don't like the
1819 default value (generally 8KB).  Using @samp{%array}
1820 results in somewhat slower scanners, but the value
1821 of @code{yytext} becomes immune to calls to @samp{input()} and
1822 @samp{unput()}, which potentially destroy its value when
1823 @code{yytext} is a character pointer.  The opposite of
1824 @samp{%array} is @samp{%pointer}, which is the default.
1825
1826 You cannot use @samp{%array} when generating C++ scanner
1827 classes (the @samp{-+} flag).
1828
1829 @item
1830 @samp{int yyleng} holds the length of the current token.
1831
1832 @item
1833 @samp{FILE *yyin} is the file which by default @code{flex} reads
1834 from.  It may be redefined but doing so only makes
1835 sense before scanning begins or after an EOF has
1836 been encountered.  Changing it in the midst of
1837 scanning will have unexpected results since @code{flex}
1838 buffers its input; use @samp{yyrestart()} instead.  Once
1839 scanning terminates because an end-of-file has been
1840 seen, you can assign @code{yyin} at the new input file and
1841 then call the scanner again to continue scanning.
1842
1843 @item
1844 @samp{void yyrestart( FILE *new_file )} may be called to
1845 point @code{yyin} at the new input file.  The switch-over
1846 to the new file is immediate (any previously
1847 buffered-up input is lost).  Note that calling
1848 @samp{yyrestart()} with @code{yyin} as an argument thus throws
1849 away the current input buffer and continues
1850 scanning the same input file.
1851
1852 @item
1853 @samp{FILE *yyout} is the file to which @samp{ECHO} actions are
1854 done.  It can be reassigned by the user.
1855
1856 @item
1857 @code{YY_CURRENT_BUFFER} returns a @code{YY_BUFFER_STATE} handle
1858 to the current buffer.
1859
1860 @item
1861 @code{YY_START} returns an integer value corresponding to
1862 the current start condition.  You can subsequently
1863 use this value with @code{BEGIN} to return to that start
1864 condition.
1865 @end itemize
1866
1867 @node YACC interface, Options, User variables, Top
1868 @section Interfacing with @code{yacc}
1869
1870 One of the main uses of @code{flex} is as a companion to the @code{yacc}
1871 parser-generator.  @code{yacc} parsers expect to call a routine
1872 named @samp{yylex()} to find the next input token.  The routine
1873 is supposed to return the type of the next token as well
1874 as putting any associated value in the global @code{yylval}.  To
1875 use @code{flex} with @code{yacc}, one specifies the @samp{-d} option to @code{yacc} to
1876 instruct it to generate the file @file{y.tab.h} containing
1877 definitions of all the @samp{%tokens} appearing in the @code{yacc} input.
1878 This file is then included in the @code{flex} scanner.  For
1879 example, if one of the tokens is "TOK_NUMBER", part of the
1880 scanner might look like:
1881
1882 @example
1883 %@{
1884 #include "y.tab.h"
1885 %@}
1886
1887 %%
1888
1889 [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
1890 @end example
1891
1892 @node Options, Performance, YACC interface, Top
1893 @section Options
1894 @code{flex} has the following options:
1895
1896 @table @samp
1897 @item -b
1898 Generate backing-up information to @file{lex.backup}.
1899 This is a list of scanner states which require
1900 backing up and the input characters on which they
1901 do so.  By adding rules one can remove backing-up
1902 states.  If @emph{all} backing-up states are eliminated
1903 and @samp{-Cf} or @samp{-CF} is used, the generated scanner will
1904 run faster (see the @samp{-p} flag).  Only users who wish
1905 to squeeze every last cycle out of their scanners
1906 need worry about this option.  (See the section on
1907 Performance Considerations below.)
1908
1909 @item -c
1910 is a do-nothing, deprecated option included for
1911 POSIX compliance.
1912
1913 @item -d
1914 makes the generated scanner run in @dfn{debug} mode.
1915 Whenever a pattern is recognized and the global
1916 @code{yy_flex_debug} is non-zero (which is the default),
1917 the scanner will write to @code{stderr} a line of the
1918 form:
1919
1920 @example
1921 --accepting rule at line 53 ("the matched text")
1922 @end example
1923
1924 The line number refers to the location of the rule
1925 in the file defining the scanner (i.e., the file
1926 that was fed to flex).  Messages are also generated
1927 when the scanner backs up, accepts the default
1928 rule, reaches the end of its input buffer (or
1929 encounters a NUL; at this point, the two look the
1930 same as far as the scanner's concerned), or reaches
1931 an end-of-file.
1932
1933 @item -f
1934 specifies @dfn{fast scanner}.  No table compression is
1935 done and stdio is bypassed.  The result is large
1936 but fast.  This option is equivalent to @samp{-Cfr} (see
1937 below).
1938
1939 @item -h
1940 generates a "help" summary of @code{flex's} options to
1941 @code{stdout} and then exits.  @samp{-?} and @samp{--help} are synonyms
1942 for @samp{-h}.
1943
1944 @item -i
1945 instructs @code{flex} to generate a @emph{case-insensitive}
1946 scanner.  The case of letters given in the @code{flex} input
1947 patterns will be ignored, and tokens in the input
1948 will be matched regardless of case.  The matched
1949 text given in @code{yytext} will have the preserved case
1950 (i.e., it will not be folded).
1951
1952 @item -l
1953 turns on maximum compatibility with the original
1954 AT&T @code{lex} implementation.  Note that this does not
1955 mean @emph{full} compatibility.  Use of this option costs
1956 a considerable amount of performance, and it cannot
1957 be used with the @samp{-+, -f, -F, -Cf}, or @samp{-CF} options.
1958 For details on the compatibilities it provides, see
1959 the section "Incompatibilities With Lex And POSIX"
1960 below.  This option also results in the name
1961 @code{YY_FLEX_LEX_COMPAT} being #define'd in the generated
1962 scanner.
1963
1964 @item -n
1965 is another do-nothing, deprecated option included
1966 only for POSIX compliance.
1967
1968 @item -p
1969 generates a performance report to stderr.  The
1970 report consists of comments regarding features of
1971 the @code{flex} input file which will cause a serious loss
1972 of performance in the resulting scanner.  If you
1973 give the flag twice, you will also get comments
1974 regarding features that lead to minor performance
1975 losses.
1976
1977 Note that the use of @code{REJECT}, @samp{%option yylineno} and
1978 variable trailing context (see the Deficiencies / Bugs section below)
1979 entails a substantial performance penalty; use of @samp{yymore()},
1980 the @samp{^} operator, and the @samp{-I} flag entail minor performance
1981 penalties.
1982
1983 @item -s
1984 causes the @dfn{default rule} (that unmatched scanner
1985 input is echoed to @code{stdout}) to be suppressed.  If
1986 the scanner encounters input that does not match
1987 any of its rules, it aborts with an error.  This
1988 option is useful for finding holes in a scanner's
1989 rule set.
1990
1991 @item -t
1992 instructs @code{flex} to write the scanner it generates to
1993 standard output instead of @file{lex.yy.c}.
1994
1995 @item -v
1996 specifies that @code{flex} should write to @code{stderr} a
1997 summary of statistics regarding the scanner it
1998 generates.  Most of the statistics are meaningless to
1999 the casual @code{flex} user, but the first line identifies
2000 the version of @code{flex} (same as reported by @samp{-V}), and
2001 the next line the flags used when generating the
2002 scanner, including those that are on by default.
2003
2004 @item -w
2005 suppresses warning messages.
2006
2007 @item -B
2008 instructs @code{flex} to generate a @emph{batch} scanner, the
2009 opposite of @emph{interactive} scanners generated by @samp{-I}
2010 (see below).  In general, you use @samp{-B} when you are
2011 @emph{certain} that your scanner will never be used
2012 interactively, and you want to squeeze a @emph{little} more
2013 performance out of it.  If your goal is instead to
2014 squeeze out a @emph{lot} more performance, you should be
2015 using the @samp{-Cf} or @samp{-CF} options (discussed below),
2016 which turn on @samp{-B} automatically anyway.
2017
2018 @item -F
2019 specifies that the @dfn{fast} scanner table
2020 representation should be used (and stdio bypassed).  This
2021 representation is about as fast as the full table
2022 representation @samp{(-f)}, and for some sets of patterns
2023 will be considerably smaller (and for others,
2024 larger).  In general, if the pattern set contains
2025 both "keywords" and a catch-all, "identifier" rule,
2026 such as in the set:
2027
2028 @example
2029 "case"    return TOK_CASE;
2030 "switch"  return TOK_SWITCH;
2031 ...
2032 "default" return TOK_DEFAULT;
2033 [a-z]+    return TOK_ID;
2034 @end example
2035
2036 @noindent
2037 then you're better off using the full table
2038 representation.  If only the "identifier" rule is
2039 present and you then use a hash table or some such to
2040 detect the keywords, you're better off using @samp{-F}.
2041
2042 This option is equivalent to @samp{-CFr} (see below).  It
2043 cannot be used with @samp{-+}.
2044
2045 @item -I
2046 instructs @code{flex} to generate an @emph{interactive} scanner.
2047 An interactive scanner is one that only looks ahead
2048 to decide what token has been matched if it
2049 absolutely must.  It turns out that always looking one
2050 extra character ahead, even if the scanner has
2051 already seen enough text to disambiguate the
2052 current token, is a bit faster than only looking ahead
2053 when necessary.  But scanners that always look
2054 ahead give dreadful interactive performance; for
2055 example, when a user types a newline, it is not
2056 recognized as a newline token until they enter
2057 @emph{another} token, which often means typing in another
2058 whole line.
2059
2060 @code{Flex} scanners default to @emph{interactive} unless you use
2061 the @samp{-Cf} or @samp{-CF} table-compression options (see
2062 below).  That's because if you're looking for
2063 high-performance you should be using one of these
2064 options, so if you didn't, @code{flex} assumes you'd
2065 rather trade off a bit of run-time performance for
2066 intuitive interactive behavior.  Note also that you
2067 @emph{cannot} use @samp{-I} in conjunction with @samp{-Cf} or @samp{-CF}.
2068 Thus, this option is not really needed; it is on by
2069 default for all those cases in which it is allowed.
2070
2071 You can force a scanner to @emph{not} be interactive by
2072 using @samp{-B} (see above).
2073
2074 @item -L
2075 instructs @code{flex} not to generate @samp{#line} directives.
2076 Without this option, @code{flex} peppers the generated
2077 scanner with #line directives so error messages in
2078 the actions will be correctly located with respect
2079 to either the original @code{flex} input file (if the
2080 errors are due to code in the input file), or
2081 @file{lex.yy.c} (if the errors are @code{flex's} fault -- you
2082 should report these sorts of errors to the email
2083 address given below).
2084
2085 @item -T
2086 makes @code{flex} run in @code{trace} mode.  It will generate a
2087 lot of messages to @code{stderr} concerning the form of
2088 the input and the resultant non-deterministic and
2089 deterministic finite automata.  This option is
2090 mostly for use in maintaining @code{flex}.
2091
2092 @item -V
2093 prints the version number to @code{stdout} and exits.
2094 @samp{--version} is a synonym for @samp{-V}.
2095
2096 @item -7
2097 instructs @code{flex} to generate a 7-bit scanner, i.e.,
2098 one which can only recognized 7-bit characters in
2099 its input.  The advantage of using @samp{-7} is that the
2100 scanner's tables can be up to half the size of
2101 those generated using the @samp{-8} option (see below).
2102 The disadvantage is that such scanners often hang
2103 or crash if their input contains an 8-bit
2104 character.
2105
2106 Note, however, that unless you generate your
2107 scanner using the @samp{-Cf} or @samp{-CF} table compression options,
2108 use of @samp{-7} will save only a small amount of table
2109 space, and make your scanner considerably less
2110 portable.  @code{Flex's} default behavior is to generate
2111 an 8-bit scanner unless you use the @samp{-Cf} or @samp{-CF}, in
2112 which case @code{flex} defaults to generating 7-bit
2113 scanners unless your site was always configured to
2114 generate 8-bit scanners (as will often be the case
2115 with non-USA sites).  You can tell whether flex
2116 generated a 7-bit or an 8-bit scanner by inspecting
2117 the flag summary in the @samp{-v} output as described
2118 above.
2119
2120 Note that if you use @samp{-Cfe} or @samp{-CFe} (those table
2121 compression options, but also using equivalence
2122 classes as discussed see below), flex still
2123 defaults to generating an 8-bit scanner, since
2124 usually with these compression options full 8-bit
2125 tables are not much more expensive than 7-bit
2126 tables.
2127
2128 @item -8
2129 instructs @code{flex} to generate an 8-bit scanner, i.e.,
2130 one which can recognize 8-bit characters.  This
2131 flag is only needed for scanners generated using
2132 @samp{-Cf} or @samp{-CF}, as otherwise flex defaults to
2133 generating an 8-bit scanner anyway.
2134
2135 See the discussion of @samp{-7} above for flex's default
2136 behavior and the tradeoffs between 7-bit and 8-bit
2137 scanners.
2138
2139 @item -+
2140 specifies that you want flex to generate a C++
2141 scanner class.  See the section on Generating C++
2142 Scanners below for details.
2143
2144 @item -C[aefFmr]
2145 controls the degree of table compression and, more
2146 generally, trade-offs between small scanners and
2147 fast scanners.
2148
2149 @samp{-Ca} ("align") instructs flex to trade off larger
2150 tables in the generated scanner for faster
2151 performance because the elements of the tables are better
2152 aligned for memory access and computation.  On some
2153 RISC architectures, fetching and manipulating
2154 long-words is more efficient than with smaller-sized
2155 units such as shortwords.  This option can double
2156 the size of the tables used by your scanner.
2157
2158 @samp{-Ce} directs @code{flex} to construct @dfn{equivalence classes},
2159 i.e., sets of characters which have identical
2160 lexical properties (for example, if the only appearance
2161 of digits in the @code{flex} input is in the character
2162 class "[0-9]" then the digits '0', '1', @dots{}, '9'
2163 will all be put in the same equivalence class).
2164 Equivalence classes usually give dramatic
2165 reductions in the final table/object file sizes
2166 (typically a factor of 2-5) and are pretty cheap
2167 performance-wise (one array look-up per character
2168 scanned).
2169
2170 @samp{-Cf} specifies that the @emph{full} scanner tables should
2171 be generated - @code{flex} should not compress the tables
2172 by taking advantages of similar transition
2173 functions for different states.
2174
2175 @samp{-CF} specifies that the alternate fast scanner
2176 representation (described above under the @samp{-F} flag)
2177 should be used.  This option cannot be used with
2178 @samp{-+}.
2179
2180 @samp{-Cm} directs @code{flex} to construct @dfn{meta-equivalence
2181 classes}, which are sets of equivalence classes (or
2182 characters, if equivalence classes are not being
2183 used) that are commonly used together.
2184 Meta-equivalence classes are often a big win when using
2185 compressed tables, but they have a moderate
2186 performance impact (one or two "if" tests and one array
2187 look-up per character scanned).
2188
2189 @samp{-Cr} causes the generated scanner to @emph{bypass} use of
2190 the standard I/O library (stdio) for input.
2191 Instead of calling @samp{fread()} or @samp{getc()}, the scanner
2192 will use the @samp{read()} system call, resulting in a
2193 performance gain which varies from system to
2194 system, but in general is probably negligible unless
2195 you are also using @samp{-Cf} or @samp{-CF}.  Using @samp{-Cr} can cause
2196 strange behavior if, for example, you read from
2197 @code{yyin} using stdio prior to calling the scanner
2198 (because the scanner will miss whatever text your
2199 previous reads left in the stdio input buffer).
2200
2201 @samp{-Cr} has no effect if you define @code{YY_INPUT} (see The
2202 Generated Scanner above).
2203
2204 A lone @samp{-C} specifies that the scanner tables should
2205 be compressed but neither equivalence classes nor
2206 meta-equivalence classes should be used.
2207
2208 The options @samp{-Cf} or @samp{-CF} and @samp{-Cm} do not make sense
2209 together - there is no opportunity for
2210 meta-equivalence classes if the table is not being
2211 compressed.  Otherwise the options may be freely
2212 mixed, and are cumulative.
2213
2214 The default setting is @samp{-Cem}, which specifies that
2215 @code{flex} should generate equivalence classes and
2216 meta-equivalence classes.  This setting provides the
2217 highest degree of table compression.  You can trade
2218 off faster-executing scanners at the cost of larger
2219 tables with the following generally being true:
2220
2221 @example
2222 slowest & smallest
2223       -Cem
2224       -Cm
2225       -Ce
2226       -C
2227       -C@{f,F@}e
2228       -C@{f,F@}
2229       -C@{f,F@}a
2230 fastest & largest
2231 @end example
2232
2233 Note that scanners with the smallest tables are
2234 usually generated and compiled the quickest, so
2235 during development you will usually want to use the
2236 default, maximal compression.
2237
2238 @samp{-Cfe} is often a good compromise between speed and
2239 size for production scanners.
2240
2241 @item -ooutput
2242 directs flex to write the scanner to the file @samp{out-}
2243 @code{put} instead of @file{lex.yy.c}.  If you combine @samp{-o} with
2244 the @samp{-t} option, then the scanner is written to
2245 @code{stdout} but its @samp{#line} directives (see the @samp{-L} option
2246 above) refer to the file @code{output}.
2247
2248 @item -Pprefix
2249 changes the default @samp{yy} prefix used by @code{flex} for all
2250 globally-visible variable and function names to
2251 instead be @var{prefix}.  For example, @samp{-Pfoo} changes the
2252 name of @code{yytext} to @file{footext}.  It also changes the
2253 name of the default output file from @file{lex.yy.c} to
2254 @file{lex.foo.c}.  Here are all of the names affected:
2255
2256 @example
2257 yy_create_buffer
2258 yy_delete_buffer
2259 yy_flex_debug
2260 yy_init_buffer
2261 yy_flush_buffer
2262 yy_load_buffer_state
2263 yy_switch_to_buffer
2264 yyin
2265 yyleng
2266 yylex
2267 yylineno
2268 yyout
2269 yyrestart
2270 yytext
2271 yywrap
2272 @end example
2273
2274 (If you are using a C++ scanner, then only @code{yywrap}
2275 and @code{yyFlexLexer} are affected.) Within your scanner
2276 itself, you can still refer to the global variables
2277 and functions using either version of their name;
2278 but externally, they have the modified name.
2279
2280 This option lets you easily link together multiple
2281 @code{flex} programs into the same executable.  Note,
2282 though, that using this option also renames
2283 @samp{yywrap()}, so you now @emph{must} either provide your own
2284 (appropriately-named) version of the routine for
2285 your scanner, or use @samp{%option noyywrap}, as linking
2286 with @samp{-lfl} no longer provides one for you by
2287 default.
2288
2289 @item -Sskeleton_file
2290 overrides the default skeleton file from which @code{flex}
2291 constructs its scanners.  You'll never need this
2292 option unless you are doing @code{flex} maintenance or
2293 development.
2294 @end table
2295
2296 @code{flex} also provides a mechanism for controlling options
2297 within the scanner specification itself, rather than from
2298 the flex command-line.  This is done by including @samp{%option}
2299 directives in the first section of the scanner
2300 specification.  You can specify multiple options with a single
2301 @samp{%option} directive, and multiple directives in the first
2302 section of your flex input file.  Most options are given
2303 simply as names, optionally preceded by the word "no"
2304 (with no intervening whitespace) to negate their meaning.
2305 A number are equivalent to flex flags or their negation:
2306
2307 @example
2308 7bit            -7 option
2309 8bit            -8 option
2310 align           -Ca option
2311 backup          -b option
2312 batch           -B option
2313 c++             -+ option
2314
2315 caseful or
2316 case-sensitive  opposite of -i (default)
2317
2318 case-insensitive or
2319 caseless        -i option
2320
2321 debug           -d option
2322 default         opposite of -s option
2323 ecs             -Ce option
2324 fast            -F option
2325 full            -f option
2326 interactive     -I option
2327 lex-compat      -l option
2328 meta-ecs        -Cm option
2329 perf-report     -p option
2330 read            -Cr option
2331 stdout          -t option
2332 verbose         -v option
2333 warn            opposite of -w option
2334                 (use "%option nowarn" for -w)
2335
2336 array           equivalent to "%array"
2337 pointer         equivalent to "%pointer" (default)
2338 @end example
2339
2340 Some @samp{%option's} provide features otherwise not available:
2341
2342 @table @samp
2343 @item always-interactive
2344 instructs flex to generate a scanner which always
2345 considers its input "interactive".  Normally, on
2346 each new input file the scanner calls @samp{isatty()} in
2347 an attempt to determine whether the scanner's input
2348 source is interactive and thus should be read a
2349 character at a time.  When this option is used,
2350 however, then no such call is made.
2351
2352 @item main
2353 directs flex to provide a default @samp{main()} program
2354 for the scanner, which simply calls @samp{yylex()}.  This
2355 option implies @code{noyywrap} (see below).
2356
2357 @item never-interactive
2358 instructs flex to generate a scanner which never
2359 considers its input "interactive" (again, no call
2360 made to @samp{isatty())}.  This is the opposite of @samp{always-}
2361 @emph{interactive}.
2362
2363 @item stack
2364 enables the use of start condition stacks (see
2365 Start Conditions above).
2366
2367 @item stdinit
2368 if unset (i.e., @samp{%option nostdinit}) initializes @code{yyin}
2369 and @code{yyout} to nil @code{FILE} pointers, instead of @code{stdin}
2370 and @code{stdout}.
2371
2372 @item yylineno
2373 directs @code{flex} to generate a scanner that maintains the number
2374 of the current line read from its input in the global variable
2375 @code{yylineno}.  This option is implied by @samp{%option lex-compat}.
2376
2377 @item yywrap
2378 if unset (i.e., @samp{%option noyywrap}), makes the
2379 scanner not call @samp{yywrap()} upon an end-of-file, but
2380 simply assume that there are no more files to scan
2381 (until the user points @code{yyin} at a new file and calls
2382 @samp{yylex()} again).
2383 @end table
2384
2385 @code{flex} scans your rule actions to determine whether you use
2386 the @code{REJECT} or @samp{yymore()} features.  The @code{reject} and @code{yymore}
2387 options are available to override its decision as to
2388 whether you use the options, either by setting them (e.g.,
2389 @samp{%option reject}) to indicate the feature is indeed used, or
2390 unsetting them to indicate it actually is not used (e.g.,
2391 @samp{%option noyymore}).
2392
2393 Three options take string-delimited values, offset with '=':
2394
2395 @example
2396 %option outfile="ABC"
2397 @end example
2398
2399 @noindent
2400 is equivalent to @samp{-oABC}, and
2401
2402 @example
2403 %option prefix="XYZ"
2404 @end example
2405
2406 @noindent
2407 is equivalent to @samp{-PXYZ}.
2408
2409 Finally,
2410
2411 @example
2412 %option yyclass="foo"
2413 @end example
2414
2415 @noindent
2416 only applies when generating a C++ scanner (@samp{-+} option).  It
2417 informs @code{flex} that you have derived @samp{foo} as a subclass of
2418 @code{yyFlexLexer} so @code{flex} will place your actions in the member
2419 function @samp{foo::yylex()} instead of @samp{yyFlexLexer::yylex()}.
2420 It also generates a @samp{yyFlexLexer::yylex()} member function that
2421 emits a run-time error (by invoking @samp{yyFlexLexer::LexerError()})
2422 if called.  See Generating C++ Scanners, below, for additional
2423 information.
2424
2425 A number of options are available for lint purists who
2426 want to suppress the appearance of unneeded routines in
2427 the generated scanner.  Each of the following, if unset,
2428 results in the corresponding routine not appearing in the
2429 generated scanner:
2430
2431 @example
2432 input, unput
2433 yy_push_state, yy_pop_state, yy_top_state
2434 yy_scan_buffer, yy_scan_bytes, yy_scan_string
2435 @end example
2436
2437 @noindent
2438 (though @samp{yy_push_state()} and friends won't appear anyway
2439 unless you use @samp{%option stack}).
2440
2441 @node Performance, C++, Options, Top
2442 @section Performance considerations
2443
2444 The main design goal of @code{flex} is that it generate
2445 high-performance scanners.  It has been optimized for dealing
2446 well with large sets of rules.  Aside from the effects on
2447 scanner speed of the table compression @samp{-C} options outlined
2448 above, there are a number of options/actions which degrade
2449 performance.  These are, from most expensive to least:
2450
2451 @example
2452 REJECT
2453 %option yylineno
2454 arbitrary trailing context
2455
2456 pattern sets that require backing up
2457 %array
2458 %option interactive
2459 %option always-interactive
2460
2461 '^' beginning-of-line operator
2462 yymore()
2463 @end example
2464
2465 with the first three all being quite expensive and the
2466 last two being quite cheap.  Note also that @samp{unput()} is
2467 implemented as a routine call that potentially does quite
2468 a bit of work, while @samp{yyless()} is a quite-cheap macro; so
2469 if just putting back some excess text you scanned, use
2470 @samp{yyless()}.
2471
2472 @code{REJECT} should be avoided at all costs when performance is
2473 important.  It is a particularly expensive option.
2474
2475 Getting rid of backing up is messy and often may be an
2476 enormous amount of work for a complicated scanner.  In
2477 principal, one begins by using the @samp{-b} flag to generate a
2478 @file{lex.backup} file.  For example, on the input
2479
2480 @example
2481 %%
2482 foo        return TOK_KEYWORD;
2483 foobar     return TOK_KEYWORD;
2484 @end example
2485
2486 @noindent
2487 the file looks like:
2488
2489 @example
2490 State #6 is non-accepting -
2491  associated rule line numbers:
2492        2       3
2493  out-transitions: [ o ]
2494  jam-transitions: EOF [ \001-n  p-\177 ]
2495
2496 State #8 is non-accepting -
2497  associated rule line numbers:
2498        3
2499  out-transitions: [ a ]
2500  jam-transitions: EOF [ \001-`  b-\177 ]
2501
2502 State #9 is non-accepting -
2503  associated rule line numbers:
2504        3
2505  out-transitions: [ r ]
2506  jam-transitions: EOF [ \001-q  s-\177 ]
2507
2508 Compressed tables always back up.
2509 @end example
2510
2511 The first few lines tell us that there's a scanner state
2512 in which it can make a transition on an 'o' but not on any
2513 other character, and that in that state the currently
2514 scanned text does not match any rule.  The state occurs
2515 when trying to match the rules found at lines 2 and 3 in
2516 the input file.  If the scanner is in that state and then
2517 reads something other than an 'o', it will have to back up
2518 to find a rule which is matched.  With a bit of
2519 head-scratching one can see that this must be the state it's in
2520 when it has seen "fo".  When this has happened, if
2521 anything other than another 'o' is seen, the scanner will
2522 have to back up to simply match the 'f' (by the default
2523 rule).
2524
2525 The comment regarding State #8 indicates there's a problem
2526 when "foob" has been scanned.  Indeed, on any character
2527 other than an 'a', the scanner will have to back up to
2528 accept "foo".  Similarly, the comment for State #9
2529 concerns when "fooba" has been scanned and an 'r' does not
2530 follow.
2531
2532 The final comment reminds us that there's no point going
2533 to all the trouble of removing backing up from the rules
2534 unless we're using @samp{-Cf} or @samp{-CF}, since there's no
2535 performance gain doing so with compressed scanners.
2536
2537 The way to remove the backing up is to add "error" rules:
2538
2539 @example
2540 %%
2541 foo         return TOK_KEYWORD;
2542 foobar      return TOK_KEYWORD;
2543
2544 fooba       |
2545 foob        |
2546 fo          @{
2547             /* false alarm, not really a keyword */
2548             return TOK_ID;
2549             @}
2550 @end example
2551
2552 Eliminating backing up among a list of keywords can also
2553 be done using a "catch-all" rule:
2554
2555 @example
2556 %%
2557 foo         return TOK_KEYWORD;
2558 foobar      return TOK_KEYWORD;
2559
2560 [a-z]+      return TOK_ID;
2561 @end example
2562
2563 This is usually the best solution when appropriate.
2564
2565 Backing up messages tend to cascade.  With a complicated
2566 set of rules it's not uncommon to get hundreds of
2567 messages.  If one can decipher them, though, it often only
2568 takes a dozen or so rules to eliminate the backing up
2569 (though it's easy to make a mistake and have an error rule
2570 accidentally match a valid token.  A possible future @code{flex}
2571 feature will be to automatically add rules to eliminate
2572 backing up).
2573
2574 It's important to keep in mind that you gain the benefits
2575 of eliminating backing up only if you eliminate @emph{every}
2576 instance of backing up.  Leaving just one means you gain
2577 nothing.
2578
2579 @var{Variable} trailing context (where both the leading and
2580 trailing parts do not have a fixed length) entails almost
2581 the same performance loss as @code{REJECT} (i.e., substantial).
2582 So when possible a rule like:
2583
2584 @example
2585 %%
2586 mouse|rat/(cat|dog)   run();
2587 @end example
2588
2589 @noindent
2590 is better written:
2591
2592 @example
2593 %%
2594 mouse/cat|dog         run();
2595 rat/cat|dog           run();
2596 @end example
2597
2598 @noindent
2599 or as
2600
2601 @example
2602 %%
2603 mouse|rat/cat         run();
2604 mouse|rat/dog         run();
2605 @end example
2606
2607 Note that here the special '|' action does @emph{not} provide any
2608 savings, and can even make things worse (see Deficiencies
2609 / Bugs below).
2610
2611 Another area where the user can increase a scanner's
2612 performance (and one that's easier to implement) arises from
2613 the fact that the longer the tokens matched, the faster
2614 the scanner will run.  This is because with long tokens
2615 the processing of most input characters takes place in the
2616 (short) inner scanning loop, and does not often have to go
2617 through the additional work of setting up the scanning
2618 environment (e.g., @code{yytext}) for the action.  Recall the
2619 scanner for C comments:
2620
2621 @example
2622 %x comment
2623 %%
2624         int line_num = 1;
2625
2626 "/*"         BEGIN(comment);
2627
2628 <comment>[^*\n]*
2629 <comment>"*"+[^*/\n]*
2630 <comment>\n             ++line_num;
2631 <comment>"*"+"/"        BEGIN(INITIAL);
2632 @end example
2633
2634 This could be sped up by writing it as:
2635
2636 @example
2637 %x comment
2638 %%
2639         int line_num = 1;
2640
2641 "/*"         BEGIN(comment);
2642
2643 <comment>[^*\n]*
2644 <comment>[^*\n]*\n      ++line_num;
2645 <comment>"*"+[^*/\n]*
2646 <comment>"*"+[^*/\n]*\n ++line_num;
2647 <comment>"*"+"/"        BEGIN(INITIAL);
2648 @end example
2649
2650 Now instead of each newline requiring the processing of
2651 another action, recognizing the newlines is "distributed"
2652 over the other rules to keep the matched text as long as
2653 possible.  Note that @emph{adding} rules does @emph{not} slow down the
2654 scanner!  The speed of the scanner is independent of the
2655 number of rules or (modulo the considerations given at the
2656 beginning of this section) how complicated the rules are
2657 with regard to operators such as '*' and '|'.
2658
2659 A final example in speeding up a scanner: suppose you want
2660 to scan through a file containing identifiers and
2661 keywords, one per line and with no other extraneous
2662 characters, and recognize all the keywords.  A natural first
2663 approach is:
2664
2665 @example
2666 %%
2667 asm      |
2668 auto     |
2669 break    |
2670 @dots{} etc @dots{}
2671 volatile |
2672 while    /* it's a keyword */
2673
2674 .|\n     /* it's not a keyword */
2675 @end example
2676
2677 To eliminate the back-tracking, introduce a catch-all
2678 rule:
2679
2680 @example
2681 %%
2682 asm      |
2683 auto     |
2684 break    |
2685 ... etc ...
2686 volatile |
2687 while    /* it's a keyword */
2688
2689 [a-z]+   |
2690 .|\n     /* it's not a keyword */
2691 @end example
2692
2693 Now, if it's guaranteed that there's exactly one word per
2694 line, then we can reduce the total number of matches by a
2695 half by merging in the recognition of newlines with that
2696 of the other tokens:
2697
2698 @example
2699 %%
2700 asm\n    |
2701 auto\n   |
2702 break\n  |
2703 @dots{} etc @dots{}
2704 volatile\n |
2705 while\n  /* it's a keyword */
2706
2707 [a-z]+\n |
2708 .|\n     /* it's not a keyword */
2709 @end example
2710
2711 One has to be careful here, as we have now reintroduced
2712 backing up into the scanner.  In particular, while @emph{we} know
2713 that there will never be any characters in the input
2714 stream other than letters or newlines, @code{flex} can't figure
2715 this out, and it will plan for possibly needing to back up
2716 when it has scanned a token like "auto" and then the next
2717 character is something other than a newline or a letter.
2718 Previously it would then just match the "auto" rule and be
2719 done, but now it has no "auto" rule, only a "auto\n" rule.
2720 To eliminate the possibility of backing up, we could
2721 either duplicate all rules but without final newlines, or,
2722 since we never expect to encounter such an input and
2723 therefore don't how it's classified, we can introduce one
2724 more catch-all rule, this one which doesn't include a
2725 newline:
2726
2727 @example
2728 %%
2729 asm\n    |
2730 auto\n   |
2731 break\n  |
2732 @dots{} etc @dots{}
2733 volatile\n |
2734 while\n  /* it's a keyword */
2735
2736 [a-z]+\n |
2737 [a-z]+   |
2738 .|\n     /* it's not a keyword */
2739 @end example
2740
2741 Compiled with @samp{-Cf}, this is about as fast as one can get a
2742 @code{flex} scanner to go for this particular problem.
2743
2744 A final note: @code{flex} is slow when matching NUL's,
2745 particularly when a token contains multiple NUL's.  It's best to
2746 write rules which match @emph{short} amounts of text if it's
2747 anticipated that the text will often include NUL's.
2748
2749 Another final note regarding performance: as mentioned
2750 above in the section How the Input is Matched, dynamically
2751 resizing @code{yytext} to accommodate huge tokens is a slow
2752 process because it presently requires that the (huge) token
2753 be rescanned from the beginning.  Thus if performance is
2754 vital, you should attempt to match "large" quantities of
2755 text but not "huge" quantities, where the cutoff between
2756 the two is at about 8K characters/token.
2757
2758 @node C++, Incompatibilities, Performance, Top
2759 @section Generating C++ scanners
2760
2761 @code{flex} provides two different ways to generate scanners for
2762 use with C++.  The first way is to simply compile a
2763 scanner generated by @code{flex} using a C++ compiler instead of a C
2764 compiler.  You should not encounter any compilations
2765 errors (please report any you find to the email address
2766 given in the Author section below).  You can then use C++
2767 code in your rule actions instead of C code.  Note that
2768 the default input source for your scanner remains @code{yyin},
2769 and default echoing is still done to @code{yyout}.  Both of these
2770 remain @samp{FILE *} variables and not C++ @code{streams}.
2771
2772 You can also use @code{flex} to generate a C++ scanner class, using
2773 the @samp{-+} option, (or, equivalently, @samp{%option c++}), which
2774 is automatically specified if the name of the flex executable ends
2775 in a @samp{+}, such as @code{flex++}.  When using this option, flex
2776 defaults to generating the scanner to the file @file{lex.yy.cc} instead
2777 of @file{lex.yy.c}.  The generated scanner includes the header file
2778 @file{FlexLexer.h}, which defines the interface to two C++ classes.
2779
2780 The first class, @code{FlexLexer}, provides an abstract base
2781 class defining the general scanner class interface.  It
2782 provides the following member functions:
2783
2784 @table @samp
2785 @item const char* YYText()
2786 returns the text of the most recently matched
2787 token, the equivalent of @code{yytext}.
2788
2789 @item int YYLeng()
2790 returns the length of the most recently matched
2791 token, the equivalent of @code{yyleng}.
2792
2793 @item int lineno() const
2794 returns the current input line number (see @samp{%option yylineno}),
2795 or 1 if @samp{%option yylineno} was not used.
2796
2797 @item void set_debug( int flag )
2798 sets the debugging flag for the scanner, equivalent to assigning to
2799 @code{yy_flex_debug} (see the Options section above).  Note that you
2800 must build the scanner using @samp{%option debug} to include debugging
2801 information in it.
2802
2803 @item int debug() const
2804 returns the current setting of the debugging flag.
2805 @end table
2806
2807 Also provided are member functions equivalent to
2808 @samp{yy_switch_to_buffer(), yy_create_buffer()} (though the
2809 first argument is an @samp{istream*} object pointer and not a
2810 @samp{FILE*}, @samp{yy_flush_buffer()}, @samp{yy_delete_buffer()},
2811 and @samp{yyrestart()} (again, the first argument is a @samp{istream*}
2812 object pointer).
2813
2814 The second class defined in @file{FlexLexer.h} is @code{yyFlexLexer},
2815 which is derived from @code{FlexLexer}.  It defines the following
2816 additional member functions:
2817
2818 @table @samp
2819 @item yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
2820 constructs a @code{yyFlexLexer} object using the given
2821 streams for input and output.  If not specified,
2822 the streams default to @code{cin} and @code{cout}, respectively.
2823
2824 @item virtual int yylex()
2825 performs the same role is @samp{yylex()} does for ordinary
2826 flex scanners: it scans the input stream, consuming
2827 tokens, until a rule's action returns a value.  If you derive a subclass
2828 @var{S}
2829 from @code{yyFlexLexer}
2830 and want to access the member functions and variables of
2831 @var{S}
2832 inside @samp{yylex()},
2833 then you need to use @samp{%option yyclass="@var{S}"}
2834 to inform @code{flex}
2835 that you will be using that subclass instead of @code{yyFlexLexer}.
2836 In this case, rather than generating @samp{yyFlexLexer::yylex()},
2837 @code{flex} generates @samp{@var{S}::yylex()}
2838 (and also generates a dummy @samp{yyFlexLexer::yylex()}
2839 that calls @samp{yyFlexLexer::LexerError()}
2840 if called).
2841
2842 @item virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)
2843 reassigns @code{yyin} to @code{new_in}
2844 (if non-nil)
2845 and @code{yyout} to @code{new_out}
2846 (ditto), deleting the previous input buffer if @code{yyin}
2847 is reassigned.
2848
2849 @item int yylex( istream* new_in = 0, ostream* new_out = 0 )
2850 first switches the input streams via @samp{switch_streams( new_in, new_out )}
2851 and then returns the value of @samp{yylex()}.
2852 @end table
2853
2854 In addition, @code{yyFlexLexer} defines the following protected
2855 virtual functions which you can redefine in derived
2856 classes to tailor the scanner:
2857
2858 @table @samp
2859 @item virtual int LexerInput( char* buf, int max_size )
2860 reads up to @samp{max_size} characters into @var{buf} and
2861 returns the number of characters read.  To indicate
2862 end-of-input, return 0 characters.  Note that
2863 "interactive" scanners (see the @samp{-B} and @samp{-I} flags)
2864 define the macro @code{YY_INTERACTIVE}.  If you redefine
2865 @code{LexerInput()} and need to take different actions
2866 depending on whether or not the scanner might be
2867 scanning an interactive input source, you can test
2868 for the presence of this name via @samp{#ifdef}.
2869
2870 @item virtual void LexerOutput( const char* buf, int size )
2871 writes out @var{size} characters from the buffer @var{buf},
2872 which, while NUL-terminated, may also contain
2873 "internal" NUL's if the scanner's rules can match
2874 text with NUL's in them.
2875
2876 @item virtual void LexerError( const char* msg )
2877 reports a fatal error message.  The default version
2878 of this function writes the message to the stream
2879 @code{cerr} and exits.
2880 @end table
2881
2882 Note that a @code{yyFlexLexer} object contains its @emph{entire}
2883 scanning state.  Thus you can use such objects to create
2884 reentrant scanners.  You can instantiate multiple instances of
2885 the same @code{yyFlexLexer} class, and you can also combine
2886 multiple C++ scanner classes together in the same program
2887 using the @samp{-P} option discussed above.
2888 Finally, note that the @samp{%array} feature is not available to
2889 C++ scanner classes; you must use @samp{%pointer} (the default).
2890
2891 Here is an example of a simple C++ scanner:
2892
2893 @example
2894     // An example of using the flex C++ scanner class.
2895
2896 %@{
2897 int mylineno = 0;
2898 %@}
2899
2900 string  \"[^\n"]+\"
2901
2902 ws      [ \t]+
2903
2904 alpha   [A-Za-z]
2905 dig     [0-9]
2906 name    (@{alpha@}|@{dig@}|\$)(@{alpha@}|@{dig@}|[_.\-/$])*
2907 num1    [-+]?@{dig@}+\.?([eE][-+]?@{dig@}+)?
2908 num2    [-+]?@{dig@}*\.@{dig@}+([eE][-+]?@{dig@}+)?
2909 number  @{num1@}|@{num2@}
2910
2911 %%
2912
2913 @{ws@}    /* skip blanks and tabs */
2914
2915 "/*"    @{
2916         int c;
2917
2918         while((c = yyinput()) != 0)
2919             @{
2920             if(c == '\n')
2921                 ++mylineno;
2922
2923             else if(c == '*')
2924                 @{
2925                 if((c = yyinput()) == '/')
2926                     break;
2927                 else
2928                     unput(c);
2929                 @}
2930             @}
2931         @}
2932
2933 @{number@}  cout << "number " << YYText() << '\n';
2934
2935 \n        mylineno++;
2936
2937 @{name@}    cout << "name " << YYText() << '\n';
2938
2939 @{string@}  cout << "string " << YYText() << '\n';
2940
2941 %%
2942
2943 Version 2.5               December 1994                        44
2944
2945 int main( int /* argc */, char** /* argv */ )
2946     @{
2947     FlexLexer* lexer = new yyFlexLexer;
2948     while(lexer->yylex() != 0)
2949         ;
2950     return 0;
2951     @}
2952 @end example
2953
2954 If you want to create multiple (different) lexer classes,
2955 you use the @samp{-P} flag (or the @samp{prefix=} option) to rename each
2956 @code{yyFlexLexer} to some other @code{xxFlexLexer}.  You then can
2957 include @samp{<FlexLexer.h>} in your other sources once per lexer
2958 class, first renaming @code{yyFlexLexer} as follows:
2959
2960 @example
2961 #undef yyFlexLexer
2962 #define yyFlexLexer xxFlexLexer
2963 #include <FlexLexer.h>
2964
2965 #undef yyFlexLexer
2966 #define yyFlexLexer zzFlexLexer
2967 #include <FlexLexer.h>
2968 @end example
2969
2970 if, for example, you used @samp{%option prefix="xx"} for one of
2971 your scanners and @samp{%option prefix="zz"} for the other.
2972
2973 IMPORTANT: the present form of the scanning class is
2974 @emph{experimental} and may change considerably between major
2975 releases.
2976
2977 @node Incompatibilities, Diagnostics, C++, Top
2978 @section Incompatibilities with @code{lex} and POSIX
2979
2980 @code{flex} is a rewrite of the AT&T Unix @code{lex} tool (the two
2981 implementations do not share any code, though), with some
2982 extensions and incompatibilities, both of which are of
2983 concern to those who wish to write scanners acceptable to
2984 either implementation.  Flex is fully compliant with the
2985 POSIX @code{lex} specification, except that when using @samp{%pointer}
2986 (the default), a call to @samp{unput()} destroys the contents of
2987 @code{yytext}, which is counter to the POSIX specification.
2988
2989 In this section we discuss all of the known areas of
2990 incompatibility between flex, AT&T lex, and the POSIX
2991 specification.
2992
2993 @code{flex's} @samp{-l} option turns on maximum compatibility with the
2994 original AT&T @code{lex} implementation, at the cost of a major
2995 loss in the generated scanner's performance.  We note
2996 below which incompatibilities can be overcome using the @samp{-l}
2997 option.
2998
2999 @code{flex} is fully compatible with @code{lex} with the following
3000 exceptions:
3001
3002 @itemize -
3003 @item
3004 The undocumented @code{lex} scanner internal variable @code{yylineno}
3005 is not supported unless @samp{-l} or @samp{%option yylineno} is used.
3006 @code{yylineno} should be maintained on a per-buffer basis, rather
3007 than a per-scanner (single global variable) basis.  @code{yylineno} is
3008 not part of the POSIX specification.
3009
3010 @item
3011 The @samp{input()} routine is not redefinable, though it
3012 may be called to read characters following whatever
3013 has been matched by a rule.  If @samp{input()} encounters
3014 an end-of-file the normal @samp{yywrap()} processing is
3015 done.  A ``real'' end-of-file is returned by
3016 @samp{input()} as @code{EOF}.
3017
3018 Input is instead controlled by defining the
3019 @code{YY_INPUT} macro.
3020
3021 The @code{flex} restriction that @samp{input()} cannot be
3022 redefined is in accordance with the POSIX
3023 specification, which simply does not specify any way of
3024 controlling the scanner's input other than by making
3025 an initial assignment to @code{yyin}.
3026
3027 @item
3028 The @samp{unput()} routine is not redefinable.  This
3029 restriction is in accordance with POSIX.
3030
3031 @item
3032 @code{flex} scanners are not as reentrant as @code{lex} scanners.
3033 In particular, if you have an interactive scanner
3034 and an interrupt handler which long-jumps out of
3035 the scanner, and the scanner is subsequently called
3036 again, you may get the following message:
3037
3038 @example
3039 fatal flex scanner internal error--end of buffer missed
3040 @end example
3041
3042 To reenter the scanner, first use
3043
3044 @example
3045 yyrestart( yyin );
3046 @end example
3047
3048 Note that this call will throw away any buffered
3049 input; usually this isn't a problem with an
3050 interactive scanner.
3051
3052 Also note that flex C++ scanner classes @emph{are}
3053 reentrant, so if using C++ is an option for you, you
3054 should use them instead.  See "Generating C++
3055 Scanners" above for details.
3056
3057 @item
3058 @samp{output()} is not supported.  Output from the @samp{ECHO}
3059 macro is done to the file-pointer @code{yyout} (default
3060 @code{stdout}).
3061
3062 @samp{output()} is not part of the POSIX specification.
3063
3064 @item
3065 @code{lex} does not support exclusive start conditions
3066 (%x), though they are in the POSIX specification.
3067
3068 @item
3069 When definitions are expanded, @code{flex} encloses them
3070 in parentheses.  With lex, the following:
3071
3072 @example
3073 NAME    [A-Z][A-Z0-9]*
3074 %%
3075 foo@{NAME@}?      printf( "Found it\n" );
3076 %%
3077 @end example
3078
3079 will not match the string "foo" because when the
3080 macro is expanded the rule is equivalent to
3081 "foo[A-Z][A-Z0-9]*?" and the precedence is such that the
3082 '?' is associated with "[A-Z0-9]*".  With @code{flex}, the
3083 rule will be expanded to "foo([A-Z][A-Z0-9]*)?" and
3084 so the string "foo" will match.
3085
3086 Note that if the definition begins with @samp{^} or ends
3087 with @samp{$} then it is @emph{not} expanded with parentheses, to
3088 allow these operators to appear in definitions
3089 without losing their special meanings.  But the
3090 @samp{<s>, /}, and @samp{<<EOF>>} operators cannot be used in a
3091 @code{flex} definition.
3092
3093 Using @samp{-l} results in the @code{lex} behavior of no
3094 parentheses around the definition.
3095
3096 The POSIX specification is that the definition be enclosed in
3097 parentheses.
3098
3099 @item
3100 Some implementations of @code{lex} allow a rule's action to begin on
3101 a separate line, if the rule's pattern has trailing whitespace:
3102
3103 @example
3104 %%
3105 foo|bar<space here>
3106   @{ foobar_action(); @}
3107 @end example
3108
3109 @code{flex} does not support this feature.
3110
3111 @item
3112 The @code{lex} @samp{%r} (generate a Ratfor scanner) option is
3113 not supported.  It is not part of the POSIX
3114 specification.
3115
3116 @item
3117 After a call to @samp{unput()}, @code{yytext} is undefined until
3118 the next token is matched, unless the scanner was
3119 built using @samp{%array}.  This is not the case with @code{lex}
3120 or the POSIX specification.  The @samp{-l} option does
3121 away with this incompatibility.
3122
3123 @item
3124 The precedence of the @samp{@{@}} (numeric range) operator
3125 is different.  @code{lex} interprets "abc@{1,3@}" as "match
3126 one, two, or three occurrences of 'abc'", whereas
3127 @code{flex} interprets it as "match 'ab' followed by one,
3128 two, or three occurrences of 'c'".  The latter is
3129 in agreement with the POSIX specification.
3130
3131 @item
3132 The precedence of the @samp{^} operator is different.  @code{lex}
3133 interprets "^foo|bar" as "match either 'foo' at the
3134 beginning of a line, or 'bar' anywhere", whereas
3135 @code{flex} interprets it as "match either 'foo' or 'bar'
3136 if they come at the beginning of a line".  The
3137 latter is in agreement with the POSIX specification.
3138
3139 @item
3140 The special table-size declarations such as @samp{%a}
3141 supported by @code{lex} are not required by @code{flex} scanners;
3142 @code{flex} ignores them.
3143
3144 @item
3145 The name FLEX_SCANNER is #define'd so scanners may
3146 be written for use with either @code{flex} or @code{lex}.
3147 Scanners also include @code{YY_FLEX_MAJOR_VERSION} and
3148 @code{YY_FLEX_MINOR_VERSION} indicating which version of
3149 @code{flex} generated the scanner (for example, for the
3150 2.5 release, these defines would be 2 and 5
3151 respectively).
3152 @end itemize
3153
3154 The following @code{flex} features are not included in @code{lex} or the
3155 POSIX specification:
3156
3157 @example
3158 C++ scanners
3159 %option
3160 start condition scopes
3161 start condition stacks
3162 interactive/non-interactive scanners
3163 yy_scan_string() and friends
3164 yyterminate()
3165 yy_set_interactive()
3166 yy_set_bol()
3167 YY_AT_BOL()
3168 <<EOF>>
3169 <*>
3170 YY_DECL
3171 YY_START
3172 YY_USER_ACTION
3173 YY_USER_INIT
3174 #line directives
3175 %@{@}'s around actions
3176 multiple actions on a line
3177 @end example
3178
3179 @noindent
3180 plus almost all of the flex flags.  The last feature in
3181 the list refers to the fact that with @code{flex} you can put
3182 multiple actions on the same line, separated with
3183 semicolons, while with @code{lex}, the following
3184
3185 @example
3186 foo    handle_foo(); ++num_foos_seen;
3187 @end example
3188
3189 @noindent
3190 is (rather surprisingly) truncated to
3191
3192 @example
3193 foo    handle_foo();
3194 @end example
3195
3196 @code{flex} does not truncate the action.  Actions that are not
3197 enclosed in braces are simply terminated at the end of the
3198 line.
3199
3200 @node Diagnostics, Files, Incompatibilities, Top
3201 @section Diagnostics
3202
3203 @table @samp
3204 @item warning, rule cannot be matched
3205 indicates that the given
3206 rule cannot be matched because it follows other rules that
3207 will always match the same text as it.  For example, in
3208 the following "foo" cannot be matched because it comes
3209 after an identifier "catch-all" rule:
3210
3211 @example
3212 [a-z]+    got_identifier();
3213 foo       got_foo();
3214 @end example
3215
3216 Using @code{REJECT} in a scanner suppresses this warning.
3217
3218 @item warning, -s option given but default rule can be matched
3219 means that it is possible (perhaps only in a particular
3220 start condition) that the default rule (match any single
3221 character) is the only one that will match a particular
3222 input.  Since @samp{-s} was given, presumably this is not
3223 intended.
3224
3225 @item reject_used_but_not_detected undefined
3226 @itemx yymore_used_but_not_detected undefined
3227 These errors can
3228 occur at compile time.  They indicate that the scanner
3229 uses @code{REJECT} or @samp{yymore()} but that @code{flex} failed to notice the
3230 fact, meaning that @code{flex} scanned the first two sections
3231 looking for occurrences of these actions and failed to
3232 find any, but somehow you snuck some in (via a #include
3233 file, for example).  Use @samp{%option reject} or @samp{%option yymore}
3234 to indicate to flex that you really do use these features.
3235
3236 @item flex scanner jammed
3237 a scanner compiled with @samp{-s} has
3238 encountered an input string which wasn't matched by any of
3239 its rules.  This error can also occur due to internal
3240 problems.
3241
3242 @item token too large, exceeds YYLMAX
3243 your scanner uses @samp{%array}
3244 and one of its rules matched a string longer than the @samp{YYL-}
3245 @code{MAX} constant (8K bytes by default).  You can increase the
3246 value by #define'ing @code{YYLMAX} in the definitions section of
3247 your @code{flex} input.
3248
3249 @item scanner requires -8 flag to use the character '@var{x}'
3250 Your
3251 scanner specification includes recognizing the 8-bit
3252 character @var{x} and you did not specify the -8 flag, and your
3253 scanner defaulted to 7-bit because you used the @samp{-Cf} or @samp{-CF}
3254 table compression options.  See the discussion of the @samp{-7}
3255 flag for details.
3256
3257 @item flex scanner push-back overflow
3258 you used @samp{unput()} to push
3259 back so much text that the scanner's buffer could not hold
3260 both the pushed-back text and the current token in @code{yytext}.
3261 Ideally the scanner should dynamically resize the buffer
3262 in this case, but at present it does not.
3263
3264 @item input buffer overflow, can't enlarge buffer because scanner uses REJECT
3265 the scanner was working on matching an
3266 extremely large token and needed to expand the input
3267 buffer.  This doesn't work with scanners that use @code{REJECT}.
3268
3269 @item fatal flex scanner internal error--end of buffer missed
3270 This can occur in an scanner which is reentered after a
3271 long-jump has jumped out (or over) the scanner's
3272 activation frame.  Before reentering the scanner, use:
3273
3274 @example
3275 yyrestart( yyin );
3276 @end example
3277
3278 @noindent
3279 or, as noted above, switch to using the C++ scanner class.
3280
3281 @item too many start conditions in <> construct!
3282 you listed
3283 more start conditions in a <> construct than exist (so you
3284 must have listed at least one of them twice).
3285 @end table
3286
3287 @node Files, Deficiencies, Diagnostics, Top
3288 @section Files
3289
3290 @table @file
3291 @item -lfl
3292 library with which scanners must be linked.
3293
3294 @item lex.yy.c
3295 generated scanner (called @file{lexyy.c} on some systems).
3296
3297 @item lex.yy.cc
3298 generated C++ scanner class, when using @samp{-+}.
3299
3300 @item <FlexLexer.h>
3301 header file defining the C++ scanner base class,
3302 @code{FlexLexer}, and its derived class, @code{yyFlexLexer}.
3303
3304 @item flex.skl
3305 skeleton scanner.  This file is only used when
3306 building flex, not when flex executes.
3307
3308 @item lex.backup
3309 backing-up information for @samp{-b} flag (called @file{lex.bck}
3310 on some systems).
3311 @end table
3312
3313 @node Deficiencies, See also, Files, Top
3314 @section Deficiencies / Bugs
3315
3316 Some trailing context patterns cannot be properly matched
3317 and generate warning messages ("dangerous trailing
3318 context").  These are patterns where the ending of the first
3319 part of the rule matches the beginning of the second part,
3320 such as "zx*/xy*", where the 'x*' matches the 'x' at the
3321 beginning of the trailing context.  (Note that the POSIX
3322 draft states that the text matched by such patterns is
3323 undefined.)
3324
3325 For some trailing context rules, parts which are actually
3326 fixed-length are not recognized as such, leading to the
3327 abovementioned performance loss.  In particular, parts
3328 using '|' or @{n@} (such as "foo@{3@}") are always considered
3329 variable-length.
3330
3331 Combining trailing context with the special '|' action can
3332 result in @emph{fixed} trailing context being turned into the
3333 more expensive @var{variable} trailing context.  For example, in
3334 the following:
3335
3336 @example
3337 %%
3338 abc      |
3339 xyz/def
3340 @end example
3341
3342 Use of @samp{unput()} invalidates yytext and yyleng, unless the
3343 @samp{%array} directive or the @samp{-l} option has been used.
3344
3345 Pattern-matching of NUL's is substantially slower than
3346 matching other characters.
3347
3348 Dynamic resizing of the input buffer is slow, as it
3349 entails rescanning all the text matched so far by the
3350 current (generally huge) token.
3351
3352 Due to both buffering of input and read-ahead, you cannot
3353 intermix calls to <stdio.h> routines, such as, for
3354 example, @samp{getchar()}, with @code{flex} rules and expect it to work.
3355 Call @samp{input()} instead.
3356
3357 The total table entries listed by the @samp{-v} flag excludes the
3358 number of table entries needed to determine what rule has
3359 been matched.  The number of entries is equal to the
3360 number of DFA states if the scanner does not use @code{REJECT}, and
3361 somewhat greater than the number of states if it does.
3362
3363 @code{REJECT} cannot be used with the @samp{-f} or @samp{-F} options.
3364
3365 The @code{flex} internal algorithms need documentation.
3366
3367 @node See also, Author, Deficiencies, Top
3368 @section See also
3369
3370 @code{lex}(1), @code{yacc}(1), @code{sed}(1), @code{awk}(1).
3371
3372 John Levine, Tony Mason, and Doug Brown: Lex & Yacc;
3373 O'Reilly and Associates.  Be sure to get the 2nd edition.
3374
3375 M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator.
3376
3377 Alfred Aho, Ravi Sethi and Jeffrey Ullman: Compilers:
3378 Principles, Techniques and Tools; Addison-Wesley (1986).
3379 Describes the pattern-matching techniques used by @code{flex}
3380 (deterministic finite automata).
3381
3382 @node Author,  , See also, Top
3383 @section Author
3384
3385 Vern Paxson, with the help of many ideas and much inspiration from
3386 Van Jacobson.  Original version by Jef Poskanzer.  The fast table
3387 representation is a partial implementation of a design done by Van
3388 Jacobson.  The implementation was done by Kevin Gong and Vern Paxson.
3389
3390 Thanks to the many @code{flex} beta-testers, feedbackers, and
3391 contributors, especially Francois Pinard, Casey Leedom, Stan
3392 Adermann, Terry Allen, David Barker-Plummer, John Basrai, Nelson
3393 H.F. Beebe, @samp{benson@@odi.com}, Karl Berry, Peter A. Bigot,
3394 Simon Blanchard, Keith Bostic, Frederic Brehm, Ian Brockbank, Kin
3395 Cho, Nick Christopher, Brian Clapper, J.T. Conklin, Jason Coughlin,
3396 Bill Cox, Nick Cropper, Dave Curtis, Scott David Daniels, Chris
3397 G. Demetriou, Theo Deraadt, Mike Donahue, Chuck Doucette, Tom Epperly,
3398 Leo Eskin, Chris Faylor, Chris Flatters, Jon Forrest, Joe Gayda, Kaveh
3399 R. Ghazi, Eric Goldman, Christopher M.  Gould, Ulrich Grepel, Peer
3400 Griebel, Jan Hajic, Charles Hemphill, NORO Hideo, Jarkko Hietaniemi,
3401 Scott Hofmann, Jeff Honig, Dana Hudes, Eric Hughes, John Interrante,
3402 Ceriel Jacobs, Michal Jaegermann, Sakari Jalovaara, Jeffrey R. Jones,
3403 Henry Juengst, Klaus Kaempf, Jonathan I. Kamens, Terrence O Kane,
3404 Amir Katz, @samp{ken@@ken.hilco.com}, Kevin B. Kenny, Steve Kirsch,
3405 Winfried Koenig, Marq Kole, Ronald Lamprecht, Greg Lee, Rohan Lenard,
3406 Craig Leres, John Levine, Steve Liddle, Mike Long, Mohamed el Lozy,
3407 Brian Madsen, Malte, Joe Marshall, Bengt Martensson, Chris Metcalf,
3408 Luke Mewburn, Jim Meyering, R.  Alexander Milowski, Erik Naggum,
3409 G.T. Nicol, Landon Noll, James Nordby, Marc Nozell, Richard Ohnemus,
3410 Karsten Pahnke, Sven Panne, Roland Pesch, Walter Pelissero, Gaumond
3411 Pierre, Esmond Pitt, Jef Poskanzer, Joe Rahmeh, Jarmo Raiha, Frederic
3412 Raimbault, Pat Rankin, Rick Richardson, Kevin Rodgers, Kai Uwe Rommel,
3413 Jim Roskind, Alberto Santini, Andreas Scherer, Darrell Schiebel, Raf
3414 Schietekat, Doug Schmidt, Philippe Schnoebelen, Andreas Schwab, Alex
3415 Siegel, Eckehard Stolz, Jan-Erik Strvmquist, Mike Stump, Paul Stuart,
3416 Dave Tallman, Ian Lance Taylor, Chris Thewalt, Richard M. Timoney,
3417 Jodi Tsai, Paul Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms,
3418 Kent Williams, Ken Yap, Ron Zellar, Nathan Zelle, David Zuhn, and
3419 those whose names have slipped my marginal mail-archiving skills but
3420 whose contributions are appreciated all the same.
3421
3422 Thanks to Keith Bostic, Jon Forrest, Noah Friedman, John Gilmore,
3423 Craig Leres, John Levine, Bob Mulcahy, G.T.  Nicol, Francois Pinard,
3424 Rich Salz, and Richard Stallman for help with various distribution
3425 headaches.
3426
3427 Thanks to Esmond Pitt and Earle Horton for 8-bit character support;
3428 to Benson Margulies and Fred Burke for C++ support; to Kent Williams
3429 and Tom Epperly for C++ class support; to Ove Ewerlid for support of
3430 NUL's; and to Eric Hughes for support of multiple buffers.
3431
3432 This work was primarily done when I was with the Real Time Systems
3433 Group at the Lawrence Berkeley Laboratory in Berkeley, CA.  Many thanks
3434 to all there for the support I received.
3435
3436 Send comments to @samp{vern@@ee.lbl.gov}.
3437
3438 @c @node Index,  , Top, Top
3439 @c @unnumbered Index
3440 @c
3441 @c @printindex cp
3442
3443 @contents
3444 @bye
3445
3446 @c Local variables:
3447 @c texinfo-column-for-description: 32
3448 @c End: