man/man1/flexdoc.1

   1 .TH FLEX 1 "26 May 1990" "Version 2.3"
   2 .SH NAME
   3 flexdoc - fast lexical analyzer generator
   4 .SH SYNOPSIS
   5 .B flex
   6 .B [-bcdfinpstvFILT8 -C[efmF] -Sskeleton]
   7 .I [filename ...]
   8 .SH DESCRIPTION
   9 .I flex
  10 is a tool for generating
  11 .I scanners:
  12 programs which recognized lexical patterns in text.
  13 .I flex
  14 reads
  15 the given input files, or its standard input if no file names are given,
  16 for a description of a scanner to generate.  The description is in
  17 the form of pairs
  18 of regular expressions and C code, called
  19 .I rules.  flex
  20 generates as output a C source file,
  21 .B lex.yy.c,
  22 which defines a routine
  23 .B yylex().
  24 This file is compiled and linked with the
  25 .B -lfl
  26 library to produce an executable.  When the executable is run,
  27 it analyzes its input for occurrences
  28 of the regular expressions.  Whenever it finds one, it executes
  29 the corresponding C code.
  30 .SH SOME SIMPLE EXAMPLES
  31 .LP
  32 First some simple examples to get the flavor of how one uses
  33 .I flex.
  34 The following
  35 .I flex
  36 input specifies a scanner which whenever it encounters the string
  37 "username" will replace it with the user's login name:
  38 .nf
  39
  40     %%
  41     username    printf( "%s", getlogin() );
  42
  43 .fi
  44 By default, any text not matched by a
  45 .I flex
  46 scanner
  47 is copied to the output, so the net effect of this scanner is
  48 to copy its input file to its output with each occurrence
  49 of "username" expanded.
  50 In this input, there is just one rule.  "username" is the
  51 .I pattern
  52 and the "printf" is the
  53 .I action.
  54 The "%%" marks the beginning of the rules.
  55 .LP
  56 Here's another simple example:
  57 .nf
  58
  59         int num_lines = 0, num_chars = 0;
  60
  61     %%
  62     \\n    ++num_lines; ++num_chars;
  63     .     ++num_chars;
  64
  65     %%
  66     main()
  67         {
  68         yylex();
  69         printf( "# of lines = %d, # of chars = %d\\n",
  70                 num_lines, num_chars );
  71         }
  72
  73 .fi
  74 This scanner counts the number of characters and the number
  75 of lines in its input (it produces no output other than the
  76 final report on the counts).  The first line
  77 declares two globals, "num_lines" and "num_chars", which are accessible
  78 both inside
  79 .B yylex()
  80 and in the
  81 .B main()
  82 routine declared after the second "%%".  There are two rules, one
  83 which matches a newline ("\\n") and increments both the line count and
  84 the character count, and one which matches any character other than
  85 a newline (indicated by the "." regular expression).
  86 .LP
  87 A somewhat more complicated example:
  88 .nf
  89
  90     /* scanner for a toy Pascal-like language */
  91
  92     %{
  93     /* need this for the call to atof() below */
  94     #include <math.h>
  95     %}
  96
  97     DIGIT    [0-9]
  98     ID       [a-z][a-z0-9]*
  99
 100     %%
 101
 102     {DIGIT}+    {
 103                 printf( "An integer: %s (%d)\\n", yytext,
 104                         atoi( yytext ) );
 105                 }
 106
 107     {DIGIT}+"."{DIGIT}*        {
 108                 printf( "A float: %s (%g)\\n", yytext,
 109                         atof( yytext ) );
 110                 }
 111
 112     if|then|begin|end|procedure|function        {
 113                 printf( "A keyword: %s\\n", yytext );
 114                 }
 115
 116     {ID}        printf( "An identifier: %s\\n", yytext );
 117
 118     "+"|"-"|"*"|"/"   printf( "An operator: %s\\n", yytext );
 119
 120     "{"[^}\\n]*"}"     /* eat up one-line comments */
 121
 122     [ \\t\\n]+          /* eat up whitespace */
 123
 124     .           printf( "Unrecognized character: %s\\n", yytext );
 125
 126     %%
 127
 128     main( argc, argv )
 129     int argc;
 130     char **argv;
 131         {
 132         ++argv, --argc;  /* skip over program name */
 133         if ( argc > 0 )
 134                 yyin = fopen( argv[0], "r" );
 135         else
 136                 yyin = stdin;
 137
 138         yylex();
 139         }
 140
 141 .fi
 142 This is the beginnings of a simple scanner for a language like
 143 Pascal.  It identifies different types of
 144 .I tokens
 145 and reports on what it has seen.
 146 .LP
 147 The details of this example will be explained in the following
 148 sections.
 149 .SH FORMAT OF THE INPUT FILE
 150 The
 151 .I flex
 152 input file consists of three sections, separated by a line with just
 153 .B %%
 154 in it:
 155 .nf
 156
 157     definitions
 158     %%
 159     rules
 160     %%
 161     user code
 162
 163 .fi
 164 The
 165 .I definitions
 166 section contains declarations of simple
 167 .I name
 168 definitions to simplify the scanner specification, and declarations of
 169 .I start conditions,
 170 which are explained in a later section.
 171 .LP
 172 Name definitions have the form:
 173 .nf
 174
 175     name definition
 176
 177 .fi
 178 The "name" is a word beginning with a letter or an underscore ('_')
 179 followed by zero or more letters, digits, '_', or '-' (dash).
 180 The definition is taken to begin at the first non-white-space character
 181 following the name and continuing to the end of the line.
 182 The definition can subsequently be referred to using "{name}", which
 183 will expand to "(definition)".  For example,
 184 .nf
 185
 186     DIGIT    [0-9]
 187     ID       [a-z][a-z0-9]*
 188
 189 .fi
 190 defines "DIGIT" to be a regular expression which matches a
 191 single digit, and
 192 "ID" to be a regular expression which matches a letter
 193 followed by zero-or-more letters-or-digits.
 194 A subsequent reference to
 195 .nf
 196
 197     {DIGIT}+"."{DIGIT}*
 198
 199 .fi
 200 is identical to
 201 .nf
 202
 203     ([0-9])+"."([0-9])*
 204
 205 .fi
 206 and matches one-or-more digits followed by a '.' followed
 207 by zero-or-more digits.
 208 .LP
 209 The
 210 .I rules
 211 section of the
 212 .I flex
 213 input contains a series of rules of the form:
 214 .nf
 215
 216     pattern   action
 217
 218 .fi
 219 where the pattern must be unindented and the action must begin
 220 on the same line.
 221 .LP
 222 See below for a further description of patterns and actions.
 223 .LP
 224 Finally, the user code section is simply copied to
 225 .B lex.yy.c
 226 verbatim.
 227 It is used for companion routines which call or are called
 228 by the scanner.  The presence of this section is optional;
 229 if it is missing, the second
 230 .B %%
 231 in the input file may be skipped, too.
 232 .LP
 233 In the definitions and rules sections, any
 234 .I indented
 235 text or text enclosed in
 236 .B %{
 237 and
 238 .B %}
 239 is copied verbatim to the output (with the %{}'s removed).
 240 The %{}'s must appear unindented on lines by themselves.
 241 .LP
 242 In the rules section,
 243 any indented or %{} text appearing before the
 244 first rule may be used to declare variables
 245 which are local to the scanning routine and (after the declarations)
 246 code which is to be executed whenever the scanning routine is entered.
 247 Other indented or %{} text in the rule section is still copied to the output,
 248 but its meaning is not well-defined and it may well cause compile-time
 249 errors (this feature is present for
 250 .I POSIX
 251 compliance; see below for other such features).
 252 .LP
 253 In the definitions section, an unindented comment (i.e., a line
 254 beginning with "/*") is also copied verbatim to the output up
 255 to the next "*/".  Also, any line in the definitions section
 256 beginning with '#' is ignored, though this style of comment is
 257 deprecated and may go away in the future.
 258 .SH PATTERNS
 259 The patterns in the input are written using an extended set of regular
 260 expressions.  These are:
 261 .nf
 262
 263     x          match the character 'x'
 264     .          any character except newline
 265     [xyz]      a "character class"; in this case, the pattern
 266                  matches either an 'x', a 'y', or a 'z'
 267     [abj-oZ]   a "character class" with a range in it; matches
 268                  an 'a', a 'b', any letter from 'j' through 'o',
 269                  or a 'Z'
 270     [^A-Z]     a "negated character class", i.e., any character
 271                  but those in the class.  In this case, any
 272                  character EXCEPT an uppercase letter.
 273     [^A-Z\\n]   any character EXCEPT an uppercase letter or
 274                  a newline
 275     r*         zero or more r's, where r is any regular expression
 276     r+         one or more r's
 277     r?         zero or one r's (that is, "an optional r")
 278     r{2,5}     anywhere from two to five r's
 279     r{2,}      two or more r's
 280     r{4}       exactly 4 r's
 281     {name}     the expansion of the "name" definition
 282                (see above)
 283     "[xyz]\\"foo"
 284                the literal string: [xyz]"foo
 285     \\X         if X is an 'a', 'b', 'f', 'n', 'r', 't', or 'v',
 286                  then the ANSI-C interpretation of \\x.
 287                  Otherwise, a literal 'X' (used to escape
 288                  operators such as '*')
 289     \\123       the character with octal value 123
 290     \\x2a       the character with hexadecimal value 2a
 291     (r)        match an r; parentheses are used to override
 292                  precedence (see below)
 293
 294
 295     rs         the regular expression r followed by the
 296                  regular expression s; called "concatenation"
 297
 298
 299     r|s        either an r or an s
 300
 301
 302     r/s        an r but only if it is followed by an s.  The
 303                  s is not part of the matched text.  This type
 304                  of pattern is called as "trailing context".
 305     ^r         an r, but only at the beginning of a line
 306     r$         an r, but only at the end of a line.  Equivalent
 307                  to "r/\\n".
 308
 309
 310     <s>r       an r, but only in start condition s (see
 311                below for discussion of start conditions)
 312     <s1,s2,s3>r
 313                same, but in any of start conditions s1,
 314                s2, or s3
 315
 316
 317     <<EOF>>    an end-of-file
 318     <s1,s2><<EOF>>
 319                an end-of-file when in start condition s1 or s2
 320
 321 .fi
 322 The regular expressions listed above are grouped according to
 323 precedence, from highest precedence at the top to lowest at the bottom.
 324 Those grouped together have equal precedence.  For example,
 325 .nf
 326
 327     foo|bar*
 328
 329 .fi
 330 is the same as
 331 .nf
 332
 333     (foo)|(ba(r*))
 334
 335 .fi
 336 since the '*' operator has higher precedence than concatenation,
 337 and concatenation higher than alternation ('|').  This pattern
 338 therefore matches
 339 .I either
 340 the string "foo"
 341 .I or
 342 the string "ba" followed by zero-or-more r's.
 343 To match "foo" or zero-or-more "bar"'s, use:
 344 .nf
 345
 346     foo|(bar)*
 347
 348 .fi
 349 and to match zero-or-more "foo"'s-or-"bar"'s:
 350 .nf
 351
 352     (foo|bar)*
 353
 354 .fi
 355 .LP
 356 Some notes on patterns:
 357 .IP -
 358 A negated character class such as the example "[^A-Z]"
 359 above
 360 .I will match a newline
 361 unless "\\n" (or an equivalent escape sequence) is one of the
 362 characters explicitly present in the negated character class
 363 (e.g., "[^A-Z\\n]").  This is unlike how many other regular
 364 expression tools treat negated character classes, but unfortunately
 365 the inconsistency is historically entrenched.
 366 Matching newlines means that a pattern like [^"]* can match an entire
 367 input (overflowing the scanner's input buffer) unless there's another
 368 quote in the input.
 369 .IP -
 370 A rule can have at most one instance of trailing context (the '/' operator
 371 or the '$' operator).  The start condition, '^', and "<<EOF>>" patterns
 372 can only occur at the beginning of a pattern, and, as well as with '/' and '$',
 373 cannot be grouped inside parentheses.  A '^' which does not occur at
 374 the beginning of a rule or a '$' which does not occur at the end of
 375 a rule loses its special properties and is treated as a normal character.
 376 .IP
 377 The following are illegal:
 378 .nf
 379
 380     foo/bar$
 381     <sc1>foo<sc2>bar
 382
 383 .fi
 384 Note that the first of these, can be written "foo/bar\\n".
 385 .IP
 386 The following will result in '$' or '^' being treated as a normal character:
 387 .nf
 388
 389     foo|(bar$)
 390     foo|^bar
 391
 392 .fi
 393 If what's wanted is a "foo" or a bar-followed-by-a-newline, the following
 394 could be used (the special '|' action is explained below):
 395 .nf
 396
 397     foo      |
 398     bar$     /* action goes here */
 399
 400 .fi
 401 A similar trick will work for matching a foo or a
 402 bar-at-the-beginning-of-a-line.
 403 .SH HOW THE INPUT IS MATCHED
 404 When the generated scanner is run, it analyzes its input looking
 405 for strings which match any of its patterns.  If it finds more than
 406 one match, it takes the one matching the most text (for trailing
 407 context rules, this includes the length of the trailing part, even
 408 though it will then be returned to the input).  If it finds two
 409 or more matches of the same length, the
 410 rule listed first in the
 411 .I flex
 412 input file is chosen.
 413 .LP
 414 Once the match is determined, the text corresponding to the match
 415 (called the
 416 .I token)
 417 is made available in the global character pointer
 418 .B yytext,
 419 and its length in the global integer
 420 .B yyleng.
 421 The
 422 .I action
 423 corresponding to the matched pattern is then executed (a more
 424 detailed description of actions follows), and then the remaining
 425 input is scanned for another match.
 426 .LP
 427 If no match is found, then the
 428 .I default rule
 429 is executed: the next character in the input is considered matched and
 430 copied to the standard output.  Thus, the simplest legal
 431 .I flex
 432 input is:
 433 .nf
 434
 435     %%
 436
 437 .fi
 438 which generates a scanner that simply copies its input (one character
 439 at a time) to its output.
 440 .SH ACTIONS
 441 Each pattern in a rule has a corresponding action, which can be any
 442 arbitrary C statement.  The pattern ends at the first non-escaped
 443 whitespace character; the remainder of the line is its action.  If the
 444 action is empty, then when the pattern is matched the input token
 445 is simply discarded.  For example, here is the specification for a program
 446 which deletes all occurrences of "zap me" from its input:
 447 .nf
 448
 449     %%
 450     "zap me"
 451
 452 .fi
 453 (It will copy all other characters in the input to the output since
 454 they will be matched by the default rule.)
 455 .LP
 456 Here is a program which compresses multiple blanks and tabs down to
 457 a single blank, and throws away whitespace found at the end of a line:
 458 .nf
 459
 460     %%
 461     [ \\t]+        putchar( ' ' );
 462     [ \\t]+$       /* ignore this token */
 463
 464 .fi
 465 .LP
 466 If the action contains a '{', then the action spans till the balancing '}'
 467 is found, and the action may cross multiple lines.
 468 .I flex
 469 knows about C strings and comments and won't be fooled by braces found
 470 within them, but also allows actions to begin with
 471 .B %{
 472 and will consider the action to be all the text up to the next
 473 .B %}
 474 (regardless of ordinary braces inside the action).
 475 .LP
 476 An action consisting solely of a vertical bar ('|') means "same as
 477 the action for the next rule."  See below for an illustration.
 478 .LP
 479 Actions can include arbitrary C code, including
 480 .B return
 481 statements to return a value to whatever routine called
 482 .B yylex().
 483 Each time
 484 .B yylex()
 485 is called it continues processing tokens from where it last left
 486 off until it either reaches
 487 the end of the file or executes a return.  Once it reaches an end-of-file,
 488 however, then any subsequent call to
 489 .B yylex()
 490 will simply immediately return, unless
 491 .B yyrestart()
 492 is first called (see below).
 493 .LP
 494 Actions are not allowed to modify yytext or yyleng.
 495 .LP
 496 There are a number of special directives which can be included within
 497 an action:
 498 .IP -
 499 .B ECHO
 500 copies yytext to the scanner's output.
 501 .IP -
 502 .B BEGIN
 503 followed by the name of a start condition places the scanner in the
 504 corresponding start condition (see below).
 505 .IP -
 506 .B REJECT
 507 directs the scanner to proceed on to the "second best" rule which matched the
 508 input (or a prefix of the input).  The rule is chosen as described
 509 above in "How the Input is Matched", and
 510 .B yytext
 511 and
 512 .B yyleng
 513 set up appropriately.
 514 It may either be one which matched as much text
 515 as the originally chosen rule but came later in the
 516 .I flex
 517 input file, or one which matched less text.
 518 For example, the following will both count the
 519 words in the input and call the routine special() whenever "frob" is seen:
 520 .nf
 521
 522             int word_count = 0;
 523     %%
 524
 525     frob        special(); REJECT;
 526     [^ \\t\\n]+   ++word_count;
 527
 528 .fi
 529 Without the
 530 .B REJECT,
 531 any "frob"'s in the input would not be counted as words, since the
 532 scanner normally executes only one action per token.
 533 Multiple
 534 .B REJECT's
 535 are allowed, each one finding the next best choice to the currently
 536 active rule.  For example, when the following scanner scans the token
 537 "abcd", it will write "abcdabcaba" to the output:
 538 .nf
 539
 540     %%
 541     a        |
 542     ab       |
 543     abc      |
 544     abcd     ECHO; REJECT;
 545     .|\\n     /* eat up any unmatched character */
 546
 547 .fi
 548 (The first three rules share the fourth's action since they use
 549 the special '|' action.)
 550 .B REJECT
 551 is a particularly expensive feature in terms scanner performance;
 552 if it is used in
 553 .I any
 554 of the scanner's actions it will slow down
 555 .I all
 556 of the scanner's matching.  Furthermore,
 557 .B REJECT
 558 cannot be used with the
 559 .I -f
 560 or
 561 .I -F
 562 options (see below).
 563 .IP
 564 Note also that unlike the other special actions,
 565 .B REJECT
 566 is a
 567 .I branch;
 568 code immediately following it in the action will
 569 .I not
 570 be executed.
 571 .IP -
 572 .B yymore()
 573 tells the scanner that the next time it matches a rule, the corresponding
 574 token should be
 575 .I appended
 576 onto the current value of
 577 .B yytext
 578 rather than replacing it.  For example, given the input "mega-kludge"
 579 the following will write "mega-mega-kludge" to the output:
 580 .nf
 581
 582     %%
 583     mega-    ECHO; yymore();
 584     kludge   ECHO;
 585
 586 .fi
 587 First "mega-" is matched and echoed to the output.  Then "kludge"
 588 is matched, but the previous "mega-" is still hanging around at the
 589 beginning of
 590 .B yytext
 591 so the
 592 .B ECHO
 593 for the "kludge" rule will actually write "mega-kludge".
 594 The presence of
 595 .B yymore()
 596 in the scanner's action entails a minor performance penalty in the
 597 scanner's matching speed.
 598 .IP -
 599 .B yyless(n)
 600 returns all but the first
 601 .I n
 602 characters of the current token back to the input stream, where they
 603 will be rescanned when the scanner looks for the next match.
 604 .B yytext
 605 and
 606 .B yyleng
 607 are adjusted appropriately (e.g.,
 608 .B yyleng
 609 will now be equal to
 610 .I n
 611 ).  For example, on the input "foobar" the following will write out
 612 "foobarbar":
 613 .nf
 614
 615     %%
 616     foobar    ECHO; yyless(3);
 617     [a-z]+    ECHO;
 618
 619 .fi
 620 An argument of 0 to
 621 .B yyless
 622 will cause the entire current input string to be scanned again.  Unless you've
 623 changed how the scanner will subsequently process its input (using
 624 .B BEGIN,
 625 for example), this will result in an endless loop.
 626 .IP -
 627 .B unput(c)
 628 puts the character
 629 .I c
 630 back onto the input stream.  It will be the next character scanned.
 631 The following action will take the current token and cause it
 632 to be rescanned enclosed in parentheses.
 633 .nf
 634
 635     {
 636     int i;
 637     unput( ')' );
 638     for ( i = yyleng - 1; i >= 0; --i )
 639         unput( yytext[i] );
 640     unput( '(' );
 641     }
 642
 643 .fi
 644 Note that since each
 645 .B unput()
 646 puts the given character back at the
 647 .I beginning
 648 of the input stream, pushing back strings must be done back-to-front.
 649 .IP -
 650 .B input()
 651 reads the next character from the input stream.  For example,
 652 the following is one way to eat up C comments:
 653 .nf
 654
 655     %%
 656     "/*"        {
 657                 register int c;
 658
 659                 for ( ; ; )
 660                     {
 661                     while ( (c = input()) != '*' &&
 662                             c != EOF )
 663                         ;    /* eat up text of comment */
 664
 665                     if ( c == '*' )
 666                         {
 667                         while ( (c = input()) == '*' )
 668                             ;
 669                         if ( c == '/' )
 670                             break;    /* found the end */
 671                         }
 672
 673                     if ( c == EOF )
 674                         {
 675                         error( "EOF in comment" );
 676                         break;
 677                         }
 678                     }
 679                 }
 680
 681 .fi
 682 (Note that if the scanner is compiled using
 683 .B C++,
 684 then
 685 .B input()
 686 is instead referred to as
 687 .B yyinput(),
 688 in order to avoid a name clash with the
 689 .B C++
 690 stream by the name of
 691 .I input.)
 692 .IP -
 693 .B yyterminate()
 694 can be used in lieu of a return statement in an action.  It terminates
 695 the scanner and returns a 0 to the scanner's caller, indicating "all done".
 696 Subsequent calls to the scanner will immediately return unless preceded
 697 by a call to
 698 .B yyrestart()
 699 (see below).
 700 By default,
 701 .B yyterminate()
 702 is also called when an end-of-file is encountered.  It is a macro and
 703 may be redefined.
 704 .SH THE GENERATED SCANNER
 705 The output of
 706 .I flex
 707 is the file
 708 .B lex.yy.c,
 709 which contains the scanning routine
 710 .B yylex(),
 711 a number of tables used by it for matching tokens, and a number
 712 of auxiliary routines and macros.  By default,
 713 .B yylex()
 714 is declared as follows:
 715 .nf
 716
 717     int yylex()
 718         {
 719         ... various definitions and the actions in here ...
 720         }
 721
 722 .fi
 723 (If your environment supports function prototypes, then it will
 724 be "int yylex( void )".)  This definition may be changed by redefining
 725 the "YY_DECL" macro.  For example, you could use:
 726 .nf
 727
 728     #undef YY_DECL
 729     #define YY_DECL float lexscan( a, b ) float a, b;
 730
 731 .fi
 732 to give the scanning routine the name
 733 .I lexscan,
 734 returning a float, and taking two floats as arguments.  Note that
 735 if you give arguments to the scanning routine using a
 736 K&R-style/non-prototyped function declaration, you must terminate
 737 the definition with a semi-colon (;).
 738 .LP
 739 Whenever
 740 .B yylex()
 741 is called, it scans tokens from the global input file
 742 .I yyin
 743 (which defaults to stdin).  It continues until it either reaches
 744 an end-of-file (at which point it returns the value 0) or
 745 one of its actions executes a
 746 .I return
 747 statement.
 748 In the former case, when called again the scanner will immediately
 749 return unless
 750 .B yyrestart()
 751 is called to point
 752 .I yyin
 753 at the new input file.  (
 754 .B yyrestart()
 755 takes one argument, a
 756 .B FILE *
 757 pointer.)
 758 In the latter case (i.e., when an action
 759 executes a return), the scanner may then be called again and it
 760 will resume scanning where it left off.
 761 .LP
 762 By default (and for purposes of efficiency), the scanner uses
 763 block-reads rather than simple
 764 .I getc()
 765 calls to read characters from
 766 .I yyin.
 767 The nature of how it gets its input can be controlled by redefining the
 768 .B YY_INPUT
 769 macro.
 770 YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)".  Its
 771 action is to place up to
 772 .I max_size
 773 characters in the character array
 774 .I buf
 775 and return in the integer variable
 776 .I result
 777 either the
 778 number of characters read or the constant YY_NULL (0 on Unix systems)
 779 to indicate EOF.  The default YY_INPUT reads from the
 780 global file-pointer "yyin".
 781 .LP
 782 A sample redefinition of YY_INPUT (in the definitions
 783 section of the input file):
 784 .nf
 785
 786     %{
 787     #undef YY_INPUT
 788     #define YY_INPUT(buf,result,max_size) \\
 789         { \\
 790         int c = getchar(); \\
 791         result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \\
 792         }
 793     %}
 794
 795 .fi
 796 This definition will change the input processing to occur
 797 one character at a time.
 798 .LP
 799 You also can add in things like keeping track of the
 800 input line number this way; but don't expect your scanner to
 801 go very fast.
 802 .LP
 803 When the scanner receives an end-of-file indication from YY_INPUT,
 804 it then checks the
 805 .B yywrap()
 806 function.  If
 807 .B yywrap()
 808 returns false (zero), then it is assumed that the
 809 function has gone ahead and set up
 810 .I yyin
 811 to point to another input file, and scanning continues.  If it returns
 812 true (non-zero), then the scanner terminates, returning 0 to its
 813 caller.
 814 .LP
 815 The default
 816 .B yywrap()
 817 always returns 1.  Presently, to redefine it you must first
 818 "#undef yywrap", as it is currently implemented as a macro.  As indicated
 819 by the hedging in the previous sentence, it may be changed to
 820 a true function in the near future.
 821 .LP
 822 The scanner writes its
 823 .B ECHO
 824 output to the
 825 .I yyout
 826 global (default, stdout), which may be redefined by the user simply
 827 by assigning it to some other
 828 .B FILE
 829 pointer.
 830 .SH START CONDITIONS
 831 .I flex
 832 provides a mechanism for conditionally activating rules.  Any rule
 833 whose pattern is prefixed with "<sc>" will only be active when
 834 the scanner is in the start condition named "sc".  For example,
 835 .nf
 836
 837     <STRING>[^"]*        { /* eat up the string body ... */
 838                 ...
 839                 }
 840
 841 .fi
 842 will be active only when the scanner is in the "STRING" start
 843 condition, and
 844 .nf
 845
 846     <INITIAL,STRING,QUOTE>\\.        { /* handle an escape ... */
 847                 ...
 848                 }
 849
 850 .fi
 851 will be active only when the current start condition is
 852 either "INITIAL", "STRING", or "QUOTE".
 853 .LP
 854 Start conditions
 855 are declared in the definitions (first) section of the input
 856 using unindented lines beginning with either
 857 .B %s
 858 or
 859 .B %x
 860 followed by a list of names.
 861 The former declares
 862 .I inclusive
 863 start conditions, the latter
 864 .I exclusive
 865 start conditions.  A start condition is activated using the
 866 .B BEGIN
 867 action.  Until the next
 868 .B BEGIN
 869 action is executed, rules with the given start
 870 condition will be active and
 871 rules with other start conditions will be inactive.
 872 If the start condition is
 873 .I inclusive,
 874 then rules with no start conditions at all will also be active.
 875 If it is
 876 .I exclusive,
 877 then
 878 .I only
 879 rules qualified with the start condition will be active.
 880 A set of rules contingent on the same exclusive start condition
 881 describe a scanner which is independent of any of the other rules in the
 882 .I flex
 883 input.  Because of this,
 884 exclusive start conditions make it easy to specify "mini-scanners"
 885 which scan portions of the input that are syntactically different
 886 from the rest (e.g., comments).
 887 .LP
 888 If the distinction between inclusive and exclusive start conditions
 889 is still a little vague, here's a simple example illustrating the
 890 connection between the two.  The set of rules:
 891 .nf
 892
 893     %s example
 894     %%
 895     <example>foo           /* do something */
 896
 897 .fi
 898 is equivalent to
 899 .nf
 900
 901     %x example
 902     %%
 903     <INITIAL,example>foo   /* do something */
 904
 905 .fi
 906 .LP
 907 The default rule (to
 908 .B ECHO
 909 any unmatched character) remains active in start conditions.
 910 .LP
 911 .B BEGIN(0)
 912 returns to the original state where only the rules with
 913 no start conditions are active.  This state can also be
 914 referred to as the start-condition "INITIAL", so
 915 .B BEGIN(INITIAL)
 916 is equivalent to
 917 .B BEGIN(0).
 918 (The parentheses around the start condition name are not required but
 919 are considered good style.)
 920 .LP
 921 .B BEGIN
 922 actions can also be given as indented code at the beginning
 923 of the rules section.  For example, the following will cause
 924 the scanner to enter the "SPECIAL" start condition whenever
 925 .I yylex()
 926 is called and the global variable
 927 .I enter_special
 928 is true:
 929 .nf
 930
 931             int enter_special;
 932
 933     %x SPECIAL
 934     %%
 935             if ( enter_special )
 936                 BEGIN(SPECIAL);
 937
 938     <SPECIAL>blahblahblah
 939     ...more rules follow...
 940
 941 .fi
 942 .LP
 943 To illustrate the uses of start conditions,
 944 here is a scanner which provides two different interpretations
 945 of a string like "123.456".  By default it will treat it as
 946 as three tokens, the integer "123", a dot ('.'), and the integer "456".
 947 But if the string is preceded earlier in the line by the string
 948 "expect-floats"
 949 it will treat it as a single token, the floating-point number
 950 123.456:
 951 .nf
 952
 953     %{
 954     #include <math.h>
 955     %}
 956     %s expect
 957
 958     %%
 959     expect-floats        BEGIN(expect);
 960
 961     <expect>[0-9]+"."[0-9]+      {
 962                 printf( "found a float, = %f\\n",
 963                         atof( yytext ) );
 964                 }
 965     <expect>\\n           {
 966                 /* that's the end of the line, so
 967                  * we need another "expect-number"
 968                  * before we'll recognize any more
 969                  * numbers
 970                  */
 971                 BEGIN(INITIAL);
 972                 }
 973
 974     [0-9]+      {
 975                 printf( "found an integer, = %d\\n",
 976                         atoi( yytext ) );
 977                 }
 978
 979     "."         printf( "found a dot\\n" );
 980
 981 .fi
 982 Here is a scanner which recognizes (and discards) C comments while
 983 maintaining a count of the current input line.
 984 .nf
 985
 986     %x comment
 987     %%
 988             int line_num = 1;
 989
 990     "/*"         BEGIN(comment);
 991
 992     <comment>[^*\\n]*        /* eat anything that's not a '*' */
 993     <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
 994     <comment>\\n             ++line_num;
 995     <comment>"*"+"/"        BEGIN(INITIAL);
 996
 997 .fi
 998 Note that start-conditions names are really integer values and
 999 can be stored as such.  Thus, the above could be extended in the
1000 following fashion:
1001 .nf
1002
1003     %x comment foo
1004     %%
1005             int line_num = 1;
1006             int comment_caller;
1007
1008     "/*"         {
1009                  comment_caller = INITIAL;
1010                  BEGIN(comment);
1011                  }
1012
1013     ...
1014
1015     <foo>"/*"    {
1016                  comment_caller = foo;
1017                  BEGIN(comment);
1018                  }
1019
1020     <comment>[^*\\n]*        /* eat anything that's not a '*' */
1021     <comment>"*"+[^*/\\n]*   /* eat up '*'s not followed by '/'s */
1022     <comment>\\n             ++line_num;
1023     <comment>"*"+"/"        BEGIN(comment_caller);
1024
1025 .fi
1026 One can then implement a "stack" of start conditions using an
1027 array of integers.  (It is likely that such stacks will become
1028 a full-fledged
1029 .I flex
1030 feature in the future.)  Note, though, that
1031 start conditions do not have their own name-space; %s's and %x's
1032 declare names in the same fashion as #define's.
1033 .SH MULTIPLE INPUT BUFFERS
1034 Some scanners (such as those which support "include" files)
1035 require reading from several input streams.  As
1036 .I flex
1037 scanners do a large amount of buffering, one cannot control
1038 where the next input will be read from by simply writing a
1039 .B YY_INPUT
1040 which is sensitive to the scanning context.
1041 .B YY_INPUT
1042 is only called when the scanner reaches the end of its buffer, which
1043 may be a long time after scanning a statement such as an "include"
1044 which requires switching the input source.
1045 .LP
1046 To negotiate these sorts of problems,
1047 .I flex
1048 provides a mechanism for creating and switching between multiple
1049 input buffers.  An input buffer is created by using:
1050 .nf
1051
1052     YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
1053
1054 .fi
1055 which takes a
1056 .I FILE
1057 pointer and a size and creates a buffer associated with the given
1058 file and large enough to hold
1059 .I size
1060 characters (when in doubt, use
1061 .B YY_BUF_SIZE
1062 for the size).  It returns a
1063 .B YY_BUFFER_STATE
1064 handle, which may then be passed to other routines:
1065 .nf
1066
1067     void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
1068
1069 .fi
1070 switches the scanner's input buffer so subsequent tokens will
1071 come from
1072 .I new_buffer.
1073 Note that
1074 .B yy_switch_to_buffer()
1075 may be used by yywrap() to sets things up for continued scanning, instead
1076 of opening a new file and pointing
1077 .I yyin
1078 at it.
1079 .nf
1080
1081     void yy_delete_buffer( YY_BUFFER_STATE buffer )
1082
1083 .fi
1084 is used to reclaim the storage associated with a buffer.
1085 .LP
1086 .B yy_new_buffer()
1087 is an alias for
1088 .B yy_create_buffer(),
1089 provided for compatibility with the C++ use of
1090 .I new
1091 and
1092 .I delete
1093 for creating and destroying dynamic objects.
1094 .LP
1095 Finally, the
1096 .B YY_CURRENT_BUFFER
1097 macro returns a
1098 .B YY_BUFFER_STATE
1099 handle to the current buffer.
1100 .LP
1101 Here is an example of using these features for writing a scanner
1102 which expands include files (the
1103 .B <<EOF>>
1104 feature is discussed below):
1105 .nf
1106
1107     /* the "incl" state is used for picking up the name
1108      * of an include file
1109      */
1110     %x incl
1111
1112     %{
1113     #define MAX_INCLUDE_DEPTH 10
1114     YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1115     int include_stack_ptr = 0;
1116     %}
1117
1118     %%
1119     include             BEGIN(incl);
1120
1121     [a-z]+              ECHO;
1122     [^a-z\\n]*\\n?        ECHO;
1123
1124     <incl>[ \\t]*      /* eat the whitespace */
1125     <incl>[^ \\t\\n]+   { /* got the include file name */
1126             if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1127                 {
1128                 fprintf( stderr, "Includes nested too deeply" );
1129                 exit( 1 );
1130                 }
1131
1132             include_stack[include_stack_ptr++] =
1133                 YY_CURRENT_BUFFER;
1134
1135             yyin = fopen( yytext, "r" );
1136
1137             if ( ! yyin )
1138                 error( ... );
1139
1140             yy_switch_to_buffer(
1141                 yy_create_buffer( yyin, YY_BUF_SIZE ) );
1142
1143             BEGIN(INITIAL);
1144             }
1145
1146     <<EOF>> {
1147             if ( --include_stack_ptr < 0 )
1148                 {
1149                 yyterminate();
1150                 }
1151
1152             else
1153                 yy_switch_to_buffer(
1154                      include_stack[include_stack_ptr] );
1155             }
1156
1157 .fi
1158 .SH END-OF-FILE RULES
1159 The special rule "<<EOF>>" indicates
1160 actions which are to be taken when an end-of-file is
1161 encountered and yywrap() returns non-zero (i.e., indicates
1162 no further files to process).  The action must finish
1163 by doing one of four things:
1164 .IP -
1165 the special
1166 .B YY_NEW_FILE
1167 action, if
1168 .I yyin
1169 has been pointed at a new file to process;
1170 .IP -
1171 a
1172 .I return
1173 statement;
1174 .IP -
1175 the special
1176 .B yyterminate()
1177 action;
1178 .IP -
1179 or, switching to a new buffer using
1180 .B yy_switch_to_buffer()
1181 as shown in the example above.
1182 .LP
1183 <<EOF>> rules may not be used with other
1184 patterns; they may only be qualified with a list of start
1185 conditions.  If an unqualified <<EOF>> rule is given, it
1186 applies to
1187 .I all
1188 start conditions which do not already have <<EOF>> actions.  To
1189 specify an <<EOF>> rule for only the initial start condition, use
1190 .nf
1191
1192     <INITIAL><<EOF>>
1193
1194 .fi
1195 .LP
1196 These rules are useful for catching things like unclosed comments.
1197 An example:
1198 .nf
1199
1200     %x quote
1201     %%
1202
1203     ...other rules for dealing with quotes...
1204
1205     <quote><<EOF>>   {
1206              error( "unterminated quote" );
1207              yyterminate();
1208              }
1209     <<EOF>>  {
1210              if ( *++filelist )
1211                  {
1212                  yyin = fopen( *filelist, "r" );
1213                  YY_NEW_FILE;
1214                  }
1215              else
1216                 yyterminate();
1217              }
1218
1219 .fi
1220 .SH MISCELLANEOUS MACROS
1221 The macro
1222 .B YY_USER_ACTION
1223 can be redefined to provide an action
1224 which is always executed prior to the matched rule's action.  For example,
1225 it could be #define'd to call a routine to convert yytext to lower-case.
1226 .LP
1227 The macro
1228 .B YY_USER_INIT
1229 may be redefined to provide an action which is always executed before
1230 the first scan (and before the scanner's internal initializations are done).
1231 For example, it could be used to call a routine to read
1232 in a data table or open a logging file.
1233 .LP
1234 In the generated scanner, the actions are all gathered in one large
1235 switch statement and separated using
1236 .B YY_BREAK,
1237 which may be redefined.  By default, it is simply a "break", to separate
1238 each rule's action from the following rule's.
1239 Redefining
1240 .B YY_BREAK
1241 allows, for example, C++ users to
1242 #define YY_BREAK to do nothing (while being very careful that every
1243 rule ends with a "break" or a "return"!) to avoid suffering from
1244 unreachable statement warnings where because a rule's action ends with
1245 "return", the
1246 .B YY_BREAK
1247 is inaccessible.
1248 .SH INTERFACING WITH YACC
1249 One of the main uses of
1250 .I flex
1251 is as a companion to the
1252 .I yacc
1253 parser-generator.
1254 .I yacc
1255 parsers expect to call a routine named
1256 .B yylex()
1257 to find the next input token.  The routine is supposed to
1258 return the type of the next token as well as putting any associated
1259 value in the global
1260 .B yylval.
1261 To use
1262 .I flex
1263 with
1264 .I yacc,
1265 one specifies the
1266 .B -d
1267 option to
1268 .I yacc
1269 to instruct it to generate the file
1270 .B y.tab.h
1271 containing definitions of all the
1272 .B %tokens
1273 appearing in the
1274 .I yacc
1275 input.  This file is then included in the
1276 .I flex
1277 scanner.  For example, if one of the tokens is "TOK_NUMBER",
1278 part of the scanner might look like:
1279 .nf
1280
1281     %{
1282     #include "y.tab.h"
1283     %}
1284
1285     %%
1286
1287     [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
1288
1289 .fi
1290 .SH TRANSLATION TABLE
1291 In the name of POSIX compliance,
1292 .I flex
1293 supports a
1294 .I translation table
1295 for mapping input characters into groups.
1296 The table is specified in the first section, and its format looks like:
1297 .nf
1298
1299     %t
1300     1        abcd
1301     2        ABCDEFGHIJKLMNOPQRSTUVWXYZ
1302     52       0123456789
1303     6        \\t\\ \\n
1304     %t
1305
1306 .fi
1307 This example specifies that the characters 'a', 'b', 'c', and 'd'
1308 are to all be lumped into group #1, upper-case letters
1309 in group #2, digits in group #52, tabs, blanks, and newlines into
1310 group #6, and
1311 .I
1312 no other characters will appear in the patterns.
1313 The group numbers are actually disregarded by
1314 .I flex;
1315 .B %t
1316 serves, though, to lump characters together.  Given the above
1317 table, for example, the pattern "a(AA)*5" is equivalent to "d(ZQ)*0".
1318 They both say, "match any character in group #1, followed by
1319 zero-or-more pairs of characters
1320 from group #2, followed by a character from group #52."  Thus
1321 .B %t
1322 provides a crude way for introducing equivalence classes into
1323 the scanner specification.
1324 .LP
1325 Note that the
1326 .B -i
1327 option (see below) coupled with the equivalence classes which
1328 .I flex
1329 automatically generates take care of virtually all the instances
1330 when one might consider using
1331 .B %t.
1332 But what the hell, it's there if you want it.
1333 .SH OPTIONS
1334 .I flex
1335 has the following options:
1336 .TP
1337 .B -b
1338 Generate backtracking information to
1339 .I lex.backtrack.
1340 This is a list of scanner states which require backtracking
1341 and the input characters on which they do so.  By adding rules one
1342 can remove backtracking states.  If all backtracking states
1343 are eliminated and
1344 .B -f
1345 or
1346 .B -F
1347 is used, the generated scanner will run faster (see the
1348 .B -p
1349 flag).  Only users who wish to squeeze every last cycle out of their
1350 scanners need worry about this option.  (See the section on PERFORMANCE
1351 CONSIDERATIONS below.)
1352 .TP
1353 .B -c
1354 is a do-nothing, deprecated option included for POSIX compliance.
1355 .IP
1356 .B NOTE:
1357 in previous releases of
1358 .I flex
1359 .B -c
1360 specified table-compression options.  This functionality is
1361 now given by the
1362 .B -C
1363 flag.  To ease the the impact of this change, when
1364 .I flex
1365 encounters
1366 .B -c,
1367 it currently issues a warning message and assumes that
1368 .B -C
1369 was desired instead.  In the future this "promotion" of
1370 .B -c
1371 to
1372 .B -C
1373 will go away in the name of full POSIX compliance (unless
1374 the POSIX meaning is removed first).
1375 .TP
1376 .B -d
1377 makes the generated scanner run in
1378 .I debug
1379 mode.  Whenever a pattern is recognized and the global
1380 .B yy_flex_debug
1381 is non-zero (which is the default),
1382 the scanner will write to
1383 .I stderr
1384 a line of the form:
1385 .nf
1386
1387     --accepting rule at line 53 ("the matched text")
1388
1389 .fi
1390 The line number refers to the location of the rule in the file
1391 defining the scanner (i.e., the file that was fed to flex).  Messages
1392 are also generated when the scanner backtracks, accepts the
1393 default rule, reaches the end of its input buffer (or encounters
1394 a NUL; at this point, the two look the same as far as the scanner's concerned),
1395 or reaches an end-of-file.
1396 .TP
1397 .B -f
1398 specifies (take your pick)
1399 .I full table
1400 or
1401 .I fast scanner.
1402 No table compression is done.  The result is large but fast.
1403 This option is equivalent to
1404 .B -Cf
1405 (see below).
1406 .TP
1407 .B -i
1408 instructs
1409 .I flex
1410 to generate a
1411 .I case-insensitive
1412 scanner.  The case of letters given in the
1413 .I flex
1414 input patterns will
1415 be ignored, and tokens in the input will be matched regardless of case.  The
1416 matched text given in
1417 .I yytext
1418 will have the preserved case (i.e., it will not be folded).
1419 .TP
1420 .B -n
1421 is another do-nothing, deprecated option included only for
1422 POSIX compliance.
1423 .TP
1424 .B -p
1425 generates a performance report to stderr.  The report
1426 consists of comments regarding features of the
1427 .I flex
1428 input file which will cause a loss of performance in the resulting scanner.
1429 Note that the use of
1430 .I REJECT
1431 and variable trailing context (see the BUGS section in flex(1))
1432 entails a substantial performance penalty; use of
1433 .I yymore(),
1434 the
1435 .B ^
1436 operator,
1437 and the
1438 .B -I
1439 flag entail minor performance penalties.
1440 .TP
1441 .B -s
1442 causes the
1443 .I default rule
1444 (that unmatched scanner input is echoed to
1445 .I stdout)
1446 to be suppressed.  If the scanner encounters input that does not
1447 match any of its rules, it aborts with an error.  This option is
1448 useful for finding holes in a scanner's rule set.
1449 .TP
1450 .B -t
1451 instructs
1452 .I flex
1453 to write the scanner it generates to standard output instead
1454 of
1455 .B lex.yy.c.
1456 .TP
1457 .B -v
1458 specifies that
1459 .I flex
1460 should write to
1461 .I stderr
1462 a summary of statistics regarding the scanner it generates.
1463 Most of the statistics are meaningless to the casual
1464 .I flex
1465 user, but the
1466 first line identifies the version of
1467 .I flex,
1468 which is useful for figuring
1469 out where you stand with respect to patches and new releases,
1470 and the next two lines give the date when the scanner was created
1471 and a summary of the flags which were in effect.
1472 .TP
1473 .B -F
1474 specifies that the
1475 .I fast
1476 scanner table representation should be used.  This representation is
1477 about as fast as the full table representation
1478 .RB ( \-f ),
1479 and for some sets of patterns will be considerably smaller (and for
1480 others, larger).  In general, if the pattern set contains both "keywords"
1481 and a catch-all, "identifier" rule, such as in the set:
1482 .nf
1483
1484     "case"    return TOK_CASE;
1485     "switch"  return TOK_SWITCH;
1486     ...
1487     "default" return TOK_DEFAULT;
1488     [a-z]+    return TOK_ID;
1489
1490 .fi
1491 then you're better off using the full table representation.  If only
1492 the "identifier" rule is present and you then use a hash table or some such
1493 to detect the keywords, you're better off using
1494 .BR \-F .
1495 .IP
1496 This option is equivalent to
1497 .B -CF
1498 (see below).
1499 .TP
1500 .B -I
1501 instructs
1502 .I flex
1503 to generate an
1504 .I interactive
1505 scanner.  Normally, scanners generated by
1506 .I flex
1507 always look ahead one
1508 character before deciding that a rule has been matched.  At the cost of
1509 some scanning overhead,
1510 .I flex
1511 will generate a scanner which only looks ahead
1512 when needed.  Such scanners are called
1513 .I interactive
1514 because if you want to write a scanner for an interactive system such as a
1515 command shell, you will probably want the user's input to be terminated
1516 with a newline, and without
1517 .B -I
1518 the user will have to type a character in addition to the newline in order
1519 to have the newline recognized.  This leads to dreadful interactive
1520 performance.
1521 .IP
1522 If all this seems to confusing, here's the general rule: if a human will
1523 be typing in input to your scanner, use
1524 .B -I,
1525 otherwise don't; if you don't care about squeezing the utmost performance
1526 from your scanner and you
1527 don't want to make any assumptions about the input to your scanner,
1528 use
1529 .B -I.
1530 .IP
1531 Note,
1532 .B -I
1533 cannot be used in conjunction with
1534 .I full
1535 or
1536 .I fast tables,
1537 i.e., the
1538 .B -f, -F, -Cf,
1539 or
1540 .B -CF
1541 flags.
1542 .TP
1543 .B -L
1544 instructs
1545 .I flex
1546 not to generate
1547 .B #line
1548 directives.  Without this option,
1549 .I flex
1550 peppers the generated scanner
1551 with #line directives so error messages in the actions will be correctly
1552 located with respect to the original
1553 .I flex
1554 input file, and not to
1555 the fairly meaningless line numbers of
1556 .B lex.yy.c.
1557 (Unfortunately
1558 .I flex
1559 does not presently generate the necessary directives
1560 to "retarget" the line numbers for those parts of
1561 .B lex.yy.c
1562 which it generated.  So if there is an error in the generated code,
1563 a meaningless line number is reported.)
1564 .TP
1565 .B -T
1566 makes
1567 .I flex
1568 run in
1569 .I trace
1570 mode.  It will generate a lot of messages to
1571 .I stdout
1572 concerning
1573 the form of the input and the resultant non-deterministic and deterministic
1574 finite automata.  This option is mostly for use in maintaining
1575 .I flex.
1576 .TP
1577 .B -8
1578 instructs
1579 .I flex
1580 to generate an 8-bit scanner, i.e., one which can recognize 8-bit
1581 characters.  On some sites,
1582 .I flex
1583 is installed with this option as the default.  On others, the default
1584 is 7-bit characters.  To see which is the case, check the verbose
1585 .B (-v)
1586 output for "equivalence classes created".  If the denominator of
1587 the number shown is 128, then by default
1588 .I flex
1589 is generating 7-bit characters.  If it is 256, then the default is
1590 8-bit characters and the
1591 .B -8
1592 flag is not required (but may be a good idea to keep the scanner
1593 specification portable).  Feeding a 7-bit scanner 8-bit characters
1594 will result in infinite loops, bus errors, or other such fireworks,
1595 so when in doubt, use the flag.  Note that if equivalence classes
1596 are used, 8-bit scanners take only slightly more table space than
1597 7-bit scanners (128 bytes, to be exact); if equivalence classes are
1598 not used, however, then the tables may grow up to twice their
1599 7-bit size.
1600 .TP
1601 .B -C[efmF]
1602 controls the degree of table compression.
1603 .IP
1604 .B -Ce
1605 directs
1606 .I flex
1607 to construct
1608 .I equivalence classes,
1609 i.e., sets of characters
1610 which have identical lexical properties (for example, if the only
1611 appearance of digits in the
1612 .I flex
1613 input is in the character class
1614 "[0-9]" then the digits '0', '1', ..., '9' will all be put
1615 in the same equivalence class).  Equivalence classes usually give
1616 dramatic reductions in the final table/object file sizes (typically
1617 a factor of 2-5) and are pretty cheap performance-wise (one array
1618 look-up per character scanned).
1619 .IP
1620 .B -Cf
1621 specifies that the
1622 .I full
1623 scanner tables should be generated -
1624 .I flex
1625 should not compress the
1626 tables by taking advantages of similar transition functions for
1627 different states.
1628 .IP
1629 .B -CF
1630 specifies that the alternate fast scanner representation (described
1631 above under the
1632 .B -F
1633 flag)
1634 should be used.
1635 .IP
1636 .B -Cm
1637 directs
1638 .I flex
1639 to construct
1640 .I meta-equivalence classes,
1641 which are sets of equivalence classes (or characters, if equivalence
1642 classes are not being used) that are commonly used together.  Meta-equivalence
1643 classes are often a big win when using compressed tables, but they
1644 have a moderate performance impact (one or two "if" tests and one
1645 array look-up per character scanned).
1646 .IP
1647 A lone
1648 .B -C
1649 specifies that the scanner tables should be compressed but neither
1650 equivalence classes nor meta-equivalence classes should be used.
1651 .IP
1652 The options
1653 .B -Cf
1654 or
1655 .B -CF
1656 and
1657 .B -Cm
1658 do not make sense together - there is no opportunity for meta-equivalence
1659 classes if the table is not being compressed.  Otherwise the options
1660 may be freely mixed.
1661 .IP
1662 The default setting is
1663 .B -Cem,
1664 which specifies that
1665 .I flex
1666 should generate equivalence classes
1667 and meta-equivalence classes.  This setting provides the highest
1668 degree of table compression.  You can trade off
1669 faster-executing scanners at the cost of larger tables with
1670 the following generally being true:
1671 .nf
1672
1673     slowest & smallest
1674           -Cem
1675           -Cm
1676           -Ce
1677           -C
1678           -C{f,F}e
1679           -C{f,F}
1680     fastest & largest
1681
1682 .fi
1683 Note that scanners with the smallest tables are usually generated and
1684 compiled the quickest, so
1685 during development you will usually want to use the default, maximal
1686 compression.
1687 .IP
1688 .B -Cfe
1689 is often a good compromise between speed and size for production
1690 scanners.
1691 .IP
1692 .B -C
1693 options are not cumulative; whenever the flag is encountered, the
1694 previous -C settings are forgotten.
1695 .TP
1696 .B -Sskeleton_file
1697 overrides the default skeleton file from which
1698 .I flex
1699 constructs its scanners.  You'll never need this option unless you are doing
1700 .I flex
1701 maintenance or development.
1702 .SH PERFORMANCE CONSIDERATIONS
1703 The main design goal of
1704 .I flex
1705 is that it generate high-performance scanners.  It has been optimized
1706 for dealing well with large sets of rules.  Aside from the effects
1707 of table compression on scanner speed outlined above,
1708 there are a number of options/actions which degrade performance.  These
1709 are, from most expensive to least:
1710 .nf
1711
1712     REJECT
1713
1714     pattern sets that require backtracking
1715     arbitrary trailing context
1716
1717     '^' beginning-of-line operator
1718     yymore()
1719
1720 .fi
1721 with the first three all being quite expensive and the last two
1722 being quite cheap.
1723 .LP
1724 .B REJECT
1725 should be avoided at all costs when performance is important.
1726 It is a particularly expensive option.
1727 .LP
1728 Getting rid of backtracking is messy and often may be an enormous
1729 amount of work for a complicated scanner.  In principal, one begins
1730 by using the
1731 .B -b
1732 flag to generate a
1733 .I lex.backtrack
1734 file.  For example, on the input
1735 .nf
1736
1737     %%
1738     foo        return TOK_KEYWORD;
1739     foobar     return TOK_KEYWORD;
1740
1741 .fi
1742 the file looks like:
1743 .nf
1744
1745     State #6 is non-accepting -
1746      associated rule line numbers:
1747            2       3
1748      out-transitions: [ o ]
1749      jam-transitions: EOF [ \\001-n  p-\\177 ]
1750
1751     State #8 is non-accepting -
1752      associated rule line numbers:
1753            3
1754      out-transitions: [ a ]
1755      jam-transitions: EOF [ \\001-`  b-\\177 ]
1756
1757     State #9 is non-accepting -
1758      associated rule line numbers:
1759            3
1760      out-transitions: [ r ]
1761      jam-transitions: EOF [ \\001-q  s-\\177 ]
1762
1763     Compressed tables always backtrack.
1764
1765 .fi
1766 The first few lines tell us that there's a scanner state in
1767 which it can make a transition on an 'o' but not on any other
1768 character, and that in that state the currently scanned text does not match
1769 any rule.  The state occurs when trying to match the rules found
1770 at lines 2 and 3 in the input file.
1771 If the scanner is in that state and then reads
1772 something other than an 'o', it will have to backtrack to find
1773 a rule which is matched.  With
1774 a bit of headscratching one can see that this must be the
1775 state it's in when it has seen "fo".  When this has happened,
1776 if anything other than another 'o' is seen, the scanner will
1777 have to back up to simply match the 'f' (by the default rule).
1778 .LP
1779 The comment regarding State #8 indicates there's a problem
1780 when "foob" has been scanned.  Indeed, on any character other
1781 than a 'b', the scanner will have to back up to accept "foo".
1782 Similarly, the comment for State #9 concerns when "fooba" has
1783 been scanned.
1784 .LP
1785 The final comment reminds us that there's no point going to
1786 all the trouble of removing backtracking from the rules unless
1787 we're using
1788 .B -f
1789 or
1790 .B -F,
1791 since there's no performance gain doing so with compressed scanners.
1792 .LP
1793 The way to remove the backtracking is to add "error" rules:
1794 .nf
1795
1796     %%
1797     foo         return TOK_KEYWORD;
1798     foobar      return TOK_KEYWORD;
1799
1800     fooba       |
1801     foob        |
1802     fo          {
1803                 /* false alarm, not really a keyword */
1804                 return TOK_ID;
1805                 }
1806
1807 .fi
1808 .LP
1809 Eliminating backtracking among a list of keywords can also be
1810 done using a "catch-all" rule:
1811 .nf
1812
1813     %%
1814     foo         return TOK_KEYWORD;
1815     foobar      return TOK_KEYWORD;
1816
1817     [a-z]+      return TOK_ID;
1818
1819 .fi
1820 This is usually the best solution when appropriate.
1821 .LP
1822 Backtracking messages tend to cascade.
1823 With a complicated set of rules it's not uncommon to get hundreds
1824 of messages.  If one can decipher them, though, it often
1825 only takes a dozen or so rules to eliminate the backtracking (though
1826 it's easy to make a mistake and have an error rule accidentally match
1827 a valid token.  A possible future
1828 .I flex
1829 feature will be to automatically add rules to eliminate backtracking).
1830 .LP
1831 .I Variable
1832 trailing context (where both the leading and trailing parts do not have
1833 a fixed length) entails almost the same performance loss as
1834 .I REJECT
1835 (i.e., substantial).  So when possible a rule like:
1836 .nf
1837
1838     %%
1839     mouse|rat/(cat|dog)   run();
1840
1841 .fi
1842 is better written:
1843 .nf
1844
1845     %%
1846     mouse/cat|dog         run();
1847     rat/cat|dog           run();
1848
1849 .fi
1850 or as
1851 .nf
1852
1853     %%
1854     mouse|rat/cat         run();
1855     mouse|rat/dog         run();
1856
1857 .fi
1858 Note that here the special '|' action does
1859 .I not
1860 provide any savings, and can even make things worse (see
1861 .B BUGS
1862 in flex(1)).
1863 .LP
1864 Another area where the user can increase a scanner's performance
1865 (and one that's easier to implement) arises from the fact that
1866 the longer the tokens matched, the faster the scanner will run.
1867 This is because with long tokens the processing of most input
1868 characters takes place in the (short) inner scanning loop, and
1869 does not often have to go through the additional work of setting up
1870 the scanning environment (e.g.,
1871 .B yytext)
1872 for the action.  Recall the scanner for C comments:
1873 .nf
1874
1875     %x comment
1876     %%
1877             int line_num = 1;
1878
1879     "/*"         BEGIN(comment);
1880
1881     <comment>[^*\\n]*
1882     <comment>"*"+[^*/\\n]*
1883     <comment>\\n             ++line_num;
1884     <comment>"*"+"/"        BEGIN(INITIAL);
1885
1886 .fi
1887 This could be sped up by writing it as:
1888 .nf
1889
1890     %x comment
1891     %%
1892             int line_num = 1;
1893
1894     "/*"         BEGIN(comment);
1895
1896     <comment>[^*\\n]*
1897     <comment>[^*\\n]*\\n      ++line_num;
1898     <comment>"*"+[^*/\\n]*
1899     <comment>"*"+[^*/\\n]*\\n ++line_num;
1900     <comment>"*"+"/"        BEGIN(INITIAL);
1901
1902 .fi
1903 Now instead of each newline requiring the processing of another
1904 action, recognizing the newlines is "distributed" over the other rules
1905 to keep the matched text as long as possible.  Note that
1906 .I adding
1907 rules does
1908 .I not
1909 slow down the scanner!  The speed of the scanner is independent
1910 of the number of rules or (modulo the considerations given at the
1911 beginning of this section) how complicated the rules are with
1912 regard to operators such as '*' and '|'.
1913 .LP
1914 A final example in speeding up a scanner: suppose you want to scan
1915 through a file containing identifiers and keywords, one per line
1916 and with no other extraneous characters, and recognize all the
1917 keywords.  A natural first approach is:
1918 .nf
1919
1920     %%
1921     asm      |
1922     auto     |
1923     break    |
1924     ... etc ...
1925     volatile |
1926     while    /* it's a keyword */
1927
1928     .|\\n     /* it's not a keyword */
1929
1930 .fi
1931 To eliminate the back-tracking, introduce a catch-all rule:
1932 .nf
1933
1934     %%
1935     asm      |
1936     auto     |
1937     break    |
1938     ... etc ...
1939     volatile |
1940     while    /* it's a keyword */
1941
1942     [a-z]+   |
1943     .|\\n     /* it's not a keyword */
1944
1945 .fi
1946 Now, if it's guaranteed that there's exactly one word per line,
1947 then we can reduce the total number of matches by a half by
1948 merging in the recognition of newlines with that of the other
1949 tokens:
1950 .nf
1951
1952     %%
1953     asm\\n    |
1954     auto\\n   |
1955     break\\n  |
1956     ... etc ...
1957     volatile\\n |
1958     while\\n  /* it's a keyword */
1959
1960     [a-z]+\\n |
1961     .|\\n     /* it's not a keyword */
1962
1963 .fi
1964 One has to be careful here, as we have now reintroduced backtracking
1965 into the scanner.  In particular, while
1966 .I we
1967 know that there will never be any characters in the input stream
1968 other than letters or newlines,
1969 .I flex
1970 can't figure this out, and it will plan for possibly needing backtracking
1971 when it has scanned a token like "auto" and then the next character
1972 is something other than a newline or a letter.  Previously it would
1973 then just match the "auto" rule and be done, but now it has no "auto"
1974 rule, only a "auto\\n" rule.  To eliminate the possibility of backtracking,
1975 we could either duplicate all rules but without final newlines, or,
1976 since we never expect to encounter such an input and therefore don't
1977 how it's classified, we can introduce one more catch-all rule, this
1978 one which doesn't include a newline:
1979 .nf
1980
1981     %%
1982     asm\\n    |
1983     auto\\n   |
1984     break\\n  |
1985     ... etc ...
1986     volatile\\n |
1987     while\\n  /* it's a keyword */
1988
1989     [a-z]+\\n |
1990     [a-z]+   |
1991     .|\\n     /* it's not a keyword */
1992
1993 .fi
1994 Compiled with
1995 .B -Cf,
1996 this is about as fast as one can get a
1997 .I flex
1998 scanner to go for this particular problem.
1999 .LP
2000 A final note:
2001 .I flex
2002 is slow when matching NUL's, particularly when a token contains
2003 multiple NUL's.
2004 It's best to write rules which match
2005 .I short
2006 amounts of text if it's anticipated that the text will often include NUL's.
2007 .SH INCOMPATIBILITIES WITH LEX AND POSIX
2008 .I flex
2009 is a rewrite of the Unix
2010 .I lex
2011 tool (the two implementations do not share any code, though),
2012 with some extensions and incompatibilities, both of which
2013 are of concern to those who wish to write scanners acceptable
2014 to either implementation.  At present, the POSIX
2015 .I lex
2016 draft is
2017 very close to the original
2018 .I lex
2019 implementation, so some of these
2020 incompatibilities are also in conflict with the POSIX draft.  But
2021 the intent is that except as noted below,
2022 .I flex
2023 as it presently stands will
2024 ultimately be POSIX conformant (i.e., that those areas of conflict with
2025 the POSIX draft will be resolved in
2026 .I flex's
2027 favor).  Please bear in
2028 mind that all the comments which follow are with regard to the POSIX
2029 .I draft
2030 standard of Summer 1989, and not the final document (or subsequent
2031 drafts); they are included so
2032 .I flex
2033 users can be aware of the standardization issues and those areas where
2034 .I flex
2035 may in the near future undergo changes incompatible with
2036 its current definition.
2037 .LP
2038 .I flex
2039 is fully compatible with
2040 .I lex
2041 with the following exceptions:
2042 .IP -
2043 The undocumented
2044 .I lex
2045 scanner internal variable
2046 .B yylineno
2047 is not supported.  It is difficult to support this option efficiently,
2048 since it requires examining every character scanned and reexamining
2049 the characters when the scanner backs up.
2050 Things get more complicated when the end of buffer or file is reached or a
2051 NUL is scanned (since the scan must then be restarted with the proper line
2052 number count), or the user uses the yyless(), unput(), or REJECT actions,
2053 or the multiple input buffer functions.
2054 .IP
2055 The fix is to add rules which, upon seeing a newline, increment
2056 yylineno.  This is usually an easy process, though it can be a drag if some
2057 of the patterns can match multiple newlines along with other characters.
2058 .IP
2059 yylineno is not part of the POSIX draft.
2060 .IP -
2061 The
2062 .B input()
2063 routine is not redefinable, though it may be called to read characters
2064 following whatever has been matched by a rule.  If
2065 .B input()
2066 encounters an end-of-file the normal
2067 .B yywrap()
2068 processing is done.  A ``real'' end-of-file is returned by
2069 .B input()
2070 as
2071 .I EOF.
2072 .IP
2073 Input is instead controlled by redefining the
2074 .B YY_INPUT
2075 macro.
2076 .IP
2077 The
2078 .I flex
2079 restriction that
2080 .B input()
2081 cannot be redefined is in accordance with the POSIX draft, but
2082 .B YY_INPUT
2083 has not yet been accepted into the draft (and probably won't; it looks
2084 like the draft will simply not specify any way of controlling the
2085 scanner's input other than by making an initial assignment to
2086 .I yyin).
2087 .IP -
2088 .I flex
2089 scanners do not use stdio for input.  Because of this, when writing an
2090 interactive scanner one must explicitly call fflush() on the
2091 stream associated with the terminal after writing out a prompt.
2092 With
2093 .I lex
2094 such writes are automatically flushed since
2095 .I lex
2096 scanners use
2097 .B getchar()
2098 for their input.  Also, when writing interactive scanners with
2099 .I flex,
2100 the
2101 .B -I
2102 flag must be used.
2103 .IP -
2104 .I flex
2105 scanners are not as reentrant as
2106 .I lex
2107 scanners.  In particular, if you have an interactive scanner and
2108 an interrupt handler which long-jumps out of the scanner, and
2109 the scanner is subsequently called again, you may get the following
2110 message:
2111 .nf
2112
2113     fatal flex scanner internal error--end of buffer missed
2114
2115 .fi
2116 To reenter the scanner, first use
2117 .nf
2118
2119     yyrestart( yyin );
2120
2121 .fi
2122 .IP -
2123 .B output()
2124 is not supported.
2125 Output from the
2126 .B ECHO
2127 macro is done to the file-pointer
2128 .I yyout
2129 (default
2130 .I stdout).
2131 .IP
2132 The POSIX draft mentions that an
2133 .B output()
2134 routine exists but currently gives no details as to what it does.
2135 .IP -
2136 .I lex
2137 does not support exclusive start conditions (%x), though they
2138 are in the current POSIX draft.
2139 .IP -
2140 When definitions are expanded,
2141 .I flex
2142 encloses them in parentheses.
2143 With lex, the following:
2144 .nf
2145
2146     NAME    [A-Z][A-Z0-9]*
2147     %%
2148     foo{NAME}?      printf( "Found it\\n" );
2149     %%
2150
2151 .fi
2152 will not match the string "foo" because when the macro
2153 is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
2154 and the precedence is such that the '?' is associated with
2155 "[A-Z0-9]*".  With
2156 .I flex,
2157 the rule will be expanded to
2158 "foo([A-Z][A-Z0-9]*)?" and so the string "foo" will match.
2159 Note that because of this, the
2160 .B ^, $, <s>, /,
2161 and
2162 .B <<EOF>>
2163 operators cannot be used in a
2164 .I flex
2165 definition.
2166 .IP
2167 The POSIX draft interpretation is the same as
2168 .I flex's.
2169 .IP -
2170 To specify a character class which matches anything but a left bracket (']'),
2171 in
2172 .I lex
2173 one can use "[^]]" but with
2174 .I flex
2175 one must use "[^\\]]".  The latter works with
2176 .I lex,
2177 too.
2178 .IP -
2179 The
2180 .I lex
2181 .B %r
2182 (generate a Ratfor scanner) option is not supported.  It is not part
2183 of the POSIX draft.
2184 .IP -
2185 If you are providing your own yywrap() routine, you must include a
2186 "#undef yywrap" in the definitions section (section 1).  Note that
2187 the "#undef" will have to be enclosed in %{}'s.
2188 .IP
2189 The POSIX draft
2190 specifies that yywrap() is a function and this is very unlikely to change; so
2191 .I flex users are warned
2192 that
2193 .B yywrap()
2194 is likely to be changed to a function in the near future.
2195 .IP -
2196 After a call to
2197 .B unput(),
2198 .I yytext
2199 and
2200 .I yyleng
2201 are undefined until the next token is matched.  This is not the case with
2202 .I lex
2203 or the present POSIX draft.
2204 .IP -
2205 The precedence of the
2206 .B {}
2207 (numeric range) operator is different.
2208 .I lex
2209 interprets "abc{1,3}" as "match one, two, or
2210 three occurrences of 'abc'", whereas
2211 .I flex
2212 interprets it as "match 'ab'
2213 followed by one, two, or three occurrences of 'c'".  The latter is
2214 in agreement with the current POSIX draft.
2215 .IP -
2216 The precedence of the
2217 .B ^
2218 operator is different.
2219 .I lex
2220 interprets "^foo|bar" as "match either 'foo' at the beginning of a line,
2221 or 'bar' anywhere", whereas
2222 .I flex
2223 interprets it as "match either 'foo' or 'bar' if they come at the beginning
2224 of a line".  The latter is in agreement with the current POSIX draft.
2225 .IP -
2226 To refer to yytext outside of the scanner source file,
2227 the correct definition with
2228 .I flex
2229 is "extern char *yytext" rather than "extern char yytext[]".
2230 This is contrary to the current POSIX draft but a point on which
2231 .I flex
2232 will not be changing, as the array representation entails a
2233 serious performance penalty.  It is hoped that the POSIX draft will
2234 be emended to support the
2235 .I flex
2236 variety of declaration (as this is a fairly painless change to
2237 require of
2238 .I lex
2239 users).
2240 .IP -
2241 .I yyin
2242 is
2243 .I initialized
2244 by
2245 .I lex
2246 to be
2247 .I stdin;
2248 .I flex,
2249 on the other hand,
2250 initializes
2251 .I yyin
2252 to NULL
2253 and then
2254 .I assigns
2255 it to
2256 .I stdin
2257 the first time the scanner is called, providing
2258 .I yyin
2259 has not already been assigned to a non-NULL value.  The difference is
2260 subtle, but the net effect is that with
2261 .I flex
2262 scanners,
2263 .I yyin
2264 does not have a valid value until the scanner has been called.
2265 .IP -
2266 The special table-size declarations such as
2267 .B %a
2268 supported by
2269 .I lex
2270 are not required by
2271 .I flex
2272 scanners;
2273 .I flex
2274 ignores them.
2275 .IP -
2276 The name
2277 .B FLEX_SCANNER
2278 is #define'd so scanners may be written for use with either
2279 .I flex
2280 or
2281 .I lex.
2282 .LP
2283 The following
2284 .I flex
2285 features are not included in
2286 .I lex
2287 or the POSIX draft standard:
2288 .nf
2289
2290     yyterminate()
2291     <<EOF>>
2292     YY_DECL
2293     #line directives
2294     %{}'s around actions
2295     yyrestart()
2296     comments beginning with '#' (deprecated)
2297     multiple actions on a line
2298
2299 .fi
2300 This last feature refers to the fact that with
2301 .I flex
2302 you can put multiple actions on the same line, separated with
2303 semi-colons, while with
2304 .I lex,
2305 the following
2306 .nf
2307
2308     foo    handle_foo(); ++num_foos_seen;
2309
2310 .fi
2311 is (rather surprisingly) truncated to
2312 .nf
2313
2314     foo    handle_foo();
2315
2316 .fi
2317 .I flex
2318 does not truncate the action.  Actions that are not enclosed in
2319 braces are simply terminated at the end of the line.
2320 .SH DIAGNOSTICS
2321 .I reject_used_but_not_detected undefined
2322 or
2323 .I yymore_used_but_not_detected undefined -
2324 These errors can occur at compile time.  They indicate that the
2325 scanner uses
2326 .B REJECT
2327 or
2328 .B yymore()
2329 but that
2330 .I flex
2331 failed to notice the fact, meaning that
2332 .I flex
2333 scanned the first two sections looking for occurrences of these actions
2334 and failed to find any, but somehow you snuck some in (via a #include
2335 file, for example).  Make an explicit reference to the action in your
2336 .I flex
2337 input file.  (Note that previously
2338 .I flex
2339 supported a
2340 .B %used/%unused
2341 mechanism for dealing with this problem; this feature is still supported
2342 but now deprecated, and will go away soon unless the author hears from
2343 people who can argue compellingly that they need it.)
2344 .LP
2345 .I flex scanner jammed -
2346 a scanner compiled with
2347 .B -s
2348 has encountered an input string which wasn't matched by
2349 any of its rules.
2350 .LP
2351 .I flex input buffer overflowed -
2352 a scanner rule matched a string long enough to overflow the
2353 scanner's internal input buffer (16K bytes by default - controlled by
2354 .B YY_BUF_SIZE
2355 in "flex.skel".  Note that to redefine this macro, you must first
2356 .B #undefine
2357 it).
2358 .LP
2359 .I scanner requires -8 flag -
2360 Your scanner specification includes recognizing 8-bit characters and
2361 you did not specify the -8 flag (and your site has not installed flex
2362 with -8 as the default).
2363 .LP
2364 .I
2365 fatal flex scanner internal error--end of buffer missed -
2366 This can occur in an scanner which is reentered after a long-jump
2367 has jumped out (or over) the scanner's activation frame.  Before
2368 reentering the scanner, use:
2369 .nf
2370
2371     yyrestart( yyin );
2372
2373 .fi
2374 .LP
2375 .I too many %t classes! -
2376 You managed to put every single character into its own %t class.
2377 .I flex
2378 requires that at least one of the classes share characters.
2379 .SH DEFICIENCIES / BUGS
2380 See flex(1).
2381 .SH "SEE ALSO"
2382 .LP
2383 flex(1), lex(1), yacc(1), sed(1), awk(1x).
2384 .LP
2385 M. E. Lesk and E. Schmidt,
2386 .I LEX - Lexical Analyzer Generator
2387 .SH AUTHOR
2388 Vern Paxson, with the help of many ideas and much inspiration from
2389 Van Jacobson.  Original version by Jef Poskanzer.  The fast table
2390 representation is a partial implementation of a design done by Van
2391 Jacobson.  The implementation was done by Kevin Gong and Vern Paxson.
2392 .LP
2393 Thanks to the many
2394 .I flex
2395 beta-testers, feedbackers, and contributors, especially Casey
2396 Leedom, benson@odi.com, Keith Bostic,
2397 Frederic Brehm, Nick Christopher, Jason Coughlin,
2398 Scott David Daniels, Leo Eskin,
2399 Chris Faylor, Eric Goldman, Eric
2400 Hughes, Jeffrey R. Jones, Kevin B. Kenny, Ronald Lamprecht,
2401 Greg Lee, Craig Leres, Mohamed el Lozy, Jim Meyering, Marc Nozell, Esmond Pitt,
2402 Jef Poskanzer, Jim Roskind,
2403 Dave Tallman, Frank Whaley, Ken Yap, and those whose names
2404 have slipped my marginal mail-archiving skills but whose contributions
2405 are appreciated all the same.
2406 .LP
2407 Thanks to Keith Bostic, John Gilmore, Craig Leres, Bob
2408 Mulcahy, Rich Salz, and Richard Stallman for help with various distribution
2409 headaches.
2410 .LP
2411 Thanks to Esmond Pitt and Earle Horton for 8-bit character support;
2412 to Benson Margulies and Fred
2413 Burke for C++ support; to Ove Ewerlid for the basics of support for
2414 NUL's; and to Eric Hughes for the basics of support for multiple buffers.
2415 .LP
2416 Work is being done on extending
2417 .I flex
2418 to generate scanners in which the
2419 state machine is directly represented in C code rather than tables.
2420 These scanners may well be substantially faster than those generated
2421 using -f or -F.  If you are working in this area and are interested
2422 in comparing notes and seeing whether redundant work can be avoided,
2423 contact Ove Ewerlid (ewerlid@mizar.DoCS.UU.SE).
2424 .LP
2425 This work was primarily done when I was at the Real Time Systems Group
2426 at the Lawrence Berkeley Laboratory in Berkeley, CA.  Many thanks to all there
2427 for the support I received.
2428 .LP
2429 Send comments to:
2430 .nf
2431
2432      Vern Paxson
2433      Computer Science Department
2434      4126 Upson Hall
2435      Cornell University
2436      Ithaca, NY 14853-7501
2437
2438      vern@cs.cornell.edu
2439      decvax!cornell!vern
2440
2441 .fi
2442 .\" ref. to awk(9) man page corrected -- ASW 2005-01-15