commands/flex-2.5.4/MISC/texinfo/flex.info

   1 This is Info file flex.info, produced by Makeinfo-1.55 from the input
   2 file flex.texi.
   3
   4 START-INFO-DIR-ENTRY
   5 * Flex: (flex).         A fast scanner generator.
   6 END-INFO-DIR-ENTRY
   7
   8    This file documents Flex.
   9
  10    Copyright (c) 1990 The Regents of the University of California.  All
  11 rights reserved.
  12
  13    This code is derived from software contributed to Berkeley by Vern
  14 Paxson.
  15
  16    The United States Government has rights in this work pursuant to
  17 contract no. DE-AC03-76SF00098 between the United States Department of
  18 Energy and the University of California.
  19
  20    Redistribution and use in source and binary forms with or without
  21 modification are permitted provided that: (1) source distributions
  22 retain this entire copyright notice and comment, and (2) distributions
  23 including binaries display the following acknowledgement:  "This
  24 product includes software developed by the University of California,
  25 Berkeley and its contributors" in the documentation or other materials
  26 provided with the distribution and in all advertising materials
  27 mentioning features or use of this software.  Neither the name of the
  28 University nor the names of its contributors may be used to endorse or
  29 promote products derived from this software without specific prior
  30 written permission.
  31
  32    THIS SOFTWARE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED
  33 WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
  34 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
  35
  36 \x1f
  37 File: flex.info,  Node: Top,  Next: Name,  Prev: (dir),  Up: (dir)
  38
  39 flex
  40 ****
  41
  42    This manual documents `flex'.  It covers release 2.5.
  43
  44 * Menu:
  45
  46 * Name::                        Name
  47 * Synopsis::                    Synopsis
  48 * Overview::                    Overview
  49 * Description::                 Description
  50 * Examples::                    Some simple examples
  51 * Format::                      Format of the input file
  52 * Patterns::                    Patterns
  53 * Matching::                    How the input is matched
  54 * Actions::                     Actions
  55 * Generated scanner::           The generated scanner
  56 * Start conditions::            Start conditions
  57 * Multiple buffers::            Multiple input buffers
  58 * End-of-file rules::           End-of-file rules
  59 * Miscellaneous::               Miscellaneous macros
  60 * User variables::              Values available to the user
  61 * YACC interface::              Interfacing with `yacc'
  62 * Options::                     Options
  63 * Performance::                 Performance considerations
  64 * C++::                         Generating C++ scanners
  65 * Incompatibilities::           Incompatibilities with `lex' and POSIX
  66 * Diagnostics::                 Diagnostics
  67 * Files::                       Files
  68 * Deficiencies::                Deficiencies / Bugs
  69 * See also::                    See also
  70 * Author::                      Author
  71
  72 \x1f
  73 File: flex.info,  Node: Name,  Next: Synopsis,  Prev: Top,  Up: Top
  74
  75 Name
  76 ====
  77
  78    flex - fast lexical analyzer generator
  79
  80 \x1f
  81 File: flex.info,  Node: Synopsis,  Next: Overview,  Prev: Name,  Up: Top
  82
  83 Synopsis
  84 ========
  85
  86      flex [-bcdfhilnpstvwBFILTV78+? -C[aefFmr] -ooutput -Pprefix -Sskeleton]
  87      [--help --version] [FILENAME ...]
  88
  89 \x1f
  90 File: flex.info,  Node: Overview,  Next: Description,  Prev: Synopsis,  Up: Top
  91
  92 Overview
  93 ========
  94
  95    This manual describes `flex', a tool for generating programs that
  96 perform pattern-matching on text.  The manual includes both tutorial
  97 and reference sections:
  98
  99 Description
 100      a brief overview of the tool
 101
 102 Some Simple Examples
 103 Format Of The Input File
 104 Patterns
 105      the extended regular expressions used by flex
 106
 107 How The Input Is Matched
 108      the rules for determining what has been matched
 109
 110 Actions
 111      how to specify what to do when a pattern is matched
 112
 113 The Generated Scanner
 114      details regarding the scanner that flex produces; how to control
 115      the input source
 116
 117 Start Conditions
 118      introducing context into your scanners, and managing
 119      "mini-scanners"
 120
 121 Multiple Input Buffers
 122      how to manipulate multiple input sources; how to scan from strings
 123      instead of files
 124
 125 End-of-file Rules
 126      special rules for matching the end of the input
 127
 128 Miscellaneous Macros
 129      a summary of macros available to the actions
 130
 131 Values Available To The User
 132      a summary of values available to the actions
 133
 134 Interfacing With Yacc
 135      connecting flex scanners together with yacc parsers
 136
 137 Options
 138      flex command-line options, and the "%option" directive
 139
 140 Performance Considerations
 141      how to make your scanner go as fast as possible
 142
 143 Generating C++ Scanners
 144      the (experimental) facility for generating C++ scanner classes
 145
 146 Incompatibilities With Lex And POSIX
 147      how flex differs from AT&T lex and the POSIX lex standard
 148
 149 Diagnostics
 150      those error messages produced by flex (or scanners it generates)
 151      whose meanings might not be apparent
 152
 153 Files
 154      files used by flex
 155
 156 Deficiencies / Bugs
 157      known problems with flex
 158
 159 See Also
 160      other documentation, related tools
 161
 162 Author
 163      includes contact information
 164
 165 \x1f
 166 File: flex.info,  Node: Description,  Next: Examples,  Prev: Overview,  Up: Top
 167
 168 Description
 169 ===========
 170
 171    `flex' is a tool for generating "scanners": programs which
 172 recognized lexical patterns in text.  `flex' reads the given input
 173 files, or its standard input if no file names are given, for a
 174 description of a scanner to generate.  The description is in the form
 175 of pairs of regular expressions and C code, called "rules". `flex'
 176 generates as output a C source file, `lex.yy.c', which defines a
 177 routine `yylex()'.  This file is compiled and linked with the `-lfl'
 178 library to produce an executable.  When the executable is run, it
 179 analyzes its input for occurrences of the regular expressions.
 180 Whenever it finds one, it executes the corresponding C code.
 181
 182 \x1f
 183 File: flex.info,  Node: Examples,  Next: Format,  Prev: Description,  Up: Top
 184
 185 Some simple examples
 186 ====================
 187
 188    First some simple examples to get the flavor of how one uses `flex'.
 189 The following `flex' input specifies a scanner which whenever it
 190 encounters the string "username" will replace it with the user's login
 191 name:
 192
 193      %%
 194      username    printf( "%s", getlogin() );
 195
 196    By default, any text not matched by a `flex' scanner is copied to
 197 the output, so the net effect of this scanner is to copy its input file
 198 to its output with each occurrence of "username" expanded.  In this
 199 input, there is just one rule.  "username" is the PATTERN and the
 200 "printf" is the ACTION.  The "%%" marks the beginning of the rules.
 201
 202    Here's another simple example:
 203
 204              int num_lines = 0, num_chars = 0;
 205
 206      %%
 207      \n      ++num_lines; ++num_chars;
 208      .       ++num_chars;
 209
 210      %%
 211      main()
 212              {
 213              yylex();
 214              printf( "# of lines = %d, # of chars = %d\n",
 215                      num_lines, num_chars );
 216              }
 217
 218    This scanner counts the number of characters and the number of lines
 219 in its input (it produces no output other than the final report on the
 220 counts).  The first line declares two globals, "num_lines" and
 221 "num_chars", which are accessible both inside `yylex()' and in the
 222 `main()' routine declared after the second "%%".  There are two rules,
 223 one which matches a newline ("\n") and increments both the line count
 224 and the character count, and one which matches any character other than
 225 a newline (indicated by the "." regular expression).
 226
 227    A somewhat more complicated example:
 228
 229      /* scanner for a toy Pascal-like language */
 230
 231      %{
 232      /* need this for the call to atof() below */
 233      #include <math.h>
 234      %}
 235
 236      DIGIT    [0-9]
 237      ID       [a-z][a-z0-9]*
 238
 239      %%
 240
 241      {DIGIT}+    {
 242                  printf( "An integer: %s (%d)\n", yytext,
 243                          atoi( yytext ) );
 244                  }
 245
 246      {DIGIT}+"."{DIGIT}*        {
 247                  printf( "A float: %s (%g)\n", yytext,
 248                          atof( yytext ) );
 249                  }
 250
 251      if|then|begin|end|procedure|function        {
 252                  printf( "A keyword: %s\n", yytext );
 253                  }
 254
 255      {ID}        printf( "An identifier: %s\n", yytext );
 256
 257      "+"|"-"|"*"|"/"   printf( "An operator: %s\n", yytext );
 258
 259      "{"[^}\n]*"}"     /* eat up one-line comments */
 260
 261      [ \t\n]+          /* eat up whitespace */
 262
 263      .           printf( "Unrecognized character: %s\n", yytext );
 264
 265      %%
 266
 267      main( argc, argv )
 268      int argc;
 269      char **argv;
 270          {
 271          ++argv, --argc;  /* skip over program name */
 272          if ( argc > 0 )
 273                  yyin = fopen( argv[0], "r" );
 274          else
 275                  yyin = stdin;
 276
 277          yylex();
 278          }
 279
 280    This is the beginnings of a simple scanner for a language like
 281 Pascal.  It identifies different types of TOKENS and reports on what it
 282 has seen.
 283
 284    The details of this example will be explained in the following
 285 sections.
 286
 287 \x1f
 288 File: flex.info,  Node: Format,  Next: Patterns,  Prev: Examples,  Up: Top
 289
 290 Format of the input file
 291 ========================
 292
 293    The `flex' input file consists of three sections, separated by a
 294 line with just `%%' in it:
 295
 296      definitions
 297      %%
 298      rules
 299      %%
 300      user code
 301
 302    The "definitions" section contains declarations of simple "name"
 303 definitions to simplify the scanner specification, and declarations of
 304 "start conditions", which are explained in a later section.  Name
 305 definitions have the form:
 306
 307      name definition
 308
 309    The "name" is a word beginning with a letter or an underscore ('_')
 310 followed by zero or more letters, digits, '_', or '-' (dash).  The
 311 definition is taken to begin at the first non-white-space character
 312 following the name and continuing to the end of the line.  The
 313 definition can subsequently be referred to using "{name}", which will
 314 expand to "(definition)".  For example,
 315
 316      DIGIT    [0-9]
 317      ID       [a-z][a-z0-9]*
 318
 319 defines "DIGIT" to be a regular expression which matches a single
 320 digit, and "ID" to be a regular expression which matches a letter
 321 followed by zero-or-more letters-or-digits.  A subsequent reference to
 322
 323      {DIGIT}+"."{DIGIT}*
 324
 325 is identical to
 326
 327      ([0-9])+"."([0-9])*
 328
 329 and matches one-or-more digits followed by a '.' followed by
 330 zero-or-more digits.
 331
 332    The RULES section of the `flex' input contains a series of rules of
 333 the form:
 334
 335      pattern   action
 336
 337 where the pattern must be unindented and the action must begin on the
 338 same line.
 339
 340    See below for a further description of patterns and actions.
 341
 342    Finally, the user code section is simply copied to `lex.yy.c'
 343 verbatim.  It is used for companion routines which call or are called
 344 by the scanner.  The presence of this section is optional; if it is
 345 missing, the second `%%' in the input file may be skipped, too.
 346
 347    In the definitions and rules sections, any *indented* text or text
 348 enclosed in `%{' and `%}' is copied verbatim to the output (with the
 349 `%{}''s removed).  The `%{}''s must appear unindented on lines by
 350 themselves.
 351
 352    In the rules section, any indented or %{} text appearing before the
 353 first rule may be used to declare variables which are local to the
 354 scanning routine and (after the declarations) code which is to be
 355 executed whenever the scanning routine is entered.  Other indented or
 356 %{} text in the rule section is still copied to the output, but its
 357 meaning is not well-defined and it may well cause compile-time errors
 358 (this feature is present for `POSIX' compliance; see below for other
 359 such features).
 360
 361    In the definitions section (but not in the rules section), an
 362 unindented comment (i.e., a line beginning with "/*") is also copied
 363 verbatim to the output up to the next "*/".
 364
 365 \x1f
 366 File: flex.info,  Node: Patterns,  Next: Matching,  Prev: Format,  Up: Top
 367
 368 Patterns
 369 ========
 370
 371    The patterns in the input are written using an extended set of
 372 regular expressions.  These are:
 373
 374 `x'
 375      match the character `x'
 376
 377 `.'
 378      any character (byte) except newline
 379
 380 `[xyz]'
 381      a "character class"; in this case, the pattern matches either an
 382      `x', a `y', or a `z'
 383
 384 `[abj-oZ]'
 385      a "character class" with a range in it; matches an `a', a `b', any
 386      letter from `j' through `o', or a `Z'
 387
 388 `[^A-Z]'
 389      a "negated character class", i.e., any character but those in the
 390      class.  In this case, any character EXCEPT an uppercase letter.
 391
 392 `[^A-Z\n]'
 393      any character EXCEPT an uppercase letter or a newline
 394
 395 `R*'
 396      zero or more R's, where R is any regular expression
 397
 398 `R+'
 399      one or more R's
 400
 401 `R?'
 402      zero or one R's (that is, "an optional R")
 403
 404 `R{2,5}'
 405      anywhere from two to five R's
 406
 407 `R{2,}'
 408      two or more R's
 409
 410 `R{4}'
 411      exactly 4 R's
 412
 413 `{NAME}'
 414      the expansion of the "NAME" definition (see above)
 415
 416 `"[xyz]\"foo"'
 417      the literal string: `[xyz]"foo'
 418
 419 `\X'
 420      if X is an `a', `b', `f', `n', `r', `t', or `v', then the ANSI-C
 421      interpretation of \X.  Otherwise, a literal `X' (used to escape
 422      operators such as `*')
 423
 424 `\0'
 425      a NUL character (ASCII code 0)
 426
 427 `\123'
 428      the character with octal value 123
 429
 430 `\x2a'
 431      the character with hexadecimal value `2a'
 432
 433 `(R)'
 434      match an R; parentheses are used to override precedence (see below)
 435
 436 `RS'
 437      the regular expression R followed by the regular expression S;
 438      called "concatenation"
 439
 440 `R|S'
 441      either an R or an S
 442
 443 `R/S'
 444      an R but only if it is followed by an S.  The text matched by S is
 445      included when determining whether this rule is the "longest
 446      match", but is then returned to the input before the action is
 447      executed.  So the action only sees the text matched by R.  This
 448      type of pattern is called "trailing context".  (There are some
 449      combinations of `R/S' that `flex' cannot match correctly; see
 450      notes in the Deficiencies / Bugs section below regarding
 451      "dangerous trailing context".)
 452
 453 `^R'
 454      an R, but only at the beginning of a line (i.e., which just
 455      starting to scan, or right after a newline has been scanned).
 456
 457 `R$'
 458      an R, but only at the end of a line (i.e., just before a newline).
 459      Equivalent to "R/\n".
 460
 461      Note that flex's notion of "newline" is exactly whatever the C
 462      compiler used to compile flex interprets '\n' as; in particular,
 463      on some DOS systems you must either filter out \r's in the input
 464      yourself, or explicitly use R/\r\n for "r$".
 465
 466 `<S>R'
 467      an R, but only in start condition S (see below for discussion of
 468      start conditions) <S1,S2,S3>R same, but in any of start conditions
 469      S1, S2, or S3
 470
 471 `<*>R'
 472      an R in any start condition, even an exclusive one.
 473
 474 `<<EOF>>'
 475      an end-of-file <S1,S2><<EOF>> an end-of-file when in start
 476      condition S1 or S2
 477
 478    Note that inside of a character class, all regular expression
 479 operators lose their special meaning except escape ('\') and the
 480 character class operators, '-', ']', and, at the beginning of the
 481 class, '^'.
 482
 483    The regular expressions listed above are grouped according to
 484 precedence, from highest precedence at the top to lowest at the bottom.
 485 Those grouped together have equal precedence.  For example,
 486
 487      foo|bar*
 488
 489 is the same as
 490
 491      (foo)|(ba(r*))
 492
 493 since the '*' operator has higher precedence than concatenation, and
 494 concatenation higher than alternation ('|').  This pattern therefore
 495 matches *either* the string "foo" *or* the string "ba" followed by
 496 zero-or-more r's.  To match "foo" or zero-or-more "bar"'s, use:
 497
 498      foo|(bar)*
 499
 500 and to match zero-or-more "foo"'s-or-"bar"'s:
 501
 502      (foo|bar)*
 503
 504    In addition to characters and ranges of characters, character
 505 classes can also contain character class "expressions".  These are
 506 expressions enclosed inside `[': and `:'] delimiters (which themselves
 507 must appear between the '[' and ']' of the character class; other
 508 elements may occur inside the character class, too).  The valid
 509 expressions are:
 510
 511      [:alnum:] [:alpha:] [:blank:]
 512      [:cntrl:] [:digit:] [:graph:]
 513      [:lower:] [:print:] [:punct:]
 514      [:space:] [:upper:] [:xdigit:]
 515
 516    These expressions all designate a set of characters equivalent to
 517 the corresponding standard C `isXXX' function.  For example,
 518 `[:alnum:]' designates those characters for which `isalnum()' returns
 519 true - i.e., any alphabetic or numeric.  Some systems don't provide
 520 `isblank()', so flex defines `[:blank:]' as a blank or a tab.
 521
 522    For example, the following character classes are all equivalent:
 523
 524      [[:alnum:]]
 525      [[:alpha:][:digit:]
 526      [[:alpha:]0-9]
 527      [a-zA-Z0-9]
 528
 529    If your scanner is case-insensitive (the `-i' flag), then
 530 `[:upper:]' and `[:lower:]' are equivalent to `[:alpha:]'.
 531
 532    Some notes on patterns:
 533
 534    - A negated character class such as the example "[^A-Z]" above *will
 535      match a newline* unless "\n" (or an equivalent escape sequence) is
 536      one of the characters explicitly present in the negated character
 537      class (e.g., "[^A-Z\n]").  This is unlike how many other regular
 538      expression tools treat negated character classes, but
 539      unfortunately the inconsistency is historically entrenched.
 540      Matching newlines means that a pattern like [^"]* can match the
 541      entire input unless there's another quote in the input.
 542
 543    - A rule can have at most one instance of trailing context (the '/'
 544      operator or the '$' operator).  The start condition, '^', and
 545      "<<EOF>>" patterns can only occur at the beginning of a pattern,
 546      and, as well as with '/' and '$', cannot be grouped inside
 547      parentheses.  A '^' which does not occur at the beginning of a
 548      rule or a '$' which does not occur at the end of a rule loses its
 549      special properties and is treated as a normal character.
 550
 551      The following are illegal:
 552
 553           foo/bar$
 554           <sc1>foo<sc2>bar
 555
 556      Note that the first of these, can be written "foo/bar\n".
 557
 558      The following will result in '$' or '^' being treated as a normal
 559      character:
 560
 561           foo|(bar$)
 562           foo|^bar
 563
 564      If what's wanted is a "foo" or a bar-followed-by-a-newline, the
 565      following could be used (the special '|' action is explained
 566      below):
 567
 568           foo      |
 569           bar$     /* action goes here */
 570
 571      A similar trick will work for matching a foo or a
 572      bar-at-the-beginning-of-a-line.
 573
 574 \x1f
 575 File: flex.info,  Node: Matching,  Next: Actions,  Prev: Patterns,  Up: Top
 576
 577 How the input is matched
 578 ========================
 579
 580    When the generated scanner is run, it analyzes its input looking for
 581 strings which match any of its patterns.  If it finds more than one
 582 match, it takes the one matching the most text (for trailing context
 583 rules, this includes the length of the trailing part, even though it
 584 will then be returned to the input).  If it finds two or more matches
 585 of the same length, the rule listed first in the `flex' input file is
 586 chosen.
 587
 588    Once the match is determined, the text corresponding to the match
 589 (called the TOKEN) is made available in the global character pointer
 590 `yytext', and its length in the global integer `yyleng'.  The ACTION
 591 corresponding to the matched pattern is then executed (a more detailed
 592 description of actions follows), and then the remaining input is
 593 scanned for another match.
 594
 595    If no match is found, then the "default rule" is executed: the next
 596 character in the input is considered matched and copied to the standard
 597 output.  Thus, the simplest legal `flex' input is:
 598
 599      %%
 600
 601    which generates a scanner that simply copies its input (one
 602 character at a time) to its output.
 603
 604    Note that `yytext' can be defined in two different ways: either as a
 605 character *pointer* or as a character *array*.  You can control which
 606 definition `flex' uses by including one of the special directives
 607 `%pointer' or `%array' in the first (definitions) section of your flex
 608 input.  The default is `%pointer', unless you use the `-l' lex
 609 compatibility option, in which case `yytext' will be an array.  The
 610 advantage of using `%pointer' is substantially faster scanning and no
 611 buffer overflow when matching very large tokens (unless you run out of
 612 dynamic memory).  The disadvantage is that you are restricted in how
 613 your actions can modify `yytext' (see the next section), and calls to
 614 the `unput()' function destroys the present contents of `yytext', which
 615 can be a considerable porting headache when moving between different
 616 `lex' versions.
 617
 618    The advantage of `%array' is that you can then modify `yytext' to
 619 your heart's content, and calls to `unput()' do not destroy `yytext'
 620 (see below).  Furthermore, existing `lex' programs sometimes access
 621 `yytext' externally using declarations of the form:
 622      extern char yytext[];
 623    This definition is erroneous when used with `%pointer', but correct
 624 for `%array'.
 625
 626    `%array' defines `yytext' to be an array of `YYLMAX' characters,
 627 which defaults to a fairly large value.  You can change the size by
 628 simply #define'ing `YYLMAX' to a different value in the first section
 629 of your `flex' input.  As mentioned above, with `%pointer' yytext grows
 630 dynamically to accommodate large tokens.  While this means your
 631 `%pointer' scanner can accommodate very large tokens (such as matching
 632 entire blocks of comments), bear in mind that each time the scanner
 633 must resize `yytext' it also must rescan the entire token from the
 634 beginning, so matching such tokens can prove slow.  `yytext' presently
 635 does *not* dynamically grow if a call to `unput()' results in too much
 636 text being pushed back; instead, a run-time error results.
 637
 638    Also note that you cannot use `%array' with C++ scanner classes (the
 639 `c++' option; see below).
 640
 641 \x1f
 642 File: flex.info,  Node: Actions,  Next: Generated scanner,  Prev: Matching,  Up: Top
 643
 644 Actions
 645 =======
 646
 647    Each pattern in a rule has a corresponding action, which can be any
 648 arbitrary C statement.  The pattern ends at the first non-escaped
 649 whitespace character; the remainder of the line is its action.  If the
 650 action is empty, then when the pattern is matched the input token is
 651 simply discarded.  For example, here is the specification for a program
 652 which deletes all occurrences of "zap me" from its input:
 653
 654      %%
 655      "zap me"
 656
 657    (It will copy all other characters in the input to the output since
 658 they will be matched by the default rule.)
 659
 660    Here is a program which compresses multiple blanks and tabs down to
 661 a single blank, and throws away whitespace found at the end of a line:
 662
 663      %%
 664      [ \t]+        putchar( ' ' );
 665      [ \t]+$       /* ignore this token */
 666
 667    If the action contains a '{', then the action spans till the
 668 balancing '}' is found, and the action may cross multiple lines.
 669 `flex' knows about C strings and comments and won't be fooled by braces
 670 found within them, but also allows actions to begin with `%{' and will
 671 consider the action to be all the text up to the next `%}' (regardless
 672 of ordinary braces inside the action).
 673
 674    An action consisting solely of a vertical bar ('|') means "same as
 675 the action for the next rule." See below for an illustration.
 676
 677    Actions can include arbitrary C code, including `return' statements
 678 to return a value to whatever routine called `yylex()'.  Each time
 679 `yylex()' is called it continues processing tokens from where it last
 680 left off until it either reaches the end of the file or executes a
 681 return.
 682
 683    Actions are free to modify `yytext' except for lengthening it
 684 (adding characters to its end-these will overwrite later characters in
 685 the input stream).  This however does not apply when using `%array'
 686 (see above); in that case, `yytext' may be freely modified in any way.
 687
 688    Actions are free to modify `yyleng' except they should not do so if
 689 the action also includes use of `yymore()' (see below).
 690
 691    There are a number of special directives which can be included
 692 within an action:
 693
 694    - `ECHO' copies yytext to the scanner's output.
 695
 696    - `BEGIN' followed by the name of a start condition places the
 697      scanner in the corresponding start condition (see below).
 698
 699    - `REJECT' directs the scanner to proceed on to the "second best"
 700      rule which matched the input (or a prefix of the input).  The rule
 701      is chosen as described above in "How the Input is Matched", and
 702      `yytext' and `yyleng' set up appropriately.  It may either be one
 703      which matched as much text as the originally chosen rule but came
 704      later in the `flex' input file, or one which matched less text.
 705      For example, the following will both count the words in the input
 706      and call the routine special() whenever "frob" is seen:
 707
 708                   int word_count = 0;
 709           %%
 710
 711           frob        special(); REJECT;
 712           [^ \t\n]+   ++word_count;
 713
 714      Without the `REJECT', any "frob"'s in the input would not be
 715      counted as words, since the scanner normally executes only one
 716      action per token.  Multiple `REJECT's' are allowed, each one
 717      finding the next best choice to the currently active rule.  For
 718      example, when the following scanner scans the token "abcd", it
 719      will write "abcdabcaba" to the output:
 720
 721           %%
 722           a        |
 723           ab       |
 724           abc      |
 725           abcd     ECHO; REJECT;
 726           .|\n     /* eat up any unmatched character */
 727
 728      (The first three rules share the fourth's action since they use
 729      the special '|' action.)  `REJECT' is a particularly expensive
 730      feature in terms of scanner performance; if it is used in *any* of
 731      the scanner's actions it will slow down *all* of the scanner's
 732      matching.  Furthermore, `REJECT' cannot be used with the `-Cf' or
 733      `-CF' options (see below).
 734
 735      Note also that unlike the other special actions, `REJECT' is a
 736      *branch*; code immediately following it in the action will *not*
 737      be executed.
 738
 739    - `yymore()' tells the scanner that the next time it matches a rule,
 740      the corresponding token should be *appended* onto the current
 741      value of `yytext' rather than replacing it.  For example, given
 742      the input "mega-kludge" the following will write
 743      "mega-mega-kludge" to the output:
 744
 745           %%
 746           mega-    ECHO; yymore();
 747           kludge   ECHO;
 748
 749      First "mega-" is matched and echoed to the output.  Then "kludge"
 750      is matched, but the previous "mega-" is still hanging around at
 751      the beginning of `yytext' so the `ECHO' for the "kludge" rule will
 752      actually write "mega-kludge".
 753
 754    Two notes regarding use of `yymore()'.  First, `yymore()' depends on
 755 the value of `yyleng' correctly reflecting the size of the current
 756 token, so you must not modify `yyleng' if you are using `yymore()'.
 757 Second, the presence of `yymore()' in the scanner's action entails a
 758 minor performance penalty in the scanner's matching speed.
 759
 760    - `yyless(n)' returns all but the first N characters of the current
 761      token back to the input stream, where they will be rescanned when
 762      the scanner looks for the next match.  `yytext' and `yyleng' are
 763      adjusted appropriately (e.g., `yyleng' will now be equal to N ).
 764      For example, on the input "foobar" the following will write out
 765      "foobarbar":
 766
 767           %%
 768           foobar    ECHO; yyless(3);
 769           [a-z]+    ECHO;
 770
 771      An argument of 0 to `yyless' will cause the entire current input
 772      string to be scanned again.  Unless you've changed how the scanner
 773      will subsequently process its input (using `BEGIN', for example),
 774      this will result in an endless loop.
 775
 776      Note that `yyless' is a macro and can only be used in the flex
 777      input file, not from other source files.
 778
 779    - `unput(c)' puts the character `c' back onto the input stream.  It
 780      will be the next character scanned.  The following action will
 781      take the current token and cause it to be rescanned enclosed in
 782      parentheses.
 783
 784           {
 785           int i;
 786           /* Copy yytext because unput() trashes yytext */
 787           char *yycopy = strdup( yytext );
 788           unput( ')' );
 789           for ( i = yyleng - 1; i >= 0; --i )
 790               unput( yycopy[i] );
 791           unput( '(' );
 792           free( yycopy );
 793           }
 794
 795      Note that since each `unput()' puts the given character back at
 796      the *beginning* of the input stream, pushing back strings must be
 797      done back-to-front.  An important potential problem when using
 798      `unput()' is that if you are using `%pointer' (the default), a
 799      call to `unput()' *destroys* the contents of `yytext', starting
 800      with its rightmost character and devouring one character to the
 801      left with each call.  If you need the value of yytext preserved
 802      after a call to `unput()' (as in the above example), you must
 803      either first copy it elsewhere, or build your scanner using
 804      `%array' instead (see How The Input Is Matched).
 805
 806      Finally, note that you cannot put back `EOF' to attempt to mark
 807      the input stream with an end-of-file.
 808
 809    - `input()' reads the next character from the input stream.  For
 810      example, the following is one way to eat up C comments:
 811
 812           %%
 813           "/*"        {
 814                       register int c;
 815
 816                       for ( ; ; )
 817                           {
 818                           while ( (c = input()) != '*' &&
 819                                   c != EOF )
 820                               ;    /* eat up text of comment */
 821
 822                           if ( c == '*' )
 823                               {
 824                               while ( (c = input()) == '*' )
 825                                   ;
 826                               if ( c == '/' )
 827                                   break;    /* found the end */
 828                               }
 829
 830                           if ( c == EOF )
 831                               {
 832                               error( "EOF in comment" );
 833                               break;
 834                               }
 835                           }
 836                       }
 837
 838      (Note that if the scanner is compiled using `C++', then `input()'
 839      is instead referred to as `yyinput()', in order to avoid a name
 840      clash with the `C++' stream by the name of `input'.)
 841
 842    - YY_FLUSH_BUFFER flushes the scanner's internal buffer so that the
 843      next time the scanner attempts to match a token, it will first
 844      refill the buffer using `YY_INPUT' (see The Generated Scanner,
 845      below).  This action is a special case of the more general
 846      `yy_flush_buffer()' function, described below in the section
 847      Multiple Input Buffers.
 848
 849    - `yyterminate()' can be used in lieu of a return statement in an
 850      action.  It terminates the scanner and returns a 0 to the
 851      scanner's caller, indicating "all done".  By default,
 852      `yyterminate()' is also called when an end-of-file is encountered.
 853      It is a macro and may be redefined.
 854
 855 \x1f
 856 File: flex.info,  Node: Generated scanner,  Next: Start conditions,  Prev: Actions,  Up: Top
 857
 858 The generated scanner
 859 =====================
 860
 861    The output of `flex' is the file `lex.yy.c', which contains the
 862 scanning routine `yylex()', a number of tables used by it for matching
 863 tokens, and a number of auxiliary routines and macros.  By default,
 864 `yylex()' is declared as follows:
 865
 866      int yylex()
 867          {
 868          ... various definitions and the actions in here ...
 869          }
 870
 871    (If your environment supports function prototypes, then it will be
 872 "int yylex( void  )".)   This  definition  may  be changed by defining
 873 the "YY_DECL" macro.  For example, you could use:
 874
 875      #define YY_DECL float lexscan( a, b ) float a, b;
 876
 877    to give the scanning routine the name `lexscan', returning a float,
 878 and taking two floats as arguments.  Note that if you give arguments to
 879 the scanning routine using a K&R-style/non-prototyped function
 880 declaration, you must terminate the definition with a semi-colon (`;').
 881
 882    Whenever `yylex()' is called, it scans tokens from the global input
 883 file `yyin' (which defaults to stdin).  It continues until it either
 884 reaches an end-of-file (at which point it returns the value 0) or one
 885 of its actions executes a `return' statement.
 886
 887    If the scanner reaches an end-of-file, subsequent calls are undefined
 888 unless either `yyin' is pointed at a new input file (in which case
 889 scanning continues from that file), or `yyrestart()' is called.
 890 `yyrestart()' takes one argument, a `FILE *' pointer (which can be nil,
 891 if you've set up `YY_INPUT' to scan from a source other than `yyin'),
 892 and initializes `yyin' for scanning from that file.  Essentially there
 893 is no difference between just assigning `yyin' to a new input file or
 894 using `yyrestart()' to do so; the latter is available for compatibility
 895 with previous versions of `flex', and because it can be used to switch
 896 input files in the middle of scanning.  It can also be used to throw
 897 away the current input buffer, by calling it with an argument of
 898 `yyin'; but better is to use `YY_FLUSH_BUFFER' (see above).  Note that
 899 `yyrestart()' does *not* reset the start condition to `INITIAL' (see
 900 Start Conditions, below).
 901
 902    If `yylex()' stops scanning due to executing a `return' statement in
 903 one of the actions, the scanner may then be called again and it will
 904 resume scanning where it left off.
 905
 906    By default (and for purposes of efficiency), the scanner uses
 907 block-reads rather than simple `getc()' calls to read characters from
 908 `yyin'.  The nature of how it gets its input can be controlled by
 909 defining the `YY_INPUT' macro.  YY_INPUT's calling sequence is
 910 "YY_INPUT(buf,result,max_size)".  Its action is to place up to MAX_SIZE
 911 characters in the character array BUF and return in the integer
 912 variable RESULT either the number of characters read or the constant
 913 YY_NULL (0 on Unix systems) to indicate EOF.  The default YY_INPUT
 914 reads from the global file-pointer "yyin".
 915
 916    A sample definition of YY_INPUT (in the definitions section of the
 917 input file):
 918
 919      %{
 920      #define YY_INPUT(buf,result,max_size) \
 921          { \
 922          int c = getchar(); \
 923          result = (c == EOF) ? YY_NULL : (buf[0] = c, 1); \
 924          }
 925      %}
 926
 927    This definition will change the input processing to occur one
 928 character at a time.
 929
 930    When the scanner receives an end-of-file indication from YY_INPUT,
 931 it then checks the `yywrap()' function.  If `yywrap()' returns false
 932 (zero), then it is assumed that the function has gone ahead and set up
 933 `yyin' to point to another input file, and scanning continues.  If it
 934 returns true (non-zero), then the scanner terminates, returning 0 to
 935 its caller.  Note that in either case, the start condition remains
 936 unchanged; it does *not* revert to `INITIAL'.
 937
 938    If you do not supply your own version of `yywrap()', then you must
 939 either use `%option noyywrap' (in which case the scanner behaves as
 940 though `yywrap()' returned 1), or you must link with `-lfl' to obtain
 941 the default version of the routine, which always returns 1.
 942
 943    Three routines are available for scanning from in-memory buffers
 944 rather than files: `yy_scan_string()', `yy_scan_bytes()', and
 945 `yy_scan_buffer()'.  See the discussion of them below in the section
 946 Multiple Input Buffers.
 947
 948    The scanner writes its `ECHO' output to the `yyout' global (default,
 949 stdout), which may be redefined by the user simply by assigning it to
 950 some other `FILE' pointer.
 951
 952 \x1f
 953 File: flex.info,  Node: Start conditions,  Next: Multiple buffers,  Prev: Generated scanner,  Up: Top
 954
 955 Start conditions
 956 ================
 957
 958    `flex' provides a mechanism for conditionally activating rules.  Any
 959 rule whose pattern is prefixed with "<sc>" will only be active when the
 960 scanner is in the start condition named "sc".  For example,
 961
 962      <STRING>[^"]*        { /* eat up the string body ... */
 963                  ...
 964                  }
 965
 966 will be active only when the scanner is in the "STRING" start
 967 condition, and
 968
 969      <INITIAL,STRING,QUOTE>\.        { /* handle an escape ... */
 970                  ...
 971                  }
 972
 973 will be active only when the current start condition is either
 974 "INITIAL", "STRING", or "QUOTE".
 975
 976    Start conditions are declared in the definitions (first) section of
 977 the input using unindented lines beginning with either `%s' or `%x'
 978 followed by a list of names.  The former declares *inclusive* start
 979 conditions, the latter *exclusive* start conditions.  A start condition
 980 is activated using the `BEGIN' action.  Until the next `BEGIN' action is
 981 executed, rules with the given start condition will be active and rules
 982 with other start conditions will be inactive.  If the start condition
 983 is *inclusive*, then rules with no start conditions at all will also be
 984 active.  If it is *exclusive*, then *only* rules qualified with the
 985 start condition will be active.  A set of rules contingent on the same
 986 exclusive start condition describe a scanner which is independent of
 987 any of the other rules in the `flex' input.  Because of this, exclusive
 988 start conditions make it easy to specify "mini-scanners" which scan
 989 portions of the input that are syntactically different from the rest
 990 (e.g., comments).
 991
 992    If the distinction between inclusive and exclusive start conditions
 993 is still a little vague, here's a simple example illustrating the
 994 connection between the two.  The set of rules:
 995
 996      %s example
 997      %%
 998
 999      <example>foo   do_something();
1000
1001      bar            something_else();
1002
1003 is equivalent to
1004
1005      %x example
1006      %%
1007
1008      <example>foo   do_something();
1009
1010      <INITIAL,example>bar    something_else();
1011
1012    Without the `<INITIAL,example>' qualifier, the `bar' pattern in the
1013 second example wouldn't be active (i.e., couldn't match) when in start
1014 condition `example'.  If we just used `<example>' to qualify `bar',
1015 though, then it would only be active in `example' and not in `INITIAL',
1016 while in the first example it's active in both, because in the first
1017 example the `example' starting condition is an *inclusive* (`%s') start
1018 condition.
1019
1020    Also note that the special start-condition specifier `<*>' matches
1021 every start condition.  Thus, the above example could also have been
1022 written;
1023
1024      %x example
1025      %%
1026
1027      <example>foo   do_something();
1028
1029      <*>bar    something_else();
1030
1031    The default rule (to `ECHO' any unmatched character) remains active
1032 in start conditions.  It is equivalent to:
1033
1034      <*>.|\\n     ECHO;
1035
1036    `BEGIN(0)' returns to the original state where only the rules with
1037 no start conditions are active.  This state can also be referred to as
1038 the start-condition "INITIAL", so `BEGIN(INITIAL)' is equivalent to
1039 `BEGIN(0)'.  (The parentheses around the start condition name are not
1040 required but are considered good style.)
1041
1042    `BEGIN' actions can also be given as indented code at the beginning
1043 of the rules section.  For example, the following will cause the
1044 scanner to enter the "SPECIAL" start condition whenever `yylex()' is
1045 called and the global variable `enter_special' is true:
1046
1047              int enter_special;
1048
1049      %x SPECIAL
1050      %%
1051              if ( enter_special )
1052                  BEGIN(SPECIAL);
1053
1054      <SPECIAL>blahblahblah
1055      ...more rules follow...
1056
1057    To illustrate the uses of start conditions, here is a scanner which
1058 provides two different interpretations of a string like "123.456".  By
1059 default it will treat it as as three tokens, the integer "123", a dot
1060 ('.'), and the integer "456".  But if the string is preceded earlier in
1061 the line by the string "expect-floats" it will treat it as a single
1062 token, the floating-point number 123.456:
1063
1064      %{
1065      #include <math.h>
1066      %}
1067      %s expect
1068
1069      %%
1070      expect-floats        BEGIN(expect);
1071
1072      <expect>[0-9]+"."[0-9]+      {
1073                  printf( "found a float, = %f\n",
1074                          atof( yytext ) );
1075                  }
1076      <expect>\n           {
1077                  /* that's the end of the line, so
1078                   * we need another "expect-number"
1079                   * before we'll recognize any more
1080                   * numbers
1081                   */
1082                  BEGIN(INITIAL);
1083                  }
1084
1085      [0-9]+      {
1086
1087      Version 2.5               December 1994                        18
1088
1089                  printf( "found an integer, = %d\n",
1090                          atoi( yytext ) );
1091                  }
1092
1093      "."         printf( "found a dot\n" );
1094
1095    Here is a scanner which recognizes (and discards) C comments while
1096 maintaining a count of the current input line.
1097
1098      %x comment
1099      %%
1100              int line_num = 1;
1101
1102      "/*"         BEGIN(comment);
1103
1104      <comment>[^*\n]*        /* eat anything that's not a '*' */
1105      <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1106      <comment>\n             ++line_num;
1107      <comment>"*"+"/"        BEGIN(INITIAL);
1108
1109    This scanner goes to a bit of trouble to match as much text as
1110 possible with each rule.  In general, when attempting to write a
1111 high-speed scanner try to match as much possible in each rule, as it's
1112 a big win.
1113
1114    Note that start-conditions names are really integer values and can
1115 be stored as such.  Thus, the above could be extended in the following
1116 fashion:
1117
1118      %x comment foo
1119      %%
1120              int line_num = 1;
1121              int comment_caller;
1122
1123      "/*"         {
1124                   comment_caller = INITIAL;
1125                   BEGIN(comment);
1126                   }
1127
1128      ...
1129
1130      <foo>"/*"    {
1131                   comment_caller = foo;
1132                   BEGIN(comment);
1133                   }
1134
1135      <comment>[^*\n]*        /* eat anything that's not a '*' */
1136      <comment>"*"+[^*/\n]*   /* eat up '*'s not followed by '/'s */
1137      <comment>\n             ++line_num;
1138      <comment>"*"+"/"        BEGIN(comment_caller);
1139
1140    Furthermore, you can access the current start condition using the
1141 integer-valued `YY_START' macro.  For example, the above assignments to
1142 `comment_caller' could instead be written
1143
1144      comment_caller = YY_START;
1145
1146    Flex provides `YYSTATE' as an alias for `YY_START' (since that is
1147 what's used by AT&T `lex').
1148
1149    Note that start conditions do not have their own name-space; %s's
1150 and %x's declare names in the same fashion as #define's.
1151
1152    Finally, here's an example of how to match C-style quoted strings
1153 using exclusive start conditions, including expanded escape sequences
1154 (but not including checking for a string that's too long):
1155
1156      %x str
1157
1158      %%
1159              char string_buf[MAX_STR_CONST];
1160              char *string_buf_ptr;
1161
1162      \"      string_buf_ptr = string_buf; BEGIN(str);
1163
1164      <str>\"        { /* saw closing quote - all done */
1165              BEGIN(INITIAL);
1166              *string_buf_ptr = '\0';
1167              /* return string constant token type and
1168               * value to parser
1169               */
1170              }
1171
1172      <str>\n        {
1173              /* error - unterminated string constant */
1174              /* generate error message */
1175              }
1176
1177      <str>\\[0-7]{1,3} {
1178              /* octal escape sequence */
1179              int result;
1180
1181              (void) sscanf( yytext + 1, "%o", &result );
1182
1183              if ( result > 0xff )
1184                      /* error, constant is out-of-bounds */
1185
1186              *string_buf_ptr++ = result;
1187              }
1188
1189      <str>\\[0-9]+ {
1190              /* generate error - bad escape sequence; something
1191               * like '\48' or '\0777777'
1192               */
1193              }
1194
1195      <str>\\n  *string_buf_ptr++ = '\n';
1196      <str>\\t  *string_buf_ptr++ = '\t';
1197      <str>\\r  *string_buf_ptr++ = '\r';
1198      <str>\\b  *string_buf_ptr++ = '\b';
1199      <str>\\f  *string_buf_ptr++ = '\f';
1200
1201      <str>\\(.|\n)  *string_buf_ptr++ = yytext[1];
1202
1203      <str>[^\\\n\"]+        {
1204              char *yptr = yytext;
1205
1206              while ( *yptr )
1207                      *string_buf_ptr++ = *yptr++;
1208              }
1209
1210    Often, such as in some of the examples above, you wind up writing a
1211 whole bunch of rules all preceded by the same start condition(s).  Flex
1212 makes this a little easier and cleaner by introducing a notion of start
1213 condition "scope".  A start condition scope is begun with:
1214
1215      <SCs>{
1216
1217 where SCs is a list of one or more start conditions.  Inside the start
1218 condition scope, every rule automatically has the prefix `<SCs>'
1219 applied to it, until a `}' which matches the initial `{'.  So, for
1220 example,
1221
1222      <ESC>{
1223          "\\n"   return '\n';
1224          "\\r"   return '\r';
1225          "\\f"   return '\f';
1226          "\\0"   return '\0';
1227      }
1228
1229 is equivalent to:
1230
1231      <ESC>"\\n"  return '\n';
1232      <ESC>"\\r"  return '\r';
1233      <ESC>"\\f"  return '\f';
1234      <ESC>"\\0"  return '\0';
1235
1236    Start condition scopes may be nested.
1237
1238    Three routines are available for manipulating stacks of start
1239 conditions:
1240
1241 `void yy_push_state(int new_state)'
1242      pushes the current start condition onto the top of the start
1243      condition stack and switches to NEW_STATE as though you had used
1244      `BEGIN new_state' (recall that start condition names are also
1245      integers).
1246
1247 `void yy_pop_state()'
1248      pops the top of the stack and switches to it via `BEGIN'.
1249
1250 `int yy_top_state()'
1251      returns the top of the stack without altering the stack's contents.
1252
1253    The start condition stack grows dynamically and so has no built-in
1254 size limitation.  If memory is exhausted, program execution aborts.
1255
1256    To use start condition stacks, your scanner must include a `%option
1257 stack' directive (see Options below).
1258
1259 \x1f
1260 File: flex.info,  Node: Multiple buffers,  Next: End-of-file rules,  Prev: Start conditions,  Up: Top
1261
1262 Multiple input buffers
1263 ======================
1264
1265    Some scanners (such as those which support "include" files) require
1266 reading from several input streams.  As `flex' scanners do a large
1267 amount of buffering, one cannot control where the next input will be
1268 read from by simply writing a `YY_INPUT' which is sensitive to the
1269 scanning context.  `YY_INPUT' is only called when the scanner reaches
1270 the end of its buffer, which may be a long time after scanning a
1271 statement such as an "include" which requires switching the input
1272 source.
1273
1274    To negotiate these sorts of problems, `flex' provides a mechanism
1275 for creating and switching between multiple input buffers.  An input
1276 buffer is created by using:
1277
1278      YY_BUFFER_STATE yy_create_buffer( FILE *file, int size )
1279
1280 which takes a `FILE' pointer and a size and creates a buffer associated
1281 with the given file and large enough to hold SIZE characters (when in
1282 doubt, use `YY_BUF_SIZE' for the size).  It returns a `YY_BUFFER_STATE'
1283 handle, which may then be passed to other routines (see below).  The
1284 `YY_BUFFER_STATE' type is a pointer to an opaque `struct'
1285 `yy_buffer_state' structure, so you may safely initialize
1286 YY_BUFFER_STATE variables to `((YY_BUFFER_STATE) 0)' if you wish, and
1287 also refer to the opaque structure in order to correctly declare input
1288 buffers in source files other than that of your scanner.  Note that the
1289 `FILE' pointer in the call to `yy_create_buffer' is only used as the
1290 value of `yyin' seen by `YY_INPUT'; if you redefine `YY_INPUT' so it no
1291 longer uses `yyin', then you can safely pass a nil `FILE' pointer to
1292 `yy_create_buffer'.  You select a particular buffer to scan from using:
1293
1294      void yy_switch_to_buffer( YY_BUFFER_STATE new_buffer )
1295
1296    switches the scanner's input buffer so subsequent tokens will come
1297 from NEW_BUFFER.  Note that `yy_switch_to_buffer()' may be used by
1298 `yywrap()' to set things up for continued scanning, instead of opening
1299 a new file and pointing `yyin' at it.  Note also that switching input
1300 sources via either `yy_switch_to_buffer()' or `yywrap()' does *not*
1301 change the start condition.
1302
1303      void yy_delete_buffer( YY_BUFFER_STATE buffer )
1304
1305 is used to reclaim the storage associated with a buffer.  You can also
1306 clear the current contents of a buffer using:
1307
1308      void yy_flush_buffer( YY_BUFFER_STATE buffer )
1309
1310    This function discards the buffer's contents, so the next time the
1311 scanner attempts to match a token from the buffer, it will first fill
1312 the buffer anew using `YY_INPUT'.
1313
1314    `yy_new_buffer()' is an alias for `yy_create_buffer()', provided for
1315 compatibility with the C++ use of `new' and `delete' for creating and
1316 destroying dynamic objects.
1317
1318    Finally, the `YY_CURRENT_BUFFER' macro returns a `YY_BUFFER_STATE'
1319 handle to the current buffer.
1320
1321    Here is an example of using these features for writing a scanner
1322 which expands include files (the `<<EOF>>' feature is discussed below):
1323
1324      /* the "incl" state is used for picking up the name
1325       * of an include file
1326       */
1327      %x incl
1328
1329      %{
1330      #define MAX_INCLUDE_DEPTH 10
1331      YY_BUFFER_STATE include_stack[MAX_INCLUDE_DEPTH];
1332      int include_stack_ptr = 0;
1333      %}
1334
1335      %%
1336      include             BEGIN(incl);
1337
1338      [a-z]+              ECHO;
1339      [^a-z\n]*\n?        ECHO;
1340
1341      <incl>[ \t]*      /* eat the whitespace */
1342      <incl>[^ \t\n]+   { /* got the include file name */
1343              if ( include_stack_ptr >= MAX_INCLUDE_DEPTH )
1344                  {
1345                  fprintf( stderr, "Includes nested too deeply" );
1346                  exit( 1 );
1347                  }
1348
1349              include_stack[include_stack_ptr++] =
1350                  YY_CURRENT_BUFFER;
1351
1352              yyin = fopen( yytext, "r" );
1353
1354              if ( ! yyin )
1355                  error( ... );
1356
1357              yy_switch_to_buffer(
1358                  yy_create_buffer( yyin, YY_BUF_SIZE ) );
1359
1360              BEGIN(INITIAL);
1361              }
1362
1363      <<EOF>> {
1364              if ( --include_stack_ptr < 0 )
1365                  {
1366                  yyterminate();
1367                  }
1368
1369              else
1370                  {
1371                  yy_delete_buffer( YY_CURRENT_BUFFER );
1372                  yy_switch_to_buffer(
1373                       include_stack[include_stack_ptr] );
1374                  }
1375              }
1376
1377    Three routines are available for setting up input buffers for
1378 scanning in-memory strings instead of files.  All of them create a new
1379 input buffer for scanning the string, and return a corresponding
1380 `YY_BUFFER_STATE' handle (which you should delete with
1381 `yy_delete_buffer()' when done with it).  They also switch to the new
1382 buffer using `yy_switch_to_buffer()', so the next call to `yylex()' will
1383 start scanning the string.
1384
1385 `yy_scan_string(const char *str)'
1386      scans a NUL-terminated string.
1387
1388 `yy_scan_bytes(const char *bytes, int len)'
1389      scans `len' bytes (including possibly NUL's) starting at location
1390      BYTES.
1391
1392    Note that both of these functions create and scan a *copy* of the
1393 string or bytes.  (This may be desirable, since `yylex()' modifies the
1394 contents of the buffer it is scanning.) You can avoid the copy by using:
1395
1396 `yy_scan_buffer(char *base, yy_size_t size)'
1397      which scans in place the buffer starting at BASE, consisting of
1398      SIZE bytes, the last two bytes of which *must* be
1399      `YY_END_OF_BUFFER_CHAR' (ASCII NUL).  These last two bytes are not
1400      scanned; thus, scanning consists of `base[0]' through
1401      `base[size-2]', inclusive.
1402
1403      If you fail to set up BASE in this manner (i.e., forget the final
1404      two `YY_END_OF_BUFFER_CHAR' bytes), then `yy_scan_buffer()'
1405      returns a nil pointer instead of creating a new input buffer.
1406
1407      The type `yy_size_t' is an integral type to which you can cast an
1408      integer expression reflecting the size of the buffer.
1409
1410 \x1f
1411 File: flex.info,  Node: End-of-file rules,  Next: Miscellaneous,  Prev: Multiple buffers,  Up: Top
1412
1413 End-of-file rules
1414 =================
1415
1416    The special rule "<<EOF>>" indicates actions which are to be taken
1417 when an end-of-file is encountered and yywrap() returns non-zero (i.e.,
1418 indicates no further files to process).  The action must finish by
1419 doing one of four things:
1420
1421    - assigning `yyin' to a new input file (in previous versions of
1422      flex, after doing the assignment you had to call the special
1423      action `YY_NEW_FILE'; this is no longer necessary);
1424
1425    - executing a `return' statement;
1426
1427    - executing the special `yyterminate()' action;
1428
1429    - or, switching to a new buffer using `yy_switch_to_buffer()' as
1430      shown in the example above.
1431
1432    <<EOF>> rules may not be used with other patterns; they may only be
1433 qualified with a list of start conditions.  If an unqualified <<EOF>>
1434 rule is given, it applies to *all* start conditions which do not
1435 already have <<EOF>> actions.  To specify an <<EOF>> rule for only the
1436 initial start condition, use
1437
1438      <INITIAL><<EOF>>
1439
1440    These rules are useful for catching things like unclosed comments.
1441 An example:
1442
1443      %x quote
1444      %%
1445
1446      ...other rules for dealing with quotes...
1447
1448      <quote><<EOF>>   {
1449               error( "unterminated quote" );
1450               yyterminate();
1451               }
1452      <<EOF>>  {
1453               if ( *++filelist )
1454                   yyin = fopen( *filelist, "r" );
1455               else
1456                  yyterminate();
1457               }
1458
1459 \x1f
1460 File: flex.info,  Node: Miscellaneous,  Next: User variables,  Prev: End-of-file rules,  Up: Top
1461
1462 Miscellaneous macros
1463 ====================
1464
1465    The macro `YY_USER_ACTION' can be defined to provide an action which
1466 is always executed prior to the matched rule's action.  For example, it
1467 could be #define'd to call a routine to convert yytext to lower-case.
1468 When `YY_USER_ACTION' is invoked, the variable `yy_act' gives the
1469 number of the matched rule (rules are numbered starting with 1).
1470 Suppose you want to profile how often each of your rules is matched.
1471 The following would do the trick:
1472
1473      #define YY_USER_ACTION ++ctr[yy_act]
1474
1475    where `ctr' is an array to hold the counts for the different rules.
1476 Note that the macro `YY_NUM_RULES' gives the total number of rules
1477 (including the default rule, even if you use `-s', so a correct
1478 declaration for `ctr' is:
1479
1480      int ctr[YY_NUM_RULES];
1481
1482    The macro `YY_USER_INIT' may be defined to provide an action which
1483 is always executed before the first scan (and before the scanner's
1484 internal initializations are done).  For example, it could be used to
1485 call a routine to read in a data table or open a logging file.
1486
1487    The macro `yy_set_interactive(is_interactive)' can be used to
1488 control whether the current buffer is considered *interactive*.  An
1489 interactive buffer is processed more slowly, but must be used when the
1490 scanner's input source is indeed interactive to avoid problems due to
1491 waiting to fill buffers (see the discussion of the `-I' flag below).  A
1492 non-zero value in the macro invocation marks the buffer as interactive,
1493 a zero value as non-interactive.  Note that use of this macro overrides
1494 `%option always-interactive' or `%option never-interactive' (see
1495 Options below).  `yy_set_interactive()' must be invoked prior to
1496 beginning to scan the buffer that is (or is not) to be considered
1497 interactive.
1498
1499    The macro `yy_set_bol(at_bol)' can be used to control whether the
1500 current buffer's scanning context for the next token match is done as
1501 though at the beginning of a line.  A non-zero macro argument makes
1502 rules anchored with
1503
1504    The macro `YY_AT_BOL()' returns true if the next token scanned from
1505 the current buffer will have '^' rules active, false otherwise.
1506
1507    In the generated scanner, the actions are all gathered in one large
1508 switch statement and separated using `YY_BREAK', which may be
1509 redefined.  By default, it is simply a "break", to separate each rule's
1510 action from the following rule's.  Redefining `YY_BREAK' allows, for
1511 example, C++ users to #define YY_BREAK to do nothing (while being very
1512 careful that every rule ends with a "break" or a "return"!) to avoid
1513 suffering from unreachable statement warnings where because a rule's
1514 action ends with "return", the `YY_BREAK' is inaccessible.
1515
1516 \x1f
1517 File: flex.info,  Node: User variables,  Next: YACC interface,  Prev: Miscellaneous,  Up: Top
1518
1519 Values available to the user
1520 ============================
1521
1522    This section summarizes the various values available to the user in
1523 the rule actions.
1524
1525    - `char *yytext' holds the text of the current token.  It may be
1526      modified but not lengthened (you cannot append characters to the
1527      end).
1528
1529      If the special directive `%array' appears in the first section of
1530      the scanner description, then `yytext' is instead declared `char
1531      yytext[YYLMAX]', where `YYLMAX' is a macro definition that you can
1532      redefine in the first section if you don't like the default value
1533      (generally 8KB).  Using `%array' results in somewhat slower
1534      scanners, but the value of `yytext' becomes immune to calls to
1535      `input()' and `unput()', which potentially destroy its value when
1536      `yytext' is a character pointer.  The opposite of `%array' is
1537      `%pointer', which is the default.
1538
1539      You cannot use `%array' when generating C++ scanner classes (the
1540      `-+' flag).
1541
1542    - `int yyleng' holds the length of the current token.
1543
1544    - `FILE *yyin' is the file which by default `flex' reads from.  It
1545      may be redefined but doing so only makes sense before scanning
1546      begins or after an EOF has been encountered.  Changing it in the
1547      midst of scanning will have unexpected results since `flex'
1548      buffers its input; use `yyrestart()' instead.  Once scanning
1549      terminates because an end-of-file has been seen, you can assign
1550      `yyin' at the new input file and then call the scanner again to
1551      continue scanning.
1552
1553    - `void yyrestart( FILE *new_file )' may be called to point `yyin'
1554      at the new input file.  The switch-over to the new file is
1555      immediate (any previously buffered-up input is lost).  Note that
1556      calling `yyrestart()' with `yyin' as an argument thus throws away
1557      the current input buffer and continues scanning the same input
1558      file.
1559
1560    - `FILE *yyout' is the file to which `ECHO' actions are done.  It
1561      can be reassigned by the user.
1562
1563    - `YY_CURRENT_BUFFER' returns a `YY_BUFFER_STATE' handle to the
1564      current buffer.
1565
1566    - `YY_START' returns an integer value corresponding to the current
1567      start condition.  You can subsequently use this value with `BEGIN'
1568      to return to that start condition.
1569
1570 \x1f
1571 File: flex.info,  Node: YACC interface,  Next: Options,  Prev: User variables,  Up: Top
1572
1573 Interfacing with `yacc'
1574 =======================
1575
1576    One of the main uses of `flex' is as a companion to the `yacc'
1577 parser-generator.  `yacc' parsers expect to call a routine named
1578 `yylex()' to find the next input token.  The routine is supposed to
1579 return the type of the next token as well as putting any associated
1580 value in the global `yylval'.  To use `flex' with `yacc', one specifies
1581 the `-d' option to `yacc' to instruct it to generate the file `y.tab.h'
1582 containing definitions of all the `%tokens' appearing in the `yacc'
1583 input.  This file is then included in the `flex' scanner.  For example,
1584 if one of the tokens is "TOK_NUMBER", part of the scanner might look
1585 like:
1586
1587      %{
1588      #include "y.tab.h"
1589      %}
1590
1591      %%
1592
1593      [0-9]+        yylval = atoi( yytext ); return TOK_NUMBER;
1594
1595 \x1f
1596 File: flex.info,  Node: Options,  Next: Performance,  Prev: YACC interface,  Up: Top
1597
1598 Options
1599 =======
1600
1601    `flex' has the following options:
1602
1603 `-b'
1604      Generate backing-up information to `lex.backup'.  This is a list
1605      of scanner states which require backing up and the input
1606      characters on which they do so.  By adding rules one can remove
1607      backing-up states.  If *all* backing-up states are eliminated and
1608      `-Cf' or `-CF' is used, the generated scanner will run faster (see
1609      the `-p' flag).  Only users who wish to squeeze every last cycle
1610      out of their scanners need worry about this option.  (See the
1611      section on Performance Considerations below.)
1612
1613 `-c'
1614      is a do-nothing, deprecated option included for POSIX compliance.
1615
1616 `-d'
1617      makes the generated scanner run in "debug" mode.  Whenever a
1618      pattern is recognized and the global `yy_flex_debug' is non-zero
1619      (which is the default), the scanner will write to `stderr' a line
1620      of the form:
1621
1622           --accepting rule at line 53 ("the matched text")
1623
1624      The line number refers to the location of the rule in the file
1625      defining the scanner (i.e., the file that was fed to flex).
1626      Messages are also generated when the scanner backs up, accepts the
1627      default rule, reaches the end of its input buffer (or encounters a
1628      NUL; at this point, the two look the same as far as the scanner's
1629      concerned), or reaches an end-of-file.
1630
1631 `-f'
1632      specifies "fast scanner".  No table compression is done and stdio
1633      is bypassed.  The result is large but fast.  This option is
1634      equivalent to `-Cfr' (see below).
1635
1636 `-h'
1637      generates a "help" summary of `flex's' options to `stdout' and
1638      then exits.  `-?' and `--help' are synonyms for `-h'.
1639
1640 `-i'
1641      instructs `flex' to generate a *case-insensitive* scanner.  The
1642      case of letters given in the `flex' input patterns will be
1643      ignored, and tokens in the input will be matched regardless of
1644      case.  The matched text given in `yytext' will have the preserved
1645      case (i.e., it will not be folded).
1646
1647 `-l'
1648      turns on maximum compatibility with the original AT&T `lex'
1649      implementation.  Note that this does not mean *full*
1650      compatibility.  Use of this option costs a considerable amount of
1651      performance, and it cannot be used with the `-+, -f, -F, -Cf', or
1652      `-CF' options.  For details on the compatibilities it provides, see
1653      the section "Incompatibilities With Lex And POSIX" below.  This
1654      option also results in the name `YY_FLEX_LEX_COMPAT' being
1655      #define'd in the generated scanner.
1656
1657 `-n'
1658      is another do-nothing, deprecated option included only for POSIX
1659      compliance.
1660
1661 `-p'
1662      generates a performance report to stderr.  The report consists of
1663      comments regarding features of the `flex' input file which will
1664      cause a serious loss of performance in the resulting scanner.  If
1665      you give the flag twice, you will also get comments regarding
1666      features that lead to minor performance losses.
1667
1668      Note that the use of `REJECT', `%option yylineno' and variable
1669      trailing context (see the Deficiencies / Bugs section below)
1670      entails a substantial performance penalty; use of `yymore()', the
1671      `^' operator, and the `-I' flag entail minor performance penalties.
1672
1673 `-s'
1674      causes the "default rule" (that unmatched scanner input is echoed
1675      to `stdout') to be suppressed.  If the scanner encounters input
1676      that does not match any of its rules, it aborts with an error.
1677      This option is useful for finding holes in a scanner's rule set.
1678
1679 `-t'
1680      instructs `flex' to write the scanner it generates to standard
1681      output instead of `lex.yy.c'.
1682
1683 `-v'
1684      specifies that `flex' should write to `stderr' a summary of
1685      statistics regarding the scanner it generates.  Most of the
1686      statistics are meaningless to the casual `flex' user, but the
1687      first line identifies the version of `flex' (same as reported by
1688      `-V'), and the next line the flags used when generating the
1689      scanner, including those that are on by default.
1690
1691 `-w'
1692      suppresses warning messages.
1693
1694 `-B'
1695      instructs `flex' to generate a *batch* scanner, the opposite of
1696      *interactive* scanners generated by `-I' (see below).  In general,
1697      you use `-B' when you are *certain* that your scanner will never
1698      be used interactively, and you want to squeeze a *little* more
1699      performance out of it.  If your goal is instead to squeeze out a
1700      *lot* more performance, you should be using the `-Cf' or `-CF'
1701      options (discussed below), which turn on `-B' automatically anyway.
1702
1703 `-F'
1704      specifies that the "fast" scanner table representation should be
1705      used (and stdio bypassed).  This representation is about as fast
1706      as the full table representation `(-f)', and for some sets of
1707      patterns will be considerably smaller (and for others, larger).
1708      In general, if the pattern set contains both "keywords" and a
1709      catch-all, "identifier" rule, such as in the set:
1710
1711           "case"    return TOK_CASE;
1712           "switch"  return TOK_SWITCH;
1713           ...
1714           "default" return TOK_DEFAULT;
1715           [a-z]+    return TOK_ID;
1716
1717      then you're better off using the full table representation.  If
1718      only the "identifier" rule is present and you then use a hash
1719      table or some such to detect the keywords, you're better off using
1720      `-F'.
1721
1722      This option is equivalent to `-CFr' (see below).  It cannot be
1723      used with `-+'.
1724
1725 `-I'
1726      instructs `flex' to generate an *interactive* scanner.  An
1727      interactive scanner is one that only looks ahead to decide what
1728      token has been matched if it absolutely must.  It turns out that
1729      always looking one extra character ahead, even if the scanner has
1730      already seen enough text to disambiguate the current token, is a
1731      bit faster than only looking ahead when necessary.  But scanners
1732      that always look ahead give dreadful interactive performance; for
1733      example, when a user types a newline, it is not recognized as a
1734      newline token until they enter *another* token, which often means
1735      typing in another whole line.
1736
1737      `Flex' scanners default to *interactive* unless you use the `-Cf'
1738      or `-CF' table-compression options (see below).  That's because if
1739      you're looking for high-performance you should be using one of
1740      these options, so if you didn't, `flex' assumes you'd rather trade
1741      off a bit of run-time performance for intuitive interactive
1742      behavior.  Note also that you *cannot* use `-I' in conjunction
1743      with `-Cf' or `-CF'.  Thus, this option is not really needed; it
1744      is on by default for all those cases in which it is allowed.
1745
1746      You can force a scanner to *not* be interactive by using `-B' (see
1747      above).
1748
1749 `-L'
1750      instructs `flex' not to generate `#line' directives.  Without this
1751      option, `flex' peppers the generated scanner with #line directives
1752      so error messages in the actions will be correctly located with
1753      respect to either the original `flex' input file (if the errors
1754      are due to code in the input file), or `lex.yy.c' (if the errors
1755      are `flex's' fault - you should report these sorts of errors to
1756      the email address given below).
1757
1758 `-T'
1759      makes `flex' run in `trace' mode.  It will generate a lot of
1760      messages to `stderr' concerning the form of the input and the
1761      resultant non-deterministic and deterministic finite automata.
1762      This option is mostly for use in maintaining `flex'.
1763
1764 `-V'
1765      prints the version number to `stdout' and exits.  `--version' is a
1766      synonym for `-V'.
1767
1768 `-7'
1769      instructs `flex' to generate a 7-bit scanner, i.e., one which can
1770      only recognized 7-bit characters in its input.  The advantage of
1771      using `-7' is that the scanner's tables can be up to half the size
1772      of those generated using the `-8' option (see below).  The
1773      disadvantage is that such scanners often hang or crash if their
1774      input contains an 8-bit character.
1775
1776      Note, however, that unless you generate your scanner using the
1777      `-Cf' or `-CF' table compression options, use of `-7' will save
1778      only a small amount of table space, and make your scanner
1779      considerably less portable.  `Flex's' default behavior is to
1780      generate an 8-bit scanner unless you use the `-Cf' or `-CF', in
1781      which case `flex' defaults to generating 7-bit scanners unless
1782      your site was always configured to generate 8-bit scanners (as
1783      will often be the case with non-USA sites).  You can tell whether
1784      flex generated a 7-bit or an 8-bit scanner by inspecting the flag
1785      summary in the `-v' output as described above.
1786
1787      Note that if you use `-Cfe' or `-CFe' (those table compression
1788      options, but also using equivalence classes as discussed see
1789      below), flex still defaults to generating an 8-bit scanner, since
1790      usually with these compression options full 8-bit tables are not
1791      much more expensive than 7-bit tables.
1792
1793 `-8'
1794      instructs `flex' to generate an 8-bit scanner, i.e., one which can
1795      recognize 8-bit characters.  This flag is only needed for scanners
1796      generated using `-Cf' or `-CF', as otherwise flex defaults to
1797      generating an 8-bit scanner anyway.
1798
1799      See the discussion of `-7' above for flex's default behavior and
1800      the tradeoffs between 7-bit and 8-bit scanners.
1801
1802 `-+'
1803      specifies that you want flex to generate a C++ scanner class.  See
1804      the section on Generating C++ Scanners below for details.
1805
1806 `-C[aefFmr]'
1807      controls the degree of table compression and, more generally,
1808      trade-offs between small scanners and fast scanners.
1809
1810      `-Ca' ("align") instructs flex to trade off larger tables in the
1811      generated scanner for faster performance because the elements of
1812      the tables are better aligned for memory access and computation.
1813      On some RISC architectures, fetching and manipulating long-words
1814      is more efficient than with smaller-sized units such as
1815      shortwords.  This option can double the size of the tables used by
1816      your scanner.
1817
1818      `-Ce' directs `flex' to construct "equivalence classes", i.e.,
1819      sets of characters which have identical lexical properties (for
1820      example, if the only appearance of digits in the `flex' input is
1821      in the character class "[0-9]" then the digits '0', '1', ..., '9'
1822      will all be put in the same equivalence class).  Equivalence
1823      classes usually give dramatic reductions in the final table/object
1824      file sizes (typically a factor of 2-5) and are pretty cheap
1825      performance-wise (one array look-up per character scanned).
1826
1827      `-Cf' specifies that the *full* scanner tables should be generated
1828      - `flex' should not compress the tables by taking advantages of
1829      similar transition functions for different states.
1830
1831      `-CF' specifies that the alternate fast scanner representation
1832      (described above under the `-F' flag) should be used.  This option
1833      cannot be used with `-+'.
1834
1835      `-Cm' directs `flex' to construct "meta-equivalence classes",
1836      which are sets of equivalence classes (or characters, if
1837      equivalence classes are not being used) that are commonly used
1838      together.  Meta-equivalence classes are often a big win when using
1839      compressed tables, but they have a moderate performance impact
1840      (one or two "if" tests and one array look-up per character
1841      scanned).
1842
1843      `-Cr' causes the generated scanner to *bypass* use of the standard
1844      I/O library (stdio) for input.  Instead of calling `fread()' or
1845      `getc()', the scanner will use the `read()' system call, resulting
1846      in a performance gain which varies from system to system, but in
1847      general is probably negligible unless you are also using `-Cf' or
1848      `-CF'.  Using `-Cr' can cause strange behavior if, for example,
1849      you read from `yyin' using stdio prior to calling the scanner
1850      (because the scanner will miss whatever text your previous reads
1851      left in the stdio input buffer).
1852
1853      `-Cr' has no effect if you define `YY_INPUT' (see The Generated
1854      Scanner above).
1855
1856      A lone `-C' specifies that the scanner tables should be compressed
1857      but neither equivalence classes nor meta-equivalence classes
1858      should be used.
1859
1860      The options `-Cf' or `-CF' and `-Cm' do not make sense together -
1861      there is no opportunity for meta-equivalence classes if the table
1862      is not being compressed.  Otherwise the options may be freely
1863      mixed, and are cumulative.
1864
1865      The default setting is `-Cem', which specifies that `flex' should
1866      generate equivalence classes and meta-equivalence classes.  This
1867      setting provides the highest degree of table compression.  You can
1868      trade off faster-executing scanners at the cost of larger tables
1869      with the following generally being true:
1870
1871           slowest & smallest
1872                 -Cem
1873                 -Cm
1874                 -Ce
1875                 -C
1876                 -C{f,F}e
1877                 -C{f,F}
1878                 -C{f,F}a
1879           fastest & largest
1880
1881      Note that scanners with the smallest tables are usually generated
1882      and compiled the quickest, so during development you will usually
1883      want to use the default, maximal compression.
1884
1885      `-Cfe' is often a good compromise between speed and size for
1886      production scanners.
1887
1888 `-ooutput'
1889      directs flex to write the scanner to the file `out-' `put' instead
1890      of `lex.yy.c'.  If you combine `-o' with the `-t' option, then the
1891      scanner is written to `stdout' but its `#line' directives (see the
1892      `-L' option above) refer to the file `output'.
1893
1894 `-Pprefix'
1895      changes the default `yy' prefix used by `flex' for all
1896      globally-visible variable and function names to instead be PREFIX.
1897      For example, `-Pfoo' changes the name of `yytext' to `footext'.
1898      It also changes the name of the default output file from
1899      `lex.yy.c' to `lex.foo.c'.  Here are all of the names affected:
1900
1901           yy_create_buffer
1902           yy_delete_buffer
1903           yy_flex_debug
1904           yy_init_buffer
1905           yy_flush_buffer
1906           yy_load_buffer_state
1907           yy_switch_to_buffer
1908           yyin
1909           yyleng
1910           yylex
1911           yylineno
1912           yyout
1913           yyrestart
1914           yytext
1915           yywrap
1916
1917      (If you are using a C++ scanner, then only `yywrap' and
1918      `yyFlexLexer' are affected.) Within your scanner itself, you can
1919      still refer to the global variables and functions using either
1920      version of their name; but externally, they have the modified name.
1921
1922      This option lets you easily link together multiple `flex' programs
1923      into the same executable.  Note, though, that using this option
1924      also renames `yywrap()', so you now *must* either provide your own
1925      (appropriately-named) version of the routine for your scanner, or
1926      use `%option noyywrap', as linking with `-lfl' no longer provides
1927      one for you by default.
1928
1929 `-Sskeleton_file'
1930      overrides the default skeleton file from which `flex' constructs
1931      its scanners.  You'll never need this option unless you are doing
1932      `flex' maintenance or development.
1933
1934    `flex' also provides a mechanism for controlling options within the
1935 scanner specification itself, rather than from the flex command-line.
1936 This is done by including `%option' directives in the first section of
1937 the scanner specification.  You can specify multiple options with a
1938 single `%option' directive, and multiple directives in the first
1939 section of your flex input file.  Most options are given simply as
1940 names, optionally preceded by the word "no" (with no intervening
1941 whitespace) to negate their meaning.  A number are equivalent to flex
1942 flags or their negation:
1943
1944      7bit            -7 option
1945      8bit            -8 option
1946      align           -Ca option
1947      backup          -b option
1948      batch           -B option
1949      c++             -+ option
1950
1951      caseful or
1952      case-sensitive  opposite of -i (default)
1953
1954      case-insensitive or
1955      caseless        -i option
1956
1957      debug           -d option
1958      default         opposite of -s option
1959      ecs             -Ce option
1960      fast            -F option
1961      full            -f option
1962      interactive     -I option
1963      lex-compat      -l option
1964      meta-ecs        -Cm option
1965      perf-report     -p option
1966      read            -Cr option
1967      stdout          -t option
1968      verbose         -v option
1969      warn            opposite of -w option
1970                      (use "%option nowarn" for -w)
1971
1972      array           equivalent to "%array"
1973      pointer         equivalent to "%pointer" (default)
1974
1975    Some `%option's' provide features otherwise not available:
1976
1977 `always-interactive'
1978      instructs flex to generate a scanner which always considers its
1979      input "interactive".  Normally, on each new input file the scanner
1980      calls `isatty()' in an attempt to determine whether the scanner's
1981      input source is interactive and thus should be read a character at
1982      a time.  When this option is used, however, then no such call is
1983      made.
1984
1985 `main'
1986      directs flex to provide a default `main()' program for the
1987      scanner, which simply calls `yylex()'.  This option implies
1988      `noyywrap' (see below).
1989
1990 `never-interactive'
1991      instructs flex to generate a scanner which never considers its
1992      input "interactive" (again, no call made to `isatty())'.  This is
1993      the opposite of `always-' *interactive*.
1994
1995 `stack'
1996      enables the use of start condition stacks (see Start Conditions
1997      above).
1998
1999 `stdinit'
2000      if unset (i.e., `%option nostdinit') initializes `yyin' and
2001      `yyout' to nil `FILE' pointers, instead of `stdin' and `stdout'.
2002
2003 `yylineno'
2004      directs `flex' to generate a scanner that maintains the number of
2005      the current line read from its input in the global variable
2006      `yylineno'.  This option is implied by `%option lex-compat'.
2007
2008 `yywrap'
2009      if unset (i.e., `%option noyywrap'), makes the scanner not call
2010      `yywrap()' upon an end-of-file, but simply assume that there are
2011      no more files to scan (until the user points `yyin' at a new file
2012      and calls `yylex()' again).
2013
2014    `flex' scans your rule actions to determine whether you use the
2015 `REJECT' or `yymore()' features.  The `reject' and `yymore' options are
2016 available to override its decision as to whether you use the options,
2017 either by setting them (e.g., `%option reject') to indicate the feature
2018 is indeed used, or unsetting them to indicate it actually is not used
2019 (e.g., `%option noyymore').
2020
2021    Three options take string-delimited values, offset with '=':
2022
2023      %option outfile="ABC"
2024
2025 is equivalent to `-oABC', and
2026
2027      %option prefix="XYZ"
2028
2029 is equivalent to `-PXYZ'.
2030
2031    Finally,
2032
2033      %option yyclass="foo"
2034
2035 only applies when generating a C++ scanner (`-+' option).  It informs
2036 `flex' that you have derived `foo' as a subclass of `yyFlexLexer' so
2037 `flex' will place your actions in the member function `foo::yylex()'
2038 instead of `yyFlexLexer::yylex()'.  It also generates a
2039 `yyFlexLexer::yylex()' member function that emits a run-time error (by
2040 invoking `yyFlexLexer::LexerError()') if called.  See Generating C++
2041 Scanners, below, for additional information.
2042
2043    A number of options are available for lint purists who want to
2044 suppress the appearance of unneeded routines in the generated scanner.
2045 Each of the following, if unset, results in the corresponding routine
2046 not appearing in the generated scanner:
2047
2048      input, unput
2049      yy_push_state, yy_pop_state, yy_top_state
2050      yy_scan_buffer, yy_scan_bytes, yy_scan_string
2051
2052 (though `yy_push_state()' and friends won't appear anyway unless you
2053 use `%option stack').
2054
2055 \x1f
2056 File: flex.info,  Node: Performance,  Next: C++,  Prev: Options,  Up: Top
2057
2058 Performance considerations
2059 ==========================
2060
2061    The main design goal of `flex' is that it generate high-performance
2062 scanners.  It has been optimized for dealing well with large sets of
2063 rules.  Aside from the effects on scanner speed of the table
2064 compression `-C' options outlined above, there are a number of
2065 options/actions which degrade performance.  These are, from most
2066 expensive to least:
2067
2068      REJECT
2069      %option yylineno
2070      arbitrary trailing context
2071
2072      pattern sets that require backing up
2073      %array
2074      %option interactive
2075      %option always-interactive
2076
2077      '^' beginning-of-line operator
2078      yymore()
2079
2080    with the first three all being quite expensive and the last two
2081 being quite cheap.  Note also that `unput()' is implemented as a
2082 routine call that potentially does quite a bit of work, while
2083 `yyless()' is a quite-cheap macro; so if just putting back some excess
2084 text you scanned, use `yyless()'.
2085
2086    `REJECT' should be avoided at all costs when performance is
2087 important.  It is a particularly expensive option.
2088
2089    Getting rid of backing up is messy and often may be an enormous
2090 amount of work for a complicated scanner.  In principal, one begins by
2091 using the `-b' flag to generate a `lex.backup' file.  For example, on
2092 the input
2093
2094      %%
2095      foo        return TOK_KEYWORD;
2096      foobar     return TOK_KEYWORD;
2097
2098 the file looks like:
2099
2100      State #6 is non-accepting -
2101       associated rule line numbers:
2102             2       3
2103       out-transitions: [ o ]
2104       jam-transitions: EOF [ \001-n  p-\177 ]
2105
2106      State #8 is non-accepting -
2107       associated rule line numbers:
2108             3
2109       out-transitions: [ a ]
2110       jam-transitions: EOF [ \001-`  b-\177 ]
2111
2112      State #9 is non-accepting -
2113       associated rule line numbers:
2114             3
2115       out-transitions: [ r ]
2116       jam-transitions: EOF [ \001-q  s-\177 ]
2117
2118      Compressed tables always back up.
2119
2120    The first few lines tell us that there's a scanner state in which it
2121 can make a transition on an 'o' but not on any other character, and
2122 that in that state the currently scanned text does not match any rule.
2123 The state occurs when trying to match the rules found at lines 2 and 3
2124 in the input file.  If the scanner is in that state and then reads
2125 something other than an 'o', it will have to back up to find a rule
2126 which is matched.  With a bit of head-scratching one can see that this
2127 must be the state it's in when it has seen "fo".  When this has
2128 happened, if anything other than another 'o' is seen, the scanner will
2129 have to back up to simply match the 'f' (by the default rule).
2130
2131    The comment regarding State #8 indicates there's a problem when
2132 "foob" has been scanned.  Indeed, on any character other than an 'a',
2133 the scanner will have to back up to accept "foo".  Similarly, the
2134 comment for State #9 concerns when "fooba" has been scanned and an 'r'
2135 does not follow.
2136
2137    The final comment reminds us that there's no point going to all the
2138 trouble of removing backing up from the rules unless we're using `-Cf'
2139 or `-CF', since there's no performance gain doing so with compressed
2140 scanners.
2141
2142    The way to remove the backing up is to add "error" rules:
2143
2144      %%
2145      foo         return TOK_KEYWORD;
2146      foobar      return TOK_KEYWORD;
2147
2148      fooba       |
2149      foob        |
2150      fo          {
2151                  /* false alarm, not really a keyword */
2152                  return TOK_ID;
2153                  }
2154
2155    Eliminating backing up among a list of keywords can also be done
2156 using a "catch-all" rule:
2157
2158      %%
2159      foo         return TOK_KEYWORD;
2160      foobar      return TOK_KEYWORD;
2161
2162      [a-z]+      return TOK_ID;
2163
2164    This is usually the best solution when appropriate.
2165
2166    Backing up messages tend to cascade.  With a complicated set of
2167 rules it's not uncommon to get hundreds of messages.  If one can
2168 decipher them, though, it often only takes a dozen or so rules to
2169 eliminate the backing up (though it's easy to make a mistake and have
2170 an error rule accidentally match a valid token.  A possible future
2171 `flex' feature will be to automatically add rules to eliminate backing
2172 up).
2173
2174    It's important to keep in mind that you gain the benefits of
2175 eliminating backing up only if you eliminate *every* instance of
2176 backing up.  Leaving just one means you gain nothing.
2177
2178    VARIABLE trailing context (where both the leading and trailing parts
2179 do not have a fixed length) entails almost the same performance loss as
2180 `REJECT' (i.e., substantial).  So when possible a rule like:
2181
2182      %%
2183      mouse|rat/(cat|dog)   run();
2184
2185 is better written:
2186
2187      %%
2188      mouse/cat|dog         run();
2189      rat/cat|dog           run();
2190
2191 or as
2192
2193      %%
2194      mouse|rat/cat         run();
2195      mouse|rat/dog         run();
2196
2197    Note that here the special '|' action does *not* provide any
2198 savings, and can even make things worse (see Deficiencies / Bugs below).
2199
2200    Another area where the user can increase a scanner's performance
2201 (and one that's easier to implement) arises from the fact that the
2202 longer the tokens matched, the faster the scanner will run.  This is
2203 because with long tokens the processing of most input characters takes
2204 place in the (short) inner scanning loop, and does not often have to go
2205 through the additional work of setting up the scanning environment
2206 (e.g., `yytext') for the action.  Recall the scanner for C comments:
2207
2208      %x comment
2209      %%
2210              int line_num = 1;
2211
2212      "/*"         BEGIN(comment);
2213
2214      <comment>[^*\n]*
2215      <comment>"*"+[^*/\n]*
2216      <comment>\n             ++line_num;
2217      <comment>"*"+"/"        BEGIN(INITIAL);
2218
2219    This could be sped up by writing it as:
2220
2221      %x comment
2222      %%
2223              int line_num = 1;
2224
2225      "/*"         BEGIN(comment);
2226
2227      <comment>[^*\n]*
2228      <comment>[^*\n]*\n      ++line_num;
2229      <comment>"*"+[^*/\n]*
2230      <comment>"*"+[^*/\n]*\n ++line_num;
2231      <comment>"*"+"/"        BEGIN(INITIAL);
2232
2233    Now instead of each newline requiring the processing of another
2234 action, recognizing the newlines is "distributed" over the other rules
2235 to keep the matched text as long as possible.  Note that *adding* rules
2236 does *not* slow down the scanner!  The speed of the scanner is
2237 independent of the number of rules or (modulo the considerations given
2238 at the beginning of this section) how complicated the rules are with
2239 regard to operators such as '*' and '|'.
2240
2241    A final example in speeding up a scanner: suppose you want to scan
2242 through a file containing identifiers and keywords, one per line and
2243 with no other extraneous characters, and recognize all the keywords.  A
2244 natural first approach is:
2245
2246      %%
2247      asm      |
2248      auto     |
2249      break    |
2250      ... etc ...
2251      volatile |
2252      while    /* it's a keyword */
2253
2254      .|\n     /* it's not a keyword */
2255
2256    To eliminate the back-tracking, introduce a catch-all rule:
2257
2258      %%
2259      asm      |
2260      auto     |
2261      break    |
2262      ... etc ...
2263      volatile |
2264      while    /* it's a keyword */
2265
2266      [a-z]+   |
2267      .|\n     /* it's not a keyword */
2268
2269    Now, if it's guaranteed that there's exactly one word per line, then
2270 we can reduce the total number of matches by a half by merging in the
2271 recognition of newlines with that of the other tokens:
2272
2273      %%
2274      asm\n    |
2275      auto\n   |
2276      break\n  |
2277      ... etc ...
2278      volatile\n |
2279      while\n  /* it's a keyword */
2280
2281      [a-z]+\n |
2282      .|\n     /* it's not a keyword */
2283
2284    One has to be careful here, as we have now reintroduced backing up
2285 into the scanner.  In particular, while *we* know that there will never
2286 be any characters in the input stream other than letters or newlines,
2287 `flex' can't figure this out, and it will plan for possibly needing to
2288 back up when it has scanned a token like "auto" and then the next
2289 character is something other than a newline or a letter.  Previously it
2290 would then just match the "auto" rule and be done, but now it has no
2291 "auto" rule, only a "auto\n" rule.  To eliminate the possibility of
2292 backing up, we could either duplicate all rules but without final
2293 newlines, or, since we never expect to encounter such an input and
2294 therefore don't how it's classified, we can introduce one more
2295 catch-all rule, this one which doesn't include a newline:
2296
2297      %%
2298      asm\n    |
2299      auto\n   |
2300      break\n  |
2301      ... etc ...
2302      volatile\n |
2303      while\n  /* it's a keyword */
2304
2305      [a-z]+\n |
2306      [a-z]+   |
2307      .|\n     /* it's not a keyword */
2308
2309    Compiled with `-Cf', this is about as fast as one can get a `flex'
2310 scanner to go for this particular problem.
2311
2312    A final note: `flex' is slow when matching NUL's, particularly when
2313 a token contains multiple NUL's.  It's best to write rules which match
2314 *short* amounts of text if it's anticipated that the text will often
2315 include NUL's.
2316
2317    Another final note regarding performance: as mentioned above in the
2318 section How the Input is Matched, dynamically resizing `yytext' to
2319 accommodate huge tokens is a slow process because it presently requires
2320 that the (huge) token be rescanned from the beginning.  Thus if
2321 performance is vital, you should attempt to match "large" quantities of
2322 text but not "huge" quantities, where the cutoff between the two is at
2323 about 8K characters/token.
2324
2325 \x1f
2326 File: flex.info,  Node: C++,  Next: Incompatibilities,  Prev: Performance,  Up: Top
2327
2328 Generating C++ scanners
2329 =======================
2330
2331    `flex' provides two different ways to generate scanners for use with
2332 C++.  The first way is to simply compile a scanner generated by `flex'
2333 using a C++ compiler instead of a C compiler.  You should not encounter
2334 any compilations errors (please report any you find to the email address
2335 given in the Author section below).  You can then use C++ code in your
2336 rule actions instead of C code.  Note that the default input source for
2337 your scanner remains `yyin', and default echoing is still done to
2338 `yyout'.  Both of these remain `FILE *' variables and not C++ `streams'.
2339
2340    You can also use `flex' to generate a C++ scanner class, using the
2341 `-+' option, (or, equivalently, `%option c++'), which is automatically
2342 specified if the name of the flex executable ends in a `+', such as
2343 `flex++'.  When using this option, flex defaults to generating the
2344 scanner to the file `lex.yy.cc' instead of `lex.yy.c'.  The generated
2345 scanner includes the header file `FlexLexer.h', which defines the
2346 interface to two C++ classes.
2347
2348    The first class, `FlexLexer', provides an abstract base class
2349 defining the general scanner class interface.  It provides the
2350 following member functions:
2351
2352 `const char* YYText()'
2353      returns the text of the most recently matched token, the
2354      equivalent of `yytext'.
2355
2356 `int YYLeng()'
2357      returns the length of the most recently matched token, the
2358      equivalent of `yyleng'.
2359
2360 `int lineno() const'
2361      returns the current input line number (see `%option yylineno'), or
2362      1 if `%option yylineno' was not used.
2363
2364 `void set_debug( int flag )'
2365      sets the debugging flag for the scanner, equivalent to assigning to
2366      `yy_flex_debug' (see the Options section above).  Note that you
2367      must build the scanner using `%option debug' to include debugging
2368      information in it.
2369
2370 `int debug() const'
2371      returns the current setting of the debugging flag.
2372
2373    Also provided are member functions equivalent to
2374 `yy_switch_to_buffer(), yy_create_buffer()' (though the first argument
2375 is an `istream*' object pointer and not a `FILE*', `yy_flush_buffer()',
2376 `yy_delete_buffer()', and `yyrestart()' (again, the first argument is a
2377 `istream*' object pointer).
2378
2379    The second class defined in `FlexLexer.h' is `yyFlexLexer', which is
2380 derived from `FlexLexer'.  It defines the following additional member
2381 functions:
2382
2383 `yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
2384      constructs a `yyFlexLexer' object using the given streams for
2385      input and output.  If not specified, the streams default to `cin'
2386      and `cout', respectively.
2387
2388 `virtual int yylex()'
2389      performs the same role is `yylex()' does for ordinary flex
2390      scanners: it scans the input stream, consuming tokens, until a
2391      rule's action returns a value.  If you derive a subclass S from
2392      `yyFlexLexer' and want to access the member functions and
2393      variables of S inside `yylex()', then you need to use `%option
2394      yyclass="S"' to inform `flex' that you will be using that subclass
2395      instead of `yyFlexLexer'.  In this case, rather than generating
2396      `yyFlexLexer::yylex()', `flex' generates `S::yylex()' (and also
2397      generates a dummy `yyFlexLexer::yylex()' that calls
2398      `yyFlexLexer::LexerError()' if called).
2399
2400 `virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
2401      reassigns `yyin' to `new_in' (if non-nil) and `yyout' to `new_out'
2402      (ditto), deleting the previous input buffer if `yyin' is
2403      reassigned.
2404
2405 `int yylex( istream* new_in = 0, ostream* new_out = 0 )'
2406      first switches the input streams via `switch_streams( new_in,
2407      new_out )' and then returns the value of `yylex()'.
2408
2409    In addition, `yyFlexLexer' defines the following protected virtual
2410 functions which you can redefine in derived classes to tailor the
2411 scanner:
2412
2413 `virtual int LexerInput( char* buf, int max_size )'
2414      reads up to `max_size' characters into BUF and returns the number
2415      of characters read.  To indicate end-of-input, return 0
2416      characters.  Note that "interactive" scanners (see the `-B' and
2417      `-I' flags) define the macro `YY_INTERACTIVE'.  If you redefine
2418      `LexerInput()' and need to take different actions depending on
2419      whether or not the scanner might be scanning an interactive input
2420      source, you can test for the presence of this name via `#ifdef'.
2421
2422 `virtual void LexerOutput( const char* buf, int size )'
2423      writes out SIZE characters from the buffer BUF, which, while
2424      NUL-terminated, may also contain "internal" NUL's if the scanner's
2425      rules can match text with NUL's in them.
2426
2427 `virtual void LexerError( const char* msg )'
2428      reports a fatal error message.  The default version of this
2429      function writes the message to the stream `cerr' and exits.
2430
2431    Note that a `yyFlexLexer' object contains its *entire* scanning
2432 state.  Thus you can use such objects to create reentrant scanners.
2433 You can instantiate multiple instances of the same `yyFlexLexer' class,
2434 and you can also combine multiple C++ scanner classes together in the
2435 same program using the `-P' option discussed above.  Finally, note that
2436 the `%array' feature is not available to C++ scanner classes; you must
2437 use `%pointer' (the default).
2438
2439    Here is an example of a simple C++ scanner:
2440
2441          // An example of using the flex C++ scanner class.
2442
2443      %{
2444      int mylineno = 0;
2445      %}
2446
2447      string  \"[^\n"]+\"
2448
2449      ws      [ \t]+
2450
2451      alpha   [A-Za-z]
2452      dig     [0-9]
2453      name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
2454      num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
2455      num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
2456      number  {num1}|{num2}
2457
2458      %%
2459
2460      {ws}    /* skip blanks and tabs */
2461
2462      "/*"    {
2463              int c;
2464
2465              while((c = yyinput()) != 0)
2466                  {
2467                  if(c == '\n')
2468                      ++mylineno;
2469
2470                  else if(c == '*')
2471                      {
2472                      if((c = yyinput()) == '/')
2473                          break;
2474                      else
2475                          unput(c);
2476                      }
2477                  }
2478              }
2479
2480      {number}  cout << "number " << YYText() << '\n';
2481
2482      \n        mylineno++;
2483
2484      {name}    cout << "name " << YYText() << '\n';
2485
2486      {string}  cout << "string " << YYText() << '\n';
2487
2488      %%
2489
2490      Version 2.5               December 1994                        44
2491
2492      int main( int /* argc */, char** /* argv */ )
2493          {
2494          FlexLexer* lexer = new yyFlexLexer;
2495          while(lexer->yylex() != 0)
2496              ;
2497          return 0;
2498          }
2499
2500    If you want to create multiple (different) lexer classes, you use
2501 the `-P' flag (or the `prefix=' option) to rename each `yyFlexLexer' to
2502 some other `xxFlexLexer'.  You then can include `<FlexLexer.h>' in your
2503 other sources once per lexer class, first renaming `yyFlexLexer' as
2504 follows:
2505
2506      #undef yyFlexLexer
2507      #define yyFlexLexer xxFlexLexer
2508      #include <FlexLexer.h>
2509
2510      #undef yyFlexLexer
2511      #define yyFlexLexer zzFlexLexer
2512      #include <FlexLexer.h>
2513
2514    if, for example, you used `%option prefix="xx"' for one of your
2515 scanners and `%option prefix="zz"' for the other.
2516
2517    IMPORTANT: the present form of the scanning class is *experimental*
2518 and may change considerably between major releases.
2519
2520 \x1f
2521 File: flex.info,  Node: Incompatibilities,  Next: Diagnostics,  Prev: C++,  Up: Top
2522
2523 Incompatibilities with `lex' and POSIX
2524 ======================================
2525
2526    `flex' is a rewrite of the AT&T Unix `lex' tool (the two
2527 implementations do not share any code, though), with some extensions
2528 and incompatibilities, both of which are of concern to those who wish
2529 to write scanners acceptable to either implementation.  Flex is fully
2530 compliant with the POSIX `lex' specification, except that when using
2531 `%pointer' (the default), a call to `unput()' destroys the contents of
2532 `yytext', which is counter to the POSIX specification.
2533
2534    In this section we discuss all of the known areas of incompatibility
2535 between flex, AT&T lex, and the POSIX specification.
2536
2537    `flex's' `-l' option turns on maximum compatibility with the
2538 original AT&T `lex' implementation, at the cost of a major loss in the
2539 generated scanner's performance.  We note below which incompatibilities
2540 can be overcome using the `-l' option.
2541
2542    `flex' is fully compatible with `lex' with the following exceptions:
2543
2544    - The undocumented `lex' scanner internal variable `yylineno' is not
2545      supported unless `-l' or `%option yylineno' is used.  `yylineno'
2546      should be maintained on a per-buffer basis, rather than a
2547      per-scanner (single global variable) basis.  `yylineno' is not
2548      part of the POSIX specification.
2549
2550    - The `input()' routine is not redefinable, though it may be called
2551      to read characters following whatever has been matched by a rule.
2552      If `input()' encounters an end-of-file the normal `yywrap()'
2553      processing is done.  A "real" end-of-file is returned by `input()'
2554      as `EOF'.
2555
2556      Input is instead controlled by defining the `YY_INPUT' macro.
2557
2558      The `flex' restriction that `input()' cannot be redefined is in
2559      accordance with the POSIX specification, which simply does not
2560      specify any way of controlling the scanner's input other than by
2561      making an initial assignment to `yyin'.
2562
2563    - The `unput()' routine is not redefinable.  This restriction is in
2564      accordance with POSIX.
2565
2566    - `flex' scanners are not as reentrant as `lex' scanners.  In
2567      particular, if you have an interactive scanner and an interrupt
2568      handler which long-jumps out of the scanner, and the scanner is
2569      subsequently called again, you may get the following message:
2570
2571           fatal flex scanner internal error--end of buffer missed
2572
2573      To reenter the scanner, first use
2574
2575           yyrestart( yyin );
2576
2577      Note that this call will throw away any buffered input; usually
2578      this isn't a problem with an interactive scanner.
2579
2580      Also note that flex C++ scanner classes *are* reentrant, so if
2581      using C++ is an option for you, you should use them instead.  See
2582      "Generating C++ Scanners" above for details.
2583
2584    - `output()' is not supported.  Output from the `ECHO' macro is done
2585      to the file-pointer `yyout' (default `stdout').
2586
2587      `output()' is not part of the POSIX specification.
2588
2589    - `lex' does not support exclusive start conditions (%x), though
2590      they are in the POSIX specification.
2591
2592    - When definitions are expanded, `flex' encloses them in
2593      parentheses.  With lex, the following:
2594
2595           NAME    [A-Z][A-Z0-9]*
2596           %%
2597           foo{NAME}?      printf( "Found it\n" );
2598           %%
2599
2600      will not match the string "foo" because when the macro is expanded
2601      the rule is equivalent to "foo[A-Z][A-Z0-9]*?" and the precedence
2602      is such that the '?' is associated with "[A-Z0-9]*".  With `flex',
2603      the rule will be expanded to "foo([A-Z][A-Z0-9]*)?" and so the
2604      string "foo" will match.
2605
2606      Note that if the definition begins with `^' or ends with `$' then
2607      it is *not* expanded with parentheses, to allow these operators to
2608      appear in definitions without losing their special meanings.  But
2609      the `<s>, /', and `<<EOF>>' operators cannot be used in a `flex'
2610      definition.
2611
2612      Using `-l' results in the `lex' behavior of no parentheses around
2613      the definition.
2614
2615      The POSIX specification is that the definition be enclosed in
2616      parentheses.
2617
2618    - Some implementations of `lex' allow a rule's action to begin on a
2619      separate line, if the rule's pattern has trailing whitespace:
2620
2621           %%
2622           foo|bar<space here>
2623             { foobar_action(); }
2624
2625      `flex' does not support this feature.
2626
2627    - The `lex' `%r' (generate a Ratfor scanner) option is not
2628      supported.  It is not part of the POSIX specification.
2629
2630    - After a call to `unput()', `yytext' is undefined until the next
2631      token is matched, unless the scanner was built using `%array'.
2632      This is not the case with `lex' or the POSIX specification.  The
2633      `-l' option does away with this incompatibility.
2634
2635    - The precedence of the `{}' (numeric range) operator is different.
2636      `lex' interprets "abc{1,3}" as "match one, two, or three
2637      occurrences of 'abc'", whereas `flex' interprets it as "match 'ab'
2638      followed by one, two, or three occurrences of 'c'".  The latter is
2639      in agreement with the POSIX specification.
2640
2641    - The precedence of the `^' operator is different.  `lex' interprets
2642      "^foo|bar" as "match either 'foo' at the beginning of a line, or
2643      'bar' anywhere", whereas `flex' interprets it as "match either
2644      'foo' or 'bar' if they come at the beginning of a line".  The
2645      latter is in agreement with the POSIX specification.
2646
2647    - The special table-size declarations such as `%a' supported by
2648      `lex' are not required by `flex' scanners; `flex' ignores them.
2649
2650    - The name FLEX_SCANNER is #define'd so scanners may be written for
2651      use with either `flex' or `lex'.  Scanners also include
2652      `YY_FLEX_MAJOR_VERSION' and `YY_FLEX_MINOR_VERSION' indicating
2653      which version of `flex' generated the scanner (for example, for the
2654      2.5 release, these defines would be 2 and 5 respectively).
2655
2656    The following `flex' features are not included in `lex' or the POSIX
2657 specification:
2658
2659      C++ scanners
2660      %option
2661      start condition scopes
2662      start condition stacks
2663      interactive/non-interactive scanners
2664      yy_scan_string() and friends
2665      yyterminate()
2666      yy_set_interactive()
2667      yy_set_bol()
2668      YY_AT_BOL()
2669      <<EOF>>
2670      <*>
2671      YY_DECL
2672      YY_START
2673      YY_USER_ACTION
2674      YY_USER_INIT
2675      #line directives
2676      %{}'s around actions
2677      multiple actions on a line
2678
2679 plus almost all of the flex flags.  The last feature in the list refers
2680 to the fact that with `flex' you can put multiple actions on the same
2681 line, separated with semicolons, while with `lex', the following
2682
2683      foo    handle_foo(); ++num_foos_seen;
2684
2685 is (rather surprisingly) truncated to
2686
2687      foo    handle_foo();
2688
2689    `flex' does not truncate the action.  Actions that are not enclosed
2690 in braces are simply terminated at the end of the line.
2691
2692 \x1f
2693 File: flex.info,  Node: Diagnostics,  Next: Files,  Prev: Incompatibilities,  Up: Top
2694
2695 Diagnostics
2696 ===========
2697
2698 `warning, rule cannot be matched'
2699      indicates that the given rule cannot be matched because it follows
2700      other rules that will always match the same text as it.  For
2701      example, in the following "foo" cannot be matched because it comes
2702      after an identifier "catch-all" rule:
2703
2704           [a-z]+    got_identifier();
2705           foo       got_foo();
2706
2707      Using `REJECT' in a scanner suppresses this warning.
2708
2709 `warning, -s option given but default rule can be matched'
2710      means that it is possible (perhaps only in a particular start
2711      condition) that the default rule (match any single character) is
2712      the only one that will match a particular input.  Since `-s' was
2713      given, presumably this is not intended.
2714
2715 `reject_used_but_not_detected undefined'
2716 `yymore_used_but_not_detected undefined'
2717      These errors can occur at compile time.  They indicate that the
2718      scanner uses `REJECT' or `yymore()' but that `flex' failed to
2719      notice the fact, meaning that `flex' scanned the first two sections
2720      looking for occurrences of these actions and failed to find any,
2721      but somehow you snuck some in (via a #include file, for example).
2722      Use `%option reject' or `%option yymore' to indicate to flex that
2723      you really do use these features.
2724
2725 `flex scanner jammed'
2726      a scanner compiled with `-s' has encountered an input string which
2727      wasn't matched by any of its rules.  This error can also occur due
2728      to internal problems.
2729
2730 `token too large, exceeds YYLMAX'
2731      your scanner uses `%array' and one of its rules matched a string
2732      longer than the `YYL-' `MAX' constant (8K bytes by default).  You
2733      can increase the value by #define'ing `YYLMAX' in the definitions
2734      section of your `flex' input.
2735
2736 `scanner requires -8 flag to use the character 'X''
2737      Your scanner specification includes recognizing the 8-bit
2738      character X and you did not specify the -8 flag, and your scanner
2739      defaulted to 7-bit because you used the `-Cf' or `-CF' table
2740      compression options.  See the discussion of the `-7' flag for
2741      details.
2742
2743 `flex scanner push-back overflow'
2744      you used `unput()' to push back so much text that the scanner's
2745      buffer could not hold both the pushed-back text and the current
2746      token in `yytext'.  Ideally the scanner should dynamically resize
2747      the buffer in this case, but at present it does not.
2748
2749 `input buffer overflow, can't enlarge buffer because scanner uses REJECT'
2750      the scanner was working on matching an extremely large token and
2751      needed to expand the input buffer.  This doesn't work with
2752      scanners that use `REJECT'.
2753
2754 `fatal flex scanner internal error--end of buffer missed'
2755      This can occur in an scanner which is reentered after a long-jump
2756      has jumped out (or over) the scanner's activation frame.  Before
2757      reentering the scanner, use:
2758
2759           yyrestart( yyin );
2760
2761      or, as noted above, switch to using the C++ scanner class.
2762
2763 `too many start conditions in <> construct!'
2764      you listed more start conditions in a <> construct than exist (so
2765      you must have listed at least one of them twice).
2766
2767 \x1f
2768 File: flex.info,  Node: Files,  Next: Deficiencies,  Prev: Diagnostics,  Up: Top
2769
2770 Files
2771 =====
2772
2773 `-lfl'
2774      library with which scanners must be linked.
2775
2776 `lex.yy.c'
2777      generated scanner (called `lexyy.c' on some systems).
2778
2779 `lex.yy.cc'
2780      generated C++ scanner class, when using `-+'.
2781
2782 `<FlexLexer.h>'
2783      header file defining the C++ scanner base class, `FlexLexer', and
2784      its derived class, `yyFlexLexer'.
2785
2786 `flex.skl'
2787      skeleton scanner.  This file is only used when building flex, not
2788      when flex executes.
2789
2790 `lex.backup'
2791      backing-up information for `-b' flag (called `lex.bck' on some
2792      systems).
2793
2794 \x1f
2795 File: flex.info,  Node: Deficiencies,  Next: See also,  Prev: Files,  Up: Top
2796
2797 Deficiencies / Bugs
2798 ===================
2799
2800    Some trailing context patterns cannot be properly matched and
2801 generate warning messages ("dangerous trailing context").  These are
2802 patterns where the ending of the first part of the rule matches the
2803 beginning of the second part, such as "zx*/xy*", where the 'x*' matches
2804 the 'x' at the beginning of the trailing context.  (Note that the POSIX
2805 draft states that the text matched by such patterns is undefined.)
2806
2807    For some trailing context rules, parts which are actually
2808 fixed-length are not recognized as such, leading to the abovementioned
2809 performance loss.  In particular, parts using '|' or {n} (such as
2810 "foo{3}") are always considered variable-length.
2811
2812    Combining trailing context with the special '|' action can result in
2813 *fixed* trailing context being turned into the more expensive VARIABLE
2814 trailing context.  For example, in the following:
2815
2816      %%
2817      abc      |
2818      xyz/def
2819
2820    Use of `unput()' invalidates yytext and yyleng, unless the `%array'
2821 directive or the `-l' option has been used.
2822
2823    Pattern-matching of NUL's is substantially slower than matching
2824 other characters.
2825
2826    Dynamic resizing of the input buffer is slow, as it entails
2827 rescanning all the text matched so far by the current (generally huge)
2828 token.
2829
2830    Due to both buffering of input and read-ahead, you cannot intermix
2831 calls to <stdio.h> routines, such as, for example, `getchar()', with
2832 `flex' rules and expect it to work.  Call `input()' instead.
2833
2834    The total table entries listed by the `-v' flag excludes the number
2835 of table entries needed to determine what rule has been matched.  The
2836 number of entries is equal to the number of DFA states if the scanner
2837 does not use `REJECT', and somewhat greater than the number of states
2838 if it does.
2839
2840    `REJECT' cannot be used with the `-f' or `-F' options.
2841
2842    The `flex' internal algorithms need documentation.
2843
2844 \x1f
2845 File: flex.info,  Node: See also,  Next: Author,  Prev: Deficiencies,  Up: Top
2846
2847 See also
2848 ========
2849
2850    `lex'(1), `yacc'(1), `sed'(1), `awk'(1).
2851
2852    John Levine, Tony Mason, and Doug Brown: Lex & Yacc; O'Reilly and
2853 Associates.  Be sure to get the 2nd edition.
2854
2855    M. E. Lesk and E. Schmidt, LEX - Lexical Analyzer Generator.
2856
2857    Alfred Aho, Ravi Sethi and Jeffrey Ullman: Compilers: Principles,
2858 Techniques and Tools; Addison-Wesley (1986).  Describes the
2859 pattern-matching techniques used by `flex' (deterministic finite
2860 automata).
2861
2862 \x1f
2863 File: flex.info,  Node: Author,  Prev: See also,  Up: Top
2864
2865 Author
2866 ======
2867
2868    Vern Paxson, with the help of many ideas and much inspiration from
2869 Van Jacobson.  Original version by Jef Poskanzer.  The fast table
2870 representation is a partial implementation of a design done by Van
2871 Jacobson.  The implementation was done by Kevin Gong and Vern Paxson.
2872
2873    Thanks to the many `flex' beta-testers, feedbackers, and
2874 contributors, especially Francois Pinard, Casey Leedom, Stan Adermann,
2875 Terry Allen, David Barker-Plummer, John Basrai, Nelson H.F. Beebe,
2876 `benson@odi.com', Karl Berry, Peter A. Bigot, Simon Blanchard, Keith
2877 Bostic, Frederic Brehm, Ian Brockbank, Kin Cho, Nick Christopher, Brian
2878 Clapper, J.T. Conklin, Jason Coughlin, Bill Cox, Nick Cropper, Dave
2879 Curtis, Scott David Daniels, Chris G. Demetriou, Theo Deraadt, Mike
2880 Donahue, Chuck Doucette, Tom Epperly, Leo Eskin, Chris Faylor, Chris
2881 Flatters, Jon Forrest, Joe Gayda, Kaveh R. Ghazi, Eric Goldman,
2882 Christopher M.  Gould, Ulrich Grepel, Peer Griebel, Jan Hajic, Charles
2883 Hemphill, NORO Hideo, Jarkko Hietaniemi, Scott Hofmann, Jeff Honig,
2884 Dana Hudes, Eric Hughes, John Interrante, Ceriel Jacobs, Michal
2885 Jaegermann, Sakari Jalovaara, Jeffrey R. Jones, Henry Juengst, Klaus
2886 Kaempf, Jonathan I. Kamens, Terrence O Kane, Amir Katz,
2887 `ken@ken.hilco.com', Kevin B. Kenny, Steve Kirsch, Winfried Koenig,
2888 Marq Kole, Ronald Lamprecht, Greg Lee, Rohan Lenard, Craig Leres, John
2889 Levine, Steve Liddle, Mike Long, Mohamed el Lozy, Brian Madsen, Malte,
2890 Joe Marshall, Bengt Martensson, Chris Metcalf, Luke Mewburn, Jim
2891 Meyering, R.  Alexander Milowski, Erik Naggum, G.T. Nicol, Landon Noll,
2892 James Nordby, Marc Nozell, Richard Ohnemus, Karsten Pahnke, Sven Panne,
2893 Roland Pesch, Walter Pelissero, Gaumond Pierre, Esmond Pitt, Jef
2894 Poskanzer, Joe Rahmeh, Jarmo Raiha, Frederic Raimbault, Pat Rankin,
2895 Rick Richardson, Kevin Rodgers, Kai Uwe Rommel, Jim Roskind, Alberto
2896 Santini, Andreas Scherer, Darrell Schiebel, Raf Schietekat, Doug
2897 Schmidt, Philippe Schnoebelen, Andreas Schwab, Alex Siegel, Eckehard
2898 Stolz, Jan-Erik Strvmquist, Mike Stump, Paul Stuart, Dave Tallman, Ian
2899 Lance Taylor, Chris Thewalt, Richard M. Timoney, Jodi Tsai, Paul
2900 Tuinenga, Gary Weik, Frank Whaley, Gerhard Wilhelms, Kent Williams, Ken
2901 Yap, Ron Zellar, Nathan Zelle, David Zuhn, and those whose names have
2902 slipped my marginal mail-archiving skills but whose contributions are
2903 appreciated all the same.
2904
2905    Thanks to Keith Bostic, Jon Forrest, Noah Friedman, John Gilmore,
2906 Craig Leres, John Levine, Bob Mulcahy, G.T.  Nicol, Francois Pinard,
2907 Rich Salz, and Richard Stallman for help with various distribution
2908 headaches.
2909
2910    Thanks to Esmond Pitt and Earle Horton for 8-bit character support;
2911 to Benson Margulies and Fred Burke for C++ support; to Kent Williams
2912 and Tom Epperly for C++ class support; to Ove Ewerlid for support of
2913 NUL's; and to Eric Hughes for support of multiple buffers.
2914
2915    This work was primarily done when I was with the Real Time Systems
2916 Group at the Lawrence Berkeley Laboratory in Berkeley, CA.  Many thanks
2917 to all there for the support I received.
2918
2919    Send comments to `vern@ee.lbl.gov'.
2920
2921
2922 \x1f
2923 Tag Table:
2924 Node: Top\x7f1430
2925 Node: Name\x7f2808
2926 Node: Synopsis\x7f2933
2927 Node: Overview\x7f3145
2928 Node: Description\x7f4986
2929 Node: Examples\x7f5748
2930 Node: Format\x7f8896
2931 Node: Patterns\x7f11637
2932 Node: Matching\x7f18138
2933 Node: Actions\x7f21438
2934 Node: Generated scanner\x7f30560
2935 Node: Start conditions\x7f34988
2936 Node: Multiple buffers\x7f45069
2937 Node: End-of-file rules\x7f50975
2938 Node: Miscellaneous\x7f52508
2939 Node: User variables\x7f55279
2940 Node: YACC interface\x7f57651
2941 Node: Options\x7f58542
2942 Node: Performance\x7f78234
2943 Node: C++\x7f87532
2944 Node: Incompatibilities\x7f94993
2945 Node: Diagnostics\x7f101853
2946 Node: Files\x7f105094
2947 Node: Deficiencies\x7f105715
2948 Node: See also\x7f107684
2949 Node: Author\x7f108216
2950 \x1f
2951 End Tag Table