1 -- Id: README.BTYACC,v 1.2 2014/04/22 08:18:57 Tom.Shields Exp
3 The original README from btyacc is below.
5 The backtracking enhancements to byacc have been merged into Thomas Dickey's
8 The %include and %define/%ifdef enhancements described below are not currently
11 The position management functionality ("YYPOSN", "yyposn", "YYREDUCEPOSNFUNC",
12 "YYREDUCEPOSNFUNCARG" & "YYCALLREDUCEPOSN") is replaced by a bison-compatible
13 "%locations" implementation.
15 The memory management functionality ("YYDELETEVAL" & "YYDELETEPOSN") is
16 replaced by a bison-compatible "%destructor" implementation.
18 The detailed syntax error processing functionality ("YYERROR_DETAILED"
19 & "yyerror_detailed()") is subsumed by the bison-compatible "yyerror()"
20 implementation, as modified by the %parse-param and %locations directives.
22 The debugging macro "YYDBPR()" in the parser skeleton is renamed
25 -------------------------------------------------------------------------------
26 BTYACC -- backtracking yacc
27 ===========================
29 BTYACC was created by Chris Dodd using ideas from many
30 places and lots of code from the Berkeley Yacc
31 distribution, which is a public domain yacc clone put
32 together by the good folks at Berkeley. This code is
33 distributed with NO WARRANTY and is public domain.
34 It is certain to contain bugs, which you should
35 report to: chrisd@collins.com.
37 Vadim Maslov of Siber Systems <vadik@siber.com>
38 considerably modified BTYACC to make it suitable
39 for production environment.
41 Several people have suggested bug fixes that
42 were incorporated into BtYacc.
44 See the README.BYACC files for more about
45 Berkeley Yacc and other sources of info.
47 http://www.siber.com/btyacc/ is the current home of BtYacc.
48 It is provided courtesy of Siber Systems http://www.siber.com/.
55 Changes mostly occurred in btyaccpa.ske file that
56 contains the parsing shift/reduce/backtrack algorithm.
58 Version 3.0 innovations focus on:
59 - text position computation and propagation,
60 - industrial-strength error processing and recovery.
63 ** Added mechanism for computing and propagating
64 text position of tokens and non-terminals.
66 Compilers often need to build AST trees such that every node
67 in a tree can relate to the parsed program source it came from.
68 The following applications are very likely to need this:
69 - debuggers that show actual source of the debugged program,
70 - source-to-source translators that want
71 unchanged parts of the tree to generate the unchanged code.
73 The new YYPOSN mechanism added in this version of BtYacc
74 helps you in automating the text position computation
75 and in assigning the computed text positions to the AST.
76 This mechanism is successfully used in commercial
77 parsers and source-to-source translators.
79 In standard Yaccs every token and every non-terminal
80 has an YYSTYPE semantic value attached to it.
81 In this new version every token and every non-terminal
82 also has an YYPOSN text position attached to it.
83 YYPOSN is a user-defined type that can be anything and
84 that has a meaning of text position attached to
85 token or non-terminal.
87 In addition to semantic value stack BtYacc now maintains
88 text position stack. Behavior of the text position stack
89 is similar to the behavior of the semantic value stack.
91 If using text position mechanism,
92 you need to define the following:
94 YYPOSN Preprocessor variable that contains C/C++ type of
95 the text position attached to
96 every token and non-terminal.
98 yyposn Global variable of type YYPOSN.
99 The lexer must assign text position of
100 the returned token to yyposn, just like it assigns
101 semantic value of the returned token to yylval.
104 Preprocessor variable that points to function that
105 is called after the grammar rule reduction
106 to reduce text positions located on the stack.
108 This function is called by BtYacc to reduce text
109 positions. The function is called immediately after
110 the regular rule reduction occurs.
112 The function has the following prototype:
113 void ReducePosn(YYPOSN &ret,
122 The function arguments are:
124 Reference to the text position returned by
125 the rule. The function must write the computed
126 text position returned by the rule to ret.
127 This is analogue of the $$ semantic value.
130 Array of the right-hand side rule components
131 YYPOSN text positions. These are analogues of
132 $1, $2, ..., $N in the text position world.
135 Array of the right-hand side (RHS) rule components
136 YYSTYPE values. These are the $1,...,$N themselves.
139 Number of the components in RHS of the reduced rule.
140 Equal to size of arrays term_posns and term_vals.
141 Also equal to N in $1,...,$N in the reduced rule.
144 YYSTYPE/YYPOSN stack position before the reduction.
147 Lookahead token that immediately follows
148 the reduced RHS components.
151 YYPOSN of the token that immediately follows
152 the reduced RHS components.
155 User-defined extra argument passed to ReducePosn.
157 Typically this function extracts text positions from
158 the right-hand side rule components and either
159 assigns them to the returned $$ structure/tree or
160 if no $$ value is returned, puts them into
161 the ret text position from where
162 it will be picked up by the later reduced rules.
165 Extra user-defined argument passed to
166 the ReducePosn function. This argument can use
167 any variables defined in btyaccpa.ske.
170 ** Added code to btyaccpa.ske that automatically cleans up
171 semantic semantic values and text positions of tokens
172 and non-terminals that are discarded and deleted as
173 a result of error processing.
175 In the previous versions the discarded token and non-terminal
176 semantic values were not cleaned that caused quite severe
177 leaks. The only way to fix it was to add garbage collection
180 Now BtYacc skeleton calls delete functions for semantic
181 values and positions of the discarded tokens and
184 You need to define the following functions that BtYacc
185 calls when it needs to delete semantic value or text position.
188 User-defined function that is called by BtYacc
189 to delete semantic value of the token or non-terminal.
191 The user-defined function must have the prototype:
192 void DeleteYYval(YYSTYPE v, int type);
193 v is semantic value to delete,
194 type is one of the following:
197 2 cleaning up stack when aborting
200 User-defined function that is called by BtYacc
201 to delete text position of the token or non-terminal.
203 The user-defined function must have the prototype:
204 void DeleteYYposn(YYPOSN p, int type);
205 v is semantic value to delete,
206 type is one of the following:
209 2 cleaning up stack when aborting
212 ** User can define "detailed" syntax error processing
213 function that reports an *exact* position of
214 the token that caused the error.
216 If you define preprocessor variable YYERROR_DETAILED in
217 your grammar then you need define the following
218 error processing function:
220 void yyerror_detailed(char *text,
225 It receives the following arguments:
227 errt Code of the token that caused the error.
228 errt_value Value of the token that caused the error.
229 errt_posn Text position of token that caused error.
232 ** Dropped compatibility with C.
234 Compatibility with C became increasingly difficult
235 to maintain as new features were added to btyaccpa.ske.
236 So we dropped it. If anybody wants to make the new version
237 compatible with C, we would gladly accept the changes.
239 Meanwhile we expect that you use C++ to write grammar
240 actions and everything else in grammar files.
241 Since C is (in a sense) subset of C++, your C-based
242 grammar may work if you use C++ compiler to compile it.
244 Version 3.0 bugs fixed
245 ----------------------
247 Matthias Meixner <meixner@mes.th-darmstadt.de> fixed a bug:
248 BtYacc does not correctly handle typenames, if one typename
249 is a prefix of another one and if this type is used after
250 the longer one. In this case BTYacc produces invalid code.
257 ** Added preprocessor statements to BtYacc that are similar
258 in function and behavior to C/C++ preprocessor statements.
260 These statements are used to:
262 - Introduce modularity into a grammar by breaking it
263 into several *.y files and assembling different
264 grammars from the *.y modules using %include and %ifdef.
266 - Have several versions of the same grammar
267 by using %ifdef and $endif.
269 - To include automatically generated grammar fragment.
270 For instance, we use %include to include
271 automatically generated list of tokens.
273 Preprocessor statements are:
276 Define preprocessor variable named <var-name>.
279 If preprocessor variable named <var-name>
280 is defined by %define, then process the text from
281 this %ifdef to the closing %endif.
284 Closing bracket for %ifdef preprocessor statement.
285 Only one nesting level of %ifdef-%endif is allowed.
288 Process contents of the file named <file-name>.
289 If <file-name> is a relative name, it is looked up
290 in a directory in which btyacc was started.
291 Only one nesting level of %include is allowed.
299 ** Changed 16-bit short numbers to 32-bit int numbers in
300 grammar tables, so that huge grammar tables (tables that
301 are larger than 32768 elements) resulting from huge
302 grammars (Cobol grammar, for instance) can work correctly.
303 You need to have 32-bit integer to index table bigger than
304 32768 elements, 16-bit integer is not enough.
306 The original BtYacc just generated non-working tables
307 larger than 32768 elements without even notifying about
311 ** Make error recovery work correctly when error happens
312 while processing nested conflicts. Original BtYacc could
313 infinitely cycle in certain situations that involved error
314 recovery while in nested conflict.
316 More detailed explanation: when we have nested conflicts
317 (conflict that happens while trial-processing another
318 conflict), it leads btyacc into NP-complete searching of
319 conflict tree. The ultimate goal is YYVALID operator that
320 selects a particular branch of that tree as a valid one.
322 If no YYVALID is found on the tree, then error recovery
323 takes over. The problem with this is that error recovery
324 is started in the same state context that exists on the
325 last surveyed branch of the conflict tree. Sometimes this
326 last branch may be of zero length and it results in
327 recovering to exactly the same state as existed before
328 entering the conflict. BtYacc cycles then.
330 We solved this problem by memorizing the longest path in
331 the conflict tree while browsing it. If we ever get into
332 error recovery, we restore state that existed on the
333 longest path. Effectively we say: if we have an error,
334 let us move forward as far as we possibly could while we
335 were browsing the conflict tree.
338 ** Introduce YYVALID_NESTED operation in addition to
339 simply YYVALID. When we have a nested conflict (conflict
340 while processing in trial mode for another conflict), we
341 want to relate YYVALID to a particular level of conflict
344 Since we mostly anticipate only 2-level nested conflicts
345 YYVALID_NESTED tells the parser to satisfy only the
346 internal conflict. Therefore, in 1-level conflict
347 situation YYVALID_NESTED acts like a regular YYVALID, but
348 in 2-level conflict it is a no-op and the other YYVALID
349 for outer conflict will be searched for.
352 ** Improved handling of situation where /tmp directory is
353 missing. Original btyacc just died quietly when /tmp
354 directory was missing. We added code that states the
355 problem explicitly. While on UNIX /tmp directory is always
356 present, it may be missing on WIN32 systems, therefore
357 diagnosing this situation is important.
360 Version 1.0 changes: BackTracking
361 =================================
364 BTYACC is a modified version of yacc that supports
365 automatic backtracking and semantic disambiguation to
366 parse ambiguous grammars, as well as syntactic sugar for
367 inherited attributes (which tend to introduce conflicts).
368 Whenever a btyacc generated parser runs into a
369 shift-reduce or reduce-reduce error in the parse table, it
370 remembers the current parse point (yacc stack and input
371 stream state), and goes into trial parse mode. It then
372 continues parsing, ignoring most rule actions. If it runs
373 into an error (either through the parse table or through
374 an action calling YYERROR), it backtracks to the most
375 recent conflict point and tries a different alternative.
376 If it finds a successful parse (reaches the end of the
377 input or an action calls YYVALID), it backtracks to the
378 point where it first entered trial parse mode, and
379 continues with a full parse (executing all actions),
380 following the path of the successful trial.
382 Actions in btyacc come in two flavors -- {}-actions, which
383 are only executed when not in trial mode, and []-actions
384 which are executed regardless of mode. There are also
385 inherited attributes, which look like arguments (they are
386 enclosed in "()") and act like []-actions.
390 * No more lexer feedback hack. In yacc grammars for C, a
391 standard hack, know as the "lexer feedback hack" is used
392 to find typedef names. The lexer uses semantic
393 information to decide if any given identifier is a
394 typedef-name or not and returns a special token. With
395 btyacc, you no longer need to do this; the lexer should
396 just always return an identifier. The btyacc grammar then
397 needs a rule of the form:
399 typename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ]
401 While the hack works adequately well for parsing C, it
402 becomes a nightmare when you try to parse something like
403 C++, where treating an ID as a typedef becomes heavily
404 dependent on context.
406 * Easy disambiguation via simple ordering. Btyacc runs
407 its trials via the rule "try shifting first, then try
408 reducing by the order that the conflicting rules appear in
409 the input file". This means you can deal with semantic a
410 disambiguation rule like:
411 [1] If it looks like a declaration it is, otherwise
412 [2] If it looks like an expression it is, otherwise
413 [3] it is a syntax error
414 [Ellis&Stroustrup, Annotated C++ Reference Manual, p93]
416 To deal with this, you need only put all the rules for
417 declarations before the rules for expressions in the
420 * No extra cost if you do not use it. Backtracking is
421 only triggered when the parse hits a shift/reduce or
422 reduce/reduce conflict in the table. If you have no
423 conflicts in your grammar, there is no extra cost, other
424 than some extra code which will never be invoked.
426 * C++ and ANSI C compatible parsers. The parsers produced
427 by btyacc can be compiled with C++ correctly. If you
428 "#define" YYSTYPE to be some C++ type with constructor and
429 destructor, everything will work fine. My favorite is
430 "#define YYSTYPE SmartPointer", where SmartPointer is a
431 smart pointer type that does garbage collection on the
434 BTYACC was originally written to make it easy to write a
435 C++ parser (my goal was to be able to use the grammar out
436 of the back of the ARM with as few modifications as
437 possible). Anyone who has ever looked at Jim Roskind
438 public domain C++ yacc grammar, or the yacc-based grammar
439 used in g++ knows how difficult this is. BTYACC is very
440 useful for parsing any ambiguous grammar, particularly
441 ones that come from trying to merge two (or more) complete
444 Limitations of the backtracking: Currently, the generated
445 parser does NO pruning of alternate parsing paths. To
446 avoid an exponential explosion of possible paths (and
447 parsing time), you need to manually tell the parser when
448 it can throw away saved paths using YYVALID. In practice,
449 this turns out to be fairly easy to do. A C++ parser (for
450 example) can just put a [YYVALID;] after every complete
451 declaration and statement rule, corresponding to pruning
452 the backtracking state after seeing a ';' or '}' -- there
453 will never be a situation in which it is useful to
454 backtrack past either of these.
456 Inherited attributes in btyacc:
458 Inherited attributes look a lot like function arguments to
459 non-terminals, which is what they end up being in a
460 recursive descent parser, but NOT how they are implemented
461 in btyacc. Basically they are just syntactic sugar for
462 embedded semantic actions and $0, $-1, ... in normal yacc.
463 btyacc gives you two big advantages besides just the
465 1. it does type checking on the inherited attributes,
466 so you do not have to specify $<type>0 and makes sure
467 you give the correct number of arguments (inherited
468 attributes) to every use of a non-terminal.
469 2. It "collapses" identical actions from that are produced
470 from inherited attributes. This eliminates many
471 potential reduce-reduce conflicts arising from
472 the inherited attributes.
474 You use inherited attributes by declaring the types of the
475 attributes in the preamble with a type declaration and
476 declaring names of the attributes on the lhs of the yacc
477 rule. You can of course have more than one rule with the
478 same lhs, and you can even give them different names in
479 each, but the type and number must be the same.
481 Here is a small example:
482 /* lhs takes 2 inherited attributes */
483 %type <t1> lhs(<t1>, <t2>)
486 lhs($i1, $i2) : { $$ = $i1 }
487 | lhs($i1, $i2) stuff($1,$i2) { $$ = $2; }
489 This is roughly equivalent to the following yacc code:
492 | lhs [ $<t1>$ = $-1; ] [ $<t2>$ = $<t2>0; ] stuff
496 See the file "test/t2.y" for a longer and more complete
497 example. At the current time, the start symbol cannot
502 Btyacc supports the -S flag to use a different parser
503 skeleton, changing the way that the parser is called and
504 used. The skeleton "push.skel" is included to produce a
505 "passive" parser that you feed tokens to (rather than
506 having the parser call a separate yylex routine). With
507 push.skel, yyparse is defined as follows:
509 int yyparse(int token, YYSTYPE yylval)
511 You should call yyparse repeatedly with successive tokens
512 of input. It returns 0 if more input is needed, 1 for a
513 successful parse, and -1 for an unrecoverable parse error.
516 Miscellaneous Features in ver. 1.0
517 ----------------------------------
520 The -r option has been implemented. The -r option tells
521 Yacc to put the read-only tables in y.tab.c and the code and
522 variables in y.code.c. Keith Bostic asked for this option so
523 that :yyfix could be eliminated.
525 The -l and -t options have been implemented. The -l
526 option tells Yacc not to include #line directives in the code
527 it produces. The -t option causes debugging code to be
528 included in the compiled parser.
530 The code for error recovery has been changed to
531 implement the same algorithm as AT&T Yacc. There will still
532 be differences in the way error recovery works because AT&T
533 Yacc uses more default reductions than Berkeley Yacc.
535 The environment variable TMPDIR determines the directory
536 where temporary files will be created. If TMPDIR is defined,
537 temporary files will be created in the directory whose
538 pathname is the value of TMPDIR. By default, temporary files
541 The keywords are now case-insensitive. For example,
542 %nonassoc, %NONASSOC, %NonAssoc, and %nOnAsSoC are
545 Commas and semicolons that are not part of C code are
546 treated as commentary.
548 Line-end comments, as in BCPL, are permitted. Line-end
549 comments begin with // and end at the next end-of-line.
550 Line-end comments are permitted in C code; they are converted
551 to C comments on output.
553 The form of y.output files has been changed to look more
554 like those produced by AT&T Yacc.
556 A new kind of declaration has been added.
557 The form of the declaration is
561 where string is a sequence of characters beginning with a
562 double quote and ending with either a double quote or the
563 next end-of-line, whichever comes first. The declaration
564 will cause a #ident directive to be written near the start
567 If a parser has been compiled with debugging code, that
568 code can be enabled by setting an environment variable.
569 If the environment variable YYDEBUG is set to 0, debugging
570 output is suppressed. If it is set to 1, debugging output
571 is written to standard output.
576 by Chris Dodd and Vadim Maslov
578 We used GCC and GNU make to compile BtYacc both on UNIX and
579 WIN32 paltforms. You are welcome to try different
580 combinations of makes and compilers. Most likely it will
581 work, but it may require Makefile changes.
583 There is no config script.
584 Just type "make" and it should compile.
586 AWK. If you want to change file btyaccpa.ske (backtracking
587 parser skeleton), you will need awk to compile it into
588 skeleton.c file. We used GNU AWK (gawk) version 3.0.
590 It is known that using older versions of gawk
591 may create problems in compilation, because older awks
592 have problems with backslashes at the end of a line.
594 For MSDOS, there a "makefile.dos" that should do the trick.
595 Note: makefile.dos was not tested for a long time.
597 The result of compilation should be a single executable called
598 "btyacc" which you can install anywhere you like;
599 it does not require any other files in the distribution to run.
604 by Chris Dodd and Vadim Maslov
606 In English: BtYacc is freeware. BtYacc is distributed with
607 no warranty whatsoever. The author and any other contributors
608 take no responsibility for any and all consequences of its use.
610 In Legalese: LIMITATION OF LIABILITY. NEITHER SIBER SYSTEMS
611 NOR ANY OF ITS LICENSORS NOR ANY BTYACC CONTRIBUTOR SHALL BE
612 LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL OR CONSEQUENTIAL
613 DAMAGES, OR DAMAGES FOR LOSS OF PROFITS, REVENUE, DATA OR
614 DATA USE, CAUSED BY BTYACC AND INCURRED BY CUSTOMER OR ANY
615 THIRD PARTY, WHETHER IN AN ACTION IN CONTRACT OR TORT, EVEN
616 IF SIBER SYSTEMS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH