Fix file mode.
[llvm-testsuite.git] / MultiSource / Applications / d / manual.html
blob1cf61c09f6402a09c36325346a352e265990ceca
1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2 <html>
3 <head>
4 <meta http-equiv="content-type"
5 content="text/html; charset=ISO-8859-1">
6 <title>manual.html</title>
7 </head>
8 <body>
9 <div style="text-align: center;">&nbsp; <big><big><span
10 style="font-weight: bold;">DParser Manual<br>
11 </span></big></big>
12 <div style="text-align: left;"><big><br>
13 <br>
14 <span style="font-weight: bold;">Contents</span><br>
15 </big>
16 <ol>
17 <li>Installation</li>
18 <li>Getting Started</li>
19 <li>Comments<br>
20 </li>
21 <li>Productions</li>
22 <li>Global Code<br>
23 </li>
24 <li>Terminals</li>
25 <ol>
26 <li>Strings</li>
27 <li>Regular Expressions</li>
28 <li>External (C) scanners</li>
29 <li>Tokenizers</li>
30 <li>Longest Match<br>
31 </li>
32 </ol>
33 <li>Priorities and Associativity</li>
34 <ol>
35 <li>Token Priorities</li>
36 <li>Operator Priorities</li>
37 <li>Rule Priorities</li>
38 </ol>
39 <li>Actions</li>
40 <ol>
41 <li>Speculative Actions</li>
42 <li>Final Actions</li>
43 <li>Embedded<br>
44 </li>
45 <li>Pass Actions<br>
46 </li>
47 <li>Default Actions<br>
48 </li>
49 </ol>
50 <li>Attributes and Action Specifiers</li>
51 <ol>
52 <li>Global State</li>
53 <li>Parse Nodes</li>
54 <li>Misc</li>
55 </ol>
56 <li>Symbol Table</li>
57 <li>Whitespace</li>
58 <li>Ambiguities</li>
59 <li>Error Recovery</li>
60 <li>Parsing Options<br>
61 </li>
62 <li>Grammar Grammar<br>
63 </li>
64 </ol>
65 <span style="font-weight: bold;">1. Installation</span><br>
66 &nbsp;&nbsp;&nbsp; <br>
67 To build:
68 'gmake'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
69 (only available with source code package)<br>
70 <br>
71 To test: 'gmake
72 test'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (only
73 available with source code package)<br>
74 <br>
75 To install, 'gmake install'&nbsp;&nbsp;&nbsp; (binary or source code
76 packages)<br>
77 <br>
78 <span style="font-weight: bold;">2. Getting Started<span
79 style="font-weight: bold;"></span><span style="font-weight: bold;"></span></span><span
80 style="font-weight: bold;"></span><br>
81 <br>
82 2.1. Create your grammar, for example, in the file "my.g":<br>
83 &nbsp;&nbsp; <br>
84 &nbsp;&nbsp; E: E '+' E | "[abc]";<br>
85 &nbsp;&nbsp; <br>
86 2.2. Convert grammar into parsing tables:<br>
87 <br>
88 &nbsp; % make_dparser -g my.g<br>
89 <br>
90 2.3. Create a driver program, for example, in the file
91 "my.c":&nbsp;&nbsp; <br>
92 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>
93 #include &lt;stdio.h&gt;<br>
94 #include &lt;dparse.h&gt;<br>
95 extern D_ParserTables parser_tables_gram;<br>
96 int<br>
97 main(int argc, char *argv[]) {<br>
98 &nbsp; char s[256], *ss;<br>
99 &nbsp; D_Parser *p = new_D_Parser(&amp;parser_tables_gram, 0);<br>
100 &nbsp; if (fgets(s,255,stdin) &amp;&amp; dparse(p, s, strlen(s))
101 &amp;&amp; !p-&gt;syntax_errors)<br>
102 &nbsp;&nbsp;&nbsp; printf("success\n");<br>
103 &nbsp; else<br>
104 &nbsp;&nbsp;&nbsp; printf("failure\n");<br>
105 }<br>
106 <br>
107 2.4. Compile:<br>
108 <br>
109 &nbsp; % cc -I/usr/local/include my.c my.g.d_parser.c -L/usr/local/lib
110 -ldparse<br>
111 &nbsp; <br>
112 2.5. Run:<br>
113 &nbsp;&nbsp; <br>
114 &nbsp; % a.out<br>
115 &nbsp; a=<br>
116 &nbsp; syntax error, '' line 1<br>
117 &nbsp; failure<br>
118 &nbsp; %<br>
119 &nbsp;&nbsp; <br>
120 &nbsp; % a.out<br>
121 &nbsp; a+b<br>
122 &nbsp; success<br>
123 &nbsp; %<br>
124 <br>
125 &nbsp;We'll come back to this example later.<br>
126 <br>
127 <span style="font-weight: bold;">3. Comments</span><br>
128 <br>
129 &nbsp; Grammars can include C/C++ style comments.<br>
130 <br>
131 EXAMPLE<br>
132 <br>
133 // My first grammar<br>
134 &nbsp;&nbsp; E: E '+' E | "[abc]";<br>
135 /* is this right? */<br>
136 <br>
137 <span style="font-weight: bold;">4. Productions</span><br>
138 <br>
139 &nbsp; 4.1. The first production is the root of your grammar (what you
140 will be trying to parse).<br>
141 &nbsp; 4.2. Productions start with the non-terminal being defined
142 followed by a colon ':', a set of right hand sides seperated by '|'
143 (or)
144 consisting of elements (non-terminals or terminals).<br>
145 &nbsp; 4.3. Elements can be grouped with parens '(', and the normal
146 regular expression symbols can be used ('+' '*' '?' '|').<br>
147 <br>
148 EXAMPLE<br>
149 <br>
150 program: statements+ | &nbsp;comment* (function | &nbsp;procedure)?;<br>
151 <br>
152 &nbsp; 4.4. <span style="font-weight: bold;">NOTE:</span> Instead of
153 using '[' ']' for optional elements we use the more familar and
154 consistent '?' operator. &nbsp;The square brackets are reserved for
155 speculative actions (below).<br>
156 <br>
157 <span style="font-weight: bold;">5. Global Code</span><br>
158 <br>
159 Global (or static) C code can be intermixed with productions by
160 surrounding the code with brackets '{'.<br>
161 <br>
162 EXAMPLE<br>
163 <br>
164 { void dr_s() { printf("Dr. S\n"); }<br>
165 S: 'the' 'cat' 'and' 'the' 'hat' { dr_s(); } | T;<br>
166 { void twain() { printf("Mark Twain\n"); }<br>
167 T: 'Huck' 'Finn' { twain(); };<br>
168 <br>
169 <span style="font-weight: bold;">6. Terminals</span><br>
170 <br>
171 &nbsp; 6.1. Strings terminals are surrounded with single quotes.
172 &nbsp;For example:<br>
173 <br>
174 <span style="font-weight: bold;"></span>block: '{' statements* '}';<br>
175 whileblock: 'while' '(' expression ')' block;<br>
176 <br>
177 &nbsp; 6.2. Regular expressions are surrounded with double quotes.
178 &nbsp;For example:<br>
179 <br>
180 hexint: "(0x|0X)[0-9a-fA-F]+[uUlL]?";<br>
181 <br>
182 &nbsp; <span style="font-weight: bold;">NOTE: </span>only the simple
183 regular expression operators are currently supported (v1.3). &nbsp;This
184 include parens, square parens, ranges, and '*', '+', '?'. &nbsp; If you
185 need something more, request a feature or implement it yourself; the
186 code is in scan.c.<br>
187 <br>
188 &nbsp; 6.3. External (C) Scanners<br>
189 <br>
190 &nbsp; There are two types of external scanners, those which read a
191 single terminal, and those which are global (called for every
192 terminal).
193 &nbsp;Here is an example of a scanner for a single terminal.
194 &nbsp;Notice how it can be mixed with regular string terminals.<br>
195 <br>
196 {<br>
197 extern char *ops;<br>
198 extern void *ops_cache;<br>
199 int ops_scan(char *ops, void *ops_cache, char **as,<br>
200 &nbsp;&nbsp;&nbsp; int *col, int *line, unsigned short *op_assoc, int
201 *op_priority);<br>
202 }<br>
203 <br>
204 X: '1' (${scan ops_scan(ops, ops_cache)} '2')*;<br>
205 <br>
206 &nbsp; The user provides the 'ops_scan' function. &nbsp;This example is
207 from tests/g4.test.g in the source distribution.<br>
208 <br>
209 &nbsp; The second type of scanner is a global scanner:<br>
210 <br>
211 {<br>
212 #include "g7.test.g.d_parser.h"<br>
213 int myscanner(char **s, int *col, int *line, unsigned short *symbol,<br>
214 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int *term_priority, unsigned short
215 *op_assoc, int *op_priority)<br>
216 {<br>
217 &nbsp; if (**s == 'a') {<br>
218 &nbsp;&nbsp;&nbsp; (*s)++;<br>
219 &nbsp;&nbsp;&nbsp; *symbol = A;<br>
220 &nbsp;&nbsp;&nbsp; return 1;<br>
221 &nbsp; } else if (**s == 'b') {<br>
222 &nbsp;&nbsp;&nbsp; (*s)++;<br>
223 &nbsp;&nbsp;&nbsp; *symbol = BB;<br>
224 &nbsp;&nbsp;&nbsp; return 1;<br>
225 &nbsp; } else if (**s == 'c') {<br>
226 &nbsp;&nbsp;&nbsp; (*s)++;<br>
227 &nbsp;&nbsp;&nbsp; *symbol = CCC;<br>
228 &nbsp;&nbsp;&nbsp; return 1;<br>
229 &nbsp; } else if (**s == 'd') {<br>
230 &nbsp;&nbsp;&nbsp; (*s)++;<br>
231 &nbsp;&nbsp;&nbsp; *symbol = DDDD;<br>
232 &nbsp;&nbsp;&nbsp; return 1;<br>
233 &nbsp; } else<br>
234 &nbsp;&nbsp;&nbsp; return 0;<br>
235 }<br>
236 ${scanner myscanner}<br>
237 ${token A BB CCC DDDD}<br>
238 <br>
239 S: A (BB CCC)+ SS;<br>
240 SS: DDDD;<br>
241 <br>
242 &nbsp; Notice how the you need to include the header file generated by <span
243 style="font-weight: bold;">make_dparser</span> which contains the
244 token
245 definitions.<br>
246 <br>
247 6.4. Tokenizers<br>
248 <br>
249 &nbsp; Tokenizers are non-context sensitive global scanners which
250 produce only one token for any given input string. &nbsp;Some
251 programming languages (for example <span style="font-weight: bold;">C</span>)
252 are easier to specify using a tokenizer because (for example) reserved
253 words can be handled simply by lowering the terminal priority for
254 identifiers.<br>
255 <br>
256 EXAMPLE:<br>
257 <br>
258 S : 'if' '(' S ')' S ';' | 'do' S 'while' '(' S ')' ';' | ident;<br>
259 ident: "[a-z]+" $term -1;<br>
260 <br>
261 &nbsp; The sentence: <span style="font-weight: bold;">if ( while ) a;</span>
262 is legal because <span style="font-weight: bold;">while</span> cannot
263 appear at the start of <span style="font-weight: bold;">S</span> and
265 it doesn't conflict with the parsing of <span
266 style="font-weight: bold;">while</span>
267 as an <span style="font-weight: bold;">ident</span> in that position.
268 &nbsp;However, if a tokenizer is specified, all tokens will be possible
269 at each position and the sentense will produce a syntax error.<br>
270 <br>
271 &nbsp; <span style="font-weight: bold;">DParser</span> provides two
272 ways to specify tokenizers: globally as an option (-T) to <span
273 style="font-weight: bold;">make_dparser</span> and locally with a
274 ${declare tokenize ...} specifier (see the ANSI C grammar for an
275 example). &nbsp;The ${declare tokenize ...} declartion allows a
276 tokenizer to be specified over a subset of the parsing states so that
277 (for example) ANSI C could be a subgrammar of another larger grammar.
278 &nbsp;Currently the parse states are not split so that the productions
279 for the substates must be disjoint.<br>
280 <br>
281 6.5 Longest Match<br>
282 <br>
283 &nbsp; Longest match lexical ambiguity resolution is a technique used
284 by seperate phase lexers to help decide (along<br>
285 with lexical priorities) which single token to select for a given input
286 string.&nbsp; It is used in the definition of ANSI-C, but not in C++
287 because of a snafu in the definition of templates whereby templates of
288 templates (List&lt;List &lt;Int&gt;&gt;) can end with the right shift
289 token ('&gt;&gt;").&nbsp; Since <span style="font-weight: bold;">DParser</span>
290 does not have a seperate lexical phase, it does not require longest
291 match disambiguation, but provides it as an option.<br>
292 <br>
293 &nbsp; There are two ways to specify longest match disabiguation:
294 globally as an option (-l) to <span style="font-weight: bold;">make_dparser</span>
295 or locally with with a ${declare ... longest_match}.&nbsp; If global
296 longest match disambiguation is <span style="font-weight: bold;">ON</span>,
297 it can be locally disabled with {$declare ... all_matches} .&nbsp; As
298 with Tokenizers above, local declarations operate on disjoint subsets
300 parsing states.<br>
301 <br>
302 <span style="font-weight: bold;">7. Priorities and Associativity<br>
303 <br>
304 </span>&nbsp; Priorities can very from MININT to MAXINT and are
305 specified as integers. &nbsp;Associativity can take the values:<br>
306 <br>
307 assoc : '$unary_op_right' | '$unary_op_left' | '$binary_op_right'<br>
308 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
309 '$binary_op_left' | '$unary_right' | '$unary_left'<br>
310 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
311 '$binary_right' | '$binary_left' | '$right' | '$left' ;<br>
312 <br>
313 7.1. Token Prioritites<br>
314 <br>
315 &nbsp; Currently (v1.0) the automatically generated scanners use the <span
316 style="font-style: italic;">longest match</span> rule so that, for
317 example:<br>
318 <br>
319 OP: '&gt;' | '&gt;&gt;';<br>
320 <br>
321 will match the string '&gt;&gt;' only as '&gt;&gt;' instead of
322 ambiguously as either '&gt;' or '&gt;&gt;'. &nbsp; A planned feature is
323 to make this optional.<br>
324 <br>
325 &nbsp; Termininal priorities apply after the longest match string has
326 been found and the terminal with the highest priority is selected.<br>
327 They are introduced after a terminal by the specifier <span
328 style="font-weight: bold;">$term</span>. &nbsp;We saw an example of
329 token priorities with the definition of <span
330 style="font-weight: bold;">ident</span>.<br>
331 <br>
332 EXAMPLE:<br>
333 <br>
334 S : 'if' '(' S ')' S ';' | 'do' S 'while' '(' S ')' ';' | ident;<br>
335 ident: "[a-z]+" $term -1;<br>
336 <br>
337 7.2. Operator Priorities<br>
338 <br>
339 &nbsp; Operator priorities specify the priority of a operator symbol
340 (either a terminal or a non-terminal). &nbsp;This corresponds to the <span
341 style="font-style: italic;">yacc</span> or <span
342 style="font-style: italic;">bison</span> <span
343 style="font-weight: bold;">%left</span> etc. declaration.
344 &nbsp;However, since <span style="font-weight: bold;">DParser</span>
346 doesn't require a global tokenizer, operator priorities and
347 associativities are specified on the reduction which creates the token.
348 &nbsp;Moreover, the associativity includes the operator usage as well
349 since it cannot be infered from rule context. &nbsp;Possible operator
350 associativies are:<br>
351 <br>
352 operator_assoc : '$unary_op_right' | '$unary_op_left' |
353 '$binary_op_right'<br>
354 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
355 '$binary_op_left' | '$unary_right' | '$unary_left'<br>
356 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
357 '$binary_right' | '$binary_left';<br>
358 <br>
359 EXAMPLE:<br>
360 <br>
361 E: ident op ident;<br>
362 ident: '[a-z]+';<br>
363 <span style="font-weight: bold;"></span>op: '*' $binary_op_left 2 |<br>
364 &nbsp; &nbsp; &nbsp; '+' $binary_op_left 1;<br>
365 <br>
366 7.3. Rule Priorities<br>
367 <br>
368 &nbsp; Rule priorities specify the priority of the reduction itself and
369 have the possible associativies:<br>
370 <br>
371 rule_assoc: '$right' | '$left';<br>
372 <br>
373 &nbsp; Rule and operator priorities can be intermixed and are
374 interpreted at run time (<span style="font-weight: bold;">not</span>
375 when
376 the tables are built). &nbsp;This make it possible for user-defined
377 scanners to return the associativities and priorities of tokens.<br>
378 <br>
379 <span style="font-weight: bold;">8. Actions<br>
380 <br>
381 </span>&nbsp; Actions are the bits of code which run when a reduction
382 occurs.<br>
383 <br>
384 EXAMPLE<br>
385 <br>
386 S: this | that;<br>
387 this: 'this' { printf("got this\n"); };<br>
388 that: 'that' { printf("got that\n"); };<br>
389 <br>
390 8.1 Speculative Action<br>
391 <br>
392 &nbsp; Speculative actions occur when the reduction takes place during
393 the speculative parsing process. &nbsp;It is possible<br>
394 that the reduction will not be part of the final parse or that it will
395 occur a different number of times. &nbsp;For example:<br>
396 <br>
397 S: this | that;<br>
398 this: hi 'mom';<br>
399 that: ho 'dad';<br>
400 ho: 'hello' [ printf("ho\n"); ];<br>
401 hi: 'hello' [ printf("hi\n"); ];<br>
402 <br>
403 Will print both 'hi' and 'ho' when given the input 'hello dad' because
404 at the time hello is reduced, the following token is not known.<br>
405 <br>
406 8.2 Final Actions<br>
407 <br>
408 &nbsp; Final actions occur only when the reduction must be part of any
409 legal final parse (committed). &nbsp;It is possible to do final actions
410 during parsing or delay them till the entire parse tree is constructed
411 (see Options). &nbsp;Final actions are executed in order and in number
412 according the the single final unambiguous parse.<br>
413 <br>
414 S: A S 'b' | 'x';<br>
415 A: [ printf("speculative e-reduce A\n"); ] <br>
416 &nbsp;&nbsp; { printf("final e-reduce A\n"); };<br>
417 <br>
418 &nbsp; On input:<br>
419 <br>
420 xbbb<br>
421 <br>
422 &nbsp; Will produce:<br>
423 <br>
424 speculative e-reduce A<br>
425 final e-reduce A<br>
426 final e-reduce A<br>
427 final e-reduce A<br>
428 <br>
429 8.3 Embedded Actions<br>
430 <br>
431 &nbsp; Actions can be embedded into rule. These actions are executed
432 as if they were replaced with a synthetic production with a single null
433 rule containing the actions.&nbsp; For example:<br>
434 <br>
435 S: A { printf("X"); } B;<br>
436 A: 'a' { printf("a"); };<br>
437 B: 'b' { printf("b"); };<br>
438 <br>
439 &nbsp; On input:<br>
440 <br>
441 ab<br>
442 <br>
443 &nbsp; Will produce:<br>
444 <br>
445 aXb<br>
446 <br>
447 8.4 Pass Actions<br>
448 <br>
449 &nbsp; <span style="font-weight: bold;">DParser</span> supports
450 multiple pass compilation.&nbsp; The passes are declared at the top of
451 the grammar, and the actions are associated with individual rules.<br>
452 <br>
453 EXAMPLE<br>
454 <br>
455 ${pass sym for_all postorder}<br>
456 ${pass gen for_all postorder}<br>
457 <br>
458 translation_unit: statement*;<br>
459 <br>
460 statement<br>
461 &nbsp; : expression ';' {<br>
462 &nbsp;&nbsp;&nbsp; d_pass(${parser}, &amp;$n, ${pass sym});<br>
463 &nbsp;&nbsp;&nbsp; d_pass(${parser}, &amp;$n, ${pass gen});<br>
464 &nbsp; }<br>
465 &nbsp; ;<br>
466 <br>
467 expression :&nbsp; integer<br>
468 &nbsp; gen: { printf("gen integer\n"); }<br>
469 &nbsp; sym: { printf("sym integer\n"); }<br>
470 &nbsp; | expression '+' expression $right 2<br>
471 &nbsp; sym: { printf("sym +\n"); }<br>
472 &nbsp; ;<br>
473 <br>
474 &nbsp; A pass name then a colon indicate that the following action is
475 associated with a particular pass. Passes can be either <span
476 style="font-weight: bold;">for_all</span> or <span
477 style="font-weight: bold;">for_undefined</span> (which means that the
478 automatic traversal only applies to rules without actions defined for
479 this pass).&nbsp;&nbsp; Furthermore, passes can be <span
480 style="font-weight: bold;">postorder</span>, <span
481 style="font-weight: bold;">preorder</span>, and <span
482 style="font-weight: bold;">manual</span> (you have to call <span
483 style="font-weight: bold;">d_pass</span> yourself).&nbsp; Passes can
484 be initiated in the final action of any rule.<br>
485 <br>
486 8.5 Default Actions<br>
487 <br>
488 &nbsp; The special production "<span style="font-weight: bold;">_</span>"
489 can be defined with a single rule whose actions become the default when
490 no other action is specified.&nbsp; Default actions can be specified
491 for speculative, final and pass actions and apply to each seperately.<br>
492 <br>
493 EXAMPLE<br>
494 <br>
495 _: { printf("final action"); }<br>
496 &nbsp;&nbsp;&nbsp; gen: { printf("default gen action"); }<br>
497 &nbsp;&nbsp;&nbsp; sym: { printf("default sym action"); }<br>
498 &nbsp;&nbsp;&nbsp; ;<br>
499 <span style="font-weight: bold;"></span><br>
500 <span style="font-weight: bold;">9. Attributes and Action Specifiers</span><br>
501 <br>
502 9.1. &nbsp;Global State (<span style="font-weight: bold;">$g</span>)<br>
503 <br>
504 &nbsp; Global state is declared by <span style="font-weight: bold;">define</span>'ing<span
505 style="font-weight: bold;"> D_ParseNodeGlobals</span> (see the ANSI C
506 grammar for a similar declaration for symbols). Global state can be
507 accessed in any action with <span style="font-weight: bold;">$g</span>.
508 &nbsp;Because <span style="font-weight: bold;">DParser</span> handles
509 ambiguous parsing global state can be accessed on different speculative
510 parses. &nbsp;In the future automatic splitting of global state may be
511 implemented (if there is demand). Currently, the global state can be
512 copied and assigned to <span style="font-weight: bold;">$g</span> to
513 ensure that the changes made only effect subsequent speculative parses
514 derived from the particular parse.<br>
515 <br>
516 EXAMPLE<br>
517 <br>
518 &nbsp; [ $g = copy_globals($g);<br>
519 &nbsp;&nbsp;&nbsp; $g-&gt;my_variable = 1;<br>
520 &nbsp; ]<br>
521 <br>
522 &nbsp; The symbol table (Section 10) can be used to manage state
523 information safely for different speculative parses.<br>
524 <br>
525 9.2. Parse Node State<br>
526 <br>
527 &nbsp; Each parse node includes a set of system state variables and can
528 have a set of user-defined state variables. &nbsp;User defined parse
529 node state is declared by <span style="font-weight: bold;">define</span>'ing<span
530 style="font-weight: bold;"> D_ParseNodeUser</span>. &nbsp; Parse node
531 state is accessed with:<br>
532 <br>
533 &nbsp;<span style="font-weight: bold;">$#</span> - number of child nodes<br>
534 &nbsp;<span style="font-weight: bold;">$$</span> - user parse node
535 state
536 for parent node (non-terminal defined by the production)<br>
537 &nbsp;<span style="font-weight: bold;">$X</span> (where X is a number)
539 the user parse node state of element X of the production<br>
540 &nbsp;<span style="font-weight: bold;">$nX</span> - the system parse
541 node state of element X of the production<br>
542 <br>
543 &nbsp; The system parse node state is defined in <span
544 style="font-weight: bold;">dparse.h</span> which is installed with <span
545 style="font-weight: bold;">DParser</span>. &nbsp;It contains such
546 information as the symbol, the location of the parsed string, and
547 pointers to the start and end of the parsed string.<br>
548 <br>
549 9.3. Misc<br>
550 <br>
551 &nbsp; <span style="font-weight: bold;">${scope}</span> - the current
552 symbol table scope<br>
553 &nbsp; <span style="font-weight: bold;">${reject}</span> - in
554 speculative actions permits the current parse to be rejected<br>
555 <br>
556 <span style="font-weight: bold;">10. Symbol Table</span><br>
557 <br>
558 <span style="font-weight: bold;"></span>&nbsp; The symbol table can be
559 updated down different speculative paths while sharing the bulk of the
560 data. &nbsp;It defines the following functions in the file (dsymtab.h):<br>
561 &nbsp; <br>
562 struct D_Scope *new_D_Scope(struct D_Scope *st);<br>
563 struct D_Scope *enter_D_Scope(struct D_Scope *current, struct D_Scope
564 *scope);<br>
565 D_Sym *NEW_D_SYM(struct D_Scope *st, char *name, char *end);<br>
566 D_Sym *find_D_Sym(struct D_Scope *st, char *name, char *end);<br>
567 D_Sym *UPDATE_D_SYM(struct D_Scope *st, D_Sym *sym);<br>
568 D_Sym *current_D_Sym(struct D_Scope *st, D_Sym *sym);<br>
569 D_Sym *find_D_Sym_in_Scope(struct D_Scope *st, char *name, char *end);<br>
570 <br>
571 'new_D_Scope' creates a new scope below 'st' or NULL for a 'top level'
572 scope.&nbsp; 'enter_D_Scope' returns to a previous scoping level.&nbsp;
573 NOTE: do not simply assign ${scope} to a previous scope as any updated
574 symbol information will be lost.&nbsp; 'commit_D_Scope' can be used in
575 final actions to compress the update list for the top level scope and
576 improve efficiency.<br>
577 <br>
578 'find_D_Sym' finds the most current version of a symbol in a given
579 scope.&nbsp; 'UPDATE_D_SYM' updates the value of symbol (creates a
580 difference record on the current speculative parse path).&nbsp;
581 'current_D_Sym' is used to retrive the current version of a symbol, the
582 pointer to which may have been stored in some other attribute or
583 variable.&nbsp; Symbols with the same name should not be created in the
584 same scope.&nbsp; The function 'find_D_Sym_in_Scope' is provided to
585 detect this case. <br>
586 &nbsp; <br>
587 &nbsp; User data can be attached to symbols by <span
588 style="font-weight: bold;">define</span>'ing <span
589 style="font-weight: bold;">D_UserSym</span>. &nbsp;See the ANSI C
590 grammar for an example. <br>
591 <br>
592 Here is a full example of scope usage (from tests/g29.test.g):<br>
593 <br>
594 <span style="font-family: monospace;">#include &lt;stdio.h&gt;<br>
595 <br>
596 typedef struct My_Sym {<br>
597 &nbsp; int value;<br>
598 } My_Sym;<br>
599 #define D_UserSym My_Sym<br>
600 typedef struct My_ParseNode {<br>
601 &nbsp; int value;<br>
602 &nbsp; struct D_Scope *scope;<br>
603 } My_ParseNode;<br>
604 #define D_ParseNode_User My_ParseNode<br>
605 }<br>
606 <br>
607 translation_unit: statement*;<br>
608 &nbsp;<br>
609 statement <br>
610 &nbsp; : expression ';' <br>
611 &nbsp; { printf("%d\n", $0.value); }<br>
612 &nbsp; | '{' new_scope statement* '}'<br>
613 &nbsp; [ ${scope} = enter_D_Scope(${scope}, $n0.scope); ]<br>
614 &nbsp; { ${scope} = commit_D_Scope(${scope}); }<br>
615 &nbsp; ;<br>
616 <br>
617 new_scope: [ ${scope} = new_D_Scope(${scope}); ];<br>
618 <br>
619 expression <br>
620 &nbsp; : identifier ':' expression <br>
621 &nbsp; [ <br>
622 &nbsp;&nbsp;&nbsp; D_Sym *s;<br>
623 &nbsp;&nbsp;&nbsp; if (find_D_Sym_in_Scope(${scope}, $n0.start_loc.s,
624 $n0.end))<br>
625 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf("duplicate identifier line %d\n",
626 $n0.start_loc.line);<br>
627 &nbsp;&nbsp;&nbsp; s = NEW_D_SYM(${scope}, $n0.start_loc.s, $n0.end);<br>
628 &nbsp;&nbsp;&nbsp; s-&gt;user.value = $2.value;<br>
629 &nbsp;&nbsp;&nbsp; $$.value = s-&gt;user.value;<br>
630 &nbsp; ]<br>
631 &nbsp; | identifier '=' expression<br>
632 &nbsp; [ D_Sym *s = find_D_Sym(${scope}, $n0.start_loc.s, $n0.end);<br>
633 &nbsp;&nbsp;&nbsp; s = UPDATE_D_SYM(${scope}, s);<br>
634 &nbsp;&nbsp;&nbsp; s-&gt;user.value = $2.value;<br>
635 &nbsp;&nbsp;&nbsp; $$.value = s-&gt;user.value;<br>
636 &nbsp; ]<br>
637 &nbsp; | integer <br>
638 &nbsp; [ $$.value = atoi($n0.start_loc.s); ]<br>
639 &nbsp; | identifier <br>
640 &nbsp; [ D_Sym *s = find_D_Sym(${scope}, $n0.start_loc.s, $n0.end);<br>
641 &nbsp;&nbsp;&nbsp; if (s)<br>
642 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $$.value = s-&gt;user.value;<br>
643 &nbsp; ]<br>
644 &nbsp; | expression '+' expression<br>
645 &nbsp; [ $$.value = $0.value + $1.value; ]<br>
646 &nbsp; ;<br>
647 <br>
648 integer: "-?([0-9]|0(x|X))[0-9]*(u|U|b|B|w|W|L|l)*" $term -1;<br>
649 identifier: "[a-zA-Z_][a-zA-Z_0-9]*";<br>
650 <br>
651 </span> <br>
652 <span style="font-weight: bold;">11. Whitespace</span><br>
653 <br>
654 &nbsp; Whitespace can be specified two ways: C function which can be
655 user-defined, or as a subgrammar. &nbsp;The default whitespace parser
657 compatible with C/C++ #line directives and comments. &nbsp;It can be
658 replaced with any user specified function as a parsing option (see
659 Options).<br>
660 <br>
661 &nbsp; Additionally, if the (optionally) reserved production <span
662 style="font-weight: bold;">whitespace</span> is defined, the
663 subgrammar
664 it defines will be used to consume whitespace for the main grammar.
665 &nbsp; This subgrammar can include normal actions.<br>
666 <br>
667 EXAMPLE<br>
668 <br>
669 <span style="font-weight: bold;"></span>S: 'a' 'b' 'c';<br>
670 whitespace: "[ \t\n]*";<br>
671 <br>
672 &nbsp; Whitespace can be accessed on a per parse node basis using the
673 functions: <span style="font-weight: bold;">d_ws_before</span> and <span
674 style="font-weight: bold;">d_ws_after</span>, which return the start
675 of the whitespace before <span style="font-weight: bold;">start_loc.s</span>
676 and after <span style="font-weight: bold;">end</span> respectively. <br>
677 <br>
678 <span style="font-weight: bold;">12. Ambiguities<br>
679 </span> <br>
680 &nbsp; Ambiguities are resolved automatically based on priorities and
681 associativities. &nbsp;In addition, when the other resolution
682 techniques
683 fail, user defined ambiguity resolution is possible. &nbsp; The default
684 ambiguity handler produces a fatal error on an unresolved
685 ambiguity.&nbsp; This behavior can be replaced with a user defined
686 resolvers the signature of which is provided in <span
687 style="font-weight: bold;">dparse.h</span>.<br>
688 <br>
689 &nbsp; If the <span style="font-weight: bold;">verbose_level</span>
690 flag is set, the default ambiguity handler will print out parenthesized
691 versions of the ambiguous parse trees.&nbsp;&nbsp; This may be of some
692 assistence in disambiguating a grammar.<br>
693 <br>
694 <span style="font-weight: bold;">13. Error Recovery<br>
695 <br>
696 &nbsp; DParser</span> implements an error recovery scheme appropriate
698 scannerless parsers. &nbsp;I haven't had time to investigate all the
699 prior work in this area, so I am not sure if it is novel. &nbsp;Suffice
700 for now that it is optional and works well with C/C++ like grammars.<br>
701 <br>
702 <span style="font-weight: bold;">14. Parsing Options</span><br>
703 <br>
704 &nbsp; Parser are instantiated with the function <span
705 style="font-style: italic;">new_D_Parser<span
706 style="font-weight: bold;">. </span></span>&nbsp;The resulting data
707 structure contains a number of user configurable options (see <span
708 style="font-weight: bold;">dparser.h</span>). &nbsp;These are provided
709 reasonable default values and include:<br>
710 <ul>
711 <li><span style="font-weight: bold;">initial_globals</span> - the
712 initial global variables accessable through <span
713 style="font-weight: bold;">$g</span></li>
714 <li><span style="font-weight: bold;"></span><span
715 style="font-weight: bold;">initial_skip_space_fn</span> - the initial
716 whitespace function</li>
717 <li><span style="font-family: monospace;"></span><span
718 style="font-weight: bold;">initial_scope</span> - the initial symbol
719 table scope</li>
720 <li><span style="font-weight: bold;"></span><span
721 style="font-weight: bold;">syntax_error_fn</span> - the function
722 called
723 on a syntax error</li>
724 <li><span style="font-weight: bold;">ambiguity_fn</span> - the
725 function called on an unresolved ambiguity</li>
726 <li><span style="font-weight: bold;">loc</span> - the initial
727 location
728 (set on an error).</li>
729 </ul>
730 In addtion, there are the following user configurables:<br>
731 <ul>
732 <li><span style="font-weight: bold;">sizeof_user_parse_node</span> -
733 the sizeof <span style="font-weight: bold;">D_ParseNodeUser</span></li>
734 <li><span style="font-weight: bold;">save_parse_tree</span> - whether
735 or not the parse tree should be save once the final actions have been
736 executed</li>
737 <li><span style="font-style: italic;"></span><span
738 style="font-weight: bold;">dont_fixup_internal_productions</span> - to
739 not convert the Kleene star into a variable number of children from a
740 tree of reductions</li>
741 <li><span style="font-weight: bold;">dont_merge_epsilon_trees</span>
743 to not automatically remove ambiguities which result from trees of
744 epsilon reductions without actions</li>
745 <li><span style="font-weight: bold;">dont_use_eagerness_for_disambiguation</span>
746 - do not use the rule that the longest parse which reduces to the same
747 token should be used to disambiguate parses.&nbsp; This rule is used to
748 handle the case (<span style="font-weight: bold;">if then else?</span>)
749 relatively cleanly.</li>
750 <li><span style="font-weight: bold;">dont_use_height_for_disambiguation</span>
751 - do not use the rule that the least deep parse which reduces to the
752 same token should be used to disabiguate parses.&nbsp; This rule is
753 used
754 to handle recursive grammars relatiively cleanly.</li>
755 <li><span style="font-weight: bold;">dont_compare_stacks</span> -
756 disables comparing stacks to handle certain exponential cases during
757 ambiguous operator priority resolution.&nbsp; This feature is
758 relatively
759 new, and this disables the new code.<br>
760 </li>
761 <li><span style="font-weight: bold;">commit_actions_interval</span> -
762 how often to commit final actions (0 is immediate, MAXINT is
763 essentially
764 not till the end of parsing)</li>
765 <li><span style="font-weight: bold;"></span><span
766 style="font-weight: bold;">error_recovery</span> - whether or not to
767 use error recovery (defaults ON)</li>
768 </ul>
769 An the following result values:<br>
770 <ul>
771 <li><span style="font-weight: bold;">syntax_errors</span> - how many
772 syntax errors (if <span style="font-weight: bold;">error_recovery</span>
773 was on)</li>
774 </ul>
775 This final value should be checked to see if parse was successful.<br>
776 <br>
777 <span style="font-weight: bold;">15. Grammar Grammar</span><br>
778 <br>
779 &nbsp; <span style="font-weight: bold;">DParser</span> is fully
780 self-hosted (would you trust a parser generator which wasn't?). &nbsp;
781 The grammar grammar is <a href="grammar.g">here (Grammar Grammar)</a>.
782 &nbsp;<span style="font-weight: bold;"></span><br>
783 <span style="font-weight: bold;"></span><br>
784 <span style="font-weight: bold;"><br>
785 </span><big><big><span style="font-weight: bold;"></span></big></big></div>
786 </div>
787 </body>
788 </html>