MultiSource/Applications/d/manual.html

   1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
   2 <html>
   3 <head>
   4   <meta http-equiv="content-type"
   5  content="text/html; charset=ISO-8859-1">
   6   <title>manual.html</title>
   7 </head>
   8 <body>
   9 <div style="text-align: center;">&nbsp; <big><big><span
  10  style="font-weight: bold;">DParser Manual<br>
  11 </span></big></big>
  12 <div style="text-align: left;"><big><br>
  13 <br>
  14 <span style="font-weight: bold;">Contents</span><br>
  15 </big>
  16 <ol>
  17   <li>Installation</li>
  18   <li>Getting Started</li>
  19   <li>Comments<br>
  20   </li>
  21   <li>Productions</li>
  22   <li>Global Code<br>
  23   </li>
  24   <li>Terminals</li>
  25   <ol>
  26     <li>Strings</li>
  27     <li>Regular Expressions</li>
  28     <li>External (C) scanners</li>
  29     <li>Tokenizers</li>
  30     <li>Longest Match<br>
  31     </li>
  32   </ol>
  33   <li>Priorities and Associativity</li>
  34   <ol>
  35     <li>Token Priorities</li>
  36     <li>Operator Priorities</li>
  37     <li>Rule Priorities</li>
  38   </ol>
  39   <li>Actions</li>
  40   <ol>
  41     <li>Speculative Actions</li>
  42     <li>Final Actions</li>
  43     <li>Embedded<br>
  44     </li>
  45     <li>Pass Actions<br>
  46     </li>
  47     <li>Default Actions<br>
  48     </li>
  49   </ol>
  50   <li>Attributes and Action Specifiers</li>
  51   <ol>
  52     <li>Global State</li>
  53     <li>Parse Nodes</li>
  54     <li>Misc</li>
  55   </ol>
  56   <li>Symbol Table</li>
  57   <li>Whitespace</li>
  58   <li>Ambiguities</li>
  59   <li>Error Recovery</li>
  60   <li>Parsing Options<br>
  61   </li>
  62   <li>Grammar Grammar<br>
  63   </li>
  64 </ol>
  65 <span style="font-weight: bold;">1. Installation</span><br>
  66 &nbsp;&nbsp;&nbsp; <br>
  67 To build:
  68 'gmake'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
  69 (only available with source code package)<br>
  70 <br>
  71 To test: 'gmake
  72 test'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (only
  73 available with source code package)<br>
  74 <br>
  75 To install, 'gmake install'&nbsp;&nbsp;&nbsp; (binary or source code
  76 packages)<br>
  77 <br>
  78 <span style="font-weight: bold;">2. Getting Started<span
  79  style="font-weight: bold;"></span><span style="font-weight: bold;"></span></span><span
  80  style="font-weight: bold;"></span><br>
  81 <br>
  82 2.1. Create your grammar, for example, in the file "my.g":<br>
  83 &nbsp;&nbsp; <br>
  84 &nbsp;&nbsp; E: E '+' E | "[abc]";<br>
  85 &nbsp;&nbsp; <br>
  86 2.2. Convert grammar into parsing tables:<br>
  87 <br>
  88 &nbsp; % make_dparser -g my.g<br>
  89 <br>
  90 2.3. Create a driver program, for example, in the file
  91 "my.c":&nbsp;&nbsp; <br>
  92 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>
  93 #include &lt;stdio.h&gt;<br>
  94 #include &lt;dparse.h&gt;<br>
  95 extern D_ParserTables parser_tables_gram;<br>
  96 int<br>
  97 main(int argc, char *argv[]) {<br>
  98 &nbsp; char s[256], *ss;<br>
  99 &nbsp; D_Parser *p = new_D_Parser(&amp;parser_tables_gram, 0);<br>
 100 &nbsp; if (fgets(s,255,stdin) &amp;&amp; dparse(p, s, strlen(s))
 101 &amp;&amp; !p-&gt;syntax_errors)<br>
 102 &nbsp;&nbsp;&nbsp; printf("success\n");<br>
 103 &nbsp; else<br>
 104 &nbsp;&nbsp;&nbsp; printf("failure\n");<br>
 105 }<br>
 106 <br>
 107 2.4. Compile:<br>
 108 <br>
 109 &nbsp; % cc -I/usr/local/include my.c my.g.d_parser.c -L/usr/local/lib
 110 -ldparse<br>
 111 &nbsp; <br>
 112 2.5. Run:<br>
 113 &nbsp;&nbsp; <br>
 114 &nbsp; % a.out<br>
 115 &nbsp; a=<br>
 116 &nbsp; syntax error, '' line 1<br>
 117 &nbsp; failure<br>
 118 &nbsp; %<br>
 119 &nbsp;&nbsp; <br>
 120 &nbsp; % a.out<br>
 121 &nbsp; a+b<br>
 122 &nbsp; success<br>
 123 &nbsp; %<br>
 124 <br>
 125 &nbsp;We'll come back to this example later.<br>
 126 <br>
 127 <span style="font-weight: bold;">3. Comments</span><br>
 128 <br>
 129 &nbsp; Grammars can include C/C++ style comments.<br>
 130 <br>
 131 EXAMPLE<br>
 132 <br>
 133 // My first grammar<br>
 134 &nbsp;&nbsp; E: E '+' E | "[abc]";<br>
 135 /* is this right? */<br>
 136 <br>
 137 <span style="font-weight: bold;">4. Productions</span><br>
 138 <br>
 139 &nbsp; 4.1. The first production is the root of your grammar (what you
 140 will be trying to parse).<br>
 141 &nbsp; 4.2. Productions start with the non-terminal being defined
 142 followed by a colon ':', a set of right hand sides seperated by '|'
 143 (or)
 144 consisting of elements (non-terminals or terminals).<br>
 145 &nbsp; 4.3. Elements can be grouped with parens '(', and the normal
 146 regular expression symbols can be used ('+' '*' '?' '|').<br>
 147 <br>
 148 EXAMPLE<br>
 149 <br>
 150 program: statements+ | &nbsp;comment* (function | &nbsp;procedure)?;<br>
 151 <br>
 152 &nbsp; 4.4. <span style="font-weight: bold;">NOTE:</span> Instead of
 153 using '[' ']' for optional elements we use the more familar and
 154 consistent '?' operator. &nbsp;The square brackets are reserved for
 155 speculative actions (below).<br>
 156 <br>
 157 <span style="font-weight: bold;">5. Global Code</span><br>
 158 <br>
 159 Global (or static) C code can be intermixed with productions by
 160 surrounding the code with brackets '{'.<br>
 161 <br>
 162 EXAMPLE<br>
 163 <br>
 164 { void dr_s() { printf("Dr. S\n"); }<br>
 165 S: 'the' 'cat' 'and' 'the' 'hat' { dr_s(); } | T;<br>
 166 { void twain() { printf("Mark Twain\n"); }<br>
 167 T: 'Huck' 'Finn' { twain(); };<br>
 168 <br>
 169 <span style="font-weight: bold;">6. Terminals</span><br>
 170 <br>
 171 &nbsp; 6.1. Strings terminals are surrounded with single quotes.
 172 &nbsp;For example:<br>
 173 <br>
 174 <span style="font-weight: bold;"></span>block: '{' statements* '}';<br>
 175 whileblock: 'while' '(' expression ')' block;<br>
 176 <br>
 177 &nbsp; 6.2. Regular expressions are surrounded with double quotes.
 178 &nbsp;For example:<br>
 179 <br>
 180 hexint: "(0x|0X)[0-9a-fA-F]+[uUlL]?";<br>
 181 <br>
 182 &nbsp; <span style="font-weight: bold;">NOTE: </span>only the simple
 183 regular expression operators are currently supported (v1.3). &nbsp;This
 184 include parens, square parens, ranges, and '*', '+', '?'. &nbsp; If you
 185 need something more, request a feature or implement it yourself; the
 186 code is in scan.c.<br>
 187 <br>
 188 &nbsp; 6.3. External (C) Scanners<br>
 189 <br>
 190 &nbsp; There are two types of external scanners, those which read a
 191 single terminal, and those which are global (called for every
 192 terminal).
 193 &nbsp;Here is an example of a scanner for a single terminal.
 194 &nbsp;Notice how it can be mixed with regular string terminals.<br>
 195 <br>
 196 {<br>
 197 extern char *ops;<br>
 198 extern void *ops_cache;<br>
 199 int ops_scan(char *ops, void *ops_cache, char **as,<br>
 200 &nbsp;&nbsp;&nbsp; int *col, int *line, unsigned short *op_assoc, int
 201 *op_priority);<br>
 202 }<br>
 203 <br>
 204 X: '1' (${scan ops_scan(ops, ops_cache)} '2')*;<br>
 205 <br>
 206 &nbsp; The user provides the 'ops_scan' function. &nbsp;This example is
 207 from tests/g4.test.g in the source distribution.<br>
 208 <br>
 209 &nbsp; The second type of scanner is a global scanner:<br>
 210 <br>
 211 {<br>
 212 #include "g7.test.g.d_parser.h"<br>
 213 int myscanner(char **s, int *col, int *line, unsigned short *symbol,<br>
 214 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; int *term_priority, unsigned short
 215 *op_assoc, int *op_priority)<br>
 216 {<br>
 217 &nbsp; if (**s == 'a') {<br>
 218 &nbsp;&nbsp;&nbsp; (*s)++;<br>
 219 &nbsp;&nbsp;&nbsp; *symbol = A;<br>
 220 &nbsp;&nbsp;&nbsp; return 1;<br>
 221 &nbsp; } else if (**s == 'b') {<br>
 222 &nbsp;&nbsp;&nbsp; (*s)++;<br>
 223 &nbsp;&nbsp;&nbsp; *symbol = BB;<br>
 224 &nbsp;&nbsp;&nbsp; return 1;<br>
 225 &nbsp; } else if (**s == 'c') {<br>
 226 &nbsp;&nbsp;&nbsp; (*s)++;<br>
 227 &nbsp;&nbsp;&nbsp; *symbol = CCC;<br>
 228 &nbsp;&nbsp;&nbsp; return 1;<br>
 229 &nbsp; } else if (**s == 'd') {<br>
 230 &nbsp;&nbsp;&nbsp; (*s)++;<br>
 231 &nbsp;&nbsp;&nbsp; *symbol = DDDD;<br>
 232 &nbsp;&nbsp;&nbsp; return 1;<br>
 233 &nbsp; } else<br>
 234 &nbsp;&nbsp;&nbsp; return 0;<br>
 235 }<br>
 236 ${scanner myscanner}<br>
 237 ${token A BB CCC DDDD}<br>
 238 <br>
 239 S: A (BB CCC)+ SS;<br>
 240 SS: DDDD;<br>
 241 <br>
 242 &nbsp; Notice how the you need to include the header file generated by <span
 243  style="font-weight: bold;">make_dparser</span> which contains the
 244 token
 245 definitions.<br>
 246 <br>
 247 6.4. Tokenizers<br>
 248 <br>
 249 &nbsp; Tokenizers are non-context sensitive global scanners which
 250 produce only one token for any given input string. &nbsp;Some
 251 programming languages (for example <span style="font-weight: bold;">C</span>)
 252 are easier to specify using a tokenizer because (for example) reserved
 253 words can be handled simply by lowering the terminal priority for
 254 identifiers.<br>
 255 <br>
 256 EXAMPLE:<br>
 257 <br>
 258 S : 'if' '(' S ')' S ';' | 'do' S 'while' '(' S ')' ';' | ident;<br>
 259 ident: "[a-z]+" $term -1;<br>
 260 <br>
 261 &nbsp; The sentence: <span style="font-weight: bold;">if ( while ) a;</span>
 262 is legal because <span style="font-weight: bold;">while</span> cannot
 263 appear at the start of <span style="font-weight: bold;">S</span> and
 264 so
 265 it doesn't conflict with the parsing of <span
 266  style="font-weight: bold;">while</span>
 267 as an <span style="font-weight: bold;">ident</span> in that position.
 268 &nbsp;However, if a tokenizer is specified, all tokens will be possible
 269 at each position and the sentense will produce a syntax error.<br>
 270 <br>
 271 &nbsp; <span style="font-weight: bold;">DParser</span> provides two
 272 ways to specify tokenizers: globally as an option (-T) to <span
 273  style="font-weight: bold;">make_dparser</span> and locally with a
 274 ${declare tokenize ...} specifier (see the ANSI C grammar for an
 275 example). &nbsp;The ${declare tokenize ...} declartion allows a
 276 tokenizer to be specified over a subset of the parsing states so that
 277 (for example) ANSI C could be a subgrammar of another larger grammar.
 278 &nbsp;Currently the parse states are not split so that the productions
 279 for the substates must be disjoint.<br>
 280 <br>
 281 6.5 Longest Match<br>
 282 <br>
 283 &nbsp; Longest match lexical ambiguity resolution is a technique used
 284 by seperate phase lexers to help decide (along<br>
 285 with lexical priorities) which single token to select for a given input
 286 string.&nbsp; It is used in the definition of ANSI-C, but not in C++
 287 because of a snafu in the definition of templates whereby templates of
 288 templates (List&lt;List &lt;Int&gt;&gt;) can end with the right shift
 289 token ('&gt;&gt;").&nbsp; Since <span style="font-weight: bold;">DParser</span>
 290 does not have a seperate lexical phase, it does not require longest
 291 match disambiguation, but provides it as an option.<br>
 292 <br>
 293 &nbsp; There are two ways to specify longest match disabiguation:
 294 globally as an option (-l) to <span style="font-weight: bold;">make_dparser</span>
 295 or locally with with a ${declare ... longest_match}.&nbsp; If global
 296 longest match disambiguation is <span style="font-weight: bold;">ON</span>,
 297 it can be locally disabled with {$declare ... all_matches} .&nbsp; As
 298 with Tokenizers above, local declarations operate on disjoint subsets
 299 of
 300 parsing states.<br>
 301 <br>
 302 <span style="font-weight: bold;">7. Priorities and Associativity<br>
 303 <br>
 304 </span>&nbsp; Priorities can very from MININT to MAXINT and are
 305 specified as integers. &nbsp;Associativity can take the values:<br>
 306 <br>
 307 assoc : '$unary_op_right' | '$unary_op_left' | '$binary_op_right'<br>
 308 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
 309 '$binary_op_left' | '$unary_right' | '$unary_left'<br>
 310 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
 311 '$binary_right' | '$binary_left' | '$right' | '$left' ;<br>
 312 <br>
 313 7.1. Token Prioritites<br>
 314 <br>
 315 &nbsp; Currently (v1.0) the automatically generated scanners use the <span
 316  style="font-style: italic;">longest match</span> rule so that, for
 317 example:<br>
 318 <br>
 319 OP: '&gt;' | '&gt;&gt;';<br>
 320 <br>
 321 will match the string '&gt;&gt;' only as '&gt;&gt;' instead of
 322 ambiguously as either '&gt;' or '&gt;&gt;'. &nbsp; A planned feature is
 323 to make this optional.<br>
 324 <br>
 325 &nbsp; Termininal priorities apply after the longest match string has
 326 been found and the terminal with the highest priority is selected.<br>
 327 They are introduced after a terminal by the specifier <span
 328  style="font-weight: bold;">$term</span>. &nbsp;We saw an example of
 329 token priorities with the definition of <span
 330  style="font-weight: bold;">ident</span>.<br>
 331 <br>
 332 EXAMPLE:<br>
 333 <br>
 334 S : 'if' '(' S ')' S ';' | 'do' S 'while' '(' S ')' ';' | ident;<br>
 335 ident: "[a-z]+" $term -1;<br>
 336 <br>
 337 7.2. Operator Priorities<br>
 338 <br>
 339 &nbsp; Operator priorities specify the priority of a operator symbol
 340 (either a terminal or a non-terminal). &nbsp;This corresponds to the <span
 341  style="font-style: italic;">yacc</span> or <span
 342  style="font-style: italic;">bison</span> <span
 343  style="font-weight: bold;">%left</span> etc. declaration.
 344 &nbsp;However, since <span style="font-weight: bold;">DParser</span>
 345 is
 346 doesn't require a global tokenizer, operator priorities and
 347 associativities are specified on the reduction which creates the token.
 348 &nbsp;Moreover, the associativity includes the operator usage as well
 349 since it cannot be infered from rule context. &nbsp;Possible operator
 350 associativies are:<br>
 351 <br>
 352 operator_assoc : '$unary_op_right' | '$unary_op_left' |
 353 '$binary_op_right'<br>
 354 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
 355 '$binary_op_left' | '$unary_right' | '$unary_left'<br>
 356 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |
 357 '$binary_right' | '$binary_left';<br>
 358 <br>
 359 EXAMPLE:<br>
 360 <br>
 361 E: ident op ident;<br>
 362 ident: '[a-z]+';<br>
 363 <span style="font-weight: bold;"></span>op: '*' $binary_op_left 2 |<br>
 364 &nbsp; &nbsp; &nbsp; '+' $binary_op_left 1;<br>
 365 <br>
 366 7.3. Rule Priorities<br>
 367 <br>
 368 &nbsp; Rule priorities specify the priority of the reduction itself and
 369 have the possible associativies:<br>
 370 <br>
 371 rule_assoc: '$right' | '$left';<br>
 372 <br>
 373 &nbsp; Rule and operator priorities can be intermixed and are
 374 interpreted at run time (<span style="font-weight: bold;">not</span>
 375 when
 376 the tables are built). &nbsp;This make it possible for user-defined
 377 scanners to return the associativities and priorities of tokens.<br>
 378 <br>
 379 <span style="font-weight: bold;">8. Actions<br>
 380 <br>
 381 </span>&nbsp; Actions are the bits of code which run when a reduction
 382 occurs.<br>
 383 <br>
 384 EXAMPLE<br>
 385 <br>
 386 S: this | that;<br>
 387 this: 'this' { printf("got this\n"); };<br>
 388 that: 'that' { printf("got that\n"); };<br>
 389 <br>
 390 8.1 Speculative Action<br>
 391 <br>
 392 &nbsp; Speculative actions occur when the reduction takes place during
 393 the speculative parsing process. &nbsp;It is possible<br>
 394 that the reduction will not be part of the final parse or that it will
 395 occur a different number of times. &nbsp;For example:<br>
 396 <br>
 397 S: this | that;<br>
 398 this: hi 'mom';<br>
 399 that: ho 'dad';<br>
 400 ho: 'hello' [ printf("ho\n"); ];<br>
 401 hi: 'hello' [ printf("hi\n"); ];<br>
 402 <br>
 403 Will print both 'hi' and 'ho' when given the input 'hello dad' because
 404 at the time hello is reduced, the following token is not known.<br>
 405 <br>
 406 8.2 Final Actions<br>
 407 <br>
 408 &nbsp; Final actions occur only when the reduction must be part of any
 409 legal final parse (committed). &nbsp;It is possible to do final actions
 410 during parsing or delay them till the entire parse tree is constructed
 411 (see Options). &nbsp;Final actions are executed in order and in number
 412 according the the single final unambiguous parse.<br>
 413 <br>
 414 S: A S 'b' | 'x';<br>
 415 A: [ printf("speculative e-reduce A\n"); ] <br>
 416 &nbsp;&nbsp; { printf("final e-reduce A\n"); };<br>
 417 <br>
 418 &nbsp; On input:<br>
 419 <br>
 420 xbbb<br>
 421 <br>
 422 &nbsp; Will produce:<br>
 423 <br>
 424 speculative e-reduce A<br>
 425 final e-reduce A<br>
 426 final e-reduce A<br>
 427 final e-reduce A<br>
 428 <br>
 429 8.3 Embedded Actions<br>
 430 <br>
 431 &nbsp; Actions can be embedded into rule. These actions are executed
 432 as if they were replaced with a synthetic production with a single null
 433 rule containing the actions.&nbsp; For example:<br>
 434 <br>
 435 S: A { printf("X"); } B;<br>
 436 A: 'a' { printf("a"); };<br>
 437 B: 'b' { printf("b"); };<br>
 438 <br>
 439 &nbsp; On input:<br>
 440 <br>
 441 ab<br>
 442 <br>
 443 &nbsp; Will produce:<br>
 444 <br>
 445 aXb<br>
 446 <br>
 447 8.4 Pass Actions<br>
 448 <br>
 449 &nbsp; <span style="font-weight: bold;">DParser</span> supports
 450 multiple pass compilation.&nbsp; The passes are declared at the top of
 451 the grammar, and the actions are associated with individual rules.<br>
 452 <br>
 453 EXAMPLE<br>
 454 <br>
 455 ${pass sym for_all postorder}<br>
 456 ${pass gen for_all postorder}<br>
 457 <br>
 458 translation_unit: statement*;<br>
 459 <br>
 460 statement<br>
 461 &nbsp; : expression ';' {<br>
 462 &nbsp;&nbsp;&nbsp; d_pass(${parser}, &amp;$n, ${pass sym});<br>
 463 &nbsp;&nbsp;&nbsp; d_pass(${parser}, &amp;$n, ${pass gen});<br>
 464 &nbsp; }<br>
 465 &nbsp; ;<br>
 466 <br>
 467 expression :&nbsp; integer<br>
 468 &nbsp; gen: { printf("gen integer\n"); }<br>
 469 &nbsp; sym: { printf("sym integer\n"); }<br>
 470 &nbsp; | expression '+' expression $right 2<br>
 471 &nbsp; sym: { printf("sym +\n"); }<br>
 472 &nbsp; ;<br>
 473 <br>
 474 &nbsp; A pass name then a colon indicate that the following action is
 475 associated with a particular pass. Passes can be either <span
 476  style="font-weight: bold;">for_all</span> or <span
 477  style="font-weight: bold;">for_undefined</span> (which means that the
 478 automatic traversal only applies to rules without actions defined for
 479 this pass).&nbsp;&nbsp; Furthermore, passes can be <span
 480  style="font-weight: bold;">postorder</span>, <span
 481  style="font-weight: bold;">preorder</span>, and <span
 482  style="font-weight: bold;">manual</span> (you have to call <span
 483  style="font-weight: bold;">d_pass</span> yourself).&nbsp; Passes can
 484 be initiated in the final action of any rule.<br>
 485 <br>
 486 8.5 Default Actions<br>
 487 <br>
 488 &nbsp; The special production "<span style="font-weight: bold;">_</span>"
 489 can be defined with a single rule whose actions become the default when
 490 no other action is specified.&nbsp; Default actions can be specified
 491 for speculative, final and pass actions and apply to each seperately.<br>
 492 <br>
 493 EXAMPLE<br>
 494 <br>
 495 _: { printf("final action"); }<br>
 496 &nbsp;&nbsp;&nbsp; gen: { printf("default gen action"); }<br>
 497 &nbsp;&nbsp;&nbsp; sym: { printf("default sym action"); }<br>
 498 &nbsp;&nbsp;&nbsp; ;<br>
 499 <span style="font-weight: bold;"></span><br>
 500 <span style="font-weight: bold;">9. Attributes and Action Specifiers</span><br>
 501 <br>
 502 9.1. &nbsp;Global State (<span style="font-weight: bold;">$g</span>)<br>
 503 <br>
 504 &nbsp; Global state is declared by <span style="font-weight: bold;">define</span>'ing<span
 505  style="font-weight: bold;"> D_ParseNodeGlobals</span> (see the ANSI C
 506 grammar for a similar declaration for symbols). Global state can be
 507 accessed in any action with <span style="font-weight: bold;">$g</span>.
 508 &nbsp;Because <span style="font-weight: bold;">DParser</span> handles
 509 ambiguous parsing global state can be accessed on different speculative
 510 parses. &nbsp;In the future automatic splitting of global state may be
 511 implemented (if there is demand). Currently, the global state can be
 512 copied and assigned to <span style="font-weight: bold;">$g</span> to
 513 ensure that the changes made only effect subsequent speculative parses
 514 derived from the particular parse.<br>
 515 <br>
 516 EXAMPLE<br>
 517 <br>
 518 &nbsp; [ $g = copy_globals($g);<br>
 519 &nbsp;&nbsp;&nbsp; $g-&gt;my_variable = 1;<br>
 520 &nbsp; ]<br>
 521 <br>
 522 &nbsp; The symbol table (Section 10) can be used to manage state
 523 information safely for different speculative parses.<br>
 524 <br>
 525 9.2. Parse Node State<br>
 526 <br>
 527 &nbsp; Each parse node includes a set of system state variables and can
 528 have a set of user-defined state variables. &nbsp;User defined parse
 529 node state is declared by <span style="font-weight: bold;">define</span>'ing<span
 530  style="font-weight: bold;"> D_ParseNodeUser</span>. &nbsp; Parse node
 531 state is accessed with:<br>
 532 <br>
 533 &nbsp;<span style="font-weight: bold;">$#</span> - number of child nodes<br>
 534 &nbsp;<span style="font-weight: bold;">$$</span> - user parse node
 535 state
 536 for parent node (non-terminal defined by the production)<br>
 537 &nbsp;<span style="font-weight: bold;">$X</span> (where X is a number)
 538 -
 539 the user parse node state of element X of the production<br>
 540 &nbsp;<span style="font-weight: bold;">$nX</span> - the system parse
 541 node state of element X of the production<br>
 542 <br>
 543 &nbsp; The system parse node state is defined in <span
 544  style="font-weight: bold;">dparse.h</span> which is installed with <span
 545  style="font-weight: bold;">DParser</span>. &nbsp;It contains such
 546 information as the symbol, the location of the parsed string, and
 547 pointers to the start and end of the parsed string.<br>
 548 <br>
 549 9.3. Misc<br>
 550 <br>
 551 &nbsp; <span style="font-weight: bold;">${scope}</span> - the current
 552 symbol table scope<br>
 553 &nbsp; <span style="font-weight: bold;">${reject}</span> - in
 554 speculative actions permits the current parse to be rejected<br>
 555 <br>
 556 <span style="font-weight: bold;">10. Symbol Table</span><br>
 557 <br>
 558 <span style="font-weight: bold;"></span>&nbsp; The symbol table can be
 559 updated down different speculative paths while sharing the bulk of the
 560 data. &nbsp;It defines the following functions in the file (dsymtab.h):<br>
 561 &nbsp; <br>
 562 struct D_Scope *new_D_Scope(struct D_Scope *st);<br>
 563 struct D_Scope *enter_D_Scope(struct D_Scope *current, struct D_Scope
 564 *scope);<br>
 565 D_Sym *NEW_D_SYM(struct D_Scope *st, char *name, char *end);<br>
 566 D_Sym *find_D_Sym(struct D_Scope *st, char *name, char *end);<br>
 567 D_Sym *UPDATE_D_SYM(struct D_Scope *st, D_Sym *sym);<br>
 568 D_Sym *current_D_Sym(struct D_Scope *st, D_Sym *sym);<br>
 569 D_Sym *find_D_Sym_in_Scope(struct D_Scope *st, char *name, char *end);<br>
 570 <br>
 571 'new_D_Scope' creates a new scope below 'st' or NULL for a 'top level'
 572 scope.&nbsp; 'enter_D_Scope' returns to a previous scoping level.&nbsp;
 573 NOTE: do not simply assign ${scope} to a previous scope as any updated
 574 symbol information will be lost.&nbsp; 'commit_D_Scope' can be used in
 575 final actions to compress the update list for the top level scope and
 576 improve efficiency.<br>
 577 <br>
 578 'find_D_Sym' finds the most current version of a symbol in a given
 579 scope.&nbsp; 'UPDATE_D_SYM' updates the value of symbol (creates a
 580 difference record on the current speculative parse path).&nbsp;
 581 'current_D_Sym' is used to retrive the current version of a symbol, the
 582 pointer to which may have been stored in some other attribute or
 583 variable.&nbsp; Symbols with the same name should not be created in the
 584 same scope.&nbsp; The function 'find_D_Sym_in_Scope' is provided to
 585 detect this case. <br>
 586 &nbsp; <br>
 587 &nbsp; User data can be attached to symbols by <span
 588  style="font-weight: bold;">define</span>'ing <span
 589  style="font-weight: bold;">D_UserSym</span>. &nbsp;See the ANSI C
 590 grammar for an example. <br>
 591 <br>
 592 Here is a full example of scope usage (from tests/g29.test.g):<br>
 593 <br>
 594 <span style="font-family: monospace;">#include &lt;stdio.h&gt;<br>
 595 <br>
 596 typedef struct My_Sym {<br>
 597 &nbsp; int value;<br>
 598 } My_Sym;<br>
 599 #define D_UserSym My_Sym<br>
 600 typedef struct My_ParseNode {<br>
 601 &nbsp; int value;<br>
 602 &nbsp; struct D_Scope *scope;<br>
 603 } My_ParseNode;<br>
 604 #define D_ParseNode_User My_ParseNode<br>
 605 }<br>
 606 <br>
 607 translation_unit: statement*;<br>
 608 &nbsp;<br>
 609 statement <br>
 610 &nbsp; : expression ';' <br>
 611 &nbsp; { printf("%d\n", $0.value); }<br>
 612 &nbsp; | '{' new_scope statement* '}'<br>
 613 &nbsp; [ ${scope} = enter_D_Scope(${scope}, $n0.scope); ]<br>
 614 &nbsp; { ${scope} = commit_D_Scope(${scope}); }<br>
 615 &nbsp; ;<br>
 616 <br>
 617 new_scope: [ ${scope} = new_D_Scope(${scope}); ];<br>
 618 <br>
 619 expression <br>
 620 &nbsp; : identifier ':' expression <br>
 621 &nbsp; [ <br>
 622 &nbsp;&nbsp;&nbsp; D_Sym *s;<br>
 623 &nbsp;&nbsp;&nbsp; if (find_D_Sym_in_Scope(${scope}, $n0.start_loc.s,
 624 $n0.end))<br>
 625 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; printf("duplicate identifier line %d\n",
 626 $n0.start_loc.line);<br>
 627 &nbsp;&nbsp;&nbsp; s = NEW_D_SYM(${scope}, $n0.start_loc.s, $n0.end);<br>
 628 &nbsp;&nbsp;&nbsp; s-&gt;user.value = $2.value;<br>
 629 &nbsp;&nbsp;&nbsp; $$.value = s-&gt;user.value;<br>
 630 &nbsp; ]<br>
 631 &nbsp; | identifier '=' expression<br>
 632 &nbsp; [ D_Sym *s = find_D_Sym(${scope}, $n0.start_loc.s, $n0.end);<br>
 633 &nbsp;&nbsp;&nbsp; s = UPDATE_D_SYM(${scope}, s);<br>
 634 &nbsp;&nbsp;&nbsp; s-&gt;user.value = $2.value;<br>
 635 &nbsp;&nbsp;&nbsp; $$.value = s-&gt;user.value;<br>
 636 &nbsp; ]<br>
 637 &nbsp; | integer <br>
 638 &nbsp; [ $$.value = atoi($n0.start_loc.s); ]<br>
 639 &nbsp; | identifier <br>
 640 &nbsp; [ D_Sym *s = find_D_Sym(${scope}, $n0.start_loc.s, $n0.end);<br>
 641 &nbsp;&nbsp;&nbsp; if (s)<br>
 642 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $$.value = s-&gt;user.value;<br>
 643 &nbsp; ]<br>
 644 &nbsp; | expression '+' expression<br>
 645 &nbsp; [ $$.value = $0.value + $1.value; ]<br>
 646 &nbsp; ;<br>
 647 <br>
 648 integer: "-?([0-9]|0(x|X))[0-9]*(u|U|b|B|w|W|L|l)*" $term -1;<br>
 649 identifier: "[a-zA-Z_][a-zA-Z_0-9]*";<br>
 650 <br>
 651 </span> <br>
 652 <span style="font-weight: bold;">11. Whitespace</span><br>
 653 <br>
 654 &nbsp; Whitespace can be specified two ways: C function which can be
 655 user-defined, or as a subgrammar. &nbsp;The default whitespace parser
 656 is
 657 compatible with C/C++ #line directives and comments. &nbsp;It can be
 658 replaced with any user specified function as a parsing option (see
 659 Options).<br>
 660 <br>
 661 &nbsp; Additionally, if the (optionally) reserved production <span
 662  style="font-weight: bold;">whitespace</span> is defined, the
 663 subgrammar
 664 it defines will be used to consume whitespace for the main grammar.
 665 &nbsp; This subgrammar can include normal actions.<br>
 666 <br>
 667 EXAMPLE<br>
 668 <br>
 669 <span style="font-weight: bold;"></span>S: 'a' 'b' 'c';<br>
 670 whitespace: "[ \t\n]*";<br>
 671 <br>
 672 &nbsp; Whitespace can be accessed on a per parse node basis using the
 673 functions: <span style="font-weight: bold;">d_ws_before</span> and <span
 674  style="font-weight: bold;">d_ws_after</span>, which return the start
 675 of the whitespace before <span style="font-weight: bold;">start_loc.s</span>
 676 and after <span style="font-weight: bold;">end</span> respectively. <br>
 677 <br>
 678 <span style="font-weight: bold;">12. Ambiguities<br>
 679 </span> <br>
 680 &nbsp; Ambiguities are resolved automatically based on priorities and
 681 associativities. &nbsp;In addition, when the other resolution
 682 techniques
 683 fail, user defined ambiguity resolution is possible. &nbsp; The default
 684 ambiguity handler produces a fatal error on an unresolved
 685 ambiguity.&nbsp; This behavior can be replaced with a user defined
 686 resolvers the signature of which is provided in <span
 687  style="font-weight: bold;">dparse.h</span>.<br>
 688 <br>
 689 &nbsp; If the <span style="font-weight: bold;">verbose_level</span>
 690 flag is set, the default ambiguity handler will print out parenthesized
 691 versions of the ambiguous parse trees.&nbsp;&nbsp; This may be of some
 692 assistence in disambiguating a grammar.<br>
 693 <br>
 694 <span style="font-weight: bold;">13. Error Recovery<br>
 695 <br>
 696 &nbsp; DParser</span> implements an error recovery scheme appropriate
 697 to
 698 scannerless parsers. &nbsp;I haven't had time to investigate all the
 699 prior work in this area, so I am not sure if it is novel. &nbsp;Suffice
 700 for now that it is optional and works well with C/C++ like grammars.<br>
 701 <br>
 702 <span style="font-weight: bold;">14. Parsing Options</span><br>
 703 <br>
 704 &nbsp; Parser are instantiated with the function <span
 705  style="font-style: italic;">new_D_Parser<span
 706  style="font-weight: bold;">. </span></span>&nbsp;The resulting data
 707 structure contains a number of user configurable options (see <span
 708  style="font-weight: bold;">dparser.h</span>). &nbsp;These are provided
 709 reasonable default values and include:<br>
 710 <ul>
 711   <li><span style="font-weight: bold;">initial_globals</span> - the
 712 initial global variables accessable through <span
 713  style="font-weight: bold;">$g</span></li>
 714   <li><span style="font-weight: bold;"></span><span
 715  style="font-weight: bold;">initial_skip_space_fn</span> - the initial
 716 whitespace function</li>
 717   <li><span style="font-family: monospace;"></span><span
 718  style="font-weight: bold;">initial_scope</span> - the initial symbol
 719 table scope</li>
 720   <li><span style="font-weight: bold;"></span><span
 721  style="font-weight: bold;">syntax_error_fn</span> - the function
 722 called
 723 on a syntax error</li>
 724   <li><span style="font-weight: bold;">ambiguity_fn</span> - the
 725 function called on an unresolved ambiguity</li>
 726   <li><span style="font-weight: bold;">loc</span> - the initial
 727 location
 728 (set on an error).</li>
 729 </ul>
 730 In addtion, there are the following user configurables:<br>
 731 <ul>
 732   <li><span style="font-weight: bold;">sizeof_user_parse_node</span> -
 733 the sizeof <span style="font-weight: bold;">D_ParseNodeUser</span></li>
 734   <li><span style="font-weight: bold;">save_parse_tree</span> - whether
 735 or not the parse tree should be save once the final actions have been
 736 executed</li>
 737   <li><span style="font-style: italic;"></span><span
 738  style="font-weight: bold;">dont_fixup_internal_productions</span> - to
 739 not convert the Kleene star into a variable number of children from a
 740 tree of reductions</li>
 741   <li><span style="font-weight: bold;">dont_merge_epsilon_trees</span>
 742 -
 743 to not automatically remove ambiguities which result from trees of
 744 epsilon reductions without actions</li>
 745   <li><span style="font-weight: bold;">dont_use_eagerness_for_disambiguation</span>
 746 - do not use the rule that the longest parse which reduces to the same
 747 token should be used to disambiguate parses.&nbsp; This rule is used to
 748 handle the case (<span style="font-weight: bold;">if then else?</span>)
 749 relatively cleanly.</li>
 750   <li><span style="font-weight: bold;">dont_use_height_for_disambiguation</span>
 751 - do not use the rule that the least deep parse which reduces to the
 752 same token should be used to disabiguate parses.&nbsp; This rule is
 753 used
 754 to handle recursive grammars relatiively cleanly.</li>
 755   <li><span style="font-weight: bold;">dont_compare_stacks</span> -
 756 disables comparing stacks to handle certain exponential cases during
 757 ambiguous operator priority resolution.&nbsp; This feature is
 758 relatively
 759 new, and this disables the new code.<br>
 760   </li>
 761   <li><span style="font-weight: bold;">commit_actions_interval</span> -
 762 how often to commit final actions (0 is immediate, MAXINT is
 763 essentially
 764 not till the end of parsing)</li>
 765   <li><span style="font-weight: bold;"></span><span
 766  style="font-weight: bold;">error_recovery</span> - whether or not to
 767 use error recovery (defaults ON)</li>
 768 </ul>
 769 An the following result values:<br>
 770 <ul>
 771   <li><span style="font-weight: bold;">syntax_errors</span> - how many
 772 syntax errors (if <span style="font-weight: bold;">error_recovery</span>
 773 was on)</li>
 774 </ul>
 775 This final value should be checked to see if parse was successful.<br>
 776 <br>
 777 <span style="font-weight: bold;">15. Grammar Grammar</span><br>
 778 <br>
 779 &nbsp; <span style="font-weight: bold;">DParser</span> is fully
 780 self-hosted (would you trust a parser generator which wasn't?). &nbsp;
 781 The grammar grammar is <a href="grammar.g">here (Grammar Grammar)</a>.
 782 &nbsp;<span style="font-weight: bold;"></span><br>
 783 <span style="font-weight: bold;"></span><br>
 784 <span style="font-weight: bold;"><br>
 785 </span><big><big><span style="font-weight: bold;"></span></big></big></div>
 786 </div>
 787 </body>
 788 </html>