doc/_syntactic_analysis.txt

   1 Syntactic analysis
   2 ==================
   3
   4 DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT DRAFT
   5
   6 The syntactic analysis in lestes project comprises of a disambiguation manager
   7 ("manager"), a stack of bison-generated LALR(1) parsers ("parsers") run by
   8 the manager, and a hinter which provides parsers with type information.
   9
  10 As analysis of some parts of the source must be delayed, the analyzer is
  11 designed to be re-entrant. A new manager with its parsers and a hinter can be
  12 spawned at any time. Method bodies defined inside classes are good example of
  13 a constructs that cannot be analysed immediately (see [3.4.1/8]).
  14
  15 Disambiguation manager
  16 ----------------------
  17
  18 The main purpose of the manager is to supervise the process of disambiguation.
  19 The disambiguation itself is controlled by the parsers - they call the manager
  20 because they "know" where, and what kind of disambiguation needs to be run.
  21 The manager is responsible for actually starting a new parser whenever a
  22 disambiguation is needed.
  23
  24 The manager also does "packing", a method for delaying analysis of some chunks
  25 from the source. Again, packing itself is controlled from the parsers yet
  26 performed by the manager. For example, when the parser expects function body
  27 to follow, it makes the manager pack it and reads it as one token which
  28 contains the original token sequence inside. The body is extracted verbatim
  29 when its analysis can be conducted -- usually when the tables are filled so
  30 that the hinter can correctly supply type information.
  31
  32 Disambiguation
  33 --------------
  34
  35 Some parts of the C++ language are ambiguous, which means that the same part
  36 of a source (sequence of tokens) can have more than one interpretation. The
  37 norm states which of the possibilities is to be considered. Sometimes, the
  38 norm explicitly says that one of the possibilities takes precedence (e.g.
  39 declaration-statement before expression-statement [6.8/1]). In other cases,
  40 these precedences can be worked out.
  41
  42 Syntactic analysis in lestes builds on the assumption that all disambiguation
  43 rules are formulated using precedences.
  44
  45 The norm mandates that the disambiguation precedes parsing [6.8/3]. That
  46 implies the analyzer has to break out from parsing and process incoming token
  47 sequence sideways to find out the possible interpretations. If there is more
  48 than one of them, the analyzer has to choose the right one according to the
  49 norm.
  50
  51 Whenever parser reaches a state requiring disambiguation, it is already known
  52 which two (or more) ambiguous constructs are allowed to follow
  53 (declaration-statement and expression-statement from the example above). For
  54 each such construct a new "one-purpose" parser is started at the very position
  55 in the source token stream. Each construct has its own specialised parser.
  56
  57 The sole purpose of the newly started parser -- which runs while the main one
  58 is paused -- is to find out whether the following token sequence can match
  59 specific language construct. During this trial pass we do not modify
  60 identifier (declaration) tables and the only outcome of the pass is a logical
  61 value: Whether parsing succeeded (the construct matches), or not.
  62
  63 This way, all of the constructs in question are tried, starting at the same
  64 position in source. As all disambiguation rules are precedence-based we can
  65 run the specific parsers in order dictated by these rules and accept the first
  66 successful try as result of the disambiguation. When the result is computed,
  67 the main parser continues knowing which of the paths to choose. Technical
  68 details of implementing this within a bison-generated parser will be discussed
  69 later in implementation section.
  70
  71 Important thing to notice is that disambiguations can nest. In other words,
  72 while resolving an ambiguity, another ambiguity might arise. The concept is
  73 the same, however. We try parsing the possible subconstructs while the parser
  74 which was itself parsing some construct is not running. After resolving the
  75 sub-ambiguity (accepting the first matching successful try) the parser knows
  76 how to carry on.
  77
  78 Hinter
  79 ------
  80
  81 Unless ambiguities are handled completely in semantic analysis (which is not
  82 the case in lestes), a C++ syntactic analyzer cannot do without type
  83 information: In some contexts it needs to know whether an identifier denotes a
  84 type or a non-type. (Yet this is not enough to solve all ambiguities.) This
  85 information is provided by hinter.
  86
  87 The hinter processes all identifiers before they are passed to the parser. It
  88 searches the tables for matching declarations, following the rules described
  89 by the norm as Name lookup [3.4]. Set of found declarations is then bound to
  90 the identifier token and may be used if needed. Note that the search takes
  91 place at all times -- even when the results are not actually required.
  92
  93 Hinter must be tightly knit to both the semantic analysis and the parsers.
  94 The former is required because the lookup performed by the hinter must be run
  95 in correct scope. Qualified names, or elaborated specifiers imply the latter,
  96 as these influence parameters of the lookup process.
  97
  98 ===
  99 OLD OLD OLD OLD:
 100
 101 If it finishes without an syntax error, the disambiguation is said to succeed
 102 (or commit). Otherwise, the disambiguation fails (rolls back).
 103
 104 NO TIME:
 105 Whenever a (possibly nested) disambiguation is started, start hooks are run.
 106 When a disambiguation succeeds (commits), commit hooks are run.
 107 When a disambiguation fails (rolls back), rollback hooks are run.
 108
 109 NO TIME:
 110 When a disambiguation fails, all disambiguations that were nested in it (and
 111 committed on their own previously) are rolled back. Rollback hooks are run
 112 only once, though. (The nested disambiguations' undo actions are run, however.
 113 See manager_hooks.txt .)