flang/docs/Semantics.md

   1 <!--===- docs/Semantics.md
   2
   3    Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
   4    See https://llvm.org/LICENSE.txt for license information.
   5    SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
   6
   7 -->
   8
   9 # Semantic Analysis
  10
  11 ```{contents}
  12 ---
  13 local:
  14 ---
  15 ```
  16
  17 The semantic analysis pass determines if a syntactically correct Fortran
  18 program is is legal by enforcing the constraints of the language.
  19
  20 The input is a parse tree with a `Program` node at the root;
  21 and a "cooked" character stream, a contiguous stream of characters
  22 containing a normalized form of the Fortran source.
  23
  24 The semantic analysis pass takes a parse tree for a syntactically
  25 correct Fortran program and determines whether it is legal by enforcing
  26 the constraints of the language.
  27
  28 If the program is not legal, the results of the semantic pass will be a list of
  29 errors associated with the program.
  30
  31 If the program is legal, the semantic pass will produce a (possibly modified)
  32 parse tree for the semantically correct program with each name mapped to a symbol
  33 and each expression fully analyzed.
  34
  35 All user errors are detected either prior to or during semantic analysis.
  36 After it completes successfully the program should compile with no error messages.
  37 There may still be warnings or informational messages.
  38
  39 ## Phases of Semantic Analysis
  40
  41 1. [Validate labels](#validate-labels) -
  42    Check all constraints on labels and branches
  43 2. [Rewrite DO loops](#rewrite-do-loops) -
  44    Convert all occurrences of `LabelDoStmt` to `DoConstruct`.
  45 3. [Name resolution](#name-resolution) -
  46    Analyze names and declarations, build a tree of Scopes containing Symbols,
  47    and fill in the `Name::symbol` data member in the parse tree
  48 4. [Rewrite parse tree](#rewrite-parse-tree) -
  49    Fix incorrect parses based on symbol information
  50 5. [Expression analysis](#expression-analysis) -
  51    Analyze all expressions in the parse tree and fill in `Expr::typedExpr` and
  52    `Variable::typedExpr` with analyzed expressions; fix incorrect parses
  53    based on the result of this analysis
  54 6. [Statement semantics](#statement-semantics) -
  55    Perform remaining semantic checks on the execution parts of subprograms
  56 7. [Write module files](#write-module-files) -
  57    If no errors have occurred, write out `.mod` files for modules and submodules
  58
  59 If phase 1 or phase 2 encounter an error on any of the program units,
  60 compilation terminates. Otherwise, phases 3-6 are all performed even if
  61 errors occur.
  62 Module files are written (phase 7) only if there are no errors.
  63
  64 ### Validate labels
  65
  66 Perform semantic checks related to labels and branches:
  67 - check that any labels that are referenced are defined and in scope
  68 - check branches into loop bodies
  69 - check that labeled `DO` loops are properly nested
  70 - check labels in data transfer statements
  71
  72 ### Rewrite DO loops
  73
  74 This phase normalizes the parse tree by removing all unstructured `DO` loops
  75 and replacing them with `DO` constructs.
  76
  77 ### Name resolution
  78
  79 The name resolution phase walks the parse tree and constructs the symbol table.
  80
  81 The symbol table consists of a tree of `Scope` objects rooted at the global scope.
  82 The global scope is owned by the `SemanticsContext` object.
  83 It contains a `Scope` for each program unit in the compilation.
  84
  85 Each `Scope` in the scope tree contains child scopes representing other scopes
  86 lexically nested in it.
  87 Each `Scope` also contains a map of `CharBlock` to `Symbol` representing names
  88 declared in that scope. (All names in the symbol table are represented as
  89 `CharBlock` objects, i.e. as substrings of the cooked character stream.)
  90
  91 All `Symbol` objects are owned by the symbol table data structures.
  92 They should be accessed as `Symbol *` or `Symbol &` outside of the symbol
  93 table classes as they can't be created, copied, or moved.
  94 The `Symbol` class has functions and data common across all symbols, and a
  95 `details` field that contains more information specific to that type of symbol.
  96 Many symbols also have types, represented by `DeclTypeSpec`.
  97 Types are also owned by scopes.
  98
  99 Name resolution happens on the parse tree in this order:
 100 1. Process the specification of a program unit:
 101    1. Create a new scope for the unit
 102    2. Create a symbol for each contained subprogram containing just the name
 103    3. Process the opening statement of the unit (`ModuleStmt`, `FunctionStmt`, etc.)
 104    4. Process the specification part of the unit
 105 2. Apply the same process recursively to nested subprograms
 106 3. Process the execution part of the program unit
 107 4. Process the execution parts of nested subprograms recursively
 108
 109 After the completion of this phase, every `Name` corresponds to a `Symbol`
 110 unless an error occurred.
 111
 112 ### Rewrite parse tree
 113
 114 The parser cannot build a completely correct parse tree without symbol information.
 115 This phase corrects mis-parses based on symbols:
 116 - Array element assignments may be parsed as statement functions: `a(i) = ...`
 117 - Namelist group names without `NML=` may be parsed as format expressions
 118 - A file unit number expression may be parsed as a character variable
 119
 120 This phase also produces an internal error if it finds a `Name` that does not
 121 have its `symbol` data member filled in. This error is suppressed if other
 122 errors have occurred because in that case a `Name` corresponding to an erroneous
 123 symbol may not be resolved.
 124
 125 ### Expression analysis
 126
 127 Expressions that occur in the specification part are analyzed during name
 128 resolution, for example, initial values, array bounds, type parameters.
 129 Any remaining expressions are analyzed in this phase.
 130
 131 For each `Variable` and top-level `Expr` (i.e. one that is not nested below
 132 another `Expr` in the parse tree) the analyzed form of the expression is saved
 133 in the `typedExpr` data member. After this phase has completed, the analyzed
 134 expression can be accessed using `semantics::GetExpr()`.
 135
 136 This phase also corrects mis-parses based on the result of expression analysis:
 137 - An expression like `a(b)` is parsed as a function reference but may need
 138   to be rewritten to an array element reference (if `a` is an object entity)
 139   or to a structure constructor (if `a` is a derive type)
 140 - An expression like `a(b:c)` is parsed as an array section but may need to be
 141   rewritten as a substring if `a` is an object with type CHARACTER
 142
 143 ### Statement semantics
 144
 145 Multiple independent checkers driven by the `SemanticsVisitor` framework
 146 perform the remaining semantic checks.
 147 By this phase, all names and expressions that can be successfully resolved
 148 have been. But there may be names without symbols or expressions without
 149 analyzed form if errors occurred earlier.
 150
 151 ### Initialization processing
 152
 153 Fortran supports many means of specifying static initializers for variables,
 154 object pointers, and procedure pointers, as well as default initializers for
 155 derived type object components, pointers, and type parameters.
 156
 157 Non-pointer static initializers of variables and named constants are
 158 scanned, analyzed, folded, scalar-expanded, and validated as they are
 159 traversed during declaration processing in name resolution.
 160 So are the default initializers of non-pointer object components in
 161 non-parameterized derived types.
 162 Name constant arrays with implied shapes take their actual shape from
 163 the initialization expression.
 164
 165 Default initializers of non-pointer components and type parameters
 166 in distinct parameterized
 167 derived type instantiations are similarly processed as those instances
 168 are created, as their expressions may depend on the values of type
 169 parameters.
 170 Error messages produced during parameterized derived type instantiation
 171 are decorated with contextual attachments that point to the declarations
 172 or other type specifications that caused the instantiation.
 173
 174 Static initializations in `DATA` statements are collected, validated,
 175 and converted into static initialization in the symbol table, as if
 176 the initialized objects had used the newer style of static initialization
 177 in their entity declarations.
 178
 179 All statically initialized pointers, and default component initializers for
 180 pointers, are processed late in name resolution after all specification parts
 181 have been traversed.
 182 This allows for forward references even in the presence of `IMPLICIT NONE`.
 183 Object pointer initializers in parameterized derived type instantiations are
 184 also cloned and folded at this late stage.
 185 Validation of pointer initializers takes place later in declaration
 186 checking (below).
 187
 188 ### Declaration checking
 189
 190 Whenever possible, the enforcement of constraints and "shalls" pertaining to
 191 properties of symbols is deferred to a single read-only pass over the symbol table
 192 that takes place after all name resolution and typing is complete.
 193
 194 ### Write module files
 195
 196 Separate compilation information is written out on successful compilation
 197 of modules and submodules. These are used as input to name resolution
 198 in program units that `USE` the modules.
 199
 200 Module files are stripped down Fortran source for the module.
 201 Parts that aren't needed to compile dependent program units (e.g. action statements)
 202 are omitted.
 203
 204 The module file for module `m` is named `m.mod` and the module file for
 205 submodule `s` of module `m` is named `m-s.mod`.