1 <!--===- docs/ControlFlowGraph.md
3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4 See https://llvm.org/LICENSE.txt for license information.
5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
17 After a Fortran subprogram has been parsed, its names resolved, and all its
18 semantic constraints successfully checked, the parse tree of its
19 executable part is translated into another abstract representation,
20 namely the _control flow graph_ described in this note.
22 This second representation of the subprogram's executable part is
23 suitable for analysis and incremental modification as the subprogram
24 is readied for code generation.
25 Many high-level Fortran features are implemented by rewriting portions
26 of a subprogram's control flow graph in place.
28 ### Control Flow Graph
29 A _control flow graph_ is a collection of simple (_i.e.,_ "non-extended")
30 basic _blocks_ that comprise straight-line sequences of _actions_ with a
31 single entry point and a single exit point, and a collection of
32 directed flow _edges_ (or _arcs_) denoting all possible transitions of
33 control flow that may take place during execution from the end of
34 one basic block to the beginning of another (or itself).
36 A block that has multiple distinct successors in the flow of control
37 must end with an action that selects its successor.
39 The sequence of actions that constitutes a basic block may
40 include references to user and library procedures.
41 Subprogram calls with implicit control flow afterwards, namely
42 alternate returns and `END=`/`ERR=` labels on input/output,
43 will be lowered in translation to a representation that materializes
44 that control flow into something similar to a computed `GO TO` or
45 C language `switch` statement.
47 For convenience in optimization and to simplify the implementation of
48 data flow confluence functions, we may choose to maintain the
49 property that each flow arc is the sole outbound arc emanating from
50 its originating block, the sole inbound arc arriving at its destination,
52 Empty blocks would inserted to "split" arcs when necessary to maintain this
55 Fortran subprograms (other than internal subprograms) can have multiple
56 entry points by using the obsolescent `ENTRY` statement.
57 We will implement such subprograms by constructing a union
58 of their dummy argument lists and using it as part of the definition
59 of a new subroutine or function that can be called by each of
60 the entry points, which are then all converted into wrapper routines that
61 pass a selector value as an additional argument to drive a `switch` on entry
62 to the new subprogram.
64 This transformation ensures that every subprogram's control
65 flow graph has a well-defined `START` node.
67 Statement labels can be used in Fortran on any statement, but only
68 the labels that decorate legal destinations of `GO TO` statements
69 need to be implemented in the control flow graph.
70 Specifically, non-executable statements like `DATA`, `NAMELIST`, and
71 `FORMAT` statements will be extracted into data initialization
72 records before or during the construction of the control flow
73 graph, and will survive only as synonyms for `CONTINUE`.
75 Nests of multiple labeled `DO` loops that terminate on the same
76 label will be have that label rewritten so that `GO TO` within
77 the loop nest will arrive at the copy that most closely nests
79 The Fortran standard does not require us to do this, but XLF
80 (at least) works this way.
82 ### Expressions and Statements (Operations and Actions)
83 Expressions are trees, not DAGs, of intrinsic operations,
84 resolved function references, constant literals, and
87 Expression nodes are represented in the compiler in a type-safe manner.
88 There is a distinct class or class template for every category of
89 intrinsic type, templatized over its supported kind type parameter values.
91 Operands are storage-owning indirections to other instances
92 of `Expression`, instances of constant values, and to representations
93 of data and function references.
94 These indirections are not nullable apart from the situation in which
95 the operands of an expression are being removed for use elsewhere before
96 the expression is destructed.
98 The ranks and the extents of the shapes of the results of expressions
99 are explicit for constant arrays and recoverable by analysis otherwise.
101 Parenthesized subexpressions are scrupulously preserved in accordance with
102 the Fortran standard.
104 The expression tree is meant to be a representation that is
105 as equally well suited for use in the symbol table (e.g., for
106 a bound of an explicit shape array) as it is for an action
107 in a basic block of the control flow graph (e.g., the right
108 hand side of an assignment statement).
110 Each basic block comprises a linear sequence of _actions_.
111 These are represented as a doubly-linked list so that insertion
112 and deletion can be done in constant time.
114 Only the last action in a basic block can represent a change
115 to the flow of control.
117 ### Scope Transitions
118 Some of the various scopes of the symbol table are visible in the control flow
119 graph as `SCOPE ENTRY` and `SCOPE EXIT` actions.
120 `SCOPE ENTRY` actions are unique for their corresponding scopes,
121 while `SCOPE EXIT` actions need not be so.
122 It must be the case that
123 any flow of control within the subprogram will enter only scopes that are
124 not yet active, and exit only the most recently entered scope that has not
125 yet been deactivated; i.e., when modeled by a push-down stack that is
126 pushed by each traversal of a `SCOPE ENTRY` action,
127 the entries of the stack are always distinct, only the scope at
128 the top of the stack is ever popped by `SCOPE EXIT`, and the stack is empty
129 when the subprogram terminates.
130 Further, any references to resolved symbols must be to symbols whose scopes
133 The `DEALLOCATE` actions and calls to `FINAL` procedures implied by scoped
134 lifetimes will be explicit in the sequence of actions in the control flow
137 Parallel regions might be partially represented by scopes, or by explicit
138 operations similar to the scope entry and exit operations.
140 ### Data Flow Representation
141 The subprogram text will be in static single assignment form by the time the
142 subprogram arrives at the bridge to the LLVM IR builder.
143 Merge points are actions at the heads of basic blocks whose operands
144 are definition points; definition points are actions at the ends of
145 basic blocks whose operands are expression trees (which may refer to
148 ### Rewriting Transformations
151 #### Dynamic allocation
152 #### Array constructors
154 #### Derived type initialization, deallocation, and finalization
155 The machinery behind the complicated semantics of Fortran's derived types
156 and `ALLOCATABLE` objects will be implemented in large part by the run time
159 #### Actual argument temporaries
160 #### Array assignments, `WHERE`, and `FORALL`
162 Array operations have shape.
164 `WHERE` masks have shape.
165 Their effects on array operations are by means of explicit `MASK` operands that
166 are part of array assignment operations.
168 #### Intrinsic function and subroutine calls