flang/docs/HighLevelFIR.md

   1 # High-Level Fortran IR (HLFIR)
   2
   3 The approach of FIR and lowering design so far was to start with the minimal set
   4 of IR operations that could allow implementing the core aspects of Fortran (like
   5 memory allocations, array addressing, runtime descriptors, and structured
   6 control flow operations). One notable aspect of the current FIR is that array
   7 and character operations are buffered (some storage is allocated for the result,
   8 and the storage is addressed to implement the operation).  While this proved
   9 functional so far, the code lowering expressions and assignments from the
  10 front-end representations (the evaluate::Expr and parser nodes) to FIR has
  11 significantly grown in complexity while it still lacks some F95 features around
  12 character array expressions or FORALL. This is mainly explained by the fact that
  13 the representation level gap is big, and a lot is happening in lowering.  It
  14 appears more and more that some intermediate steps would help to split concerns
  15 between translating the front-end representation to MLIR, implementing some
  16 Fortran concepts at a lower-level (like character or derived type assignments),
  17 and how bufferizations of character and array expressions should be done.
  18
  19 This document proposes the addition of two concepts and a set of related
  20 operations in a new dialect HLFIR to allow a simpler lowering to a higher-level
  21 FIR representation that would later be lowered to the current FIR representation
  22 via MLIR translation passes.  As a result of these additions, it is likely that
  23 the fir.array_load/fir.array_merge_store and related array operations could be
  24 removed from FIR since array assignment analysis could directly happen on the
  25 higher-level FIR representation.
  26
  27
  28 The main principles of the new lowering design are:
  29 -   Make expression lowering context independent and rather naive
  30 -   Do not materialize temporaries while lowering to FIR
  31 -   Preserve Fortran semantics/information for high-level optimizations
  32
  33 The core impact on lowering will be:
  34 -   Lowering expressions and assignments in the exact same way, regardless of
  35     whether it is an array assignment context and/or an expression inside a
  36     forall.
  37 -   Lowering transformational intrinsics in a verbatim way (no runtime calls and
  38     memory aspects yet).
  39 -   Lowering character expressions in a verbatim way (no memcpy/runtime calls
  40     and memory aspects yet).
  41 -   Argument association side effects will be delayed (copy-in/copy-out) to help
  42     inlining/function specialization to get rid of them when they are not
  43     relevant.
  44
  45
  46 ## Variable and Expression value concepts in HLFIR
  47
  48 ### Strengthening the variable concept
  49
  50 Fortran variables are currently represented in FIR as mlir::Value with reference
  51 or box type coming from special operations or block arguments. They are either
  52 the result of a fir.alloca, fir.allocmem, or fir.address_of operations with the
  53 mangled name of the variable as attribute, or they are function block arguments
  54 with the mangled name of the variable as attribute.
  55
  56 Fortran variables are defined with a Fortran type (both dynamic and static) that
  57 may have type parameters, a rank and shape (including lower bounds), and some
  58 attributes (like TARGET, OPTIONAL, VOLATILE...). All this information is
  59 currently not represented in FIR. Instead, lowering keeps track of all this
  60 information in the fir::ExtendedValue lowering data structure and uses it when
  61 needed. If unused in lowering, some information about variables is lost (like
  62 non-constant array bound expressions). In the IR, only the static type, the
  63 compile time constant extents, and compile time character lengths can be
  64 retrieved from the mlir::Value of a variable in the general case (more can be
  65 retrieved if the variable is tracked via a fir.box, but not if it is a bare
  66 memory reference).
  67
  68 This makes reasoning about Fortran variables in FIR harder, and in general
  69 forces lowering to apply all decisions related to the information that is lost
  70 in FIR. A more problematic point is that it does not allow generating debug
  71 information for the variables from FIR, since the bounds and type parameters
  72 information is not tightly linked to the base mlir::Value.
  73
  74 The proposal is to add a hlfir.declare operation that would anchor the
  75 fir::ExtendedValue information in the IR. A variable will be represented by a
  76 single SSA value with a memory type (fir.ref<T>, fir.ptr<T>, fir.heap<T>,
  77 fir.box<T>, fir.boxchar or fir.ref<fir.box<T>>). Not all memory types will be
  78 allowed for a variable: it should allow retrieving all the shape, type
  79 parameters, and dynamic type information without requiring to look-up for any
  80 defining operations. For instance, `fir.ref<fir.array<?xf32>>` will not be
  81 allowed as an HLFIR variable, and fir.box<fir.array<?xf32>> will be used
  82 instead. However, `fir.ref<fir.array<100xf32>>` will be allowed to represent an
  83 array whose lower bounds are all ones (if the lower bounds are different than
  84 one, a fir.box will still be needed).  The hlfir.declare operation will be
  85 responsible for producing the SSA value with the right memory type given the
  86 variable specifications. One notable point is that, except for the POINTER and
  87 ALLOCATABLE attributes that are retrievable from the SSA value type, other
  88 Fortran attributes (OPTIONAL, TARGET, VOLATILE...) will not be retrievable from
  89 the SSA value alone, and it will be required to access the defining
  90 hlfir.declare to get the full picture.
  91
  92 This means that semantically relevant attributes will need to be set by
  93 lowering on operations using variables when that is relevant (for instance when
  94 using an OPTIONAL variable in an intrinsic where it is allowed to be absent).
  95 Then, the optimizations passes will be allowed to look for the defining
  96 hlfir.declare, but not to assume that it must be visible.  For instance, simple
  97 contiguity of fir.box can be deduced in certain case from the hlfir.declare,
  98 and if the hlfir.declare cannot be found, transformation passes will have to
  99 assume that the variable may be non-contiguous.
 100
 101 In practice, it is expected that the passes will be able to leverage
 102 hlfir.declare in most cases, but that guaranteeing that it will always be the
 103 case would constraint the IR and optimizations too much.  The goal is also to
 104 remove the fir.box usages when possible while lowering to FIR, to avoid
 105 needlessly creating runtime descriptors for variables that do not really
 106 require it.
 107
 108 The hlfir.declare operation and restrained memory types will allow:
 109 - Pushing higher-level Fortran concepts into FIR operations (like array
 110   assignments or transformational intrinsics).
 111 - Generating debug information for the variables based on the hlfir.declare
 112   operation.
 113 - Generic Fortran aliasing analysis (currently implemented only around array
 114   assignments with the fir.array_load concept).
 115
 116 The hlfir.declare will have a sibling fir.declare operation in FIR that will
 117 allow keeping variable information until debug info is generated. The main
 118 difference is that the fir.declare will simply return its first operand,
 119 making its codegen a no-op, while hlfir.declare might change the type of
 120 its first operand to return an HLFIR variable compatible type.
 121 The fir.declare op is the only operation described by this change that will be
 122 added to FIR. The rational for this is that it is intended to survive until
 123 LLVM dialect codegeneration so that debug info generation can use them and
 124 alias information can take advantage of them even on FIR.
 125
 126 Note that Fortran variables are not necessarily named objects, they can also be
 127 the result of function references returning POINTERs. hlfir.declare will also
 128 accept such variables to be described in the IR (a unique name will be built
 129 from the caller scope name and the function name.). In general, fir.declare
 130 will allow to view every memory storage as a variable, and this will be used to
 131 describe and use compiler created array temporaries.
 132
 133 ### Adding an expression value concept in HLFIR
 134
 135 Currently, Fortran expressions can be represented as SSA values for scalar
 136 logical, integer, real, and complex expressions. Scalar character or
 137 derived-type expressions and all array expressions are buffered in lowering:
 138 their results are directly given a memory storage in lowering and are
 139 manipulated as variables.
 140
 141 While this keeps FIR simple, this makes the amount of IR generated for these
 142 expressions higher, and in general makes later optimization passes job harder
 143 since they present non-trivial patterns (with memory operations) and cannot be
 144 eliminated by naive dead code elimination when the result is unused. This also
 145 forces lowering to combine elemental array expressions into single loop nests to
 146 avoid bufferizing all array sub-expressions (which would yield terrible
 147 performance). These combinations, which are implemented using C++ lambdas in
 148 lowering makes lowering code harder to understand. It also makes the expression
 149 lowering code context dependent (especially designators lowering). The lowering
 150 code paths may be different when lowering a syntactically similar expression in
 151 an elemental expression context, in a forall context, or in a normal context.
 152
 153 Some of the combinations described in [Array Composition](ArrayComposition.md)
 154 are currently not implemented in lowering because they are less trivial
 155 optimizations, and do not really belong in lowering. However, deploying such
 156 combinations on the generated FIR with bufferizations requires the usage of
 157 non-trivial pattern matching and rewrites (recognizing temporary allocation,
 158 usage, and related runtime calls). Note that the goal of such combination is not
 159 only about inlining transformational runtime calls, it is mainly about never
 160 generating a temporary for an array expression sub-operand that is a
 161 transformational intrinsic call matching certain criteria. So the optimization
 162 pass will not only need to recognize the intrinsic call, it must understand the
 163 context it is being called in.
 164
 165 The usage of memory manipulations also makes some of the alias analysis more
 166 complex, especially when dealing with foralls (the alias analysis cannot simply
 167 follow an operand tree, it must understand indirect dependencies from operations
 168 stored in memory).
 169
 170 The proposal is to add a !hlfir.expr<T> SSA value type concept, and set of
 171 character operations (concatenation, TRIM, MAX, MIN, comparisons...), a set of
 172 array transformational operations (SUM, MATMUL, TRANSPOSE, ...), and a generic
 173 hlfir.elemental operation. The hlfir.expr<T> type is not intended to be used
 174 with scalar types that already have SSA value types (e.g., integer or real
 175 scalars).  Instead, these existing SSA types will implicitly be considered as
 176 being expressions when used in high-level FIR operations, which will simplify
 177 interfacing with other dialects that define operations with these types (e.g.,
 178 the arith dialect).
 179
 180 These hlfir.expr values could then be placed in memory when needed (assigned to
 181 a variable, passed as a procedure argument, or an IO output item...) via
 182 hlfir.assign or hlfir.associate operations that will later be described.
 183
 184 When no special optimization pass is run, a translation pass would lower the
 185 operations producing hlfir.expr to buffer allocations and memory operations just
 186 as in the currently generated FIR.
 187
 188 However, these high-level operations should allow the writing of optimization
 189 passes combining chains of operations producing hlfir.expr into optimized forms
 190 via pattern matching on the operand tree.
 191
 192 The hlfir.elemental operation will be discussed in more detail below. It allows
 193 simplifying lowering while keeping the ability to combine elemental
 194 sub-expressions into a single loop nest. It should also allow rewriting some of
 195 the transformational intrinsic operations to functions of the indices as
 196 described in [Array Composition](ArrayComposition.md).
 197
 198 ## Proposed design for HLFIR (High-Level Fortran IR)
 199
 200 ### HLFIR Operations and Types
 201
 202 #### Introduce a hlfir.expr<T> type
 203
 204 Motivation: avoid the need to materialize expressions in temporaries while
 205 lowering.
 206
 207 Syntax: ``` !hlfir.expr<[extent x]* T [, class]> ```
 208
 209 - `[extent x]*` represents the shape for arrays similarly to !fir.array<> type,
 210   except that the shape cannot be assumed rank (!hlfir.expr<..xT> is invalid).
 211   This restriction can be added because it is impossible to create an assumed
 212   rank expression in Fortran that is not a variable.
 213 - `T` is the element type of the static type
 214 - `class` flag can be set to denote that this a polymorphic expression (that the
 215   dynamic type should not be assumed to be the static type).
 216
 217
 218 examples: !hlfir.expr<fir.char<?>>, !hlfir.expr<10xi32>,
 219 !hlfir.expr<?x10x?xfir.complex<4>>
 220
 221 T in scalar hlfir.expr<T> can be:
 222 -   A character type (fir.char<10, kind>, fir.char<?, kind>)
 223 -   A derived type: (fir.type<t{...}>)
 224
 225 T in an array hlfir.expr< e1 x ex2 ..  : T> can be:
 226 -   A character or derived type
 227 -   A logical type (fir.logical<kind>)
 228 -   An integer type (i1, i32, ….)
 229 -   A floating point type (f32, f16…)
 230 -   A complex type (fir.complex<4> or mlir::complex<f32>...)
 231
 232 Some expressions may be polymorphic (for instance, MERGE can be used on
 233 polymorphic entities). The hlfir.expr type has an optional "class" flag to
 234 denote this: hlfir.expr<T, class>.
 235
 236 Note that the ALLOCATABLE, POINTER, TARGET, VOLATILE, ASYNCHRONOUS, OPTIONAL
 237 aspects do not apply to expressions, they apply to variables.
 238
 239 It is possible to query the following about an expression:
 240 -   What is the extent : via hlfir.get_extent %expr, dim
 241 -   What are the length parameters: via hlfir.get_typeparam %expr [, param_name]
 242 -   What is the dynamic type: via hlfir.get_dynamic_type %expr
 243
 244 It is possible to get the value of an array expression element:
 245 - %element = hlfir.apply %expr, %i, %j : (!hlfir.expr<T>, index index) ->
 246   hlfir.expr<ScalarType> | AnyConstantSizeScalarType
 247
 248 It is not directly possible to take an address for the expression, but an
 249 expression value can be associated to a new variable whose address can be used
 250 (required when passing the expression in a user call, or to concepts that are
 251 kept low level in FIR, like IO runtime calls).  The variable created may be a
 252 compiler created temporary, or may relate to a Fortran source variable if this
 253 mechanism is used to implement ASSOCIATE.
 254
 255 -   %var = hlfir.associate %expr [attributes about the association]->
 256     AnyMemoryOrBoxType
 257 -   hlfir.end_association %var
 258
 259 The intention is that the hlfir.expr<T> is the result of an operation, and
 260 should most often not be a block argument. This is because the hlfir.expr is
 261 mostly intended to allow combining chains of operations into more optimal
 262 forms. But it is possible to represent any expression result via a Fortran
 263 runtime descriptor (fir.box<T>), implying that if a hlfir.expr<T> is passed as
 264 a block argument, the expression bufferization pass will evaluate the operation
 265 producing the expression in a temporary, and transform the block operand into a
 266 fir.box describing the temporary. Clean-up for the temporary will be inserted
 267 after the last use of the hlfir.expr. Note that, at least at first, lowering
 268 may help FIR to find the last use of a hlfir.expr by explicitly inserting a
 269 hlfir.finalize %expr operation that may turn into a no-op if the expression is
 270 not later materialized in memory.
 271
 272 It is nonetheless not intended that such abstract types be used as block
 273 arguments to avoid introducing allocations and descriptor manipulations.
 274
 275 #### hlfir.declare operation
 276
 277 Motivation: represent variables, linking together a memory storage, shape,
 278 length parameters, attributes and the variable name.
 279
 280 Syntax:
 281 ```
 282 %var = hlfir.declare %base [shape %extent1, %extent2, ...] [lbs %lb1, %lb2, ...] [typeparams %l1, ...] {fir.def = mangled_variable_name, attributes} : [(....) ->] T1, T2
 283 ```
 284
 285 %var#0 will have a FIR memory type that is allowed for HLFIR variables. %var#1
 286 will have the same type as %base, it is intended to be used when lowering HLFIR
 287 to FIR in order to avoid creating unnecessary fir.box (that would become
 288 runtime descriptors). When an HLFIR operation has access to the defining
 289 hlfir.declare of its variable operands, the operation codegen will be allowed
 290 to replace the %var#0 reference by the simpler %var#1 reference.
 291
 292 - Extents should only be provided if %base is not a fir.box and the entity is an
 293   array.
 294 - lower bounds should only be provided if the entity is an array and the lower
 295   bounds are not default (all ones). It should also not be provided for POINTERs
 296   and ALLOCATABLES since the lower bounds may change.
 297 - type parameters should be provided for entities with length parameters, unless
 298   the entity is a CHARACTER where the length is constant in %base type.
 299 - The attributes will include the Fortran attributes: TARGET (fir.target),
 300   POINTER (fir.ptr), ALLOCATABLE (fir.alloc), CONTIGUOUS (fir.contiguous),
 301   OPTIONAL (fir.optional), VOLATILE (fir.volatile), ASYNCHRONOUS (fir.async).
 302   They will also indicate when an entity is part of an equivalence by giving the
 303   equivalence name (fir.equiv = mangled_equivalence_name).
 304
 305 hlfir.declare will be used for all Fortran variables, except the ones created via
 306 the ASSOCIATE construct that will use hlfir.associate described below.
 307
 308 hlfir.declare will also be used when creating compiler created temporaries, in
 309 which case the fir.tmp attribute will be given.
 310
 311 Examples:
 312
 313 | FORTRAN                                   | HLFIR                                                                                                                                                                                                                    |
 314 | ----------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 315 | REAL :: X                                 | %mem = fir.alloca f32 <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx"} : fir.ref<f32>, fir.ref<f32>                                                                                                                  |
 316 | REAL, TARGET :: X(10)                     | %mem = fir.alloca f32 <br> %nval = fir.load %n <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx", fir.target} : fir.ref<fir.array<10xf32>>, fir.ref<fir.array<10xf32>>                                                 |
 317 | REAL :: X(N)                              | %mem = // … alloc or dummy argument <br> %nval = fir.load %n : i64 <br> %x = hlfir.declare %mem shape %nval {fir.def = "\_QPfooEx"} : (i64) -> fir.box<fir.array<?xf32>>, fir.ref<fir.array<?xf32>>                      |
 318 | REAL :: X(0:)                             | %mem = // … dummy argument <br> %c0 = arith.constant 0 : index <br> %x = hlfir.declare %mem lbs %c0 {fir.def = "\_QPfooEx"} : (index) -> fir.box<fir.array<?xf32>>, fir.box<fir.array<?xf32>>                            |
 319 | <br>REAL, POINTER :: X(:)                 | %mem = // … dummy argument, or local, or global <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx", fir.ptr} :  fir.ref<fir.box<fir.ptr<fir.array<?xf32>>>>, fir.ref<fir.box<fir.ptr<fir.array<?xf32>>>>                |
 320 | REAL, ALLOCATABLE :: X(:)                 | %mem = // … dummy argument, or local, or global <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx", fir.alloc} :  fir.ref<fir.box<fir.heap<fir.array<?xf32>>>>, fir.ref<fir.box<fir.heap<fir.array<?xf32>>>>            |
 321 | CHARACTER(10) :: C                        | %mem = //  … dummy argument, or local, or global <br> %c = hlfir.declare %mem lbs %c0 {fir.def = "\_QPfooEc"} :  fir.ref<fir.char<10>>, fir.ref<fir.char<10>>                                                            |
 322 | CHARACTER(\*) :: C                        | %unbox = fir.unbox %bochar (fir.boxchar<1>) -> (fir.ref<fir.char<?>>, index) <br> %c = hlfir.declare %unbox#0 typeparams %unbox#1 {fir.def = "\_QPfooEc"} : (index) ->  fir.boxchar<1>, fir.ref<fir.char<?>>             |
 323 | CHARACTER(\*), OPTIONAL, ALLOCATABLE :: C | %mem = // … dummy argument <br> %c = hlfir.declare %mem {fir.def = "\_QPfooEc", fir.alloc, fir.optional, fir.assumed\_len\_alloc} :  fir.ref<fir.box<fir.heap<fir.char<?>>>>, fir.ref<fir.box<fir.heap<fir.char<?>>>>    |
 324 | TYPE(T) :: X                              | %mem = //  … dummy argument, or local, or global <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx"} : fir.ref<fir.type<t{...}>>, fir.ref<fir.type<t{...}>>                                                             |
 325 | TYPE(T(L)) :: X                           | %mem = //  … dummy argument, or local, or global <br> %lval = fir.load %l <br> %x = hlfir.declare %mem typeparams %lval {fir.def = "\_QPfooEx"} : fir.box<fir.type<t{...}>>, fir.box<fir.type<t{...}>>                   |
 326 | CLASS(\*), POINTER :: X                   | %mem = //  … dummy argument, or local, or global <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx", fir.ptr} : fir.class<fir.ptr<None>>  fir.class<fir.ptr<None>>                                                      |
 327 | REAL :: X(..)                             | %mem = //  … dummy argument <br> %x = hlfir.declare %mem {fir.def = "\_QPfooEx"} : fir.box<fir.array<..xf32>>, fir.box<fir.array<..xf32>>                                                                                |
 328
 329 #### fir.declare operation
 330
 331 Motivation: keep variable information available in FIR, at least with
 332 the intent to be able to produce debug information.
 333
 334 Syntax:
 335 ```
 336 %var = fir.declare %base [shape %extent1, %extent2, ...] [lbs %lb1, %lb2, ...] [typeparams %l1, ...] {fir.def = mangled_variable_name, attributes} : [(....) ->] T
 337 ```
 338
 339 %var will have the same type as %base. When no debug info is generated, the
 340 operation can be replaced by %base when lowering to LLVM. Otherwise, the
 341 operation is similar to hlfir.declare and will be produced from it.
 342
 343 #### hlfir.associate operation
 344
 345 Motivation: represent Fortran associations (both from variables and expressions)
 346 and allow keeping actual/dummy argument association information after inlining.
 347
 348 Syntax:
 349 ```
 350 %var = hlfir.associate %expr_or_var {fir.def = mangled_uniq_name, attributes} (AnyExprOrVarType) -> AnyVarType
 351 ```
 352
 353 hlfir.associate is used to represent the following associations:
 354 - Dummy/Actual association on the caller side (the callee side uses
 355   hlfir.declare).
 356 - Host association in block constructs when VOLATILE/ASYNC attributes are added
 357   locally
 358 - ASSOCIATE construct (both from variable and expressions).
 359
 360 When the operand is a variable, hlfir.associate allows changing the attributes
 361 of the variable locally, and to encode certain side-effects (like
 362 copy-in/copy-out when going from a non-contiguous variable to a contiguous
 363 variable, with the help of the related hlfir.end_association operation).
 364
 365 When the operand is an expression, hlfir.associate allows associating a storage
 366 location to an expression value.
 367
 368 A hlfir.associate must be followed by a related hlfir.end_association that will
 369 allow inserting any necessary finalization or copy-out later.
 370
 371 #### hlfir.end_association operation
 372
 373 Motivation: mark the place where some association should end and some side
 374 effects might need to occur.
 375
 376 The hlfir.end_associate is a placeholder to later insert
 377 deallocation/finalization if the variable was associated with an expression,
 378 and to insert copy-out/deallocation if the variable was associated with another
 379 variable with a copy-in.
 380
 381 Syntax:
 382 ```
 383 hlfir.end_association %var [%original_variable] {attributes}
 384 ```
 385
 386
 387 The attributes can be:
 388 -   copy_out (copy out the associated variable back into the original variable
 389     if a copy-in occurred)
 390 -   finalize_copy_in (deallocate the temporary storage for the associated
 391     variable if a copy-in occurred but the associated variable was not modified
 392     (e.g., it is intent(in))).
 393 -   finalize: indicate that a finalizer should be run on the entity associated
 394     with the variable (There is currently no way to deduce this only from the
 395     variable type in FIR). It will give the finalizer mangled name so that it
 396     can be later called.
 397
 398 If the copy_out or finalize_copy_in attribute is set, “original_variable” (the
 399 argument of the hlfir.associate that produced %var) must be provided. The
 400 rationale is that the original variable address is needed to verify if a
 401 temporary was created, and if needed, to copy the data back to it.
 402
 403 #### hlfir.finalize
 404
 405 Motivation: mark end of life of local variables
 406
 407 Mark the place where a local variable will go out of scope. The main goal is to
 408 retain this information even after local variables are inlined.
 409
 410 Syntax:
 411 ```
 412 hlfir.finalize %var {attributes}
 413 ```
 414
 415 The attributes can be:
 416 -   finalize: indicate that a finalizer should be run on the entity associated
 417     with the variable (There is currently no way to deduce this only from the
 418     variable type in FIR).
 419
 420 Note that finalization will not free the local variable storage if it was
 421 allocated on the heap. If lowering created the storage passed to hlfir.declare
 422 via a fir.allocmem, lowering should insert a fir.freemem after the
 423 hlfir.finalize.  This could help making fir.allocmem to fir.alloca promotion
 424 simpler, and also because finalization may be run without the intent to
 425 deallocate the variable storage (like on INTENT(OUT) dummies).
 426
 427
 428 #### hlfir.designate
 429
 430 Motivation: Represent designators at a high-level and allow representing some
 431 information about derived type components that would otherwise be lost, like
 432 component lower bounds.
 433
 434 Represent Fortran designators in a verbatim way: both triplet, and component
 435 parts.
 436
 437 Syntax:
 438 ```
 439 %var = hlfir.designate %base [“component”,] [(%i, %k:l%:%m)] [substr ub, lb] [imag|real] [shape extent1, extent2, ....] [lbs lb1, lb2, .....] [typeparams %l1, ...] {attributes}
 440 ```
 441
 442 hlfir.designate is intended to encode a single part-ref (as defined by the
 443 fortran standard). That means that a(:)%x(i, j, k) must be split into two
 444 hlfir.designate: one for a(:), and one for x(i, j, k).  If the base is ranked,
 445 and the component is an array, the subscripts are mandatory and must not
 446 contain triplets. This ensures that the result of a fir.designator cannot be a
 447 "super-array".
 448
 449 The subscripts passed to hlfir.designate must be based on the base lower bounds
 450 (one by default).
 451
 452 A substring is built by providing the lower and upper character indices after
 453 `substr`. Implicit substring bounds must be made explicit by lowering.  It is
 454 not possible to provide substr if a component is already provided. Instead the
 455 related Fortran designator must be split into two fir.designator. This is
 456 because the component character length will be needed to compute the right
 457 stride, and it might be lost if not placed on the first designator typeparams.
 458
 459 Real and Imaginary complex parts are represented by an optional imag or real
 460 tag. It can be added even if there is already a component.
 461
 462 The shape, lower bound, and type parameter operands represent the output entity
 463 properties. The point of having those made explicit is to allow early folding
 464 and hoisting of array section shape and length parameters (which especially in
 465 FORALL contexts, can simplify later assignment temporary insertion a lot). Also,
 466 if lower bounds of a derived type component array could not be added here, they
 467 would be lost since they are not represented by other means in FIR (the fir.type
 468 does not include this information).
 469
 470 hlfir.designate is not intended to describe vector subscripted variables.
 471 Instead, lowering will have to introduce loops to do element by element
 472 addressing. See the Examples section. This helps keeping hlfir.designate simple,
 473 and since the contexts where a vector subscripted entity is considered to be a
 474 variable (in the sense that it can be modified) are very limited, it seems
 475 reasonable to have lowering deal with this aspect. For instance, a vector
 476 subscripted entity cannot be passed as a variable, it cannot be a pointer
 477 assignment target, and when it appears as an associated entity in an ASSOCIATE,
 478 the related variable cannot be modified.
 479
 480 #### hlfir.assign
 481
 482 Motivation: represent assignment at a high-level (mainly a change for array and
 483 character assignment) so that optimization pass can clearly reason about it
 484 (value propagation, inserting temporary for right-hand side evaluation only when
 485 needed), and that lowering does not have to implement it all.
 486
 487 Syntax:
 488 ```
 489 hlfir.assign %expr_or_var to %var [attributes]
 490 ```
 491
 492 The attributes can be:
 493
 494 -   realloc: mark that assignment has F2003 semantics and that the left-hand
 495     side may have to be deallocated/reallocated…
 496 -   use_assign=@function: mark a user defined assignment
 497 -   no_overlap: mark that an assignment does not need a temporary (added by an
 498     analysis pass).
 499 -   unordered : mark that an assignment can happen in any element order (not
 500     true if there is an impure elemental function being called).
 501 -   temporary_lhs: mark that the left hand side of the assignment is
 502     a compiler generated temporary.
 503
 504 This will replace the current array_load/array_access/array_merge semantics.
 505 Instead, a more generic alias analysis will be performed on the LHS and RHS to
 506 detect aliasing, and a temporary inserted if needed. The alias analysis will
 507 look at all the memory references in the RHS operand tree and base overlap
 508 decisions on the related variable declaration operations. This same analysis
 509 should later allow moving/merging some expression evaluation between different
 510 statements.
 511
 512 Note about user defined assignments: semantics is resolving them and building
 513 the related subroutine call. So a fir.call could directly be made in lowering if
 514 the right hand side was always evaluated in a temporary. The motivation to use
 515 hlfir.assign is to help the temporary removal, and also to deal with two edge
 516 cases: user assignment in a FORALL (the forall pass will need to understand that
 517 this an assignment), and allocatable assignment mixed with user assignment
 518 (implementing this as a call in lowering would require lowering the whole
 519 reallocation logic in lowering already, duplicating the fact that hlfir.assign
 520 should deal with it).
 521
 522 #### hlfir.ptr_assign
 523
 524 Motivation: represent pointer assignment without lowering the exact pointer
 525 implementation (descriptor address, fir.ref<fir.box> or simple pointer scalar
 526 fir.llvm_ptr<fir.ptr>).
 527
 528 Syntax:
 529 ```
 530 hlfir.ptr_assign %var [[reshape %reshape] | [lbounds %lb1, …., %lbn]] to %ptr
 531 ```
 532
 533 It is important to keep pointer assignment at a high-level so that they can
 534 later correctly be processed in hlfir.forall.
 535
 536 #### hlfir.allocate
 537
 538 Motivation: keep POINTER and ALLOCATABLE allocation explicit in HLFIR, while
 539 allowing later lowering to either inlined fir.allocmem or Fortran runtime
 540 calls. Generating runtime calls allow the runtime to do Fortran specific
 541 bookkeeping or flagging and to provide better runtime error reports.
 542
 543 The main difference with the ALLOCATE statement is that one distinct
 544 hlfir.allocate has to be created for each element of the allocation-list.
 545 Otherwise, it is a naive lowering of the ALLOCATE statement.
 546
 547 Syntax:
 548 ```
 549 %stat = hlfir.allocate %var [%shape] [%type_params] [[src=%source] | [mold=%mold]] [errmsg =%errmsg]
 550 ```
 551
 552 #### hlfir.deallocate
 553
 554 Motivation: keep deallocation explicit in HLFIR, while allowing later lowering
 555 to Fortran runtime calls to allow the runtime to do Fortran specific
 556 bookkeeping or flagging of allocations.
 557
 558 Similarly to hlfir.allocate, one operation must be created for each
 559 allocate-object-list object.
 560
 561 Syntax:
 562 ```
 563 %stat = hlfir.deallocate %var [errmsg=err].
 564 ```
 565
 566 ####  hlfir.elemental
 567
 568 Motivation: represent elemental operations without defining array level
 569 operations for each of them, and allow the representation of array expressions
 570 as function of the indices.
 571
 572 The hlfir.elemental operation can be seen as a closure: it is defining a
 573 function of the indices that returns the value of the element of the
 574 represented array expression at the given indices. This an operation with an
 575 MLIR region. It allows detailing how an elemental expression is implemented at
 576 the element level, without yet requiring materializing the operands and result
 577 in memory.  The hlfir.expr<T> elements value can be obtained using hlfir.apply.
 578
 579 The element result is built with a fir.result op, whose result type can be a
 580 scalar hlfir.expr<T> or any scalar constant size types (e.g. i32, or f32).
 581
 582 Syntax:
 583 ```
 584 %op = hlfir.elemental (%indices) %shape [%type_params] [%dynamic_type] {
 585   ….
 586   fir.result %result_element
 587 }
 588 ```
 589
 590
 591 Note that %indices are not operands, they are the elemental region block
 592 arguments, representing the array iteration space in a one based fashion.
 593 The choice of using one based indices is to match Fortran default for
 594 array variables, so that there is no need to generate bound adjustments
 595 when working with one based array variables in an expression.
 596
 597 Illustration: “A + B” represented with a hlfir.elemental.
 598
 599 ```
 600 %add = hlfir.elemental (%i:index, %j:index) shape %shape (!fir.shape<2>) -> !hlfir.expr<?x?xf32> {
 601   %belt = hlfir.designate %b, %i, %j : (!fir.box<!fir.array<?x?xf32>>, index, index) -> !fir.ref<f32>
 602   %celt = hlfir.designate %c, %i, %j : (!fir.box<!fir.array<?x?xf32>>, index, index) -> !fir.ref<f32>
 603   %bval = fir.load %belt : (!fir.ref<f32>) -> f32
 604   %cval = fir.load %celt : (!fir.ref<f32>) -> f32
 605   %add = arith.addf %bval, %cval : f32
 606   fir.result %res : f32
 607 }
 608 ```
 609
 610 In contexts where it can be proved that the array operands were not modified
 611 between the hlfir.elemental and the hlfir.apply, the region of the
 612 hlfir.elemental can be inlined at the hlfir.apply. Otherwise, if there is no
 613 such guarantee, or if the hlfir.elemental is not “visible” (because its result
 614 is passed as a block argument), the hlfir.elemental will be lowered to an array
 615 temporary. This will be done as a HLFIR to HLFIR optimization pass. Note that
 616 MLIR inlining could be used if hlfir.elemental implemented the
 617 CallableInterface and hlfir.apply the CallInterface.  But MLIR generic inlining
 618 is probably too generic for this case: no recursion is possible here, the call
 619 graphs are trivial, and using MLIR inlining here could introduce later
 620 conflicts or make normal function inlining more complex because FIR inlining
 621 hooks would already be used.
 622
 623 hlfir.elemental allows delaying elemental array expression buffering and
 624 combination. Its generic aspect has two advantages:
 625 - It avoids defining one operation per elemental operation or intrinsic,
 626   instead, the related arith dialect operations can be used directly in the
 627   elemental regions. This avoids growing HLFIR and having to maintain about a
 628   hundred operations.
 629 - It allows representing transformational intrinsics as functions of the indices
 630   while doing optimization as described in
 631   [Array Composition](ArrayComposition.md). This because the indices can be
 632   transformed inside the region before being applied to array variables
 633   according to any kind of transformation (semi-affine or not).
 634
 635
 636 #### Introducing the hlfir.apply operation
 637
 638 Motivation: provide a way to get the element of an array expression
 639 (hlfir.expr<?x…xT>)
 640
 641 This is the addressing equivalent for expressions. A notable difference is that
 642 it can only take simple scalar indices (no triplets) because it is not clear
 643 why supporting triplets would be needed, and keeping the indexing simple makes
 644 inlining of hlfir.elemental much easier.
 645
 646 If hlfir.elemental inlining is not performed, or if the hlfir.expr<T> array
 647 expression is produced by another operation (like fir.intrinsic) that is not
 648 rewritten, hlfir.apply will be lowered to an actual addressing operation that
 649 will address the temporary that was created for the hlfir.expr<T> value that
 650 was materialized in memory.
 651
 652 hlfir.apply indices will be one based to make further lowering simpler.
 653
 654 Syntax:
 655 ```
 656 %element = hlfir.apply %array_expr %i, %j: (hlfir.expr<?x?xi32>) -> i32
 657 ```
 658
 659 #### Introducing operations for transformational intrinsic functions
 660
 661 Motivation: Represent transformational intrinsics functions at a high-level so
 662 that they can be manipulated easily by the optimizer, and do not require
 663 materializing the result as a temporary in lowering.
 664
 665 An operation will be added for each Fortran transformational functions (SUM,
 666 MATMUL, TRANSPOSE....). It translates the Fortran expression verbatim: it takes
 667 the same number of arguments as the Fortran intrinsics and returns a
 668 hlfir.expr<T>. The arguments may be hlfir.expr<T>, simple scalar types (e.g.,
 669 i32, f32), or variables.
 670
 671 The exception being that the arguments that are statically absent would be
 672 passed to it (passing results of fir.absent operation), so that the arguments
 673 can be identified via their positions.
 674
 675 This operation is meant for the transformational intrinsics, not the elemental
 676 intrinsics, that will be implemented using hlfir.elemental + mlir math dialect
 677 operations, nor the intrinsic subroutines (like random_seed or system_clock),
 678 that will be directly lowered in lowering.
 679
 680 Syntax:
 681 ```
 682 %res = hlfir."intrinsic_name" %expr_or_var, ...
 683 ```
 684
 685 These operations will all inherit a same operation base in tablegen to make
 686 their definition and identification easy.
 687
 688 Without any optimization, codegen would then translate the operations to
 689 exactly the same FIR as currently generated by IntrinsicCall.cpp (runtime calls
 690 or inlined code with temporary allocation for array results). The fact that
 691 they are the verbatim Fortran translations should allow to move the lowering
 692 code to a translation pass without massive changes.
 693
 694 An operation will at least be created for each of the following transformational
 695 intrinsics: all, any, count, cshift, dot_product, eoshift, findloc, iall, iany,
 696 iparity, matmul, maxloc, maxval, minloc, minval, norm2, pack, parity, product,
 697 reduce, repeat, reshape, spread, sum, transfer, transpose, trim, unpack.
 698
 699 For the following transformational intrinsics, the current lowering to runtime
 700 call will probably be used since there is little point to keep them high level:
 701 - command_argument_count, get_team, null, num_images, team_number, this_image
 702   that are more program related (and cannot appear for instance in constant
 703   expressions)
 704 - selected_char_kind, selected_int_kind, selected_real_kind that returns scalar
 705   integers
 706
 707 #### Introducing operations for composed intrinsic functions
 708
 709 Motivation: optimize commonly composed intrinsic functions (e.g.
 710 MATMUL(TRANSPOSE(a), b)). This optimization is implemented in Classic Flang.
 711
 712 An operation and runtime function will be added for each commonly used
 713 composition of intrinsic functions. The operation will be the canonical way to
 714 write this chained operation (the MLIR canonicalization pass will rewrite the
 715 operations for the composed intrinsics into this one operation).
 716
 717 These new operations will be treated as though they were standard
 718 transformational intrinsic functions.
 719
 720 The composed intrinsic operation will return a hlfir.expr<T>. The arguments
 721 may be hlfir.expr<T>, boxed arrays, simple scalar types (e.g. i32, f32), or
 722 variables.
 723
 724 To keep things simple, these operations will only match one form of the composed
 725 intrinsic functions: therefore there will be no optional arguments.
 726
 727 Syntax:
 728 ```
 729 %res = hlfir."intrinsic_name" %expr_or_var, ...
 730 ```
 731
 732 The composed intrinsic operation will be lowered to a `fir.call` to the newly
 733 added runtime implementation of the operation.
 734
 735 These operations should not be added where the only improvement is to avoid
 736 creating a temporary intermediate buffer which would otherwise be removed by
 737 intelligent bufferization of a hlfir.expr. Similarly, these should not replace
 738 profitable uses of hlfir.elemental.
 739
 740 #### Introducing operations for character operations and elemental intrinsic functions
 741
 742
 743 Motivation: represent character operations without requiring the operand and
 744 results to be materialized in memory.
 745
 746 fir.char_op is intended to represent:
 747 -  Character concatenation (//)
 748 -  Character MIN/MAX
 749 -  Character MERGE
 750 -  “SET_LENGTH”
 751 -  Character conversions
 752 -  REPEAT
 753 -  INDEX
 754 -  CHAR
 755 -  Character comparisons
 756 -  LEN_TRIM
 757
 758 The arguments must be scalars, the elemental aspect should be handled by a
 759 hlfir.elemental operation.
 760
 761 Syntax:
 762 ```
 763 %res = hlfir.“char_op” %expr_or_var
 764 ```
 765
 766 Just like for the transformational intrinsics, if no optimization occurs, these
 767 operations will be lowered to memory operations with temporary results (if the
 768 result is a character), using the same generation code as the one currently used
 769 in lowering.
 770
 771 #### hlfir.array_ctor
 772
 773 Motivation: represent array constructor without creating temporary
 774
 775 Many array constructors have a limited number of elements (less than 10), the
 776 current lowering of array constructor is rather complex because it must deal
 777 with the generic cases.
 778
 779 Having a representation to represent array constructor will allow an easier
 780 lowering of array constructor, and make array ctor a lot easier to manipulate.
 781 For instance, for small array constructors, loops could could be unrolled with
 782 the array ctor elements without ever creating a dynamically allocated array
 783 temporary and loop nest using it.
 784
 785 Syntax:
 786 ```
 787 %array_ctor = hlfir.array_ctor %expr1, %expr2 ….
 788 ```
 789
 790 Note that hlfir.elemental could be used to implement some ac-implied-do,
 791 although this is not yet clarified since ac-implied-do may contain more than
 792 one scalar element (they may contain a list of scalar and array values, which
 793 would render the representation in a hlfir.elemental tricky, but maybe not
 794 impossible using if/then/else and hlfir.elemental nests using the index value).
 795 One big issue though is that hlfir.elemental requires the result shape to be
 796 pre-computed (it is an operand), and with an ac-implied-do containing user
 797 transformational calls returning allocatable or pointer arrays, it is
 798 impossible to pre-evaluate the shape without evaluating all the function calls
 799 entirely (and therefore all the array constructor elements).
 800
 801 #### hlfir.get_extent
 802
 803 Motivation: inquire about the extent of a hlfir.expr, variable, or fir.shape
 804
 805 Syntax:
 806 ```
 807 %extent = hlfir.get_extent %shape_expr_or_var, dim
 808 ```
 809
 810 dim is a constant integer attribute.
 811
 812 This allows inquiring about the extents of expressions whose shape may not be
 813 yet computable without generating detailed, low level operations (e.g, for some
 814 transformational intrinsics), or to avoid going into low level details for
 815 pointer and allocatable variables (where the descriptor needs to be read and
 816 loaded).
 817
 818 #### hlfir.get_typeparam
 819
 820 Motivation: inquire about the type parameters of a hlfir.expr, or variable.
 821
 822 Syntax:
 823 ```
 824 %param = hlfir.get_typeparam %expr_or_var [, param_name]
 825 ```
 826 - param_name is an optional string attribute that must contain the length
 827   parameter name if %expr_or_var is a derived type.
 828
 829 ####  hlfir.get_dynamic_type
 830
 831 Motivation: inquire about the dynamic type of a polymorphic hlfir.expr or
 832 variable.
 833
 834 Syntax:
 835 ```
 836 %dynamic_type = hlfir.get_dynamic_type %expr_or_var
 837 ```
 838
 839 #### hlfir.get_lbound
 840
 841 Motivation: inquire about the lower bounds of variables without digging into
 842 the implementation details of pointers and allocatables.
 843
 844 Syntax:
 845 ```
 846 %lb = hlfir.get_lbound %var, n
 847 ```
 848
 849 Note: n is an integer constant attribute for the (zero based) dimension.
 850
 851 ####  hlfir.shape_meet
 852
 853 Motivation: represent conformity requirement/information between two array
 854 operands so that later optimization can choose the best shape information
 855 source, or insert conformity runtime checks.
 856
 857 Syntax:
 858 ```
 859 %shape = hlfir.shape_meet %shape1, %shape2
 860 ```
 861
 862 Suppose A(n), B(m) are two explicit shape arrays. Currently, when A+B is
 863 lowered, lowering chose which operand shape gives the result shape information,
 864 and it is later not retrievable that both n and m can be used. If lowering
 865 chose n, but m later gets folded thanks to inlining or constant propagation, the
 866 optimization passes have no way to use this constant information to optimize the
 867 result storage allocation or vectorization of A+B.  hlfir.shape_meet intends to
 868 delay this choice until constant propagation or inlining can provide better
 869 information about n and m.
 870
 871 #### hlfir.forall
 872
 873 Motivation: segregate the Forall lowering complexity in its own unit.
 874
 875 Forall is tough to lower because:
 876 -   Lowering it in an optimal way requires analyzing several assignments/mask
 877     expressions.
 878 -   The shape of the temporary needed to store intermediate evaluation values is
 879     not a Fortran array in the general case, and cannot in the general case be
 880     maximized/pre-computed without executing the forall to compute the bounds of
 881     inner forall, and the shape of the assignment operands that may depend on
 882     the bound values.
 883 -   Mask expressions evaluation should be affected by previous assignment
 884     statements, but not by the following ones. Array temporaries may be
 885     required for the masks to cover this.
 886 -   On top of the above points, Forall can contain user assignments, pointer
 887     assignments, and assignment to whole allocatable.
 888
 889
 890 The hlfir.forall syntax would be exactly the one of a fir.do_loop. The
 891 difference would be that hlfir.assign and hlfir.ptr_assign inside hlfir.forall
 892 have specific semantics (the same as in Fortran):
 893 -   Given one hlfir.assign, all the iteration values of the LHS/RHS must be
 894     evaluated before the assignment of any value is done.
 895 -   Given two hlfir.assign, the first hlfir.assign must be fully performed
 896     before any evaluation of the operands of the second assignment is done.
 897 -   Masks (fir.if arguments), if any, should be evaluated before any nested
 898     assignments. Any assignments syntactically before the where mask occurrence
 899     must be performed before the mask evaluation.
 900
 901 Note that forall forbids impure function calls, hence, no calls should modify
 902 any other expression evaluation and can be removed if unused.
 903
 904 The translation of hlfir.forall will happen by:
 905 -   1. Determining if the where masks value may be modified by any assignments
 906     - Yes, pre-compute all masks in a pre-run of the forall loop, creating
 907       a “forall temps” (we may need a FIR concept to help here).
 908     - No, Do nothing (or indicate it is safe to evaluate masks while evaluating
 909       the rest).
 910 -   2. Determining if a hlfir.assign operand expression depends on the
 911        previous hlfir.assign left-hand side base value.
 912     - Yes, split the hlfir.assign into their own nest of hlfir.forall loops.
 913     - No, do nothing (or indicate it is safe to evaluate the assignment while
 914       evaluating previous assignments)
 915 -   3. For each assignments, check if the RHS/LHS operands value may depend
 916      on the LHS base:
 917     - Yes, split the forall loops. Insert a “forall temps” before the loops for
 918       the “smallest” part that may overlap (which may be the whole RHS, or some
 919       RHS sub-part, or some LHS indices). In the first nest, evaluate this
 920       overlapping part into the temp. In the next forall loop nest, modify the
 921       assignment to use the temporary, and add the [no_overlap] flag to indicate
 922       no further temporary is needed. Insert code to finalize the temp after its
 923       usage.
 924
 925 ## New HLFIR Transformation Passes
 926
 927 ### Mandatory Passes (translation towards lower-level representation)
 928
 929 Note that these passes could be implemented as a single MLIR pass, or successive
 930 passes.
 931
 932 -   Forall rewrites (getting rid of hlfir.forall)
 933 -   Array assignment rewrites (getting rid of array hlfir.assign)
 934 -   Bufferization: expression temporary materialization (getting rid of
 935     hlfir.expr, and all the operations that may produce it like transformational
 936     intrinsics and hlfir.elemental, hlfir.apply).
 937 -   Call interface argument association lowering (getting rid of hlfir.associate
 938     and hlfir.end_associate)
 939 -   Lowering high level operations using variables into FIR operations
 940     operating on memory (translating hlfir.designate, scalar hlfir.assign,
 941     hlfir.finalize into fir.array_coor, fir.do_loop, fir.store, fir.load.
 942     fir.embox/fir.rebox operations).
 943
 944 Note that these passes do not have to be the first one run after lowering. It is
 945 intended that CSE, DCE, algebraic simplification, inlining and some other new
 946 high-level optimization passes discused below be run before doing any of these
 947 translations.
 948
 949 After that, the current FIR pipeline could be used to continue lowering towards
 950 LLVM.
 951
 952 ### Optimization Passes
 953
 954 -   Elemental expression inlining (inlining of hlfir.elemental in hlfir.apply)
 955 -   User function Inlining
 956 -   Transformational intrinsic rewrites as hlfir.elemental expressions
 957 -   Assignments propagation
 958 -   Shape/Rank/dynamic type propagation
 959
 960 These high level optimization passes can be run any number of times in any
 961 order.
 962
 963 ## Transition Plan
 964
 965 The new higher-level steps proposed in this document will require significant
 966 refactoring of lowering. Codegen should not be impacted since the current FIR
 967 will remain untouched.
 968
 969 A lot of the code in lowering generating Fortran features (like an intrinsic or
 970 how to do assignments) is based on the fir::ExtendedValue concept. This
 971 currently is a collection of mlir::Value that allows describing a Fortran object
 972 (either a variable or an evaluated expression result). The variable and
 973 expression concepts described above should allow to keep an interface very
 974 similar to the fir::ExtendedValue, but having the fir::ExtendedValue wrap a
 975 single value or mlir::Operation* from which all of the object entity
 976 information can be inferred.
 977
 978 That way, all the helpers currently generating FIR from fir::ExtendedValue could
 979 be kept and used with the new variable and expression concepts with as little
 980 modification as possible.
 981
 982 The proposed plan is to:
 983 - 1. Introduce the new HLFIR operations.
 984 - 2. Refactor fir::ExtendedValue so that it can work with the new variable and
 985      expression concepts (requires part of 1.).
 986 - 3. Introduce the new translation passes, using the fir::ExtendedValue helpers
 987      (requires 1.).
 988 - 3.b Introduce the new optimization passes (requires 1.).
 989 - 4. Introduce the fir.declare and hlfir.finalize usage in lowering (requires 1.
 990      and 2. and part of 3.).
 991
 992 The following steps might have to be done in parallel of the current lowering,
 993 to avoid disturbing the work on performance until the new lowering is complete
 994 and on par.
 995
 996 - 5. Introduce hlfir.designate and hlfir.associate usage in lowering.
 997 - 6. Introduce lowering to hlfir.assign (with RHS that is not a hlfir.expr),
 998      hlfir.ptr_assign.
 999 - 7. Introduce lowering to hlfir.expr and related operations.
1000 - 8. Introduce lowering to hlfir.forall.
1001
1002 At that point, lowering using the high-level FIR should be in place, allowing
1003 extensive testing.
1004 - 9. Debugging correctness.
1005 - 10. Debugging execution performance.
1006
1007 The plan is to do these steps incrementally upstream, but for lowering this will
1008 most likely be safer to do have the new expression lowering implemented in
1009 parallel upstream, and to add an option to use the new lowering rather than to
1010 directly modify the current expression lowering and have it step by step
1011 equivalent functionally and performance wise.
1012
1013 ## Examples
1014
1015 ### Example 1: simple array assignment
1016
1017 ```Fortran
1018 subroutine foo(a, b)
1019   real :: a(:), b(:)
1020   a = b
1021 end subroutine
1022 ```
1023
1024 Lowering output:
1025
1026 ```
1027 func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>) {
1028   %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>
1029   %b = hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>
1030   hlfir.assign %b#0 to %a#0 : !fir.box<!fir.array<?xf32>>
1031   return
1032 }
1033 ```
1034
1035 HLFIR array assignment lowering pass:
1036 -   Query: can %b value depend on %a? No, they are two different argument
1037     associated variables that are neither target nor pointers.
1038 -   Lower to assignment to loop:
1039
1040 ```
1041 func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>) {
1042   %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>
1043   %b = hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>
1044
1045   %ashape = hlfir.shape_of %a#0
1046   %bshape = hlfir.shape_of %b#0
1047   %shape = hlfir.shape_meet %ashape, %bshape
1048   %extent = hlfir.get_extent %shape, 0
1049
1050   %c1 = arith.constant 1 : index
1051
1052   fir.do_loop %i = %c1 to %extent step %c1 unordered {
1053     %belt = hlfir.designate %b#0, %i
1054     %aelt = hlfir.designate %a#0, %i
1055     hlfir.assign %belt to %aelt : fir.ref<f32>, fir.ref<f32>
1056   }
1057   return
1058 }
1059 ```
1060
1061 HLFIR variable operations to memory translation pass:
1062 -   hlfir.designate is rewritten into fir.array_coor operation on the variable
1063     associated memory buffer, and returns the element address
1064 -   For numerical scalar, hlfir.assign is rewritten to fir.store (and fir.load
1065     of the operand if needed), for derived type and characters, memory copy
1066     (and padding for characters) is done.
1067 -   hlfir.shape_of are lowered to fir.box_dims, here, no constant information
1068     was obtained from any of the source shape, so hlfir.shape_meet is a no-op,
1069     selecting the first shape (a conformity runtime check could be inserted
1070     under debug options).
1071 -   hlfir.declare are translated into fir.declare that are no-ops and will allow
1072     generating debug information for LLVM.
1073
1074 This pass would wrap operations defining variables (hlfir.declare/hlfir.designate)
1075 as fir::ExtendedValue, and use all the current helpers operating on it
1076 (e.g.: fir::factory::genScalarAssignment).
1077
1078 ```
1079 func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1:
1080   !fir.box<!fir.array<?xf32>>) {
1081   %a = fir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>>
1082   %b = fir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>
1083   %c1 = arith.constant 1 : index
1084   %dims = fir.box_dims %a, 1
1085   fir.do_loop %i = %c1 to %dims#1 step %c1 unordered {
1086     %belt = fir.array_coor %b, %i : (!fir.box<!fir.array<?xf32>>, index) -> fir.ref<f32>
1087     %aelt = fir.array_coor %a, %i : (!fir.box<!fir.array<?xf32>>, index) -> fir.ref<f32>
1088     %bval = fir.load %belt : f32
1089     fir.store %bval to %aelt : fir.ref<f32>
1090   }
1091   return
1092 }
1093 ```
1094
1095 This reaches the current FIR level (except fir.declare that can be kept until
1096 LLVM codegen and dropped on the floor if there is no debug information
1097 generated).
1098
1099 ### Example 2: array assignment with elemental expression
1100
1101 ```Fortran
1102 subroutine foo(a, b, p, c)
1103   real, target :: a(:)
1104   real :: b(:), c(100)
1105   real, pointer :: p(:)
1106   a = b*p + c
1107 end subroutine
1108 ```
1109
1110 Lowering output:
1111
1112 ```
1113 func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {
1114   %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>, !fir.box<!fir.array<?xf32>
1115   %b =  hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>
1116   %p = hlfir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>, !fir.box<!fir.ptr<!fir.array<?xf32>>>
1117   %c =  hlfir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>, !fir.ref<!fir.array<100xf32>>
1118   %bshape = hlfir.shape_of %b#0
1119   %pshape = hlfir.shape_of %p#0
1120   %shape1 = hlfir.shape_meet %bshape, %pshape
1121   %mul = hlfir.elemental(%i:index) %shape1 {
1122     %belt = hlfir.designate %b#0, %i
1123     %p_lb = hlfir.get_lbound %p#0, 1
1124     %i_zero = arith.subi %i, %c1
1125     %i_p = arith.addi %i_zero,  %p_lb
1126     %pelt = hlfir.designate %p#0, %i_p
1127     %bval = fir.load %belt : f32
1128     %pval = fir.load %pelt : f32
1129     %mulres = arith.mulf %bval, %pval : f32
1130      fir.result %mulres : f32
1131   }
1132   %cshape = hlfir.shape_of %c
1133   %shape2 = hlfir.shape_meet %cshape, %shape1
1134   %add =  hlfir.elemental(%i:index) %shape2 {
1135     %mulval = hlfir.apply %mul, %i : f32
1136     %celt = hlfir.designate %c#0, %i
1137     %cval = fir.load %celt
1138     %add_res = arith.addf %mulval, %cval
1139     fir.result %add_res
1140   }
1141   hlfir.assign %add to %a#0 : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32>
1142   return
1143 }
1144 ```
1145
1146 Step 1: hlfir.elemental inlining: inline the first hlfir.elemental into the
1147 second one at the hlfir.apply.
1148
1149
1150 ```
1151 func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {
1152   %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>, !fir.box<!fir.array<?xf32>
1153   %b =  hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>
1154   %p = hlfir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>, !fir.box<!fir.ptr<!fir.array<?xf32>>>
1155   %c =  hlfir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>, !fir.ref<!fir.array<100xf32>>
1156   %bshape = hlfir.shape_of %b#0
1157   %pshape = hlfir.shape_of %p#0
1158   %shape1 = hlfir.shape_meet %bshape, %pshape
1159   %cshape = hlfir.shape_of %c
1160   %shape2 = hlfir.shape_meet %cshape, %shape1
1161   %add =  hlfir.elemental(%i:index) %shape2 {
1162     %belt = hlfir.designate %b#0, %i
1163     %p_lb = hlfir.get_lbound %p#0, 1
1164     %i_zero = arith.subi %i, %c1
1165     %i_p = arith.addi %i_zero,  %p_lb
1166     %pelt = hlfir.designate %p#0, %i_p
1167     %bval = fir.load %belt : f32
1168     %pval = fir.load %pelt : f32
1169     %mulval = arith.mulf %bval, %pval : f32
1170     %celt = hlfir.designate %c#0, %i
1171     %cval = fir.load %celt
1172     %add_res = arith.addf %mulval, %cval
1173     fir.result %add_res
1174   }
1175   hlfir.assign %add to %a#0 : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32>
1176   return
1177 }
1178 ```
1179
1180 Step2: alias analysis around the array assignment:
1181
1182 -   May %add value depend on %a variable?
1183 -   Gather variable and function calls in %add operand tree (visiting
1184     hlfir.elemental regions)
1185 -   Gather references to %b, %p, and %c. %p is a pointer variable according to
1186     its defining operations. It may alias with %a that is a target. -> answer
1187     yes.
1188 -   Insert temporary, and duplicate array assignments, that can be lowered to
1189     loops at that point
1190
1191 Note that the alias analysis could have already occurred without inlining the
1192 %add hlfir.elemental.
1193
1194
1195 ```
1196 func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {
1197   %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>, !fir.box<!fir.array<?xf32>
1198   %b =  hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>
1199   %p = hlfir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>, !fir.box<!fir.ptr<!fir.array<?xf32>>>
1200   %c =  hlfir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>, !fir.ref<!fir.array<100xf32>>
1201   %bshape = hlfir.shape_of %b#0
1202   %pshape = hlfir.shape_of %p#0
1203   %shape1 = hlfir.shape_meet %bshape, %pshape
1204   %cshape = hlfir.shape_of %c
1205   %shape2 = hlfir.shape_meet %cshape, %shape1
1206   %add =  hlfir.elemental(%i:index) %shape2 {
1207     %belt = hlfir.designate %b#0, %i
1208     %p_lb = hlfir.get_lbound %p#0, 1
1209     %i_zero = arith.subi %i, %c1
1210     %i_p = arith.addi %i_zero, %p_lb
1211     %pelt = hlfir.designate %p#0, %i_p
1212     %bval = fir.load %belt : f32
1213     %pval = fir.load %pelt : f32
1214     %mulval = arith.mulf %bval, %pval : f32
1215     %celt = hlfir.designate %c#0, %i
1216     %cval = fir.load %celt
1217     %add_res = arith.addf %mulval, %cval
1218     fir.result %add_res
1219   }
1220   %extent = hlfir.get_extent %shape2, 0: (fir.shape<1>) -> index
1221   %tempstorage = fir.allocmem %extent : fir.heap<fir.array<?xf32>>
1222   %temp = hlfir.declare %tempstorage, shape %extent {fir.def = QPfoo.temp001} : (index) -> fir.box<fir.array<?xf32>>, fir.heap<fir.array<?xf32>>
1223   hlfir.assign %add to %temp#0 no_overlap : hlfir.expr<?xf32>, !fir.box<!fir.array<?xf32>>
1224   hlfir.assign %temp to %a#0 : no_overlap  : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>
1225   hlfir.finalize %temp#0
1226   fir.freemem %tempstorage
1227   return
1228 }
1229 ```
1230
1231 Step 4: Lower assignments to regular loops since they have the no_overlap
1232 attribute, and inline the hlfir.elemental into the first loop nest.
1233
1234 ```
1235 func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {
1236   %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} {fir.target} : !fir.box<!fir.array<?xf32>, !fir.box<!fir.array<?xf32>
1237   %b =  hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>
1238   %p = hlfir.declare %arg2 {fir.def = "_QPfooEp", fir.ptr} : !fir.box<!fir.ptr<!fir.array<?xf32>>>, !fir.box<!fir.ptr<!fir.array<?xf32>>>
1239   %c =  hlfir.declare %arg3 {fir.def = "_QPfooEc"} : !fir.ref<!fir.array<100xf32>>, !fir.ref<!fir.array<100xf32>>
1240   %bshape = hlfir.shape_of %b#0
1241   %pshape = hlfir.shape_of %p#0
1242   %shape1 = hlfir.shape_meet %bshape, %pshape
1243   %cshape = hlfir.shape_of %c
1244   %shape2 = hlfir.shape_meet %cshape, %shape1
1245   %extent = hlfir.get_extent %shape2, 0: (fir.shape<1>) -> index
1246   %tempstorage = fir.allocmem %extent : fir.heap<fir.array<?xf32>>
1247   %temp = hlfir.declare %tempstorage, shape %extent {fir.def = QPfoo.temp001} : (index) -> fir.box<fir.array<?xf32>>, fir.heap<fir.array<?xf32>>
1248   fir.do_loop %i = %c1 to %shape2 step %c1 unordered {
1249     %belt = hlfir.designate %b#0, %i
1250     %p_lb = hlfir.get_lbound %p#0, 1
1251     %i_zero = arith.subi %i, %c1
1252     %i_p = arith.addi %i_zero,  %p_lb
1253     %pelt = hlfir.designate %p#0, %i_p
1254     %bval = fir.load %belt : f32
1255     %pval = fir.load %pelt : f32
1256     %mulval = arith.mulf %bval, %pval : f32
1257     %celt = hlfir.designate %c#0, %i
1258     %cval = fir.load %celt
1259     %add_res = arith.addf %mulval, %cval
1260     %tempelt = hlfir.designate %temp#0, %i
1261     hlfir.assign %add_res to %tempelt : f32, fir.ref<f32>
1262   }
1263   fir.do_loop %i = %c1 to %shape2 step %c1 unordered {
1264     %aelt = hlfir.designate %a#0, %i
1265     %tempelt = hlfir.designate %temp#0, %i
1266     hlfir.assign %add_res to %tempelt : f32, fir.ref<f32>
1267   }
1268   hlfir.finalize %temp#0
1269   fir.freemem %tempstorage
1270   return
1271 }
1272 ```
1273
1274 Step 5 (may also occur earlier or several times): shape propagation.
1275 -   %shape2 can be inferred from %cshape that has constant shape: the
1276     hlfir.shape_meet results can be replaced by it, and if the option is set,
1277     conformance checks can be added for %a, %b and %p.
1278 -   %temp is small, and its fir.allocmem can be promoted to a stack allocation
1279
1280 ```
1281 func.func @_QPfoo(%arg0: !fir.box<!fir.array<?xf32>>, %arg1: !fir.box<!fir.array<?xf32>>, %arg2: !fir.box<!fir.ptr<!fir.array<?xf32>>>, %arg3: !fir.ref<!fir.array<100xf32>>) {
1282   // .....
1283   %cshape = fir.shape %c100
1284   %extent = %c100
1285   // updated fir.alloca
1286   %tempstorage = fir.alloca %extent : fir.ref<fir.array<100xf32>>
1287   %temp = hlfir.declare %tempstorage, shape %extent {fir.def = QPfoo.temp001} : (index) -> fir.box<fir.array<?xf32>>, fir.heap<fir.array<?xf32>>
1288   fir.do_loop %i = %c1 to %c100 step %c1 unordered {
1289     // ...
1290   }
1291   fir.do_loop %i = %c1 to %c100 step %c1 unordered {
1292     // ...
1293   }
1294   hlfir.finalize %temp#0
1295   // deleted fir.freemem %tempstorage
1296   return
1297 }
1298 ```
1299
1300 Step 6: lower hlfir.designate/hlfir.assign in a translation pass:
1301
1302 At this point, the representation is similar to the current representation after
1303 the array value copy pass, and the existing FIR flow is used (lowering
1304 fir.do_loop to cfg and doing codegen to LLVM).
1305
1306 ### Example 3: assignments with vector subscript
1307
1308 ```Fortran
1309 subroutine foo(a, b, v)
1310   real :: a(*), b(*)
1311   integer :: v(:)
1312   a(v) = b(v)
1313 end subroutine
1314 ```
1315
1316 Lowering of vector subscripted entities would happen as follow:
1317 - vector subscripted entities would be lowered as a hlfir.elemental implementing
1318   the vector subscript addressing.
1319 - If the vector appears in a context where it can be modified (which can only
1320   be an assignment LHS, or in input IO), lowering could transform the
1321   hlfir.elemental into hlfir.forall (for assignments), or a fir.iter_while (for
1322   input IO) by inlining the elemental body into the created loops, and
1323   identifying the hlfir.designate producing the result.
1324
1325 ```
1326 func.func @_QPfoo(%arg0: !fir.ref<!fir.array<?xf32>>, %arg1: !fir.ref<!fir.array<?xf32>>, %arg2: !fir.box<<!fir.array<?xi32>>) {
1327   %a = hlfir.declare %arg0 {fir.def = "_QPfooEa"} : !fir.box<!fir.array<?xf32>>, !fir.ref<!fir.array<?xf32>>
1328   %b = hlfir.declare %arg1 {fir.def = "_QPfooEb"} : !fir.box<!fir.array<?xf32>>, !fir.ref<!fir.array<?xf32>>
1329   %v = hlfir.declare %arg2 {fir.def = "_QPfooEv"} : !fir.box<!fir.array<?xi32>>, !fir.box<!fir.array<?xi32>>
1330   %vshape = hlfir.shape_of %v : fir.shape<1>
1331   %bsection =  hlfir.elemental(%i:index) %vshape : (fir.shape<1>) -> hlfir.expr<?xf32> {
1332     %v_elt = hlfir.designate %v#0, %i : (!fir.box<!fir.array<?xi32>>, index) -> fir.ref<i32>
1333     %v_val = fir.load %v_elt : fir.ref<i32>
1334     %cast = fir.convert %v_val : (i32) -> index
1335     %b_elt = hlfir.designate %b#0, %v_val : (!fir.ref<!fir.array<?xf32>>, index) -> fir.ref<f32>
1336     %b_val = fir.load %b_elt : fir.ref<f32>
1337     fir.result %b_elt
1338   }
1339   %extent = hlfir.get_extent %vshape, 0 : (fir.shape<1>) -> index
1340   %c1 = arith.constant 1 : index
1341   hlfir.forall (%i from %c1 to %extent step %c1) {
1342     %b_section_val = hlfir.apply %bsection, %i : (hlfir.expr<?xf32>, index) -> f32
1343     %v_elt = hlfir.designate %v#0, %i : (!fir.box<!fir.array<?xi32>>, index) -> fir.ref<i32>
1344     %v_val = fir.load %v_elt : fir.ref<i32>
1345     %cast = fir.convert %v_val : (i32) -> index
1346     %a_elt = hlfir.designate %a#0, %v_val : (!fir.ref<!fir.array<?xf32>>, index) -> fir.ref<f32>
1347     hlfir.assign %b_section_val to %a_elt  : f32, fir.ref<f32>
1348   }
1349   return
1350 }
1351 ```
1352
1353 This would then be lowered as described in the examples above (hlfir.elemental
1354 will be inlined, hlfir.forall will be rewritten into normal loops taking into
1355 account the alias analysis, and hlfir.assign/hlfir.designate operations will be
1356 lowered to fir.array_coor and fir.store operations).
1357
1358 ## Alternatives that were not retained
1359
1360 ### Using a non-MLIR based mutable CFG representation
1361
1362 An option would have been to extend the PFT to describe expressions in a way
1363 that can be annotated and modified with the ability to introduce temporaries.
1364 This has been rejected because this would imply a whole new set of
1365 infrastructure and data structures while FIR is already using MLIR
1366 infrastructure, so enriching FIR seems a smoother approach and will benefit from
1367 the MLIR infrastructure experience that was gained.
1368
1369 ### Using symbols for HLFIR variables
1370
1371 #### Using attributes as pseudo variable symbols
1372
1373 Instead of restricting the memory types an HLFIR variable can have, it was
1374 force the defining operation of HLFIR variable SSA values to always be
1375 retrievable. The idea was to add a fir.ref attribute that would repeat the name
1376 of the HLFIR variable. Using such an attribute would prevent MLIR from merging
1377 two operations using different variables when merging IR blocks. (which is the
1378 main reason why the defining op may become inaccessible). The advantage of
1379 forcing the defining operation to be retrievable is that it allowed all Fortran
1380 information of variables (like attributes) to always be accessible in HLFIR
1381 when looking at their uses, and avoids requiring the introduction of fir.box
1382 usages for simply contiguous variables. The big drawback is that this implies
1383 naming all HLFIR variables, and there are many more of them than there are
1384 Fortran named variables. Naming designators with unique names was not very
1385 natural, and would make designator CSE harder. It also made inlining harder,
1386 because inlining HLFIR code without any fir.def/fir.ref attributes renaming
1387 would break the name uniqueness, which could lead to some operations using
1388 different variables to be merged, and to break the assumption that parent
1389 operations must be visible. Renaming would be possible, but would increase
1390 complexity and risks. Besides, inlining may not be the only transformation
1391 doing code motion, and whose complexity would be increased by the naming
1392 constraints.
1393
1394
1395 #### Using MLIR symbols for variables
1396
1397 Using MLIR symbols for HLFIR variables has been rejected because MLIR symbols
1398 are mainly intended to deal with globals and functions that may refer to each
1399 other before being defined. Their processing is not as light as normal values,
1400 and would require to turn every FIR operation with a region into an MLIR symbol
1401 table. This would especially be annoying since fir.designator also produces
1402 variables with their own properties, which would imply creating a lot of MLIR
1403 symbols. All the operations that both accept variable and expression operands
1404 would also either need to be more complex in order to both accept SSA values or
1405 MLIR symbol operands (or some fir.as_expr %var operation should be added to
1406 turn a variable into an expression). Given all variable definitions will
1407 dominate their uses, it seems better to use an SSA model with named attributes.
1408 Using SSA values also makes the transition and mixture with lower-level FIR
1409 operations smoother: a variable SSA usage can simply be replaced by lower-level
1410 FIR operations using the same SSA value.
1411
1412 ### Using some existing MLIR dialects for the high-level Fortran.
1413
1414 #### Why not using Linalg dialect?
1415
1416 The linalg dialects offers a powerful way to represent array operations: the
1417 linalg.generic operation takes a set of input and output arrays, a related set
1418 of affine maps to represent how these inputs/outputs are to be addressed, and a
1419 region detailing what operation should happen at each iteration point, given the
1420 input and output array elements. It seems mainly intended to optimize matmul,
1421 dot, and sum.
1422
1423 Issues:
1424
1425 -   The linalg dialect is tightly linked to the tensor/memref concepts that
1426     cannot represent byte stride based discontinuity and would most likely
1427     require FIR to use MLIR memref descriptor format to take advantage of it.
1428 -   It is not clear whether all Fortran array expression addressing can be
1429     represented as semi affine maps. For instance, vector subscripted entities
1430     can probably not, which may force creating temporaries for the related
1431     designator expressions to fit in this framework. Fortran has a lot more
1432     transformational intrinsics than matmul, dot, and sum that can and should
1433     still be optimized.
1434
1435 So while there may be benefits to use linalg at the optimization level (like
1436 rewriting fir.sum/fir.matmul to a linalg sum, with dialect types plumbing
1437 around the operand and results, to get tiling done by linalg), using it as a
1438 lowering target would not cover all Fortran needs (especially for the non
1439 semi-affine cases).
1440 So using linalg is for now left as an optimization pass opportunity in some
1441 cases that could be experimented.
1442
1443 #### Why not using Shape dialect?
1444
1445 MLIR shape dialect gives a set of operations to manipulate shapes. The
1446 shape.meet operation is exactly similar with hlfir.shape_meet, except that it
1447 returns a tensor or a shape.shape.
1448
1449 The main issue with using the shape dialect is that it is dependent on tensors.
1450 Bringing the tensor toolchain in flang for the sole purpose of manipulating
1451 shape is not seen as beneficial given that the only thing Fortran needs is
1452 shape.meet The shape dialect is a lot more complex because it is intended to
1453 deal with computations involving dynamically ranked entity, which is not the
1454 case in Fortran (assumed rank usage in Fortran is greatly limited).
1455
1456 ### Using embox/rebox and box as an alternative to fir.declare/hlfir.designate and hlfir.expr/ variable concept
1457
1458 All Fortran entities (*) can be described at runtime by a fir.box, except for
1459 some attributes that are not part of the runtime descriptors (like TARGET,
1460 OPTIONAL or VOLATILE).  In that sense, it would be possible to have
1461 fir.declare, hlfir.designate, and hlfir.associate be replaced by embox/rebox,
1462 and also to have all operation creating hlfir.expr to create fir.box.
1463
1464 This was rejected because this would lack clarity, and make embox/rebox
1465 semantics way too complex (their codegen is already non-trivial), and also
1466 because it would then not really be possible to know if a fir.box is an
1467 expression or a variable when it is an operand, which would make reasoning
1468 harder: this would already imply that expressions have been buffered, and it is
1469 not clear when looking at a fir.box if the value it describe may change or not,
1470 while a hlfir.expr value cannot change, which allows moving its usages more
1471 easily.
1472
1473 This would also risk generating too many runtime descriptors read and writes
1474 that could make later optimizations harder.
1475
1476 Hence, while this would be functionally possible, this makes the reasoning about
1477 the IR harder and would not benefit high-level optimizations.
1478
1479 (*) This not true for vector subscripted variables, but the proposed plan will
1480 also not allow creating vector subscripted variables as the result of a
1481 hlfir.designate. Lowering will deal with the assignment and input IO special
1482 case using hlfir.elemental.