flang/docs/ArrayComposition.md

   1 <!--===- docs/ArrayComposition.md
   2
   3    Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
   4    See https://llvm.org/LICENSE.txt for license information.
   5    SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
   6
   7 -->
   8
   9 # Array Composition
  10
  11 ```{contents}
  12 ---
  13 local:
  14 ---
  15 ```
  16
  17 This note attempts to describe the motivation for and design of an
  18 implementation of Fortran 90 (and later) array expression evaluation that
  19 minimizes the use of dynamically allocated temporary storage for
  20 the results of calls to transformational intrinsic functions, and
  21 making them more amenable to acceleration.
  22
  23 The transformational intrinsic functions of Fortran of interest to
  24 us here include:
  25
  26 * Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`,
  27   `DOT_PRODUCT`,
  28   `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`)
  29 * Axial reductions (`SUM(X,DIM=)`, &c.)
  30 * Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`)
  31 * Axial location reductions (`MAXLOC(DIM=`, &c.)
  32 * `TRANSPOSE(M)` matrix transposition
  33 * `RESHAPE` without `ORDER=`
  34 * `RESHAPE` with `ORDER=`
  35 * `CSHIFT` and `EOSHIFT` with scalar `SHIFT=`
  36 * `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=`
  37 * `PACK` and `UNPACK`
  38 * `MATMUL`
  39 * `SPREAD`
  40
  41 Other Fortran intrinsic functions are technically transformational (e.g.,
  42 `COMMAND_ARGUMENT_COUNT`) but not of interest for this note.
  43 The generic `REDUCE` is also not considered here.
  44
  45 ## Arrays as functions
  46
  47 A whole array can be viewed as a function that maps its indices to the values
  48 of its elements.
  49 Specifically, it is a map from a tuple of integers to its element type.
  50 The rank of the array is the number of elements in that tuple,
  51 and the shape of the array delimits the domain of the map.
  52
  53 `REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers
  54 `(J,K)` with `1<=J<=N` and `1<=J<=M` to real values.
  55
  56 ## Array expressions as functions
  57
  58 The same perspective can be taken of an array expression comprising
  59 intrinsic operators and elemental functions.
  60 Fortran doesn't allow one to apply subscripts directly to an expression,
  61 but expressions have rank and shape, and one can view array expressions
  62 as functions over index tuples by applying those indices to the arrays
  63 and subexpressions in the expression.
  64
  65 Consider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`).
  66 The right-hand side of that assignment could be evaluated into a
  67 temporary array `T` and then subscripted as it is copied into `B`.
  68 ```
  69 REAL, ALLOCATABLE :: T(:,:)
  70 ALLOCATE(T(N,M))
  71 DO CONCURRENT(J=1:N,K=1:M)
  72   T(J,K)=A(J,K) + 1.0
  73 END DO
  74 DO CONCURRENT(J=1:N,K=1:M)
  75   B(J,K)=T(J,K)
  76 END DO
  77 DEALLOCATE(T)
  78 ```
  79 But we can avoid the allocation, population, and deallocation of
  80 the temporary by treating the right-hand side expression as if it
  81 were a statement function `F(J,K)=A(J,K)+1.0` and evaluating
  82 ```
  83 DO CONCURRENT(J=1:N,K=1:M)
  84   A(J,K)=F(J,K)
  85 END DO
  86 ```
  87
  88 In general, when a Fortran array assignment to a non-allocatable array
  89 does not include the left-hand
  90 side variable as an operand of the right-hand side expression, and any
  91 function calls on the right-hand side are elemental or scalar-valued,
  92 we can avoid the use of a temporary.
  93
  94 ## Transformational intrinsic functions as function composition
  95
  96 Many of the transformational intrinsic functions listed above
  97 can, when their array arguments are viewed as functions over their
  98 index tuples, be seen as compositions of those functions with
  99 functions of the "incoming" indices -- yielding a function for
 100 an entire right-hand side of an array assignment statement.
 101
 102 For example, the application of `TRANSPOSE(A + 1.0)` to the index
 103 tuple `(J,K)` becomes `A(K,J) + 1.0`.
 104
 105 Partial (axial) reductions can be similarly composed.
 106 The application of `SUM(A,DIM=2)` to the index `J` is the
 107 complete reduction `SUM(A(J,:))`.
 108
 109 More completely:
 110 * Reductions to scalars (`SUM(X)` without `DIM=`) become
 111   runtime calls; the result needs no dynamic allocation,
 112   being a scalar.
 113 * Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)`
 114   become scalar values like `SUM(X(J,K,:))` if `d=3`.
 115 * Location reductions to indices (`MAXLOC(X)` without `DIM=`)
 116   do not require dynamic allocation, since their results are
 117   either scalar or small vectors of length `RANK(X)`.
 118 * Axial location reductions (`MAXLOC(X,DIM=)`, &c.)
 119   are handled like other axial reductions like `SUM(DIM=)`.
 120 * `TRANSPOSE(M)` exchanges the two components of the index tuple.
 121 * `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape
 122   vector `S`, and then use it to linearize indices into offsets
 123   in the storage order of `A` (whose shape must also be captured).
 124   These conversions can involve division and/or modulus, which
 125   can be optimized into a fixed-point multiplication using the
 126   usual technique.
 127 * `RESHAPE` with `ORDER=` is similar, but must permute the
 128   components of the index tuple; it generalizes `TRANSPOSE`.
 129 * `CSHIFT` applies addition and modulus.
 130 * `EOSHIFT` applies addition and a conditional move (`MERGE`).
 131 * `PACK` and `UNPACK` are likely to require a runtime call.
 132 * `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but
 133   might benefit from calling a highly optimized runtime
 134   routine.
 135 * `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply
 136   applies `A` to a reduced index tuple.
 137
 138 ## Determination of rank and shape
 139
 140 An important part of evaluating array expressions without the use of
 141 temporary storage is determining the shape of the result prior to,
 142 or without, evaluating the elements of the result.
 143
 144 The shapes of array objects, results of elemental intrinsic functions,
 145 and results of intrinsic operations are obvious.
 146 But it is possible to determine the shapes of the results of many
 147 transformational intrinsic function calls as well.
 148
 149 * `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed:
 150   `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general.
 151   (The `DIM=` argument is commonly a compile-time constant.)
 152 * `SHAPE(MAXLOC(X))` is `[RANK(X)]`.
 153 * `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed.
 154 * `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`.
 155 * `SHAPE(RESHAPE(..., SHAPE=S))` is `S`.
 156 * `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`.
 157 * `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)`
 158 * `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`.
 159 * `RANK(PACK(...))` is always 1.
 160 * `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`.
 161 * `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`.
 162 * `SHAPE(SHAPE(X))` is `[RANK(X)]`.
 163 * `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at
 164   dimension `d`.
 165
 166 This is useful because expression evaluations that *do* require temporaries
 167 to hold their results (due to the context in which the evaluation occurs)
 168 can be implemented with a separation of the allocation
 169 of the temporary array and the population of the array.
 170 The code that evaluates the expression, or that implements a transformational
 171 intrinsic in the runtime library, can be designed with an API that includes
 172 a pointer to the destination array as an argument.
 173
 174 Statements like `ALLOCATE(A,SOURCE=expression)` should thus be capable
 175 of evaluating their array expressions directly into the newly-allocated
 176 storage for the allocatable array.
 177 The implementation would generate code to calculate the shape, use it
 178 to allocate the memory and populate the descriptor, and then drive a
 179 loop nest around the expression to populate the array.
 180 In cases where the analyzed shape is known at compile time, we should
 181 be able to have the opportunity to avoid heap allocation in favor of
 182 stack storage, if the scope of the variable is local.
 183
 184 ## Automatic reallocation of allocatables
 185
 186 Fortran 2003 introduced the ability to assign non-conforming array expressions
 187 to ALLOCATABLE arrays with the implied semantics of reallocation to the
 188 new shape.
 189 The implementation of this feature also becomes more straightforward if
 190 our implementation of array expressions has decoupled calculation of shapes
 191 from the evaluation of the elements of the result.
 192
 193 ## Rewriting rules
 194
 195 Let `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into
 196 the result of an array expression or subexpression.
 197
 198 * Array constructors always yield vectors; higher-rank arrays that appear as
 199   constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`.
 200 * Array constructors with multiple constituents are concatenations of
 201   their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`.
 202 * Array constructors with implied DO loops are difficult when nested
 203   triangularly.
 204 * Whole array references can have lower bounds other than 1, so
 205   `A => A(LBOUND(A,1):UBOUND(A,1),...)`.
 206 * Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`.
 207 * Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`.
 208 * Scalar operands ignore indices: `X{j,k} => X`.
 209   Further, they are evaluated at most once.
 210 * Elemental operators and functions apply indices to their arguments:
 211   `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`.
 212 * `TRANSPOSE(X){j,k} => X{k,j}`.
 213 * `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated.
 214   If X is sufficiently expensive to compute elementally, it might be evaluated
 215   into a temporary.
 216
 217 (more...)