flang/docs/F202X.md

   1 <!--===- docs/F202X.md
   2
   3    Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
   4    See https://llvm.org/LICENSE.txt for license information.
   5    SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
   6
   7 -->
   8
   9 # A first take on Fortran 202X features for LLVM Flang
  10
  11 I (Peter Klausler) have been studying the draft PDF of the
  12 [Fortran 202X standard](https://j3-fortran.org/doc/year/23/23-007r1.pdf),
  13 which will soon be published as ISO Fortran 2023.
  14 I have compiled this summary of its changes relative to
  15 the current Fortran 2018 standard from the perspective
  16 of a [Fortran compiler](https://github.com/llvm/llvm-project/tree/main/flang)
  17 implementor.
  18
  19 ## TL;DR
  20
  21 Fortran 202X doesn't make very many changes to the language
  22 relative to Fortran 2018, which was itself a small increment
  23 over Fortran 2008.
  24 Apart from `REDUCE` clauses that were added to the
  25 [still broken](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md)
  26 `DO CONCURRENT` construct, there's little here for Fortran users
  27 to get excited about.
  28
  29 ## Priority of implementation in LLVM Flang
  30
  31 We are working hard to ensure that existing working applications will
  32 port successfully to LLVM Flang with minimal effort.
  33 I am not particularly concerned with conforming to a new
  34 standard as an end in itself.
  35
  36 The only features below that appear to have already been implemented
  37 in other compilers are the `REDUCE` clauses and the degree trigonometric
  38 intrinsic functions, so those should have priority as an aid to
  39 portability.
  40 We would want to support them earlier even if they were not in a standard.
  41
  42 The `REDUCE` clause also merits early implementation due to
  43 its potential for performance improvements in real codes.
  44 I don't see any other feature here that would be relevant to
  45 performance (maybe a weak argument could be made for `SIMPLE`).
  46 The bulk of this revision unfortunately comprises changes to Fortran that
  47 are neither performance-related, already available in
  48 some compilers, nor (obviously) in use in existing codes.
  49 I will not prioritize implementing them myself over
  50 other work until they become portability concerns or are
  51 requested by actual users.
  52
  53 Given Fortran's history of the latency between new
  54 standards and the support for their features in real compilers,
  55 and then the extra lag before the features are then actually used
  56 in codes meant to be portable, I doubt that many of the items
  57 below will have to be worked on any time soon due to user demand.
  58
  59 If J3 had chosen to add more features that were material improvements
  60 to Fortran -- and there's quite a long list of worthy candidates that
  61 were passed over, like read-only pointers -- it would have made sense
  62 for me to prioritize their implementation in LLVM Flang more
  63 urgently.
  64
  65 ## Specific change descriptions
  66
  67 The individual features added to the language are summarized
  68 in what I see as their order of significance to Fortran users.
  69
  70 ### Alert: There's a breaking change!
  71
  72 The Fortran committee used to abhor making breaking changes,
  73 apart from fixes, so that conforming codes could be portable across
  74 time as well as across compilers.
  75 Fortran 202X, however, uncharacteristically perpetrates one such
  76 change to existing semantics that will silently cause existing
  77 codes to work differently, if that change were to be implemented
  78 and enabled by default.
  79
  80 Specifically, automatic reallocation of whole deferred-length character
  81 allocatable scalars is now mandated when they appear for internal output
  82 (e.g., `WRITE(A,*) ...`)
  83 or as output arguments for some statements and intrinsic procedures
  84 (e.g., `IOMSG=`, `ERRMSG=`).
  85 So existing codes that allocate output buffers
  86 for such things will, or would, now observe that their buffers are
  87 silently changing their lengths during execution, rather than being
  88 padded with blanks or being truncated.  For example:
  89
  90 ```
  91   character(:), allocatable :: buffer
  92   allocate(character(20)::buffer)
  93   write(buffer,'F5.3') 3.14159
  94   print *, len(buffer)
  95 ```
  96
  97 prints 20 with Fortran 2018 but would print 5 with Fortran 202X.
  98
  99 There would have no problem with the new standard changing the
 100 behavior in the current error case of an unallocated variable;
 101 defining new semantics for old errors is a generally safe means
 102 for extending a programming language.
 103 However, in this case, we'll need to protect existing conforming
 104 codes from the surprising new reallocation semantics, which
 105 affect cases that are not errors.
 106
 107 When/if there are requests from real users to implement this breaking
 108 change, and if it is implemented, I'll have to ensure that users
 109 have the ability to control this change in behavior via an option &/or the
 110 runtime environment, and when it's enabled, emit a warning at code
 111 sites that are at risk.
 112 This warning should mention a source change they can make to protect
 113 themselves from this change by passing the complete substring (`A(:)`)
 114 instead of a whole character allocatable.
 115
 116 This feature reminds me of Fortran 2003's change to whole
 117 allocatable array assignment, although in that case users were
 118 put at risk only of extra runtime overhead that was needless in
 119 existing codes, not a change in behavior, and users learned to
 120 assign to whole array sections (`A(:)=...`) rather than to whole
 121 allocatable arrays where the performance hit mattered.
 122
 123 ### Major Items
 124
 125 The features in this section are expensive to implement in
 126 terms of engineering time to design, code, refactor, and test
 127 (i.e., weeks or months, not days).
 128
 129 #### `DO CONCURRENT REDUCE`
 130
 131 J3 continues to ignore the
 132 [serious semantic problems](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md)
 133 with `DO CONCURRENT`, despite the simplicity of the necessary fix and their
 134 admirable willingness to repair the standard to fix problems with
 135 other features (e.g., plugging holes in `PURE` procedure requirements)
 136 and their less admirable willingness to make breaking changes (see above).
 137 They did add `REDUCE` clauses to `DO CONCURRENT`, and those seem to be
 138 immediately useful to HPC codes and worth implementing soon.
 139
 140 #### `SIMPLE` procedures
 141
 142 The new `SIMPLE` procedures constitute a subset of F'95/HPF's `PURE`
 143 procedures.
 144 There are things that one can do in a `PURE` procedure
 145 but cannot in a `SIMPLE` one.  But the virtue of being `SIMPLE` seems
 146 to be its own reward, not a requirement to access any other
 147 feature.
 148
 149 `SIMPLE` procedures might have been more useful had `DO CONCURRENT` been
 150 changed to require callees to be `SIMPLE`, not just `PURE`.
 151
 152 The implementation of `SIMPLE` will be nontrivial: it involves
 153 some parsing and symbol table work, and some generalization of the
 154 predicate function `IsPureProcedure()`, extending the semantic checking on
 155 calls in `PURE` procedures to ensure that `SIMPLE` procedures
 156 only call other `SIMPLE` procedures, and modifying the intrinsic
 157 procedure table to note that most intrinsics are now `SIMPLE`
 158 rather than just `PURE`.
 159
 160 I don't expect any codes to rush to change their `PURE` procedures
 161 to be `SIMPLE`, since it buys little and reduces portability.
 162 This makes `SIMPLE` a lower-priority feature.
 163
 164 #### Conditional expressions and actual arguments
 165
 166 Next on the list of "big ticket" items are C-style conditional
 167 expressions.  These come in two forms, each of which is a distinct
 168 feature that would be nontrivial to implement, and I would not be
 169 surprised to see some compilers implement one before the other.
 170
 171 The first form is a new parenthesized expression primary that any C programmer
 172 would recognize.  It has straightforward parsing and semantics,
 173 but will require support in folding and all other code that
 174 processes expressions.  Lowering will be nontrivial due to
 175 control flow.
 176
 177 The second form is a conditional actual argument syntax
 178 that allows runtime selection of argument associations, as well
 179 as a `.NIL.` syntax for optional arguments to signify an absent actual
 180 argument.  This would have been more useful if it had also been
 181 allowed as a pointer assignment statement right-hand side, and
 182 that might be a worthwhile extension.  As this form is essentially
 183 a conditional variable reference it may be cleaner to have a
 184 distinct representation from the conditional expression primary
 185 in the parse tree and strongly-typed `Expr<T>` representations.
 186
 187 #### `ENUMERATION TYPE`
 188
 189 Fortran 202X has a new category of type.  The new non-interoperable
 190 `ENUMERATION TYPE` feature is like C++'s `enum class` -- not, unfortunately,
 191 a powerful sum data type as in Haskell or Rust.  Unlike the
 192 current `ENUM, BIND(C)` feature, `ENUMERATION TYPE` defines a new
 193 type name and its distinct values.
 194
 195 This feature may well be the item requiring the largest patch to
 196 the compiler for its implementation, as it affects parsing,
 197 type checking on assignment and argument association, generic
 198 resolution, formatted I/O, NAMELIST, debugging symbols, &c.
 199 It will indirectly affect every switch statement in the compiler
 200 that switches over the six (now seven) type categories.
 201 This will be a big project for little useful return to users.
 202
 203 #### `TYPEOF` and `CLASSOF`
 204
 205 Last on the list of "big ticket" items are the new TYPEOF and CLASSOF
 206 type specifiers, which allow declarations to indirectly use the
 207 types of previously-defined entities.  These would have obvious utility
 208 in a language with type polymorphism but aren't going to be very
 209 useful yet in Fortran 202X (esp. `TYPEOF`), although they would be worth
 210 supporting as a utility feature for a parametric module extension.
 211
 212 `CLASSOF` has implications for semantics and lowering that need to
 213 be thought through as it seems to provide a means of
 214 declaring polymorphic local variables and function results that are
 215 neither allocatables nor pointers.
 216
 217 #### Coarray extensions:
 218
 219  * `NOTIFY_TYPE`, `NOTIFY WAIT` statement, `NOTIFY=` specifier on image selector
 220  * Arrays with coarray components
 221
 222 #### "Rank Independent" Features
 223
 224 The `RANK(n)` attribute declaration syntax is equivalent to
 225 `DIMENSION(:,:,...,:)` or an equivalent entity-decl containing `n` colons.
 226 As `n` must be a constant expression, that's straightforward to implement,
 227 though not terribly useful until the language acquires additional features.
 228 (I can see some utility in being able to declare PDT components with a
 229 `RANK` that depends on a `KIND` type parameter.)
 230
 231 It is now possible to declare the lower and upper bounds of an explicit
 232 shape entity using a constant-length vector specification expression
 233 in a declaration, `ALLOCATE` statement, or pointer assignment with
 234 bounds remapping.
 235 For example, `real A([2,3])` is equivalent to `real A(2,3)`.
 236
 237 The new `A(@V)` "multiple subscript" indexing syntax uses an integer
 238 vector to supply a list of subscripts or of triplet bounds/strides.  This one
 239 has tough edge cases for lowering that need to be thought through;
 240 for example, when the lengths of two or more of the vectors in
 241 `A(@U,@V,@W)` are not known at compilation time, implementing the indexing
 242 would be tricky in generated code and might just end up filling a
 243 temporary with `[U,V,W]` first.
 244
 245 The obvious use case for "multiple subscripts" would be as a means to
 246 index into an assumed-rank dummy argument without the bother of a `SELECT RANK`
 247 construct, but that usage is not supported in Fortran 202X.
 248
 249 This feature may well turn out to be Fortran 202X's analog to Fortran 2003's
 250 `LEN` derived type parameters.
 251
 252 ### Minor Items
 253
 254 So much for the major features of Fortran 202X.  The longer list
 255 of minor features can be more briefly summarized.
 256
 257 #### New Edit Descriptors
 258
 259 Fortran 202X has some noncontroversial small tweaks to formatted output.
 260 The `AT` edit descriptor automatically trims character output.  The `LZP`,
 261 `LZS`, and `LZ` control edit descriptors and `LEADING_ZERO=` specifier provide a
 262 means for controlling the output of leading zero digits.
 263
 264 #### Intrinsic Module Extensions
 265
 266 Addressing some issues and omissions in intrinsic modules:
 267
 268  * LOGICAL8/16/32/64 and REAL16
 269  * IEEE module facilities upgraded to match latest IEEE FP standard
 270  * C_F_STRPOINTER, F_C_STRING for NUL-terminated strings
 271  * C_F_POINTER(LOWER=)
 272
 273 #### Intrinsic Procedure Extensions
 274
 275 The `SYSTEM_CLOCK` intrinsic function got some semantic tweaks.
 276
 277 There are new intrinsic functions for trigonometric functions in
 278 units of degrees and half-circles.
 279 GNU Fortran already supports the forms that use degree units.
 280 These should call into math library implementations that are
 281 specialized for those units rather than simply multiplying
 282 arguments or results with conversion factors.
 283  * `ACOSD`, `ASIND`, `ATAND`, `ATAN2D`, `COSD`, `SIND`, `TAND`
 284  * `ACOSPI`, `ASINPI`, `ATANPI`, `ATAN2PI`, `COSPI`, `SINPI`, `TANPI`
 285
 286 `SELECTED_LOGICAL_KIND` maps a bit size to a kind of `LOGICAL`
 287
 288 There are two new character utility intrinsic
 289 functions whose implementations have very low priority: `SPLIT` and `TOKENIZE`.
 290 `TOKENIZE` requires memory allocation to return its results,
 291 and could and should have been implemented once in some Fortran utility
 292 library for those who need a slow tokenization facility rather than
 293 requiring implementations in each vendor's runtime support library with
 294 all the extra cost and compatibilty risk that entails.
 295
 296 `SPLIT` is worse -- not only could it, like `TOKENIZE`,
 297 have been supplied by a Fortran utility library rather than being
 298 added to the standard, it's redundant;
 299 it provides nothing that cannot be already accomplished by
 300 composing today's `SCAN` intrinsic function with substring indexing:
 301
 302 ```
 303 module m
 304   interface split
 305     module procedure :: split
 306   end interface
 307   !instantiate for all possible ck/ik/lk combinations
 308   integer, parameter :: ck = kind(''), ik = kind(0), lk = kind(.true.)
 309  contains
 310   simple elemental subroutine split(string, set, pos, back)
 311     character(*, kind=ck), intent(in) :: string, set
 312     integer(kind=ik), intent(in out) :: pos
 313     logical(kind=lk), intent(in), optional :: back
 314     if (present(back)) then
 315       if (back) then
 316         pos = scan(string(:pos-1), set, .true.)
 317         return
 318       end if
 319     end if
 320     npos = scan(string(pos+1:), set)
 321     pos = merge(pos + npos, len(string) + 1, npos /= 0)
 322   end
 323 end
 324 ```
 325
 326 (The code above isn't a proposed implementation for `SPLIT`, just a
 327 demonstration of how programs could use `SCAN` to accomplish the same
 328 results today.)
 329
 330 ## Source limitations
 331
 332 Fortran 202X raises the maximum number of characters per free form
 333 source line and the maximum total number of characters per statement.
 334 Both of these have always been unlimited in this compiler (or
 335 limited only by available memory, to be more accurate.)
 336
 337 ## More BOZ usage opportunities
 338
 339 BOZ literal constants (binary, octal, and hexadecimal constants,
 340 also known as "typeless" values) have more conforming usage in the
 341 new standard in contexts where the type is unambiguously known.
 342 They may now appear as initializers, as right-hand sides of intrinsic
 343 assignments to integer and real variables, in explicitly typed
 344 array constructors, and in the definitions of enumerations.
 345
 346 ## Citation updates
 347
 348 The source base contains hundreds of references to the subclauses,
 349 requirements, and constraints of the Fortran 2018 standard, mostly in code comments.
 350 These will need to be mapped to their Fortran 202X counterparts once the
 351 new standard is published, as the Fortran committee does not provide a
 352 means for citing these items by names that are fixed over time like the
 353 C++ committee does.
 354 If we had access to the LaTeX sources of the standard, we could generate
 355 a mapping table and automate this update.