flang/docs/FortranForCProgrammers.md

   1 <!--===- docs/FortranForCProgrammers.md
   2
   3    Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
   4    See https://llvm.org/LICENSE.txt for license information.
   5    SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
   6
   7 -->
   8
   9 # Fortran For C Programmers
  10
  11 ```eval_rst
  12 .. contents::
  13    :local:
  14 ```
  15
  16 This note is limited to essential information about Fortran so that
  17 a C or C++ programmer can get started more quickly with the language,
  18 at least as a reader, and avoid some common pitfalls when starting
  19 to write or modify Fortran code.
  20 Please see other sources to learn about Fortran's rich history,
  21 current applications, and modern best practices in new code.
  22
  23 ## Know This At Least
  24
  25 * There have been many implementations of Fortran, often from competing
  26   vendors, and the standard language has been defined by U.S. and
  27   international standards organizations.  The various editions of
  28   the standard are known as the '66, '77, '90, '95, 2003, 2008, and
  29   (now) 2018 standards.
  30 * Forward compatibility is important.  Fortran has outlasted many
  31   generations of computer systems hardware and software.  Standard
  32   compliance notwithstanding, Fortran programmers generally expect that
  33   code that has compiled successfully in the past will continue to
  34   compile and work indefinitely.  The standards sometimes designate
  35   features as being deprecated, obsolescent, or even deleted, but that
  36   can be read only as discouraging their use in new code -- they'll
  37   probably always work in any serious implementation.
  38 * Fortran has two source forms, which are typically distinguished by
  39   filename suffixes.  `foo.f` is old-style "fixed-form" source, and
  40   `foo.f90` is new-style "free-form" source.  All language features
  41   are available in both source forms.  Neither form has reserved words
  42   in the sense that C does.  Spaces are not required between tokens
  43   in fixed form, and case is not significant in either form.
  44 * Variable declarations are optional by default.  Variables whose
  45   names begin with the letters `I` through `N` are implicitly
  46   `INTEGER`, and others are implicitly `REAL`.  These implicit typing
  47   rules can be changed in the source.
  48 * Fortran uses parentheses in both array references and function calls.
  49   All arrays must be declared as such; other names followed by parenthesized
  50   expressions are assumed to be function calls.
  51 * Fortran has a _lot_ of built-in "intrinsic" functions.  They are always
  52   available without a need to declare or import them.  Their names reflect
  53   the implicit typing rules, so you will encounter names that have been
  54   modified so that they have the right type (e.g., `AIMAG` has a leading `A`
  55   so that it's `REAL` rather than `INTEGER`).
  56 * The modern language has means for declaring types, data, and subprogram
  57   interfaces in compiled "modules", as well as legacy mechanisms for
  58   sharing data and interconnecting subprograms.
  59
  60 ## A Rosetta Stone
  61
  62 Fortran's language standard and other documentation uses some terminology
  63 in particular ways that might be unfamiliar.
  64
  65 | Fortran | English |
  66 | ------- | ------- |
  67 | Association | Making a name refer to something else |
  68 | Assumed | Some attribute of an argument or interface that is not known until a call is made |
  69 | Companion processor | A C compiler |
  70 | Component | Class member |
  71 | Deferred | Some attribute of a variable that is not known until an allocation or assignment |
  72 | Derived type | C++ class |
  73 | Dummy argument | C++ reference argument |
  74 | Final procedure | C++ destructor |
  75 | Generic | Overloaded function, resolved by actual arguments |
  76 | Host procedure | The subprogram that contains a nested one |
  77 | Implied DO | There's a loop inside a statement |
  78 | Interface | Prototype |
  79 | Internal I/O | `sscanf` and `snprintf` |
  80 | Intrinsic | Built-in type or function |
  81 | Polymorphic | Dynamically typed |
  82 | Processor | Fortran compiler |
  83 | Rank | Number of dimensions that an array has |
  84 | `SAVE` attribute | Statically allocated |
  85 | Type-bound procedure | Kind of a C++ member function but not really |
  86 | Unformatted | Raw binary |
  87
  88 ## Data Types
  89
  90 There are five built-in ("intrinsic") types: `INTEGER`, `REAL`, `COMPLEX`,
  91 `LOGICAL`, and `CHARACTER`.
  92 They are parameterized with "kind" values, which should be treated as
  93 non-portable integer codes, although in practice today these are the
  94 byte sizes of the data.
  95 (For `COMPLEX`, the kind type parameter value is the byte size of one of the
  96 two `REAL` components, or half of the total size.)
  97 The legacy `DOUBLE PRECISION` intrinsic type is an alias for a kind of `REAL`
  98 that should be more precise, and bigger, than the default `REAL`.
  99
 100 `COMPLEX` is a simple structure that comprises two `REAL` components.
 101
 102 `CHARACTER` data also have length, which may or may not be known at compilation
 103 time.
 104 `CHARACTER` variables are fixed-length strings and they get padded out
 105 with space characters when not completely assigned.
 106
 107 User-defined ("derived") data types can be synthesized from the intrinsic
 108 types and from previously-defined user types, much like a C `struct`.
 109 Derived types can be parameterized with integer values that either have
 110 to be constant at compilation time ("kind" parameters) or deferred to
 111 execution ("len" parameters).
 112
 113 Derived types can inherit ("extend") from at most one other derived type.
 114 They can have user-defined destructors (`FINAL` procedures).
 115 They can specify default initial values for their components.
 116 With some work, one can also specify a general constructor function,
 117 since Fortran allows a generic interface to have the same name as that
 118 of a derived type.
 119
 120 Last, there are "typeless" binary constants that can be used in a few
 121 situations, like static data initialization or immediate conversion,
 122 where type is not necessary.
 123
 124 ## Arrays
 125
 126 Arrays are not types in Fortran.
 127 Being an array is a property of an object or function, not of a type.
 128 Unlike C, one cannot have an array of arrays or an array of pointers,
 129 although can can have an array of a derived type that has arrays or
 130 pointers as components.
 131 Arrays are multidimensional, and the number of dimensions is called
 132 the _rank_ of the array.
 133 In storage, arrays are stored such that the last subscript has the
 134 largest stride in memory, e.g. A(1,1) is followed by A(2,1), not A(1,2).
 135 And yes, the default lower bound on each dimension is 1, not 0.
 136
 137 Expressions can manipulate arrays as multidimensional values, and
 138 the compiler will create the necessary loops.
 139
 140 ## Allocatables
 141
 142 Modern Fortran programs use `ALLOCATABLE` data extensively.
 143 Such variables and derived type components are allocated dynamically.
 144 They are automatically deallocated when they go out of scope, much
 145 like C++'s `std::vector<>` class template instances are.
 146 The array bounds, derived type `LEN` parameters, and even the
 147 type of an allocatable can all be deferred to run time.
 148 (If you really want to learn all about modern Fortran, I suggest
 149 that you study everything that can be done with `ALLOCATABLE` data,
 150 and follow up all the references that are made in the documentation
 151 from the description of `ALLOCATABLE` to other topics; it's a feature
 152 that interacts with much of the rest of the language.)
 153
 154 ## I/O
 155
 156 Fortran's input/output features are built into the syntax of the language,
 157 rather than being defined by library interfaces as in C and C++.
 158 There are means for raw binary I/O and for "formatted" transfers to
 159 character representations.
 160 There are means for random-access I/O using fixed-size records as well as for
 161 sequential I/O.
 162 One can scan data from or format data into `CHARACTER` variables via
 163 "internal" formatted I/O.
 164 I/O from and to files uses a scheme of integer "unit" numbers that is
 165 similar to the open file descriptors of UNIX; i.e., one opens a file
 166 and assigns it a unit number, then uses that unit number in subsequent
 167 `READ` and `WRITE` statements.
 168
 169 Formatted I/O relies on format specifications to map values to fields of
 170 characters, similar to the format strings used with C's `printf` family
 171 of standard library functions.
 172 These format specifications can appear in `FORMAT` statements and
 173 be referenced by their labels, in character literals directly in I/O
 174 statements, or in character variables.
 175
 176 One can also use compiler-generated formatting in "list-directed" I/O,
 177 in which the compiler derives reasonable default formats based on
 178 data types.
 179
 180 ## Subprograms
 181
 182 Fortran has both `FUNCTION` and `SUBROUTINE` subprograms.
 183 They share the same name space, but functions cannot be called as
 184 subroutines or vice versa.
 185 Subroutines are called with the `CALL` statement, while functions are
 186 invoked with function references in expressions.
 187
 188 There is one level of subprogram nesting.
 189 A function, subroutine, or main program can have functions and subroutines
 190 nested within it, but these "internal" procedures cannot themselves have
 191 their own internal procedures.
 192 As is the case with C++ lambda expressions, internal procedures can
 193 reference names from their host subprograms.
 194
 195 ## Modules
 196
 197 Modern Fortran has good support for separate compilation and namespace
 198 management.
 199 The *module* is the basic unit of compilation, although independent
 200 subprograms still exist, of course, as well as the main program.
 201 Modules define types, constants, interfaces, and nested
 202 subprograms.
 203
 204 Objects from a module are made available for use in other compilation
 205 units via the `USE` statement, which has options for limiting the objects
 206 that are made available as well as for renaming them.
 207 All references to objects in modules are done with direct names or
 208 aliases that have been added to the local scope, as Fortran has no means
 209 of qualifying references with module names.
 210
 211 ## Arguments
 212
 213 Functions and subroutines have "dummy" arguments that are dynamically
 214 associated with actual arguments during calls.
 215 Essentially, all argument passing in Fortran is by reference, not value.
 216 One may restrict access to argument data by declaring that dummy
 217 arguments have `INTENT(IN)`, but that corresponds to the use of
 218 a `const` reference in C++ and does not imply that the data are
 219 copied; use `VALUE` for that.
 220
 221 When it is not possible to pass a reference to an object, or a sparse
 222 regular array section of an object, as an actual argument, Fortran
 223 compilers must allocate temporary space to hold the actual argument
 224 across the call.
 225 This is always guaranteed to happen when an actual argument is enclosed
 226 in parentheses.
 227
 228 The compiler is free to assume that any aliasing between dummy arguments
 229 and other data is safe.
 230 In other words, if some object can be written to under one name, it's
 231 never going to be read or written using some other name in that same
 232 scope.
 233 ```
 234   SUBROUTINE FOO(X,Y,Z)
 235   X = 3.14159
 236   Y = 2.1828
 237   Z = 2 * X ! CAN BE FOLDED AT COMPILE TIME
 238   END
 239 ```
 240 This is the opposite of the assumptions under which a C or C++ compiler must
 241 labor when trying to optimize code with pointers.
 242
 243 ## Overloading
 244
 245 Fortran supports a form of overloading via its interface feature.
 246 By default, an interface is a means for specifying prototypes for a
 247 set of subroutines and functions.
 248 But when an interface is named, that name becomes a *generic* name
 249 for its specific subprograms, and calls via the generic name are
 250 mapped at compile time to one of the specific subprograms based
 251 on the types, kinds, and ranks of the actual arguments.
 252 A similar feature can be used for generic type-bound procedures.
 253
 254 This feature can be used to overload the built-in operators and some
 255 I/O statements, too.
 256
 257 ## Polymorphism
 258
 259 Fortran code can be written to accept data of some derived type or
 260 any extension thereof using `CLASS`, deferring the actual type to
 261 execution, rather than the usual `TYPE` syntax.
 262 This is somewhat similar to the use of `virtual` functions in c++.
 263
 264 Fortran's `SELECT TYPE` construct is used to distinguish between
 265 possible specific types dynamically, when necessary.  It's a
 266 little like C++17's `std::visit()` on a discriminated union.
 267
 268 ## Pointers
 269
 270 Pointers are objects in Fortran, not data types.
 271 Pointers can point to data, arrays, and subprograms.
 272 A pointer can only point to data that has the `TARGET` attribute.
 273 Outside of the pointer assignment statement (`P=>X`) and some intrinsic
 274 functions and cases with pointer dummy arguments, pointers are implicitly
 275 dereferenced, and the use of their name is a reference to the data to which
 276 they point instead.
 277
 278 Unlike C, a pointer cannot point to a pointer *per se*, nor can they be
 279 used to implement a level of indirection to the management structure of
 280 an allocatable.
 281 If you assign to a Fortran pointer to make it point at another pointer,
 282 you are making the pointer point to the data (if any) to which the other
 283 pointer points.
 284 Similarly, if you assign to a Fortran pointer to make it point to an allocatable,
 285 you are making the pointer point to the current content of the allocatable,
 286 not to the metadata that manages the allocatable.
 287
 288 Unlike allocatables, pointers do not deallocate their data when they go
 289 out of scope.
 290
 291 A legacy feature, "Cray pointers", implements dynamic base addressing of
 292 one variable using an address stored in another.
 293
 294 ## Preprocessing
 295
 296 There is no standard preprocessing feature, but every real Fortran implementation
 297 has some support for passing Fortran source code through a variant of
 298 the standard C source preprocessor.
 299 Since Fortran is very different from C at the lexical level (e.g., line
 300 continuations, Hollerith literals, no reserved words, fixed form), using
 301 a stock modern C preprocessor on Fortran source can be difficult.
 302 Preprocessing behavior varies across implementations and one should not depend on
 303 much portability.
 304 Preprocessing is typically requested by the use of a capitalized filename
 305 suffix (e.g., "foo.F90") or a compiler command line option.
 306 (Since the F18 compiler always runs its built-in preprocessing stage,
 307 no special option or filename suffix is required.)
 308
 309 ## "Object Oriented" Programming
 310
 311 Fortran doesn't have member functions (or subroutines) in the sense
 312 that C++ does, in which a function has immediate access to the members
 313 of a specific instance of a derived type.
 314 But Fortran does have an analog to C++'s `this` via *type-bound
 315 procedures*.
 316 This is a means of binding a particular subprogram name to a derived
 317 type, possibly with aliasing, in such a way that the subprogram can
 318 be called as if it were a component of the type (e.g., `X%F(Y)`)
 319 and receive the object to the left of the `%` as an additional actual argument,
 320 exactly as if the call had been written `F(X,Y)`.
 321 The object is passed as the first argument by default, but that can be
 322 changed; indeed, the same specific subprogram can be used for multiple
 323 type-bound procedures by choosing different dummy arguments to serve as
 324 the passed object.
 325 The equivalent of a `static` member function is also available by saying
 326 that no argument is to be associated with the object via `NOPASS`.
 327
 328 There's a lot more that can be said about type-bound procedures (e.g., how they
 329 support overloading) but this should be enough to get you started with
 330 the most common usage.
 331
 332 ## Pitfalls
 333
 334 Variable initializers, e.g. `INTEGER :: J=123`, are _static_ initializers!
 335 They imply that the variable is stored in static storage, not on the stack,
 336 and the initialized value lasts only until the variable is assigned.
 337 One must use an assignment statement to implement a dynamic initializer
 338 that will apply to every fresh instance of the variable.
 339 Be especially careful when using initializers in the newish `BLOCK` construct,
 340 which perpetuates the interpretation as static data.
 341 (Derived type component initializers, however, do work as expected.)
 342
 343 If you see an assignment to an array that's never been declared as such,
 344 it's probably a definition of a *statement function*, which is like
 345 a parameterized macro definition, e.g. `A(X)=SQRT(X)**3`.
 346 In the original Fortran language, this was the only means for user
 347 function definitions.
 348 Today, of course, one should use an external or internal function instead.
 349
 350 Fortran expressions don't bind exactly like C's do.
 351 Watch out for exponentiation with `**`, which of course C lacks; it
 352 binds more tightly than negation does (e.g., `-2**2` is -4),
 353 and it binds to the right, unlike what any other Fortran and most
 354 C operators do; e.g., `2**2**3` is 256, not 64.
 355 Logical values must be compared with special logical equivalence
 356 relations (`.EQV.` and `.NEQV.`) rather than the usual equality
 357 operators.
 358
 359 A Fortran compiler is allowed to short-circuit expression evaluation,
 360 but not required to do so.
 361 If one needs to protect a use of an `OPTIONAL` argument or possibly
 362 disassociated pointer, use an `IF` statement, not a logical `.AND.`
 363 operation.
 364 In fact, Fortran can remove function calls from expressions if their
 365 values are not required to determine the value of the expression's
 366 result; e.g., if there is a `PRINT` statement in function `F`, it
 367 may or may not be executed by the assignment statement `X=0*F()`.
 368 (Well, it probably will be, in practice, but compilers always reserve
 369 the right to optimize better.)
 370
 371 Unless they have an explicit suffix (`1.0_8`, `2.0_8`) or a `D`
 372 exponent (`3.0D0`), real literal constants in Fortran have the
 373 default `REAL` type -- *not* `double` as in the case in C and C++.
 374 If you're not careful, you can lose precision at compilation time
 375 from your constant values and never know it.