3 Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4 See https://llvm.org/LICENSE.txt for license information.
5 SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
9 # C++14/17 features used in f18
17 The C++ dialect used in this project constitutes a subset of the
18 standard C++ programming language and library features.
19 We want our dialect to be compatible with the LLVM C++ language
20 subset that will be in use at the time that we integrate with that
22 We also want to maximize portability, future-proofing,
23 compile-time error checking, and use of best practices.
25 To that end, we have a C++ style guide (q.v.) that lays
26 out the details of how our C++ code should look and gives
27 guidance about feature usage.
29 We have chosen to use some features of the recent C++17
30 language standard in f18.
31 The most important of these are:
32 * sum types (discriminated unions) in the form of `std::variant`
33 * `using` template parameter packs
34 * generic lambdas with `auto` argument types
35 * product types in the form of `std::tuple`
38 (`std::tuple` is actually a C++11 feature, but I include it
39 in this list because it's not particularly well known.)
43 First, some background information to explain the need for sum types
46 Fortran is notoriously problematic to lex and parse, as tokenization
47 depends on the state of the partial parse;
48 the language has no reserved words in the sense that C++ does.
49 Fortran parsers implemented with distinct lexing and parsing phases
50 (generated by hand or with tools) need to implement them as
51 coroutines with complicated state, and experience has shown that
52 it's hard to get them right and harder to extend them as the language
55 Alternatively, with the use of backtracking, one can parse Fortran with
56 a unified lexer/parser.
57 We have chosen to do so because it is simpler and should reduce
58 both initial bugs and long-term maintenance.
60 Specifically, f18's parser uses the technique of recursive descent with
62 It is constructed as the incremental composition of pure parsing functions
63 that each, when given a context (location in the input stream plus some state),
64 either _succeeds_ or _fails_ to recognize some piece of Fortran.
65 On success, they return a new state and some semantic value, and this is
66 usually an instance of a C++ `struct` type that encodes the semantic
67 content of a production in the Fortran grammar.
69 This technique allows us to specify both the Fortran grammar and the
70 representation of successfully parsed programs with C++ code
71 whose functions and data structures correspond closely to the productions
74 The specification of Fortran uses a form of BNF with alternatives,
75 optional elements, sequences, and lists. Each of these constructs
76 in the Fortran grammar maps directly in the f18 parser to both
77 the means of combining other parsers as alternatives, &c., and to
78 the declarations of the parse tree data structures that represent
79 the results of successful parses.
80 Move semantics are used in the parsing functions to acquire and
81 combine the results of sub-parses into the result of a larger
84 To represent nodes in the Fortran parse tree, we need a means of
85 handling sum types for productions that have multiple alternatives.
86 The bounded polymorphism supplied by the C++17 `std::variant` fits
88 For example, production R502 in Fortran defines the top-level
89 program unit of Fortran as being a function, subroutine, module, &c.
90 The `struct ProgramUnit` in the f18 parse tree header file
91 represents each program unit with a member that is a `std::variant`
92 over the six possibilities.
93 Similarly, the parser for that type in the f18 grammar has six alternatives,
94 each of which constructs an instance of `ProgramUnit` upon the result of
95 parsing a `Module`, `FunctionSubprogram`, and so on.
97 Code that performs semantic analysis on the result of a successful
98 parse is typically implemented with overloaded functions.
99 A function instantiated on `ProgramUnit` will use `std::visit` to
100 identify the right alternative and perform the right actions.
101 The call to `std::visit` must pass a visitor that can handle all
102 of the possibilities, and f18 will fail to build if one is missing.
104 Were we unable to use `std::variant` directly, we would likely
105 have chosen to implement a local `SumType` replacement; in the
106 absence of C++17's abilities of `using` a template parameter pack
107 and allowing `auto` arguments in anonymous lambda functions,
108 it would be less convenient to use.
110 The other options for polymorphism in C++ at the level of C++11
112 * loosen up compile-time type safety and use a unified parse tree node
113 representation with an enumeration type for an operator and generic
115 * define the sum types for the parse tree as abstract base classes from
116 which each particular alternative would derive, and then use virtual
117 functions (or the forbidden `dynamic_cast`) to identify alternatives
122 Many productions in the Fortran grammar describe a sequence of various
124 For example, R504 defines the things that may appear in the "specification
125 part" of a subprogram in the order in which they are allowed: `USE`
126 statements, then `IMPORT` statements, and so on.
128 The parse tree node that represents such a thing needs to incorporate
129 the representations of those parses, of course.
130 It turns out to be convenient to allow these data members to be anonymous
131 components of a `std::tuple` product type.
132 This type facilitates the automation of code that walks over all of the
133 members in a type-safe fashion and avoids the need to invent and remember
134 needless member names -- the components of a `std::tuple` instance can
135 be identified and accessed in terms of their types, and those tend to be
138 So we use `std::tuple` for such things.
139 It has also been handy for template metaprogramming that needs to work
144 This simple little type is used wherever a value might or might not be
146 It is especially useful for function results and
147 rvalue reference arguments.
148 It corresponds directly to the optional elements in the productions
149 of the Fortran grammar.
150 It is also used as a wrapper around a parse tree node type to define the
151 results of the various parsing functions, where presence of a value
152 signifies a successful recognition and absence denotes a failed parse.
153 It is used in data structures in place of nullable pointers to
154 avoid indirection as well as the possible confusion over whether a pointer
155 is allowed to be null.