docs/tutorial/MyFirstLanguageFrontend/LangImpl02.rst

   1 :orphan:
   2
   3 ===========================================
   4 Kaleidoscope: Implementing a Parser and AST
   5 ===========================================
   6
   7 .. contents::
   8    :local:
   9
  10 Chapter 2 Introduction
  11 ======================
  12
  13 Welcome to Chapter 2 of the "`Implementing a language with
  14 LLVM <index.html>`_" tutorial. This chapter shows you how to use the
  15 lexer, built in `Chapter 1 <LangImpl01.html>`_, to build a full
  16 `parser <http://en.wikipedia.org/wiki/Parsing>`_ for our Kaleidoscope
  17 language. Once we have a parser, we'll define and build an `Abstract
  18 Syntax Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
  19
  20 The parser we will build uses a combination of `Recursive Descent
  21 Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
  22 `Operator-Precedence
  23 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
  24 parse the Kaleidoscope language (the latter for binary expressions and
  25 the former for everything else). Before we get to parsing though, let's
  26 talk about the output of the parser: the Abstract Syntax Tree.
  27
  28 The Abstract Syntax Tree (AST)
  29 ==============================
  30
  31 The AST for a program captures its behavior in such a way that it is
  32 easy for later stages of the compiler (e.g. code generation) to
  33 interpret. We basically want one object for each construct in the
  34 language, and the AST should closely model the language. In
  35 Kaleidoscope, we have expressions, a prototype, and a function object.
  36 We'll start with expressions first:
  37
  38 .. code-block:: c++
  39
  40     /// ExprAST - Base class for all expression nodes.
  41     class ExprAST {
  42     public:
  43       virtual ~ExprAST() {}
  44     };
  45
  46     /// NumberExprAST - Expression class for numeric literals like "1.0".
  47     class NumberExprAST : public ExprAST {
  48       double Val;
  49
  50     public:
  51       NumberExprAST(double Val) : Val(Val) {}
  52     };
  53
  54 The code above shows the definition of the base ExprAST class and one
  55 subclass which we use for numeric literals. The important thing to note
  56 about this code is that the NumberExprAST class captures the numeric
  57 value of the literal as an instance variable. This allows later phases
  58 of the compiler to know what the stored numeric value is.
  59
  60 Right now we only create the AST, so there are no useful accessor
  61 methods on them. It would be very easy to add a virtual method to pretty
  62 print the code, for example. Here are the other expression AST node
  63 definitions that we'll use in the basic form of the Kaleidoscope
  64 language:
  65
  66 .. code-block:: c++
  67
  68     /// VariableExprAST - Expression class for referencing a variable, like "a".
  69     class VariableExprAST : public ExprAST {
  70       std::string Name;
  71
  72     public:
  73       VariableExprAST(const std::string &Name) : Name(Name) {}
  74     };
  75
  76     /// BinaryExprAST - Expression class for a binary operator.
  77     class BinaryExprAST : public ExprAST {
  78       char Op;
  79       std::unique_ptr<ExprAST> LHS, RHS;
  80
  81     public:
  82       BinaryExprAST(char op, std::unique_ptr<ExprAST> LHS,
  83                     std::unique_ptr<ExprAST> RHS)
  84         : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {}
  85     };
  86
  87     /// CallExprAST - Expression class for function calls.
  88     class CallExprAST : public ExprAST {
  89       std::string Callee;
  90       std::vector<std::unique_ptr<ExprAST>> Args;
  91
  92     public:
  93       CallExprAST(const std::string &Callee,
  94                   std::vector<std::unique_ptr<ExprAST>> Args)
  95         : Callee(Callee), Args(std::move(Args)) {}
  96     };
  97
  98 This is all (intentionally) rather straight-forward: variables capture
  99 the variable name, binary operators capture their opcode (e.g. '+'), and
 100 calls capture a function name as well as a list of any argument
 101 expressions. One thing that is nice about our AST is that it captures
 102 the language features without talking about the syntax of the language.
 103 Note that there is no discussion about precedence of binary operators,
 104 lexical structure, etc.
 105
 106 For our basic language, these are all of the expression nodes we'll
 107 define. Because it doesn't have conditional control flow, it isn't
 108 Turing-complete; we'll fix that in a later installment. The two things
 109 we need next are a way to talk about the interface to a function, and a
 110 way to talk about functions themselves:
 111
 112 .. code-block:: c++
 113
 114     /// PrototypeAST - This class represents the "prototype" for a function,
 115     /// which captures its name, and its argument names (thus implicitly the number
 116     /// of arguments the function takes).
 117     class PrototypeAST {
 118       std::string Name;
 119       std::vector<std::string> Args;
 120
 121     public:
 122       PrototypeAST(const std::string &name, std::vector<std::string> Args)
 123         : Name(name), Args(std::move(Args)) {}
 124
 125       const std::string &getName() const { return Name; }
 126     };
 127
 128     /// FunctionAST - This class represents a function definition itself.
 129     class FunctionAST {
 130       std::unique_ptr<PrototypeAST> Proto;
 131       std::unique_ptr<ExprAST> Body;
 132
 133     public:
 134       FunctionAST(std::unique_ptr<PrototypeAST> Proto,
 135                   std::unique_ptr<ExprAST> Body)
 136         : Proto(std::move(Proto)), Body(std::move(Body)) {}
 137     };
 138
 139 In Kaleidoscope, functions are typed with just a count of their
 140 arguments. Since all values are double precision floating point, the
 141 type of each argument doesn't need to be stored anywhere. In a more
 142 aggressive and realistic language, the "ExprAST" class would probably
 143 have a type field.
 144
 145 With this scaffolding, we can now talk about parsing expressions and
 146 function bodies in Kaleidoscope.
 147
 148 Parser Basics
 149 =============
 150
 151 Now that we have an AST to build, we need to define the parser code to
 152 build it. The idea here is that we want to parse something like "x+y"
 153 (which is returned as three tokens by the lexer) into an AST that could
 154 be generated with calls like this:
 155
 156 .. code-block:: c++
 157
 158       auto LHS = std::make_unique<VariableExprAST>("x");
 159       auto RHS = std::make_unique<VariableExprAST>("y");
 160       auto Result = std::make_unique<BinaryExprAST>('+', std::move(LHS),
 161                                                     std::move(RHS));
 162
 163 In order to do this, we'll start by defining some basic helper routines:
 164
 165 .. code-block:: c++
 166
 167     /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
 168     /// token the parser is looking at.  getNextToken reads another token from the
 169     /// lexer and updates CurTok with its results.
 170     static int CurTok;
 171     static int getNextToken() {
 172       return CurTok = gettok();
 173     }
 174
 175 This implements a simple token buffer around the lexer. This allows us
 176 to look one token ahead at what the lexer is returning. Every function
 177 in our parser will assume that CurTok is the current token that needs to
 178 be parsed.
 179
 180 .. code-block:: c++
 181
 182
 183     /// LogError* - These are little helper functions for error handling.
 184     std::unique_ptr<ExprAST> LogError(const char *Str) {
 185       fprintf(stderr, "LogError: %s\n", Str);
 186       return nullptr;
 187     }
 188     std::unique_ptr<PrototypeAST> LogErrorP(const char *Str) {
 189       LogError(Str);
 190       return nullptr;
 191     }
 192
 193 The ``LogError`` routines are simple helper routines that our parser will
 194 use to handle errors. The error recovery in our parser will not be the
 195 best and is not particular user-friendly, but it will be enough for our
 196 tutorial. These routines make it easier to handle errors in routines
 197 that have various return types: they always return null.
 198
 199 With these basic helper functions, we can implement the first piece of
 200 our grammar: numeric literals.
 201
 202 Basic Expression Parsing
 203 ========================
 204
 205 We start with numeric literals, because they are the simplest to
 206 process. For each production in our grammar, we'll define a function
 207 which parses that production. For numeric literals, we have:
 208
 209 .. code-block:: c++
 210
 211     /// numberexpr ::= number
 212     static std::unique_ptr<ExprAST> ParseNumberExpr() {
 213       auto Result = std::make_unique<NumberExprAST>(NumVal);
 214       getNextToken(); // consume the number
 215       return std::move(Result);
 216     }
 217
 218 This routine is very simple: it expects to be called when the current
 219 token is a ``tok_number`` token. It takes the current number value,
 220 creates a ``NumberExprAST`` node, advances the lexer to the next token,
 221 and finally returns.
 222
 223 There are some interesting aspects to this. The most important one is
 224 that this routine eats all of the tokens that correspond to the
 225 production and returns the lexer buffer with the next token (which is
 226 not part of the grammar production) ready to go. This is a fairly
 227 standard way to go for recursive descent parsers. For a better example,
 228 the parenthesis operator is defined like this:
 229
 230 .. code-block:: c++
 231
 232     /// parenexpr ::= '(' expression ')'
 233     static std::unique_ptr<ExprAST> ParseParenExpr() {
 234       getNextToken(); // eat (.
 235       auto V = ParseExpression();
 236       if (!V)
 237         return nullptr;
 238
 239       if (CurTok != ')')
 240         return LogError("expected ')'");
 241       getNextToken(); // eat ).
 242       return V;
 243     }
 244
 245 This function illustrates a number of interesting things about the
 246 parser:
 247
 248 1) It shows how we use the LogError routines. When called, this function
 249 expects that the current token is a '(' token, but after parsing the
 250 subexpression, it is possible that there is no ')' waiting. For example,
 251 if the user types in "(4 x" instead of "(4)", the parser should emit an
 252 error. Because errors can occur, the parser needs a way to indicate that
 253 they happened: in our parser, we return null on an error.
 254
 255 2) Another interesting aspect of this function is that it uses recursion
 256 by calling ``ParseExpression`` (we will soon see that
 257 ``ParseExpression`` can call ``ParseParenExpr``). This is powerful
 258 because it allows us to handle recursive grammars, and keeps each
 259 production very simple. Note that parentheses do not cause construction
 260 of AST nodes themselves. While we could do it this way, the most
 261 important role of parentheses are to guide the parser and provide
 262 grouping. Once the parser constructs the AST, parentheses are not
 263 needed.
 264
 265 The next simple production is for handling variable references and
 266 function calls:
 267
 268 .. code-block:: c++
 269
 270     /// identifierexpr
 271     ///   ::= identifier
 272     ///   ::= identifier '(' expression* ')'
 273     static std::unique_ptr<ExprAST> ParseIdentifierExpr() {
 274       std::string IdName = IdentifierStr;
 275
 276       getNextToken();  // eat identifier.
 277
 278       if (CurTok != '(') // Simple variable ref.
 279         return std::make_unique<VariableExprAST>(IdName);
 280
 281       // Call.
 282       getNextToken();  // eat (
 283       std::vector<std::unique_ptr<ExprAST>> Args;
 284       if (CurTok != ')') {
 285         while (1) {
 286           if (auto Arg = ParseExpression())
 287             Args.push_back(std::move(Arg));
 288           else
 289             return nullptr;
 290
 291           if (CurTok == ')')
 292             break;
 293
 294           if (CurTok != ',')
 295             return LogError("Expected ')' or ',' in argument list");
 296           getNextToken();
 297         }
 298       }
 299
 300       // Eat the ')'.
 301       getNextToken();
 302
 303       return std::make_unique<CallExprAST>(IdName, std::move(Args));
 304     }
 305
 306 This routine follows the same style as the other routines. (It expects
 307 to be called if the current token is a ``tok_identifier`` token). It
 308 also has recursion and error handling. One interesting aspect of this is
 309 that it uses *look-ahead* to determine if the current identifier is a
 310 stand alone variable reference or if it is a function call expression.
 311 It handles this by checking to see if the token after the identifier is
 312 a '(' token, constructing either a ``VariableExprAST`` or
 313 ``CallExprAST`` node as appropriate.
 314
 315 Now that we have all of our simple expression-parsing logic in place, we
 316 can define a helper function to wrap it together into one entry point.
 317 We call this class of expressions "primary" expressions, for reasons
 318 that will become more clear `later in the
 319 tutorial <LangImpl6.html#user-defined-unary-operators>`_. In order to parse an arbitrary
 320 primary expression, we need to determine what sort of expression it is:
 321
 322 .. code-block:: c++
 323
 324     /// primary
 325     ///   ::= identifierexpr
 326     ///   ::= numberexpr
 327     ///   ::= parenexpr
 328     static std::unique_ptr<ExprAST> ParsePrimary() {
 329       switch (CurTok) {
 330       default:
 331         return LogError("unknown token when expecting an expression");
 332       case tok_identifier:
 333         return ParseIdentifierExpr();
 334       case tok_number:
 335         return ParseNumberExpr();
 336       case '(':
 337         return ParseParenExpr();
 338       }
 339     }
 340
 341 Now that you see the definition of this function, it is more obvious why
 342 we can assume the state of CurTok in the various functions. This uses
 343 look-ahead to determine which sort of expression is being inspected, and
 344 then parses it with a function call.
 345
 346 Now that basic expressions are handled, we need to handle binary
 347 expressions. They are a bit more complex.
 348
 349 Binary Expression Parsing
 350 =========================
 351
 352 Binary expressions are significantly harder to parse because they are
 353 often ambiguous. For example, when given the string "x+y\*z", the parser
 354 can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common
 355 definitions from mathematics, we expect the later parse, because "\*"
 356 (multiplication) has higher *precedence* than "+" (addition).
 357
 358 There are many ways to handle this, but an elegant and efficient way is
 359 to use `Operator-Precedence
 360 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
 361 This parsing technique uses the precedence of binary operators to guide
 362 recursion. To start with, we need a table of precedences:
 363
 364 .. code-block:: c++
 365
 366     /// BinopPrecedence - This holds the precedence for each binary operator that is
 367     /// defined.
 368     static std::map<char, int> BinopPrecedence;
 369
 370     /// GetTokPrecedence - Get the precedence of the pending binary operator token.
 371     static int GetTokPrecedence() {
 372       if (!isascii(CurTok))
 373         return -1;
 374
 375       // Make sure it's a declared binop.
 376       int TokPrec = BinopPrecedence[CurTok];
 377       if (TokPrec <= 0) return -1;
 378       return TokPrec;
 379     }
 380
 381     int main() {
 382       // Install standard binary operators.
 383       // 1 is lowest precedence.
 384       BinopPrecedence['<'] = 10;
 385       BinopPrecedence['+'] = 20;
 386       BinopPrecedence['-'] = 20;
 387       BinopPrecedence['*'] = 40;  // highest.
 388       ...
 389     }
 390
 391 For the basic form of Kaleidoscope, we will only support 4 binary
 392 operators (this can obviously be extended by you, our brave and intrepid
 393 reader). The ``GetTokPrecedence`` function returns the precedence for
 394 the current token, or -1 if the token is not a binary operator. Having a
 395 map makes it easy to add new operators and makes it clear that the
 396 algorithm doesn't depend on the specific operators involved, but it
 397 would be easy enough to eliminate the map and do the comparisons in the
 398 ``GetTokPrecedence`` function. (Or just use a fixed-size array).
 399
 400 With the helper above defined, we can now start parsing binary
 401 expressions. The basic idea of operator precedence parsing is to break
 402 down an expression with potentially ambiguous binary operators into
 403 pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g".
 404 Operator precedence parsing considers this as a stream of primary
 405 expressions separated by binary operators. As such, it will first parse
 406 the leading primary expression "a", then it will see the pairs [+, b]
 407 [+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are
 408 primary expressions, the binary expression parser doesn't need to worry
 409 about nested subexpressions like (c+d) at all.
 410
 411 To start, an expression is a primary expression potentially followed by
 412 a sequence of [binop,primaryexpr] pairs:
 413
 414 .. code-block:: c++
 415
 416     /// expression
 417     ///   ::= primary binoprhs
 418     ///
 419     static std::unique_ptr<ExprAST> ParseExpression() {
 420       auto LHS = ParsePrimary();
 421       if (!LHS)
 422         return nullptr;
 423
 424       return ParseBinOpRHS(0, std::move(LHS));
 425     }
 426
 427 ``ParseBinOpRHS`` is the function that parses the sequence of pairs for
 428 us. It takes a precedence and a pointer to an expression for the part
 429 that has been parsed so far. Note that "x" is a perfectly valid
 430 expression: As such, "binoprhs" is allowed to be empty, in which case it
 431 returns the expression that is passed into it. In our example above, the
 432 code passes the expression for "a" into ``ParseBinOpRHS`` and the
 433 current token is "+".
 434
 435 The precedence value passed into ``ParseBinOpRHS`` indicates the
 436 *minimal operator precedence* that the function is allowed to eat. For
 437 example, if the current pair stream is [+, x] and ``ParseBinOpRHS`` is
 438 passed in a precedence of 40, it will not consume any tokens (because
 439 the precedence of '+' is only 20). With this in mind, ``ParseBinOpRHS``
 440 starts with:
 441
 442 .. code-block:: c++
 443
 444     /// binoprhs
 445     ///   ::= ('+' primary)*
 446     static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
 447                                                   std::unique_ptr<ExprAST> LHS) {
 448       // If this is a binop, find its precedence.
 449       while (1) {
 450         int TokPrec = GetTokPrecedence();
 451
 452         // If this is a binop that binds at least as tightly as the current binop,
 453         // consume it, otherwise we are done.
 454         if (TokPrec < ExprPrec)
 455           return LHS;
 456
 457 This code gets the precedence of the current token and checks to see if
 458 if is too low. Because we defined invalid tokens to have a precedence of
 459 -1, this check implicitly knows that the pair-stream ends when the token
 460 stream runs out of binary operators. If this check succeeds, we know
 461 that the token is a binary operator and that it will be included in this
 462 expression:
 463
 464 .. code-block:: c++
 465
 466         // Okay, we know this is a binop.
 467         int BinOp = CurTok;
 468         getNextToken();  // eat binop
 469
 470         // Parse the primary expression after the binary operator.
 471         auto RHS = ParsePrimary();
 472         if (!RHS)
 473           return nullptr;
 474
 475 As such, this code eats (and remembers) the binary operator and then
 476 parses the primary expression that follows. This builds up the whole
 477 pair, the first of which is [+, b] for the running example.
 478
 479 Now that we parsed the left-hand side of an expression and one pair of
 480 the RHS sequence, we have to decide which way the expression associates.
 481 In particular, we could have "(a+b) binop unparsed" or "a + (b binop
 482 unparsed)". To determine this, we look ahead at "binop" to determine its
 483 precedence and compare it to BinOp's precedence (which is '+' in this
 484 case):
 485
 486 .. code-block:: c++
 487
 488         // If BinOp binds less tightly with RHS than the operator after RHS, let
 489         // the pending operator take RHS as its LHS.
 490         int NextPrec = GetTokPrecedence();
 491         if (TokPrec < NextPrec) {
 492
 493 If the precedence of the binop to the right of "RHS" is lower or equal
 494 to the precedence of our current operator, then we know that the
 495 parentheses associate as "(a+b) binop ...". In our example, the current
 496 operator is "+" and the next operator is "+", we know that they have the
 497 same precedence. In this case we'll create the AST node for "a+b", and
 498 then continue parsing:
 499
 500 .. code-block:: c++
 501
 502           ... if body omitted ...
 503         }
 504
 505         // Merge LHS/RHS.
 506         LHS = std::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
 507                                                std::move(RHS));
 508       }  // loop around to the top of the while loop.
 509     }
 510
 511 In our example above, this will turn "a+b+" into "(a+b)" and execute the
 512 next iteration of the loop, with "+" as the current token. The code
 513 above will eat, remember, and parse "(c+d)" as the primary expression,
 514 which makes the current pair equal to [+, (c+d)]. It will then evaluate
 515 the 'if' conditional above with "\*" as the binop to the right of the
 516 primary. In this case, the precedence of "\*" is higher than the
 517 precedence of "+" so the if condition will be entered.
 518
 519 The critical question left here is "how can the if condition parse the
 520 right hand side in full"? In particular, to build the AST correctly for
 521 our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression
 522 variable. The code to do this is surprisingly simple (code from the
 523 above two blocks duplicated for context):
 524
 525 .. code-block:: c++
 526
 527         // If BinOp binds less tightly with RHS than the operator after RHS, let
 528         // the pending operator take RHS as its LHS.
 529         int NextPrec = GetTokPrecedence();
 530         if (TokPrec < NextPrec) {
 531           RHS = ParseBinOpRHS(TokPrec+1, std::move(RHS));
 532           if (!RHS)
 533             return nullptr;
 534         }
 535         // Merge LHS/RHS.
 536         LHS = std::make_unique<BinaryExprAST>(BinOp, std::move(LHS),
 537                                                std::move(RHS));
 538       }  // loop around to the top of the while loop.
 539     }
 540
 541 At this point, we know that the binary operator to the RHS of our
 542 primary has higher precedence than the binop we are currently parsing.
 543 As such, we know that any sequence of pairs whose operators are all
 544 higher precedence than "+" should be parsed together and returned as
 545 "RHS". To do this, we recursively invoke the ``ParseBinOpRHS`` function
 546 specifying "TokPrec+1" as the minimum precedence required for it to
 547 continue. In our example above, this will cause it to return the AST
 548 node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of the '+'
 549 expression.
 550
 551 Finally, on the next iteration of the while loop, the "+g" piece is
 552 parsed and added to the AST. With this little bit of code (14
 553 non-trivial lines), we correctly handle fully general binary expression
 554 parsing in a very elegant way. This was a whirlwind tour of this code,
 555 and it is somewhat subtle. I recommend running through it with a few
 556 tough examples to see how it works.
 557
 558 This wraps up handling of expressions. At this point, we can point the
 559 parser at an arbitrary token stream and build an expression from it,
 560 stopping at the first token that is not part of the expression. Next up
 561 we need to handle function definitions, etc.
 562
 563 Parsing the Rest
 564 ================
 565
 566 The next thing missing is handling of function prototypes. In
 567 Kaleidoscope, these are used both for 'extern' function declarations as
 568 well as function body definitions. The code to do this is
 569 straight-forward and not very interesting (once you've survived
 570 expressions):
 571
 572 .. code-block:: c++
 573
 574     /// prototype
 575     ///   ::= id '(' id* ')'
 576     static std::unique_ptr<PrototypeAST> ParsePrototype() {
 577       if (CurTok != tok_identifier)
 578         return LogErrorP("Expected function name in prototype");
 579
 580       std::string FnName = IdentifierStr;
 581       getNextToken();
 582
 583       if (CurTok != '(')
 584         return LogErrorP("Expected '(' in prototype");
 585
 586       // Read the list of argument names.
 587       std::vector<std::string> ArgNames;
 588       while (getNextToken() == tok_identifier)
 589         ArgNames.push_back(IdentifierStr);
 590       if (CurTok != ')')
 591         return LogErrorP("Expected ')' in prototype");
 592
 593       // success.
 594       getNextToken();  // eat ')'.
 595
 596       return std::make_unique<PrototypeAST>(FnName, std::move(ArgNames));
 597     }
 598
 599 Given this, a function definition is very simple, just a prototype plus
 600 an expression to implement the body:
 601
 602 .. code-block:: c++
 603
 604     /// definition ::= 'def' prototype expression
 605     static std::unique_ptr<FunctionAST> ParseDefinition() {
 606       getNextToken();  // eat def.
 607       auto Proto = ParsePrototype();
 608       if (!Proto) return nullptr;
 609
 610       if (auto E = ParseExpression())
 611         return std::make_unique<FunctionAST>(std::move(Proto), std::move(E));
 612       return nullptr;
 613     }
 614
 615 In addition, we support 'extern' to declare functions like 'sin' and
 616 'cos' as well as to support forward declaration of user functions. These
 617 'extern's are just prototypes with no body:
 618
 619 .. code-block:: c++
 620
 621     /// external ::= 'extern' prototype
 622     static std::unique_ptr<PrototypeAST> ParseExtern() {
 623       getNextToken();  // eat extern.
 624       return ParsePrototype();
 625     }
 626
 627 Finally, we'll also let the user type in arbitrary top-level expressions
 628 and evaluate them on the fly. We will handle this by defining anonymous
 629 nullary (zero argument) functions for them:
 630
 631 .. code-block:: c++
 632
 633     /// toplevelexpr ::= expression
 634     static std::unique_ptr<FunctionAST> ParseTopLevelExpr() {
 635       if (auto E = ParseExpression()) {
 636         // Make an anonymous proto.
 637         auto Proto = std::make_unique<PrototypeAST>("", std::vector<std::string>());
 638         return std::make_unique<FunctionAST>(std::move(Proto), std::move(E));
 639       }
 640       return nullptr;
 641     }
 642
 643 Now that we have all the pieces, let's build a little driver that will
 644 let us actually *execute* this code we've built!
 645
 646 The Driver
 647 ==========
 648
 649 The driver for this simply invokes all of the parsing pieces with a
 650 top-level dispatch loop. There isn't much interesting here, so I'll just
 651 include the top-level loop. See `below <#full-code-listing>`_ for full code in the
 652 "Top-Level Parsing" section.
 653
 654 .. code-block:: c++
 655
 656     /// top ::= definition | external | expression | ';'
 657     static void MainLoop() {
 658       while (1) {
 659         fprintf(stderr, "ready> ");
 660         switch (CurTok) {
 661         case tok_eof:
 662           return;
 663         case ';': // ignore top-level semicolons.
 664           getNextToken();
 665           break;
 666         case tok_def:
 667           HandleDefinition();
 668           break;
 669         case tok_extern:
 670           HandleExtern();
 671           break;
 672         default:
 673           HandleTopLevelExpression();
 674           break;
 675         }
 676       }
 677     }
 678
 679 The most interesting part of this is that we ignore top-level
 680 semicolons. Why is this, you ask? The basic reason is that if you type
 681 "4 + 5" at the command line, the parser doesn't know whether that is the
 682 end of what you will type or not. For example, on the next line you
 683 could type "def foo..." in which case 4+5 is the end of a top-level
 684 expression. Alternatively you could type "\* 6", which would continue
 685 the expression. Having top-level semicolons allows you to type "4+5;",
 686 and the parser will know you are done.
 687
 688 Conclusions
 689 ===========
 690
 691 With just under 400 lines of commented code (240 lines of non-comment,
 692 non-blank code), we fully defined our minimal language, including a
 693 lexer, parser, and AST builder. With this done, the executable will
 694 validate Kaleidoscope code and tell us if it is grammatically invalid.
 695 For example, here is a sample interaction:
 696
 697 .. code-block:: bash
 698
 699     $ ./a.out
 700     ready> def foo(x y) x+foo(y, 4.0);
 701     Parsed a function definition.
 702     ready> def foo(x y) x+y y;
 703     Parsed a function definition.
 704     Parsed a top-level expr
 705     ready> def foo(x y) x+y );
 706     Parsed a function definition.
 707     Error: unknown token when expecting an expression
 708     ready> extern sin(a);
 709     ready> Parsed an extern
 710     ready> ^D
 711     $
 712
 713 There is a lot of room for extension here. You can define new AST nodes,
 714 extend the language in many ways, etc. In the `next
 715 installment <LangImpl03.html>`_, we will describe how to generate LLVM
 716 Intermediate Representation (IR) from the AST.
 717
 718 Full Code Listing
 719 =================
 720
 721 Here is the complete code listing for our running example. Because this
 722 uses the LLVM libraries, we need to link them in. To do this, we use the
 723 `llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform
 724 our makefile/command line about which options to use:
 725
 726 .. code-block:: bash
 727
 728     # Compile
 729     clang++ -g -O3 toy.cpp `llvm-config --cxxflags`
 730     # Run
 731     ./a.out
 732
 733 Here is the code:
 734
 735 .. literalinclude:: ../../../examples/Kaleidoscope/Chapter2/toy.cpp
 736    :language: c++
 737
 738 `Next: Implementing Code Generation to LLVM IR <LangImpl03.html>`_
 739