llvm/docs/tutorial/MyFirstLanguageFrontend/LangImpl07.rst

   1 =======================================================
   2 Kaleidoscope: Extending the Language: Mutable Variables
   3 =======================================================
   4
   5 .. contents::
   6    :local:
   7
   8 Chapter 7 Introduction
   9 ======================
  10
  11 Welcome to Chapter 7 of the "`Implementing a language with
  12 LLVM <index.html>`_" tutorial. In chapters 1 through 6, we've built a
  13 very respectable, albeit simple, `functional programming
  14 language <http://en.wikipedia.org/wiki/Functional_programming>`_. In our
  15 journey, we learned some parsing techniques, how to build and represent
  16 an AST, how to build LLVM IR, and how to optimize the resultant code as
  17 well as JIT compile it.
  18
  19 While Kaleidoscope is interesting as a functional language, the fact
  20 that it is functional makes it "too easy" to generate LLVM IR for it. In
  21 particular, a functional language makes it very easy to build LLVM IR
  22 directly in `SSA
  23 form <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_.
  24 Since LLVM requires that the input code be in SSA form, this is a very
  25 nice property and it is often unclear to newcomers how to generate code
  26 for an imperative language with mutable variables.
  27
  28 The short (and happy) summary of this chapter is that there is no need
  29 for your front-end to build SSA form: LLVM provides highly tuned and
  30 well tested support for this, though the way it works is a bit
  31 unexpected for some.
  32
  33 Why is this a hard problem?
  34 ===========================
  35
  36 To understand why mutable variables cause complexities in SSA
  37 construction, consider this extremely simple C example:
  38
  39 .. code-block:: c
  40
  41     int G, H;
  42     int test(_Bool Condition) {
  43       int X;
  44       if (Condition)
  45         X = G;
  46       else
  47         X = H;
  48       return X;
  49     }
  50
  51 In this case, we have the variable "X", whose value depends on the path
  52 executed in the program. Because there are two different possible values
  53 for X before the return instruction, a PHI node is inserted to merge the
  54 two values. The LLVM IR that we want for this example looks like this:
  55
  56 .. code-block:: llvm
  57
  58     @G = weak global i32 0   ; type of @G is i32*
  59     @H = weak global i32 0   ; type of @H is i32*
  60
  61     define i32 @test(i1 %Condition) {
  62     entry:
  63       br i1 %Condition, label %cond_true, label %cond_false
  64
  65     cond_true:
  66       %X.0 = load i32, i32* @G
  67       br label %cond_next
  68
  69     cond_false:
  70       %X.1 = load i32, i32* @H
  71       br label %cond_next
  72
  73     cond_next:
  74       %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
  75       ret i32 %X.2
  76     }
  77
  78 In this example, the loads from the G and H global variables are
  79 explicit in the LLVM IR, and they live in the then/else branches of the
  80 if statement (cond\_true/cond\_false). In order to merge the incoming
  81 values, the X.2 phi node in the cond\_next block selects the right value
  82 to use based on where control flow is coming from: if control flow comes
  83 from the cond\_false block, X.2 gets the value of X.1. Alternatively, if
  84 control flow comes from cond\_true, it gets the value of X.0. The intent
  85 of this chapter is not to explain the details of SSA form. For more
  86 information, see one of the many `online
  87 references <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_.
  88
  89 The question for this article is "who places the phi nodes when lowering
  90 assignments to mutable variables?". The issue here is that LLVM
  91 *requires* that its IR be in SSA form: there is no "non-ssa" mode for
  92 it. However, SSA construction requires non-trivial algorithms and data
  93 structures, so it is inconvenient and wasteful for every front-end to
  94 have to reproduce this logic.
  95
  96 Memory in LLVM
  97 ==============
  98
  99 The 'trick' here is that while LLVM does require all register values to
 100 be in SSA form, it does not require (or permit) memory objects to be in
 101 SSA form. In the example above, note that the loads from G and H are
 102 direct accesses to G and H: they are not renamed or versioned. This
 103 differs from some other compiler systems, which do try to version memory
 104 objects. In LLVM, instead of encoding dataflow analysis of memory into
 105 the LLVM IR, it is handled with `Analysis
 106 Passes <../../WritingAnLLVMPass.html>`_ which are computed on demand.
 107
 108 With this in mind, the high-level idea is that we want to make a stack
 109 variable (which lives in memory, because it is on the stack) for each
 110 mutable object in a function. To take advantage of this trick, we need
 111 to talk about how LLVM represents stack variables.
 112
 113 In LLVM, all memory accesses are explicit with load/store instructions,
 114 and it is carefully designed not to have (or need) an "address-of"
 115 operator. Notice how the type of the @G/@H global variables is actually
 116 "i32\*" even though the variable is defined as "i32". What this means is
 117 that @G defines *space* for an i32 in the global data area, but its
 118 *name* actually refers to the address for that space. Stack variables
 119 work the same way, except that instead of being declared with global
 120 variable definitions, they are declared with the `LLVM alloca
 121 instruction <../../LangRef.html#alloca-instruction>`_:
 122
 123 .. code-block:: llvm
 124
 125     define i32 @example() {
 126     entry:
 127       %X = alloca i32           ; type of %X is i32*.
 128       ...
 129       %tmp = load i32, i32* %X  ; load the stack value %X from the stack.
 130       %tmp2 = add i32 %tmp, 1   ; increment it
 131       store i32 %tmp2, i32* %X  ; store it back
 132       ...
 133
 134 This code shows an example of how you can declare and manipulate a stack
 135 variable in the LLVM IR. Stack memory allocated with the alloca
 136 instruction is fully general: you can pass the address of the stack slot
 137 to functions, you can store it in other variables, etc. In our example
 138 above, we could rewrite the example to use the alloca technique to avoid
 139 using a PHI node:
 140
 141 .. code-block:: llvm
 142
 143     @G = weak global i32 0   ; type of @G is i32*
 144     @H = weak global i32 0   ; type of @H is i32*
 145
 146     define i32 @test(i1 %Condition) {
 147     entry:
 148       %X = alloca i32           ; type of %X is i32*.
 149       br i1 %Condition, label %cond_true, label %cond_false
 150
 151     cond_true:
 152       %X.0 = load i32, i32* @G
 153       store i32 %X.0, i32* %X   ; Update X
 154       br label %cond_next
 155
 156     cond_false:
 157       %X.1 = load i32, i32* @H
 158       store i32 %X.1, i32* %X   ; Update X
 159       br label %cond_next
 160
 161     cond_next:
 162       %X.2 = load i32, i32* %X  ; Read X
 163       ret i32 %X.2
 164     }
 165
 166 With this, we have discovered a way to handle arbitrary mutable
 167 variables without the need to create Phi nodes at all:
 168
 169 #. Each mutable variable becomes a stack allocation.
 170 #. Each read of the variable becomes a load from the stack.
 171 #. Each update of the variable becomes a store to the stack.
 172 #. Taking the address of a variable just uses the stack address
 173    directly.
 174
 175 While this solution has solved our immediate problem, it introduced
 176 another one: we have now apparently introduced a lot of stack traffic
 177 for very simple and common operations, a major performance problem.
 178 Fortunately for us, the LLVM optimizer has a highly-tuned optimization
 179 pass named "mem2reg" that handles this case, promoting allocas like this
 180 into SSA registers, inserting Phi nodes as appropriate. If you run this
 181 example through the pass, for example, you'll get:
 182
 183 .. code-block:: bash
 184
 185     $ llvm-as < example.ll | opt -mem2reg | llvm-dis
 186     @G = weak global i32 0
 187     @H = weak global i32 0
 188
 189     define i32 @test(i1 %Condition) {
 190     entry:
 191       br i1 %Condition, label %cond_true, label %cond_false
 192
 193     cond_true:
 194       %X.0 = load i32, i32* @G
 195       br label %cond_next
 196
 197     cond_false:
 198       %X.1 = load i32, i32* @H
 199       br label %cond_next
 200
 201     cond_next:
 202       %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
 203       ret i32 %X.01
 204     }
 205
 206 The mem2reg pass implements the standard "iterated dominance frontier"
 207 algorithm for constructing SSA form and has a number of optimizations
 208 that speed up (very common) degenerate cases. The mem2reg optimization
 209 pass is the answer to dealing with mutable variables, and we highly
 210 recommend that you depend on it. Note that mem2reg only works on
 211 variables in certain circumstances:
 212
 213 #. mem2reg is alloca-driven: it looks for allocas and if it can handle
 214    them, it promotes them. It does not apply to global variables or heap
 215    allocations.
 216 #. mem2reg only looks for alloca instructions in the entry block of the
 217    function. Being in the entry block guarantees that the alloca is only
 218    executed once, which makes analysis simpler.
 219 #. mem2reg only promotes allocas whose uses are direct loads and stores.
 220    If the address of the stack object is passed to a function, or if any
 221    funny pointer arithmetic is involved, the alloca will not be
 222    promoted.
 223 #. mem2reg only works on allocas of `first
 224    class <../../LangRef.html#first-class-types>`_ values (such as pointers,
 225    scalars and vectors), and only if the array size of the allocation is
 226    1 (or missing in the .ll file). mem2reg is not capable of promoting
 227    structs or arrays to registers. Note that the "sroa" pass is
 228    more powerful and can promote structs, "unions", and arrays in many
 229    cases.
 230
 231 All of these properties are easy to satisfy for most imperative
 232 languages, and we'll illustrate it below with Kaleidoscope. The final
 233 question you may be asking is: should I bother with this nonsense for my
 234 front-end? Wouldn't it be better if I just did SSA construction
 235 directly, avoiding use of the mem2reg optimization pass? In short, we
 236 strongly recommend that you use this technique for building SSA form,
 237 unless there is an extremely good reason not to. Using this technique
 238 is:
 239
 240 -  Proven and well tested: clang uses this technique
 241    for local mutable variables. As such, the most common clients of LLVM
 242    are using this to handle a bulk of their variables. You can be sure
 243    that bugs are found fast and fixed early.
 244 -  Extremely Fast: mem2reg has a number of special cases that make it
 245    fast in common cases as well as fully general. For example, it has
 246    fast-paths for variables that are only used in a single block,
 247    variables that only have one assignment point, good heuristics to
 248    avoid insertion of unneeded phi nodes, etc.
 249 -  Needed for debug info generation: `Debug information in
 250    LLVM <../../SourceLevelDebugging.html>`_ relies on having the address of
 251    the variable exposed so that debug info can be attached to it. This
 252    technique dovetails very naturally with this style of debug info.
 253
 254 If nothing else, this makes it much easier to get your front-end up and
 255 running, and is very simple to implement. Let's extend Kaleidoscope with
 256 mutable variables now!
 257
 258 Mutable Variables in Kaleidoscope
 259 =================================
 260
 261 Now that we know the sort of problem we want to tackle, let's see what
 262 this looks like in the context of our little Kaleidoscope language.
 263 We're going to add two features:
 264
 265 #. The ability to mutate variables with the '=' operator.
 266 #. The ability to define new variables.
 267
 268 While the first item is really what this is about, we only have
 269 variables for incoming arguments as well as for induction variables, and
 270 redefining those only goes so far :). Also, the ability to define new
 271 variables is a useful thing regardless of whether you will be mutating
 272 them. Here's a motivating example that shows how we could use these:
 273
 274 ::
 275
 276     # Define ':' for sequencing: as a low-precedence operator that ignores operands
 277     # and just returns the RHS.
 278     def binary : 1 (x y) y;
 279
 280     # Recursive fib, we could do this before.
 281     def fib(x)
 282       if (x < 3) then
 283         1
 284       else
 285         fib(x-1)+fib(x-2);
 286
 287     # Iterative fib.
 288     def fibi(x)
 289       var a = 1, b = 1, c in
 290       (for i = 3, i < x in
 291          c = a + b :
 292          a = b :
 293          b = c) :
 294       b;
 295
 296     # Call it.
 297     fibi(10);
 298
 299 In order to mutate variables, we have to change our existing variables
 300 to use the "alloca trick". Once we have that, we'll add our new
 301 operator, then extend Kaleidoscope to support new variable definitions.
 302
 303 Adjusting Existing Variables for Mutation
 304 =========================================
 305
 306 The symbol table in Kaleidoscope is managed at code generation time by
 307 the '``NamedValues``' map. This map currently keeps track of the LLVM
 308 "Value\*" that holds the double value for the named variable. In order
 309 to support mutation, we need to change this slightly, so that
 310 ``NamedValues`` holds the *memory location* of the variable in question.
 311 Note that this change is a refactoring: it changes the structure of the
 312 code, but does not (by itself) change the behavior of the compiler. All
 313 of these changes are isolated in the Kaleidoscope code generator.
 314
 315 At this point in Kaleidoscope's development, it only supports variables
 316 for two things: incoming arguments to functions and the induction
 317 variable of 'for' loops. For consistency, we'll allow mutation of these
 318 variables in addition to other user-defined variables. This means that
 319 these will both need memory locations.
 320
 321 To start our transformation of Kaleidoscope, we'll change the
 322 ``NamedValues`` map so that it maps to AllocaInst\* instead of Value\*. Once
 323 we do this, the C++ compiler will tell us what parts of the code we need
 324 to update:
 325
 326 .. code-block:: c++
 327
 328     static std::map<std::string, AllocaInst*> NamedValues;
 329
 330 Also, since we will need to create these allocas, we'll use a helper
 331 function that ensures that the allocas are created in the entry block of
 332 the function:
 333
 334 .. code-block:: c++
 335
 336     /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
 337     /// the function.  This is used for mutable variables etc.
 338     static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
 339                                               const std::string &VarName) {
 340       IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
 341                      TheFunction->getEntryBlock().begin());
 342       return TmpB.CreateAlloca(Type::getDoubleTy(*TheContext), nullptr,
 343                                VarName);
 344     }
 345
 346 This funny looking code creates an IRBuilder object that is pointing at
 347 the first instruction (.begin()) of the entry block. It then creates an
 348 alloca with the expected name and returns it. Because all values in
 349 Kaleidoscope are doubles, there is no need to pass in a type to use.
 350
 351 With this in place, the first functionality change we want to make belongs to
 352 variable references. In our new scheme, variables live on the stack, so
 353 code generating a reference to them actually needs to produce a load
 354 from the stack slot:
 355
 356 .. code-block:: c++
 357
 358     Value *VariableExprAST::codegen() {
 359       // Look this variable up in the function.
 360       AllocaInst *A = NamedValues[Name];
 361       if (!A)
 362         return LogErrorV("Unknown variable name");
 363
 364       // Load the value.
 365       return Builder->CreateLoad(A->getAllocatedType(), A, Name.c_str());
 366     }
 367
 368 As you can see, this is pretty straightforward. Now we need to update
 369 the things that define the variables to set up the alloca. We'll start
 370 with ``ForExprAST::codegen()`` (see the `full code listing <#id1>`_ for
 371 the unabridged code):
 372
 373 .. code-block:: c++
 374
 375       Function *TheFunction = Builder->GetInsertBlock()->getParent();
 376
 377       // Create an alloca for the variable in the entry block.
 378       AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
 379
 380       // Emit the start code first, without 'variable' in scope.
 381       Value *StartVal = Start->codegen();
 382       if (!StartVal)
 383         return nullptr;
 384
 385       // Store the value into the alloca.
 386       Builder->CreateStore(StartVal, Alloca);
 387       ...
 388
 389       // Compute the end condition.
 390       Value *EndCond = End->codegen();
 391       if (!EndCond)
 392         return nullptr;
 393
 394       // Reload, increment, and restore the alloca.  This handles the case where
 395       // the body of the loop mutates the variable.
 396       Value *CurVar = Builder->CreateLoad(Alloca->getAllocatedType(), Alloca,
 397                                           VarName.c_str());
 398       Value *NextVar = Builder->CreateFAdd(CurVar, StepVal, "nextvar");
 399       Builder->CreateStore(NextVar, Alloca);
 400       ...
 401
 402 This code is virtually identical to the code `before we allowed mutable
 403 variables <LangImpl05.html#code-generation-for-the-for-loop>`_. The big difference is that we
 404 no longer have to construct a PHI node, and we use load/store to access
 405 the variable as needed.
 406
 407 To support mutable argument variables, we need to also make allocas for
 408 them. The code for this is also pretty simple:
 409
 410 .. code-block:: c++
 411
 412     Function *FunctionAST::codegen() {
 413       ...
 414       Builder->SetInsertPoint(BB);
 415
 416       // Record the function arguments in the NamedValues map.
 417       NamedValues.clear();
 418       for (auto &Arg : TheFunction->args()) {
 419         // Create an alloca for this variable.
 420         AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName());
 421
 422         // Store the initial value into the alloca.
 423         Builder->CreateStore(&Arg, Alloca);
 424
 425         // Add arguments to variable symbol table.
 426         NamedValues[std::string(Arg.getName())] = Alloca;
 427       }
 428
 429       if (Value *RetVal = Body->codegen()) {
 430         ...
 431
 432 For each argument, we make an alloca, store the input value to the
 433 function into the alloca, and register the alloca as the memory location
 434 for the argument. This method gets invoked by ``FunctionAST::codegen()``
 435 right after it sets up the entry block for the function.
 436
 437 The final missing piece is adding the mem2reg pass, which allows us to
 438 get good codegen once again:
 439
 440 .. code-block:: c++
 441
 442         // Promote allocas to registers.
 443         TheFPM->add(createPromoteMemoryToRegisterPass());
 444         // Do simple "peephole" optimizations and bit-twiddling optzns.
 445         TheFPM->add(createInstructionCombiningPass());
 446         // Reassociate expressions.
 447         TheFPM->add(createReassociatePass());
 448         ...
 449
 450 It is interesting to see what the code looks like before and after the
 451 mem2reg optimization runs. For example, this is the before/after code
 452 for our recursive fib function. Before the optimization:
 453
 454 .. code-block:: llvm
 455
 456     define double @fib(double %x) {
 457     entry:
 458       %x1 = alloca double
 459       store double %x, double* %x1
 460       %x2 = load double, double* %x1
 461       %cmptmp = fcmp ult double %x2, 3.000000e+00
 462       %booltmp = uitofp i1 %cmptmp to double
 463       %ifcond = fcmp one double %booltmp, 0.000000e+00
 464       br i1 %ifcond, label %then, label %else
 465
 466     then:       ; preds = %entry
 467       br label %ifcont
 468
 469     else:       ; preds = %entry
 470       %x3 = load double, double* %x1
 471       %subtmp = fsub double %x3, 1.000000e+00
 472       %calltmp = call double @fib(double %subtmp)
 473       %x4 = load double, double* %x1
 474       %subtmp5 = fsub double %x4, 2.000000e+00
 475       %calltmp6 = call double @fib(double %subtmp5)
 476       %addtmp = fadd double %calltmp, %calltmp6
 477       br label %ifcont
 478
 479     ifcont:     ; preds = %else, %then
 480       %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
 481       ret double %iftmp
 482     }
 483
 484 Here there is only one variable (x, the input argument) but you can
 485 still see the extremely simple-minded code generation strategy we are
 486 using. In the entry block, an alloca is created, and the initial input
 487 value is stored into it. Each reference to the variable does a reload
 488 from the stack. Also, note that we didn't modify the if/then/else
 489 expression, so it still inserts a PHI node. While we could make an
 490 alloca for it, it is actually easier to create a PHI node for it, so we
 491 still just make the PHI.
 492
 493 Here is the code after the mem2reg pass runs:
 494
 495 .. code-block:: llvm
 496
 497     define double @fib(double %x) {
 498     entry:
 499       %cmptmp = fcmp ult double %x, 3.000000e+00
 500       %booltmp = uitofp i1 %cmptmp to double
 501       %ifcond = fcmp one double %booltmp, 0.000000e+00
 502       br i1 %ifcond, label %then, label %else
 503
 504     then:
 505       br label %ifcont
 506
 507     else:
 508       %subtmp = fsub double %x, 1.000000e+00
 509       %calltmp = call double @fib(double %subtmp)
 510       %subtmp5 = fsub double %x, 2.000000e+00
 511       %calltmp6 = call double @fib(double %subtmp5)
 512       %addtmp = fadd double %calltmp, %calltmp6
 513       br label %ifcont
 514
 515     ifcont:     ; preds = %else, %then
 516       %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
 517       ret double %iftmp
 518     }
 519
 520 This is a trivial case for mem2reg, since there are no redefinitions of
 521 the variable. The point of showing this is to calm your tension about
 522 inserting such blatant inefficiencies :).
 523
 524 After the rest of the optimizers run, we get:
 525
 526 .. code-block:: llvm
 527
 528     define double @fib(double %x) {
 529     entry:
 530       %cmptmp = fcmp ult double %x, 3.000000e+00
 531       %booltmp = uitofp i1 %cmptmp to double
 532       %ifcond = fcmp ueq double %booltmp, 0.000000e+00
 533       br i1 %ifcond, label %else, label %ifcont
 534
 535     else:
 536       %subtmp = fsub double %x, 1.000000e+00
 537       %calltmp = call double @fib(double %subtmp)
 538       %subtmp5 = fsub double %x, 2.000000e+00
 539       %calltmp6 = call double @fib(double %subtmp5)
 540       %addtmp = fadd double %calltmp, %calltmp6
 541       ret double %addtmp
 542
 543     ifcont:
 544       ret double 1.000000e+00
 545     }
 546
 547 Here we see that the simplifycfg pass decided to clone the return
 548 instruction into the end of the 'else' block. This allowed it to
 549 eliminate some branches and the PHI node.
 550
 551 Now that all symbol table references are updated to use stack variables,
 552 we'll add the assignment operator.
 553
 554 New Assignment Operator
 555 =======================
 556
 557 With our current framework, adding a new assignment operator is really
 558 simple. We will parse it just like any other binary operator, but handle
 559 it internally (instead of allowing the user to define it). The first
 560 step is to set a precedence:
 561
 562 .. code-block:: c++
 563
 564      int main() {
 565        // Install standard binary operators.
 566        // 1 is lowest precedence.
 567        BinopPrecedence['='] = 2;
 568        BinopPrecedence['<'] = 10;
 569        BinopPrecedence['+'] = 20;
 570        BinopPrecedence['-'] = 20;
 571
 572 Now that the parser knows the precedence of the binary operator, it
 573 takes care of all the parsing and AST generation. We just need to
 574 implement codegen for the assignment operator. This looks like:
 575
 576 .. code-block:: c++
 577
 578     Value *BinaryExprAST::codegen() {
 579       // Special case '=' because we don't want to emit the LHS as an expression.
 580       if (Op == '=') {
 581         // This assume we're building without RTTI because LLVM builds that way by
 582         // default. If you build LLVM with RTTI this can be changed to a
 583         // dynamic_cast for automatic error checking.
 584         VariableExprAST *LHSE = static_cast<VariableExprAST*>(LHS.get());
 585         if (!LHSE)
 586           return LogErrorV("destination of '=' must be a variable");
 587
 588 Unlike the rest of the binary operators, our assignment operator doesn't
 589 follow the "emit LHS, emit RHS, do computation" model. As such, it is
 590 handled as a special case before the other binary operators are handled.
 591 The other strange thing is that it requires the LHS to be a variable. It
 592 is invalid to have "(x+1) = expr" - only things like "x = expr" are
 593 allowed.
 594
 595 .. code-block:: c++
 596
 597         // Codegen the RHS.
 598         Value *Val = RHS->codegen();
 599         if (!Val)
 600           return nullptr;
 601
 602         // Look up the name.
 603         Value *Variable = NamedValues[LHSE->getName()];
 604         if (!Variable)
 605           return LogErrorV("Unknown variable name");
 606
 607         Builder->CreateStore(Val, Variable);
 608         return Val;
 609       }
 610       ...
 611
 612 Once we have the variable, codegen'ing the assignment is
 613 straightforward: we emit the RHS of the assignment, create a store, and
 614 return the computed value. Returning a value allows for chained
 615 assignments like "X = (Y = Z)".
 616
 617 Now that we have an assignment operator, we can mutate loop variables
 618 and arguments. For example, we can now run code like this:
 619
 620 ::
 621
 622     # Function to print a double.
 623     extern printd(x);
 624
 625     # Define ':' for sequencing: as a low-precedence operator that ignores operands
 626     # and just returns the RHS.
 627     def binary : 1 (x y) y;
 628
 629     def test(x)
 630       printd(x) :
 631       x = 4 :
 632       printd(x);
 633
 634     test(123);
 635
 636 When run, this example prints "123" and then "4", showing that we did
 637 actually mutate the value! Okay, we have now officially implemented our
 638 goal: getting this to work requires SSA construction in the general
 639 case. However, to be really useful, we want the ability to define our
 640 own local variables, let's add this next!
 641
 642 User-defined Local Variables
 643 ============================
 644
 645 Adding var/in is just like any other extension we made to
 646 Kaleidoscope: we extend the lexer, the parser, the AST and the code
 647 generator. The first step for adding our new 'var/in' construct is to
 648 extend the lexer. As before, this is pretty trivial, the code looks like
 649 this:
 650
 651 .. code-block:: c++
 652
 653     enum Token {
 654       ...
 655       // var definition
 656       tok_var = -13
 657     ...
 658     }
 659     ...
 660     static int gettok() {
 661     ...
 662         if (IdentifierStr == "in")
 663           return tok_in;
 664         if (IdentifierStr == "binary")
 665           return tok_binary;
 666         if (IdentifierStr == "unary")
 667           return tok_unary;
 668         if (IdentifierStr == "var")
 669           return tok_var;
 670         return tok_identifier;
 671     ...
 672
 673 The next step is to define the AST node that we will construct. For
 674 var/in, it looks like this:
 675
 676 .. code-block:: c++
 677
 678     /// VarExprAST - Expression class for var/in
 679     class VarExprAST : public ExprAST {
 680       std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;
 681       std::unique_ptr<ExprAST> Body;
 682
 683     public:
 684       VarExprAST(std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames,
 685                  std::unique_ptr<ExprAST> Body)
 686         : VarNames(std::move(VarNames)), Body(std::move(Body)) {}
 687
 688       Value *codegen() override;
 689     };
 690
 691 var/in allows a list of names to be defined all at once, and each name
 692 can optionally have an initializer value. As such, we capture this
 693 information in the VarNames vector. Also, var/in has a body, this body
 694 is allowed to access the variables defined by the var/in.
 695
 696 With this in place, we can define the parser pieces. The first thing we
 697 do is add it as a primary expression:
 698
 699 .. code-block:: c++
 700
 701     /// primary
 702     ///   ::= identifierexpr
 703     ///   ::= numberexpr
 704     ///   ::= parenexpr
 705     ///   ::= ifexpr
 706     ///   ::= forexpr
 707     ///   ::= varexpr
 708     static std::unique_ptr<ExprAST> ParsePrimary() {
 709       switch (CurTok) {
 710       default:
 711         return LogError("unknown token when expecting an expression");
 712       case tok_identifier:
 713         return ParseIdentifierExpr();
 714       case tok_number:
 715         return ParseNumberExpr();
 716       case '(':
 717         return ParseParenExpr();
 718       case tok_if:
 719         return ParseIfExpr();
 720       case tok_for:
 721         return ParseForExpr();
 722       case tok_var:
 723         return ParseVarExpr();
 724       }
 725     }
 726
 727 Next we define ParseVarExpr:
 728
 729 .. code-block:: c++
 730
 731     /// varexpr ::= 'var' identifier ('=' expression)?
 732     //                    (',' identifier ('=' expression)?)* 'in' expression
 733     static std::unique_ptr<ExprAST> ParseVarExpr() {
 734       getNextToken();  // eat the var.
 735
 736       std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;
 737
 738       // At least one variable name is required.
 739       if (CurTok != tok_identifier)
 740         return LogError("expected identifier after var");
 741
 742 The first part of this code parses the list of identifier/expr pairs
 743 into the local ``VarNames`` vector.
 744
 745 .. code-block:: c++
 746
 747       while (true) {
 748         std::string Name = IdentifierStr;
 749         getNextToken();  // eat identifier.
 750
 751         // Read the optional initializer.
 752         std::unique_ptr<ExprAST> Init;
 753         if (CurTok == '=') {
 754           getNextToken(); // eat the '='.
 755
 756           Init = ParseExpression();
 757           if (!Init) return nullptr;
 758         }
 759
 760         VarNames.push_back(std::make_pair(Name, std::move(Init)));
 761
 762         // End of var list, exit loop.
 763         if (CurTok != ',') break;
 764         getNextToken(); // eat the ','.
 765
 766         if (CurTok != tok_identifier)
 767           return LogError("expected identifier list after var");
 768       }
 769
 770 Once all the variables are parsed, we then parse the body and create the
 771 AST node:
 772
 773 .. code-block:: c++
 774
 775       // At this point, we have to have 'in'.
 776       if (CurTok != tok_in)
 777         return LogError("expected 'in' keyword after 'var'");
 778       getNextToken();  // eat 'in'.
 779
 780       auto Body = ParseExpression();
 781       if (!Body)
 782         return nullptr;
 783
 784       return std::make_unique<VarExprAST>(std::move(VarNames),
 785                                            std::move(Body));
 786     }
 787
 788 Now that we can parse and represent the code, we need to support
 789 emission of LLVM IR for it. This code starts out with:
 790
 791 .. code-block:: c++
 792
 793     Value *VarExprAST::codegen() {
 794       std::vector<AllocaInst *> OldBindings;
 795
 796       Function *TheFunction = Builder->GetInsertBlock()->getParent();
 797
 798       // Register all variables and emit their initializer.
 799       for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
 800         const std::string &VarName = VarNames[i].first;
 801         ExprAST *Init = VarNames[i].second.get();
 802
 803 Basically it loops over all the variables, installing them one at a
 804 time. For each variable we put into the symbol table, we remember the
 805 previous value that we replace in OldBindings.
 806
 807 .. code-block:: c++
 808
 809         // Emit the initializer before adding the variable to scope, this prevents
 810         // the initializer from referencing the variable itself, and permits stuff
 811         // like this:
 812         //  var a = 1 in
 813         //    var a = a in ...   # refers to outer 'a'.
 814         Value *InitVal;
 815         if (Init) {
 816           InitVal = Init->codegen();
 817           if (!InitVal)
 818             return nullptr;
 819         } else { // If not specified, use 0.0.
 820           InitVal = ConstantFP::get(*TheContext, APFloat(0.0));
 821         }
 822
 823         AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
 824         Builder->CreateStore(InitVal, Alloca);
 825
 826         // Remember the old variable binding so that we can restore the binding when
 827         // we unrecurse.
 828         OldBindings.push_back(NamedValues[VarName]);
 829
 830         // Remember this binding.
 831         NamedValues[VarName] = Alloca;
 832       }
 833
 834 There are more comments here than code. The basic idea is that we emit
 835 the initializer, create the alloca, then update the symbol table to
 836 point to it. Once all the variables are installed in the symbol table,
 837 we evaluate the body of the var/in expression:
 838
 839 .. code-block:: c++
 840
 841       // Codegen the body, now that all vars are in scope.
 842       Value *BodyVal = Body->codegen();
 843       if (!BodyVal)
 844         return nullptr;
 845
 846 Finally, before returning, we restore the previous variable bindings:
 847
 848 .. code-block:: c++
 849
 850       // Pop all our variables from scope.
 851       for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
 852         NamedValues[VarNames[i].first] = OldBindings[i];
 853
 854       // Return the body computation.
 855       return BodyVal;
 856     }
 857
 858 The end result of all of this is that we get properly scoped variable
 859 definitions, and we even (trivially) allow mutation of them :).
 860
 861 With this, we completed what we set out to do. Our nice iterative fib
 862 example from the intro compiles and runs just fine. The mem2reg pass
 863 optimizes all of our stack variables into SSA registers, inserting PHI
 864 nodes where needed, and our front-end remains simple: no "iterated
 865 dominance frontier" computation anywhere in sight.
 866
 867 Full Code Listing
 868 =================
 869
 870 Here is the complete code listing for our running example, enhanced with
 871 mutable variables and var/in support. To build this example, use:
 872
 873 .. code-block:: bash
 874
 875     # Compile
 876     clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
 877     # Run
 878     ./toy
 879
 880 Here is the code:
 881
 882 .. literalinclude:: ../../../examples/Kaleidoscope/Chapter7/toy.cpp
 883    :language: c++
 884
 885 `Next: Compiling to Object Code <LangImpl08.html>`_
 886