docs/tutorial/MyFirstLanguageFrontend/LangImpl07.rst

   1 :orphan:
   2
   3 =======================================================
   4 Kaleidoscope: Extending the Language: Mutable Variables
   5 =======================================================
   6
   7 .. contents::
   8    :local:
   9
  10 Chapter 7 Introduction
  11 ======================
  12
  13 Welcome to Chapter 7 of the "`Implementing a language with
  14 LLVM <index.html>`_" tutorial. In chapters 1 through 6, we've built a
  15 very respectable, albeit simple, `functional programming
  16 language <http://en.wikipedia.org/wiki/Functional_programming>`_. In our
  17 journey, we learned some parsing techniques, how to build and represent
  18 an AST, how to build LLVM IR, and how to optimize the resultant code as
  19 well as JIT compile it.
  20
  21 While Kaleidoscope is interesting as a functional language, the fact
  22 that it is functional makes it "too easy" to generate LLVM IR for it. In
  23 particular, a functional language makes it very easy to build LLVM IR
  24 directly in `SSA
  25 form <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_.
  26 Since LLVM requires that the input code be in SSA form, this is a very
  27 nice property and it is often unclear to newcomers how to generate code
  28 for an imperative language with mutable variables.
  29
  30 The short (and happy) summary of this chapter is that there is no need
  31 for your front-end to build SSA form: LLVM provides highly tuned and
  32 well tested support for this, though the way it works is a bit
  33 unexpected for some.
  34
  35 Why is this a hard problem?
  36 ===========================
  37
  38 To understand why mutable variables cause complexities in SSA
  39 construction, consider this extremely simple C example:
  40
  41 .. code-block:: c
  42
  43     int G, H;
  44     int test(_Bool Condition) {
  45       int X;
  46       if (Condition)
  47         X = G;
  48       else
  49         X = H;
  50       return X;
  51     }
  52
  53 In this case, we have the variable "X", whose value depends on the path
  54 executed in the program. Because there are two different possible values
  55 for X before the return instruction, a PHI node is inserted to merge the
  56 two values. The LLVM IR that we want for this example looks like this:
  57
  58 .. code-block:: llvm
  59
  60     @G = weak global i32 0   ; type of @G is i32*
  61     @H = weak global i32 0   ; type of @H is i32*
  62
  63     define i32 @test(i1 %Condition) {
  64     entry:
  65       br i1 %Condition, label %cond_true, label %cond_false
  66
  67     cond_true:
  68       %X.0 = load i32* @G
  69       br label %cond_next
  70
  71     cond_false:
  72       %X.1 = load i32* @H
  73       br label %cond_next
  74
  75     cond_next:
  76       %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
  77       ret i32 %X.2
  78     }
  79
  80 In this example, the loads from the G and H global variables are
  81 explicit in the LLVM IR, and they live in the then/else branches of the
  82 if statement (cond\_true/cond\_false). In order to merge the incoming
  83 values, the X.2 phi node in the cond\_next block selects the right value
  84 to use based on where control flow is coming from: if control flow comes
  85 from the cond\_false block, X.2 gets the value of X.1. Alternatively, if
  86 control flow comes from cond\_true, it gets the value of X.0. The intent
  87 of this chapter is not to explain the details of SSA form. For more
  88 information, see one of the many `online
  89 references <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_.
  90
  91 The question for this article is "who places the phi nodes when lowering
  92 assignments to mutable variables?". The issue here is that LLVM
  93 *requires* that its IR be in SSA form: there is no "non-ssa" mode for
  94 it. However, SSA construction requires non-trivial algorithms and data
  95 structures, so it is inconvenient and wasteful for every front-end to
  96 have to reproduce this logic.
  97
  98 Memory in LLVM
  99 ==============
 100
 101 The 'trick' here is that while LLVM does require all register values to
 102 be in SSA form, it does not require (or permit) memory objects to be in
 103 SSA form. In the example above, note that the loads from G and H are
 104 direct accesses to G and H: they are not renamed or versioned. This
 105 differs from some other compiler systems, which do try to version memory
 106 objects. In LLVM, instead of encoding dataflow analysis of memory into
 107 the LLVM IR, it is handled with `Analysis
 108 Passes <../WritingAnLLVMPass.html>`_ which are computed on demand.
 109
 110 With this in mind, the high-level idea is that we want to make a stack
 111 variable (which lives in memory, because it is on the stack) for each
 112 mutable object in a function. To take advantage of this trick, we need
 113 to talk about how LLVM represents stack variables.
 114
 115 In LLVM, all memory accesses are explicit with load/store instructions,
 116 and it is carefully designed not to have (or need) an "address-of"
 117 operator. Notice how the type of the @G/@H global variables is actually
 118 "i32\*" even though the variable is defined as "i32". What this means is
 119 that @G defines *space* for an i32 in the global data area, but its
 120 *name* actually refers to the address for that space. Stack variables
 121 work the same way, except that instead of being declared with global
 122 variable definitions, they are declared with the `LLVM alloca
 123 instruction <../LangRef.html#alloca-instruction>`_:
 124
 125 .. code-block:: llvm
 126
 127     define i32 @example() {
 128     entry:
 129       %X = alloca i32           ; type of %X is i32*.
 130       ...
 131       %tmp = load i32* %X       ; load the stack value %X from the stack.
 132       %tmp2 = add i32 %tmp, 1   ; increment it
 133       store i32 %tmp2, i32* %X  ; store it back
 134       ...
 135
 136 This code shows an example of how you can declare and manipulate a stack
 137 variable in the LLVM IR. Stack memory allocated with the alloca
 138 instruction is fully general: you can pass the address of the stack slot
 139 to functions, you can store it in other variables, etc. In our example
 140 above, we could rewrite the example to use the alloca technique to avoid
 141 using a PHI node:
 142
 143 .. code-block:: llvm
 144
 145     @G = weak global i32 0   ; type of @G is i32*
 146     @H = weak global i32 0   ; type of @H is i32*
 147
 148     define i32 @test(i1 %Condition) {
 149     entry:
 150       %X = alloca i32           ; type of %X is i32*.
 151       br i1 %Condition, label %cond_true, label %cond_false
 152
 153     cond_true:
 154       %X.0 = load i32* @G
 155       store i32 %X.0, i32* %X   ; Update X
 156       br label %cond_next
 157
 158     cond_false:
 159       %X.1 = load i32* @H
 160       store i32 %X.1, i32* %X   ; Update X
 161       br label %cond_next
 162
 163     cond_next:
 164       %X.2 = load i32* %X       ; Read X
 165       ret i32 %X.2
 166     }
 167
 168 With this, we have discovered a way to handle arbitrary mutable
 169 variables without the need to create Phi nodes at all:
 170
 171 #. Each mutable variable becomes a stack allocation.
 172 #. Each read of the variable becomes a load from the stack.
 173 #. Each update of the variable becomes a store to the stack.
 174 #. Taking the address of a variable just uses the stack address
 175    directly.
 176
 177 While this solution has solved our immediate problem, it introduced
 178 another one: we have now apparently introduced a lot of stack traffic
 179 for very simple and common operations, a major performance problem.
 180 Fortunately for us, the LLVM optimizer has a highly-tuned optimization
 181 pass named "mem2reg" that handles this case, promoting allocas like this
 182 into SSA registers, inserting Phi nodes as appropriate. If you run this
 183 example through the pass, for example, you'll get:
 184
 185 .. code-block:: bash
 186
 187     $ llvm-as < example.ll | opt -mem2reg | llvm-dis
 188     @G = weak global i32 0
 189     @H = weak global i32 0
 190
 191     define i32 @test(i1 %Condition) {
 192     entry:
 193       br i1 %Condition, label %cond_true, label %cond_false
 194
 195     cond_true:
 196       %X.0 = load i32* @G
 197       br label %cond_next
 198
 199     cond_false:
 200       %X.1 = load i32* @H
 201       br label %cond_next
 202
 203     cond_next:
 204       %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
 205       ret i32 %X.01
 206     }
 207
 208 The mem2reg pass implements the standard "iterated dominance frontier"
 209 algorithm for constructing SSA form and has a number of optimizations
 210 that speed up (very common) degenerate cases. The mem2reg optimization
 211 pass is the answer to dealing with mutable variables, and we highly
 212 recommend that you depend on it. Note that mem2reg only works on
 213 variables in certain circumstances:
 214
 215 #. mem2reg is alloca-driven: it looks for allocas and if it can handle
 216    them, it promotes them. It does not apply to global variables or heap
 217    allocations.
 218 #. mem2reg only looks for alloca instructions in the entry block of the
 219    function. Being in the entry block guarantees that the alloca is only
 220    executed once, which makes analysis simpler.
 221 #. mem2reg only promotes allocas whose uses are direct loads and stores.
 222    If the address of the stack object is passed to a function, or if any
 223    funny pointer arithmetic is involved, the alloca will not be
 224    promoted.
 225 #. mem2reg only works on allocas of `first
 226    class <../LangRef.html#first-class-types>`_ values (such as pointers,
 227    scalars and vectors), and only if the array size of the allocation is
 228    1 (or missing in the .ll file). mem2reg is not capable of promoting
 229    structs or arrays to registers. Note that the "sroa" pass is
 230    more powerful and can promote structs, "unions", and arrays in many
 231    cases.
 232
 233 All of these properties are easy to satisfy for most imperative
 234 languages, and we'll illustrate it below with Kaleidoscope. The final
 235 question you may be asking is: should I bother with this nonsense for my
 236 front-end? Wouldn't it be better if I just did SSA construction
 237 directly, avoiding use of the mem2reg optimization pass? In short, we
 238 strongly recommend that you use this technique for building SSA form,
 239 unless there is an extremely good reason not to. Using this technique
 240 is:
 241
 242 -  Proven and well tested: clang uses this technique
 243    for local mutable variables. As such, the most common clients of LLVM
 244    are using this to handle a bulk of their variables. You can be sure
 245    that bugs are found fast and fixed early.
 246 -  Extremely Fast: mem2reg has a number of special cases that make it
 247    fast in common cases as well as fully general. For example, it has
 248    fast-paths for variables that are only used in a single block,
 249    variables that only have one assignment point, good heuristics to
 250    avoid insertion of unneeded phi nodes, etc.
 251 -  Needed for debug info generation: `Debug information in
 252    LLVM <../SourceLevelDebugging.html>`_ relies on having the address of
 253    the variable exposed so that debug info can be attached to it. This
 254    technique dovetails very naturally with this style of debug info.
 255
 256 If nothing else, this makes it much easier to get your front-end up and
 257 running, and is very simple to implement. Let's extend Kaleidoscope with
 258 mutable variables now!
 259
 260 Mutable Variables in Kaleidoscope
 261 =================================
 262
 263 Now that we know the sort of problem we want to tackle, let's see what
 264 this looks like in the context of our little Kaleidoscope language.
 265 We're going to add two features:
 266
 267 #. The ability to mutate variables with the '=' operator.
 268 #. The ability to define new variables.
 269
 270 While the first item is really what this is about, we only have
 271 variables for incoming arguments as well as for induction variables, and
 272 redefining those only goes so far :). Also, the ability to define new
 273 variables is a useful thing regardless of whether you will be mutating
 274 them. Here's a motivating example that shows how we could use these:
 275
 276 ::
 277
 278     # Define ':' for sequencing: as a low-precedence operator that ignores operands
 279     # and just returns the RHS.
 280     def binary : 1 (x y) y;
 281
 282     # Recursive fib, we could do this before.
 283     def fib(x)
 284       if (x < 3) then
 285         1
 286       else
 287         fib(x-1)+fib(x-2);
 288
 289     # Iterative fib.
 290     def fibi(x)
 291       var a = 1, b = 1, c in
 292       (for i = 3, i < x in
 293          c = a + b :
 294          a = b :
 295          b = c) :
 296       b;
 297
 298     # Call it.
 299     fibi(10);
 300
 301 In order to mutate variables, we have to change our existing variables
 302 to use the "alloca trick". Once we have that, we'll add our new
 303 operator, then extend Kaleidoscope to support new variable definitions.
 304
 305 Adjusting Existing Variables for Mutation
 306 =========================================
 307
 308 The symbol table in Kaleidoscope is managed at code generation time by
 309 the '``NamedValues``' map. This map currently keeps track of the LLVM
 310 "Value\*" that holds the double value for the named variable. In order
 311 to support mutation, we need to change this slightly, so that
 312 ``NamedValues`` holds the *memory location* of the variable in question.
 313 Note that this change is a refactoring: it changes the structure of the
 314 code, but does not (by itself) change the behavior of the compiler. All
 315 of these changes are isolated in the Kaleidoscope code generator.
 316
 317 At this point in Kaleidoscope's development, it only supports variables
 318 for two things: incoming arguments to functions and the induction
 319 variable of 'for' loops. For consistency, we'll allow mutation of these
 320 variables in addition to other user-defined variables. This means that
 321 these will both need memory locations.
 322
 323 To start our transformation of Kaleidoscope, we'll change the
 324 NamedValues map so that it maps to AllocaInst\* instead of Value\*. Once
 325 we do this, the C++ compiler will tell us what parts of the code we need
 326 to update:
 327
 328 .. code-block:: c++
 329
 330     static std::map<std::string, AllocaInst*> NamedValues;
 331
 332 Also, since we will need to create these allocas, we'll use a helper
 333 function that ensures that the allocas are created in the entry block of
 334 the function:
 335
 336 .. code-block:: c++
 337
 338     /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
 339     /// the function.  This is used for mutable variables etc.
 340     static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
 341                                               const std::string &VarName) {
 342       IRBuilder<> TmpB(&TheFunction->getEntryBlock(),
 343                      TheFunction->getEntryBlock().begin());
 344       return TmpB.CreateAlloca(Type::getDoubleTy(TheContext), 0,
 345                                VarName.c_str());
 346     }
 347
 348 This funny looking code creates an IRBuilder object that is pointing at
 349 the first instruction (.begin()) of the entry block. It then creates an
 350 alloca with the expected name and returns it. Because all values in
 351 Kaleidoscope are doubles, there is no need to pass in a type to use.
 352
 353 With this in place, the first functionality change we want to make belongs to
 354 variable references. In our new scheme, variables live on the stack, so
 355 code generating a reference to them actually needs to produce a load
 356 from the stack slot:
 357
 358 .. code-block:: c++
 359
 360     Value *VariableExprAST::codegen() {
 361       // Look this variable up in the function.
 362       Value *V = NamedValues[Name];
 363       if (!V)
 364         return LogErrorV("Unknown variable name");
 365
 366       // Load the value.
 367       return Builder.CreateLoad(V, Name.c_str());
 368     }
 369
 370 As you can see, this is pretty straightforward. Now we need to update
 371 the things that define the variables to set up the alloca. We'll start
 372 with ``ForExprAST::codegen()`` (see the `full code listing <#id1>`_ for
 373 the unabridged code):
 374
 375 .. code-block:: c++
 376
 377       Function *TheFunction = Builder.GetInsertBlock()->getParent();
 378
 379       // Create an alloca for the variable in the entry block.
 380       AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
 381
 382       // Emit the start code first, without 'variable' in scope.
 383       Value *StartVal = Start->codegen();
 384       if (!StartVal)
 385         return nullptr;
 386
 387       // Store the value into the alloca.
 388       Builder.CreateStore(StartVal, Alloca);
 389       ...
 390
 391       // Compute the end condition.
 392       Value *EndCond = End->codegen();
 393       if (!EndCond)
 394         return nullptr;
 395
 396       // Reload, increment, and restore the alloca.  This handles the case where
 397       // the body of the loop mutates the variable.
 398       Value *CurVar = Builder.CreateLoad(Alloca);
 399       Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
 400       Builder.CreateStore(NextVar, Alloca);
 401       ...
 402
 403 This code is virtually identical to the code `before we allowed mutable
 404 variables <LangImpl5.html#code-generation-for-the-for-loop>`_. The big difference is that we
 405 no longer have to construct a PHI node, and we use load/store to access
 406 the variable as needed.
 407
 408 To support mutable argument variables, we need to also make allocas for
 409 them. The code for this is also pretty simple:
 410
 411 .. code-block:: c++
 412
 413     Function *FunctionAST::codegen() {
 414       ...
 415       Builder.SetInsertPoint(BB);
 416
 417       // Record the function arguments in the NamedValues map.
 418       NamedValues.clear();
 419       for (auto &Arg : TheFunction->args()) {
 420         // Create an alloca for this variable.
 421         AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, Arg.getName());
 422
 423         // Store the initial value into the alloca.
 424         Builder.CreateStore(&Arg, Alloca);
 425
 426         // Add arguments to variable symbol table.
 427         NamedValues[Arg.getName()] = Alloca;
 428       }
 429
 430       if (Value *RetVal = Body->codegen()) {
 431         ...
 432
 433 For each argument, we make an alloca, store the input value to the
 434 function into the alloca, and register the alloca as the memory location
 435 for the argument. This method gets invoked by ``FunctionAST::codegen()``
 436 right after it sets up the entry block for the function.
 437
 438 The final missing piece is adding the mem2reg pass, which allows us to
 439 get good codegen once again:
 440
 441 .. code-block:: c++
 442
 443         // Promote allocas to registers.
 444         TheFPM->add(createPromoteMemoryToRegisterPass());
 445         // Do simple "peephole" optimizations and bit-twiddling optzns.
 446         TheFPM->add(createInstructionCombiningPass());
 447         // Reassociate expressions.
 448         TheFPM->add(createReassociatePass());
 449         ...
 450
 451 It is interesting to see what the code looks like before and after the
 452 mem2reg optimization runs. For example, this is the before/after code
 453 for our recursive fib function. Before the optimization:
 454
 455 .. code-block:: llvm
 456
 457     define double @fib(double %x) {
 458     entry:
 459       %x1 = alloca double
 460       store double %x, double* %x1
 461       %x2 = load double, double* %x1
 462       %cmptmp = fcmp ult double %x2, 3.000000e+00
 463       %booltmp = uitofp i1 %cmptmp to double
 464       %ifcond = fcmp one double %booltmp, 0.000000e+00
 465       br i1 %ifcond, label %then, label %else
 466
 467     then:       ; preds = %entry
 468       br label %ifcont
 469
 470     else:       ; preds = %entry
 471       %x3 = load double, double* %x1
 472       %subtmp = fsub double %x3, 1.000000e+00
 473       %calltmp = call double @fib(double %subtmp)
 474       %x4 = load double, double* %x1
 475       %subtmp5 = fsub double %x4, 2.000000e+00
 476       %calltmp6 = call double @fib(double %subtmp5)
 477       %addtmp = fadd double %calltmp, %calltmp6
 478       br label %ifcont
 479
 480     ifcont:     ; preds = %else, %then
 481       %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
 482       ret double %iftmp
 483     }
 484
 485 Here there is only one variable (x, the input argument) but you can
 486 still see the extremely simple-minded code generation strategy we are
 487 using. In the entry block, an alloca is created, and the initial input
 488 value is stored into it. Each reference to the variable does a reload
 489 from the stack. Also, note that we didn't modify the if/then/else
 490 expression, so it still inserts a PHI node. While we could make an
 491 alloca for it, it is actually easier to create a PHI node for it, so we
 492 still just make the PHI.
 493
 494 Here is the code after the mem2reg pass runs:
 495
 496 .. code-block:: llvm
 497
 498     define double @fib(double %x) {
 499     entry:
 500       %cmptmp = fcmp ult double %x, 3.000000e+00
 501       %booltmp = uitofp i1 %cmptmp to double
 502       %ifcond = fcmp one double %booltmp, 0.000000e+00
 503       br i1 %ifcond, label %then, label %else
 504
 505     then:
 506       br label %ifcont
 507
 508     else:
 509       %subtmp = fsub double %x, 1.000000e+00
 510       %calltmp = call double @fib(double %subtmp)
 511       %subtmp5 = fsub double %x, 2.000000e+00
 512       %calltmp6 = call double @fib(double %subtmp5)
 513       %addtmp = fadd double %calltmp, %calltmp6
 514       br label %ifcont
 515
 516     ifcont:     ; preds = %else, %then
 517       %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
 518       ret double %iftmp
 519     }
 520
 521 This is a trivial case for mem2reg, since there are no redefinitions of
 522 the variable. The point of showing this is to calm your tension about
 523 inserting such blatant inefficiencies :).
 524
 525 After the rest of the optimizers run, we get:
 526
 527 .. code-block:: llvm
 528
 529     define double @fib(double %x) {
 530     entry:
 531       %cmptmp = fcmp ult double %x, 3.000000e+00
 532       %booltmp = uitofp i1 %cmptmp to double
 533       %ifcond = fcmp ueq double %booltmp, 0.000000e+00
 534       br i1 %ifcond, label %else, label %ifcont
 535
 536     else:
 537       %subtmp = fsub double %x, 1.000000e+00
 538       %calltmp = call double @fib(double %subtmp)
 539       %subtmp5 = fsub double %x, 2.000000e+00
 540       %calltmp6 = call double @fib(double %subtmp5)
 541       %addtmp = fadd double %calltmp, %calltmp6
 542       ret double %addtmp
 543
 544     ifcont:
 545       ret double 1.000000e+00
 546     }
 547
 548 Here we see that the simplifycfg pass decided to clone the return
 549 instruction into the end of the 'else' block. This allowed it to
 550 eliminate some branches and the PHI node.
 551
 552 Now that all symbol table references are updated to use stack variables,
 553 we'll add the assignment operator.
 554
 555 New Assignment Operator
 556 =======================
 557
 558 With our current framework, adding a new assignment operator is really
 559 simple. We will parse it just like any other binary operator, but handle
 560 it internally (instead of allowing the user to define it). The first
 561 step is to set a precedence:
 562
 563 .. code-block:: c++
 564
 565      int main() {
 566        // Install standard binary operators.
 567        // 1 is lowest precedence.
 568        BinopPrecedence['='] = 2;
 569        BinopPrecedence['<'] = 10;
 570        BinopPrecedence['+'] = 20;
 571        BinopPrecedence['-'] = 20;
 572
 573 Now that the parser knows the precedence of the binary operator, it
 574 takes care of all the parsing and AST generation. We just need to
 575 implement codegen for the assignment operator. This looks like:
 576
 577 .. code-block:: c++
 578
 579     Value *BinaryExprAST::codegen() {
 580       // Special case '=' because we don't want to emit the LHS as an expression.
 581       if (Op == '=') {
 582         // Assignment requires the LHS to be an identifier.
 583         VariableExprAST *LHSE = dynamic_cast<VariableExprAST*>(LHS.get());
 584         if (!LHSE)
 585           return LogErrorV("destination of '=' must be a variable");
 586
 587 Unlike the rest of the binary operators, our assignment operator doesn't
 588 follow the "emit LHS, emit RHS, do computation" model. As such, it is
 589 handled as a special case before the other binary operators are handled.
 590 The other strange thing is that it requires the LHS to be a variable. It
 591 is invalid to have "(x+1) = expr" - only things like "x = expr" are
 592 allowed.
 593
 594 .. code-block:: c++
 595
 596         // Codegen the RHS.
 597         Value *Val = RHS->codegen();
 598         if (!Val)
 599           return nullptr;
 600
 601         // Look up the name.
 602         Value *Variable = NamedValues[LHSE->getName()];
 603         if (!Variable)
 604           return LogErrorV("Unknown variable name");
 605
 606         Builder.CreateStore(Val, Variable);
 607         return Val;
 608       }
 609       ...
 610
 611 Once we have the variable, codegen'ing the assignment is
 612 straightforward: we emit the RHS of the assignment, create a store, and
 613 return the computed value. Returning a value allows for chained
 614 assignments like "X = (Y = Z)".
 615
 616 Now that we have an assignment operator, we can mutate loop variables
 617 and arguments. For example, we can now run code like this:
 618
 619 ::
 620
 621     # Function to print a double.
 622     extern printd(x);
 623
 624     # Define ':' for sequencing: as a low-precedence operator that ignores operands
 625     # and just returns the RHS.
 626     def binary : 1 (x y) y;
 627
 628     def test(x)
 629       printd(x) :
 630       x = 4 :
 631       printd(x);
 632
 633     test(123);
 634
 635 When run, this example prints "123" and then "4", showing that we did
 636 actually mutate the value! Okay, we have now officially implemented our
 637 goal: getting this to work requires SSA construction in the general
 638 case. However, to be really useful, we want the ability to define our
 639 own local variables, let's add this next!
 640
 641 User-defined Local Variables
 642 ============================
 643
 644 Adding var/in is just like any other extension we made to
 645 Kaleidoscope: we extend the lexer, the parser, the AST and the code
 646 generator. The first step for adding our new 'var/in' construct is to
 647 extend the lexer. As before, this is pretty trivial, the code looks like
 648 this:
 649
 650 .. code-block:: c++
 651
 652     enum Token {
 653       ...
 654       // var definition
 655       tok_var = -13
 656     ...
 657     }
 658     ...
 659     static int gettok() {
 660     ...
 661         if (IdentifierStr == "in")
 662           return tok_in;
 663         if (IdentifierStr == "binary")
 664           return tok_binary;
 665         if (IdentifierStr == "unary")
 666           return tok_unary;
 667         if (IdentifierStr == "var")
 668           return tok_var;
 669         return tok_identifier;
 670     ...
 671
 672 The next step is to define the AST node that we will construct. For
 673 var/in, it looks like this:
 674
 675 .. code-block:: c++
 676
 677     /// VarExprAST - Expression class for var/in
 678     class VarExprAST : public ExprAST {
 679       std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;
 680       std::unique_ptr<ExprAST> Body;
 681
 682     public:
 683       VarExprAST(std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames,
 684                  std::unique_ptr<ExprAST> Body)
 685         : VarNames(std::move(VarNames)), Body(std::move(Body)) {}
 686
 687       Value *codegen() override;
 688     };
 689
 690 var/in allows a list of names to be defined all at once, and each name
 691 can optionally have an initializer value. As such, we capture this
 692 information in the VarNames vector. Also, var/in has a body, this body
 693 is allowed to access the variables defined by the var/in.
 694
 695 With this in place, we can define the parser pieces. The first thing we
 696 do is add it as a primary expression:
 697
 698 .. code-block:: c++
 699
 700     /// primary
 701     ///   ::= identifierexpr
 702     ///   ::= numberexpr
 703     ///   ::= parenexpr
 704     ///   ::= ifexpr
 705     ///   ::= forexpr
 706     ///   ::= varexpr
 707     static std::unique_ptr<ExprAST> ParsePrimary() {
 708       switch (CurTok) {
 709       default:
 710         return LogError("unknown token when expecting an expression");
 711       case tok_identifier:
 712         return ParseIdentifierExpr();
 713       case tok_number:
 714         return ParseNumberExpr();
 715       case '(':
 716         return ParseParenExpr();
 717       case tok_if:
 718         return ParseIfExpr();
 719       case tok_for:
 720         return ParseForExpr();
 721       case tok_var:
 722         return ParseVarExpr();
 723       }
 724     }
 725
 726 Next we define ParseVarExpr:
 727
 728 .. code-block:: c++
 729
 730     /// varexpr ::= 'var' identifier ('=' expression)?
 731     //                    (',' identifier ('=' expression)?)* 'in' expression
 732     static std::unique_ptr<ExprAST> ParseVarExpr() {
 733       getNextToken();  // eat the var.
 734
 735       std::vector<std::pair<std::string, std::unique_ptr<ExprAST>>> VarNames;
 736
 737       // At least one variable name is required.
 738       if (CurTok != tok_identifier)
 739         return LogError("expected identifier after var");
 740
 741 The first part of this code parses the list of identifier/expr pairs
 742 into the local ``VarNames`` vector.
 743
 744 .. code-block:: c++
 745
 746       while (1) {
 747         std::string Name = IdentifierStr;
 748         getNextToken();  // eat identifier.
 749
 750         // Read the optional initializer.
 751         std::unique_ptr<ExprAST> Init;
 752         if (CurTok == '=') {
 753           getNextToken(); // eat the '='.
 754
 755           Init = ParseExpression();
 756           if (!Init) return nullptr;
 757         }
 758
 759         VarNames.push_back(std::make_pair(Name, std::move(Init)));
 760
 761         // End of var list, exit loop.
 762         if (CurTok != ',') break;
 763         getNextToken(); // eat the ','.
 764
 765         if (CurTok != tok_identifier)
 766           return LogError("expected identifier list after var");
 767       }
 768
 769 Once all the variables are parsed, we then parse the body and create the
 770 AST node:
 771
 772 .. code-block:: c++
 773
 774       // At this point, we have to have 'in'.
 775       if (CurTok != tok_in)
 776         return LogError("expected 'in' keyword after 'var'");
 777       getNextToken();  // eat 'in'.
 778
 779       auto Body = ParseExpression();
 780       if (!Body)
 781         return nullptr;
 782
 783       return std::make_unique<VarExprAST>(std::move(VarNames),
 784                                            std::move(Body));
 785     }
 786
 787 Now that we can parse and represent the code, we need to support
 788 emission of LLVM IR for it. This code starts out with:
 789
 790 .. code-block:: c++
 791
 792     Value *VarExprAST::codegen() {
 793       std::vector<AllocaInst *> OldBindings;
 794
 795       Function *TheFunction = Builder.GetInsertBlock()->getParent();
 796
 797       // Register all variables and emit their initializer.
 798       for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
 799         const std::string &VarName = VarNames[i].first;
 800         ExprAST *Init = VarNames[i].second.get();
 801
 802 Basically it loops over all the variables, installing them one at a
 803 time. For each variable we put into the symbol table, we remember the
 804 previous value that we replace in OldBindings.
 805
 806 .. code-block:: c++
 807
 808         // Emit the initializer before adding the variable to scope, this prevents
 809         // the initializer from referencing the variable itself, and permits stuff
 810         // like this:
 811         //  var a = 1 in
 812         //    var a = a in ...   # refers to outer 'a'.
 813         Value *InitVal;
 814         if (Init) {
 815           InitVal = Init->codegen();
 816           if (!InitVal)
 817             return nullptr;
 818         } else { // If not specified, use 0.0.
 819           InitVal = ConstantFP::get(TheContext, APFloat(0.0));
 820         }
 821
 822         AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
 823         Builder.CreateStore(InitVal, Alloca);
 824
 825         // Remember the old variable binding so that we can restore the binding when
 826         // we unrecurse.
 827         OldBindings.push_back(NamedValues[VarName]);
 828
 829         // Remember this binding.
 830         NamedValues[VarName] = Alloca;
 831       }
 832
 833 There are more comments here than code. The basic idea is that we emit
 834 the initializer, create the alloca, then update the symbol table to
 835 point to it. Once all the variables are installed in the symbol table,
 836 we evaluate the body of the var/in expression:
 837
 838 .. code-block:: c++
 839
 840       // Codegen the body, now that all vars are in scope.
 841       Value *BodyVal = Body->codegen();
 842       if (!BodyVal)
 843         return nullptr;
 844
 845 Finally, before returning, we restore the previous variable bindings:
 846
 847 .. code-block:: c++
 848
 849       // Pop all our variables from scope.
 850       for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
 851         NamedValues[VarNames[i].first] = OldBindings[i];
 852
 853       // Return the body computation.
 854       return BodyVal;
 855     }
 856
 857 The end result of all of this is that we get properly scoped variable
 858 definitions, and we even (trivially) allow mutation of them :).
 859
 860 With this, we completed what we set out to do. Our nice iterative fib
 861 example from the intro compiles and runs just fine. The mem2reg pass
 862 optimizes all of our stack variables into SSA registers, inserting PHI
 863 nodes where needed, and our front-end remains simple: no "iterated
 864 dominance frontier" computation anywhere in sight.
 865
 866 Full Code Listing
 867 =================
 868
 869 Here is the complete code listing for our running example, enhanced with
 870 mutable variables and var/in support. To build this example, use:
 871
 872 .. code-block:: bash
 873
 874     # Compile
 875     clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
 876     # Run
 877     ./toy
 878
 879 Here is the code:
 880
 881 .. literalinclude:: ../../../examples/Kaleidoscope/Chapter7/toy.cpp
 882    :language: c++
 883
 884 `Next: Compiling to Object Code <LangImpl08.html>`_
 885