docs/tutorial/LangImpl7.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2                       "http://www.w3.org/TR/html4/strict.dtd">
   3
   4 <html>
   5 <head>
   6   <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
   7          construction</title>
   8   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   9   <meta name="author" content="Chris Lattner">
  10   <link rel="stylesheet" href="../llvm.css" type="text/css">
  11 </head>
  12
  13 <body>
  14
  15 <div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
  16
  17 <ul>
  18 <li><a href="index.html">Up to Tutorial Index</a></li>
  19 <li>Chapter 7
  20   <ol>
  21     <li><a href="#intro">Chapter 7 Introduction</a></li>
  22     <li><a href="#why">Why is this a hard problem?</a></li>
  23     <li><a href="#memory">Memory in LLVM</a></li>
  24     <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li>
  25     <li><a href="#adjustments">Adjusting Existing Variables for
  26      Mutation</a></li>
  27     <li><a href="#assignment">New Assignment Operator</a></li>
  28     <li><a href="#localvars">User-defined Local Variables</a></li>
  29     <li><a href="#code">Full Code Listing</a></li>
  30   </ol>
  31 </li>
  32 <li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM
  33  tidbits</li>
  34 </ul>
  35
  36 <div class="doc_author">
  37   <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
  38 </div>
  39
  40 <!-- *********************************************************************** -->
  41 <div class="doc_section"><a name="intro">Chapter 7 Introduction</a></div>
  42 <!-- *********************************************************************** -->
  43
  44 <div class="doc_text">
  45
  46 <p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language
  47 with LLVM</a>" tutorial.  In chapters 1 through 6, we've built a very
  48 respectable, albeit simple, <a
  49 href="http://en.wikipedia.org/wiki/Functional_programming">functional
  50 programming language</a>.  In our journey, we learned some parsing techniques,
  51 how to build and represent an AST, how to build LLVM IR, and how to optimize
  52 the resultant code as well as JIT compile it.</p>
  53
  54 <p>While Kaleidoscope is interesting as a functional language, the fact that it
  55 is functional makes it "too easy" to generate LLVM IR for it.  In particular, a
  56 functional language makes it very easy to build LLVM IR directly in <a
  57 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
  58 Since LLVM requires that the input code be in SSA form, this is a very nice
  59 property and it is often unclear to newcomers how to generate code for an
  60 imperative language with mutable variables.</p>
  61
  62 <p>The short (and happy) summary of this chapter is that there is no need for
  63 your front-end to build SSA form: LLVM provides highly tuned and well tested
  64 support for this, though the way it works is a bit unexpected for some.</p>
  65
  66 </div>
  67
  68 <!-- *********************************************************************** -->
  69 <div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
  70 <!-- *********************************************************************** -->
  71
  72 <div class="doc_text">
  73
  74 <p>
  75 To understand why mutable variables cause complexities in SSA construction,
  76 consider this extremely simple C example:
  77 </p>
  78
  79 <div class="doc_code">
  80 <pre>
  81 int G, H;
  82 int test(_Bool Condition) {
  83   int X;
  84   if (Condition)
  85     X = G;
  86   else
  87     X = H;
  88   return X;
  89 }
  90 </pre>
  91 </div>
  92
  93 <p>In this case, we have the variable "X", whose value depends on the path
  94 executed in the program.  Because there are two different possible values for X
  95 before the return instruction, a PHI node is inserted to merge the two values.
  96 The LLVM IR that we want for this example looks like this:</p>
  97
  98 <div class="doc_code">
  99 <pre>
 100 @G = weak global i32 0   ; type of @G is i32*
 101 @H = weak global i32 0   ; type of @H is i32*
 102
 103 define i32 @test(i1 %Condition) {
 104 entry:
 105         br i1 %Condition, label %cond_true, label %cond_false
 106
 107 cond_true:
 108         %X.0 = load i32* @G
 109         br label %cond_next
 110
 111 cond_false:
 112         %X.1 = load i32* @H
 113         br label %cond_next
 114
 115 cond_next:
 116         %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
 117         ret i32 %X.2
 118 }
 119 </pre>
 120 </div>
 121
 122 <p>In this example, the loads from the G and H global variables are explicit in
 123 the LLVM IR, and they live in the then/else branches of the if statement
 124 (cond_true/cond_false).  In order to merge the incoming values, the X.2 phi node
 125 in the cond_next block selects the right value to use based on where control
 126 flow is coming from: if control flow comes from the cond_false block, X.2 gets
 127 the value of X.1.  Alternatively, if control flow comes from cond_true, it gets
 128 the value of X.0.  The intent of this chapter is not to explain the details of
 129 SSA form.  For more information, see one of the many <a
 130 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
 131 references</a>.</p>
 132
 133 <p>The question for this article is "who places the phi nodes when lowering
 134 assignments to mutable variables?".  The issue here is that LLVM
 135 <em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
 136 However, SSA construction requires non-trivial algorithms and data structures,
 137 so it is inconvenient and wasteful for every front-end to have to reproduce this
 138 logic.</p>
 139
 140 </div>
 141
 142 <!-- *********************************************************************** -->
 143 <div class="doc_section"><a name="memory">Memory in LLVM</a></div>
 144 <!-- *********************************************************************** -->
 145
 146 <div class="doc_text">
 147
 148 <p>The 'trick' here is that while LLVM does require all register values to be
 149 in SSA form, it does not require (or permit) memory objects to be in SSA form.
 150 In the example above, note that the loads from G and H are direct accesses to
 151 G and H: they are not renamed or versioned.  This differs from some other
 152 compiler systems, which do try to version memory objects.  In LLVM, instead of
 153 encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
 154 href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
 155 demand.</p>
 156
 157 <p>
 158 With this in mind, the high-level idea is that we want to make a stack variable
 159 (which lives in memory, because it is on the stack) for each mutable object in
 160 a function.  To take advantage of this trick, we need to talk about how LLVM
 161 represents stack variables.
 162 </p>
 163
 164 <p>In LLVM, all memory accesses are explicit with load/store instructions, and
 165 it is carefully designed not to have (or need) an "address-of" operator.  Notice
 166 how the type of the @G/@H global variables is actually "i32*" even though the
 167 variable is defined as "i32".  What this means is that @G defines <em>space</em>
 168 for an i32 in the global data area, but its <em>name</em> actually refers to the
 169 address for that space.  Stack variables work the same way, except that instead of
 170 being declared with global variable definitions, they are declared with the
 171 <a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
 172
 173 <div class="doc_code">
 174 <pre>
 175 define i32 @example() {
 176 entry:
 177         %X = alloca i32           ; type of %X is i32*.
 178         ...
 179         %tmp = load i32* %X       ; load the stack value %X from the stack.
 180         %tmp2 = add i32 %tmp, 1   ; increment it
 181         store i32 %tmp2, i32* %X  ; store it back
 182         ...
 183 </pre>
 184 </div>
 185
 186 <p>This code shows an example of how you can declare and manipulate a stack
 187 variable in the LLVM IR.  Stack memory allocated with the alloca instruction is
 188 fully general: you can pass the address of the stack slot to functions, you can
 189 store it in other variables, etc.  In our example above, we could rewrite the
 190 example to use the alloca technique to avoid using a PHI node:</p>
 191
 192 <div class="doc_code">
 193 <pre>
 194 @G = weak global i32 0   ; type of @G is i32*
 195 @H = weak global i32 0   ; type of @H is i32*
 196
 197 define i32 @test(i1 %Condition) {
 198 entry:
 199         %X = alloca i32           ; type of %X is i32*.
 200         br i1 %Condition, label %cond_true, label %cond_false
 201
 202 cond_true:
 203         %X.0 = load i32* @G
 204         store i32 %X.0, i32* %X   ; Update X
 205         br label %cond_next
 206
 207 cond_false:
 208         %X.1 = load i32* @H
 209         store i32 %X.1, i32* %X   ; Update X
 210         br label %cond_next
 211
 212 cond_next:
 213         %X.2 = load i32* %X       ; Read X
 214         ret i32 %X.2
 215 }
 216 </pre>
 217 </div>
 218
 219 <p>With this, we have discovered a way to handle arbitrary mutable variables
 220 without the need to create Phi nodes at all:</p>
 221
 222 <ol>
 223 <li>Each mutable variable becomes a stack allocation.</li>
 224 <li>Each read of the variable becomes a load from the stack.</li>
 225 <li>Each update of the variable becomes a store to the stack.</li>
 226 <li>Taking the address of a variable just uses the stack address directly.</li>
 227 </ol>
 228
 229 <p>While this solution has solved our immediate problem, it introduced another
 230 one: we have now apparently introduced a lot of stack traffic for very simple
 231 and common operations, a major performance problem.  Fortunately for us, the
 232 LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
 233 this case, promoting allocas like this into SSA registers, inserting Phi nodes
 234 as appropriate.  If you run this example through the pass, for example, you'll
 235 get:</p>
 236
 237 <div class="doc_code">
 238 <pre>
 239 $ <b>llvm-as &lt; example.ll | opt -mem2reg | llvm-dis</b>
 240 @G = weak global i32 0
 241 @H = weak global i32 0
 242
 243 define i32 @test(i1 %Condition) {
 244 entry:
 245         br i1 %Condition, label %cond_true, label %cond_false
 246
 247 cond_true:
 248         %X.0 = load i32* @G
 249         br label %cond_next
 250
 251 cond_false:
 252         %X.1 = load i32* @H
 253         br label %cond_next
 254
 255 cond_next:
 256         %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
 257         ret i32 %X.01
 258 }
 259 </pre>
 260 </div>
 261
 262 <p>The mem2reg pass implements the standard "iterated dominance frontier"
 263 algorithm for constructing SSA form and has a number of optimizations that speed
 264 up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing
 265 with mutable variables, and we highly recommend that you depend on it.  Note that
 266 mem2reg only works on variables in certain circumstances:</p>
 267
 268 <ol>
 269 <li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
 270 promotes them.  It does not apply to global variables or heap allocations.</li>
 271
 272 <li>mem2reg only looks for alloca instructions in the entry block of the
 273 function.  Being in the entry block guarantees that the alloca is only executed
 274 once, which makes analysis simpler.</li>
 275
 276 <li>mem2reg only promotes allocas whose uses are direct loads and stores.  If
 277 the address of the stack object is passed to a function, or if any funny pointer
 278 arithmetic is involved, the alloca will not be promoted.</li>
 279
 280 <li>mem2reg only works on allocas of <a
 281 href="../LangRef.html#t_classifications">first class</a>
 282 values (such as pointers, scalars and vectors), and only if the array size
 283 of the allocation is 1 (or missing in the .ll file).  mem2reg is not capable of
 284 promoting structs or arrays to registers.  Note that the "scalarrepl" pass is
 285 more powerful and can promote structs, "unions", and arrays in many cases.</li>
 286
 287 </ol>
 288
 289 <p>
 290 All of these properties are easy to satisfy for most imperative languages, and
 291 we'll illustrate it below with Kaleidoscope.  The final question you may be
 292 asking is: should I bother with this nonsense for my front-end?  Wouldn't it be
 293 better if I just did SSA construction directly, avoiding use of the mem2reg
 294 optimization pass?  In short, we strongly recommend that you use this technique
 295 for building SSA form, unless there is an extremely good reason not to.  Using
 296 this technique is:</p>
 297
 298 <ul>
 299 <li>Proven and well tested: llvm-gcc and clang both use this technique for local
 300 mutable variables.  As such, the most common clients of LLVM are using this to
 301 handle a bulk of their variables.  You can be sure that bugs are found fast and
 302 fixed early.</li>
 303
 304 <li>Extremely Fast: mem2reg has a number of special cases that make it fast in
 305 common cases as well as fully general.  For example, it has fast-paths for
 306 variables that are only used in a single block, variables that only have one
 307 assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
 308 </li>
 309
 310 <li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
 311 Debug information in LLVM</a> relies on having the address of the variable
 312 exposed so that debug info can be attached to it.  This technique dovetails
 313 very naturally with this style of debug info.</li>
 314 </ul>
 315
 316 <p>If nothing else, this makes it much easier to get your front-end up and
 317 running, and is very simple to implement.  Lets extend Kaleidoscope with mutable
 318 variables now!
 319 </p>
 320
 321 </div>
 322
 323 <!-- *********************************************************************** -->
 324 <div class="doc_section"><a name="kalvars">Mutable Variables in
 325 Kaleidoscope</a></div>
 326 <!-- *********************************************************************** -->
 327
 328 <div class="doc_text">
 329
 330 <p>Now that we know the sort of problem we want to tackle, lets see what this
 331 looks like in the context of our little Kaleidoscope language.  We're going to
 332 add two features:</p>
 333
 334 <ol>
 335 <li>The ability to mutate variables with the '=' operator.</li>
 336 <li>The ability to define new variables.</li>
 337 </ol>
 338
 339 <p>While the first item is really what this is about, we only have variables
 340 for incoming arguments as well as for induction variables, and redefining those only
 341 goes so far :).  Also, the ability to define new variables is a
 342 useful thing regardless of whether you will be mutating them.  Here's a
 343 motivating example that shows how we could use these:</p>
 344
 345 <div class="doc_code">
 346 <pre>
 347 # Define ':' for sequencing: as a low-precedence operator that ignores operands
 348 # and just returns the RHS.
 349 def binary : 1 (x y) y;
 350
 351 # Recursive fib, we could do this before.
 352 def fib(x)
 353   if (x &lt; 3) then
 354     1
 355   else
 356     fib(x-1)+fib(x-2);
 357
 358 # Iterative fib.
 359 def fibi(x)
 360   <b>var a = 1, b = 1, c in</b>
 361   (for i = 3, i &lt; x in
 362      <b>c = a + b</b> :
 363      <b>a = b</b> :
 364      <b>b = c</b>) :
 365   b;
 366
 367 # Call it.
 368 fibi(10);
 369 </pre>
 370 </div>
 371
 372 <p>
 373 In order to mutate variables, we have to change our existing variables to use
 374 the "alloca trick".  Once we have that, we'll add our new operator, then extend
 375 Kaleidoscope to support new variable definitions.
 376 </p>
 377
 378 </div>
 379
 380 <!-- *********************************************************************** -->
 381 <div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
 382 Mutation</a></div>
 383 <!-- *********************************************************************** -->
 384
 385 <div class="doc_text">
 386
 387 <p>
 388 The symbol table in Kaleidoscope is managed at code generation time by the
 389 '<tt>NamedValues</tt>' map.  This map currently keeps track of the LLVM "Value*"
 390 that holds the double value for the named variable.  In order to support
 391 mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
 392 the <em>memory location</em> of the variable in question.  Note that this
 393 change is a refactoring: it changes the structure of the code, but does not
 394 (by itself) change the behavior of the compiler.  All of these changes are
 395 isolated in the Kaleidoscope code generator.</p>
 396
 397 <p>
 398 At this point in Kaleidoscope's development, it only supports variables for two
 399 things: incoming arguments to functions and the induction variable of 'for'
 400 loops.  For consistency, we'll allow mutation of these variables in addition to
 401 other user-defined variables.  This means that these will both need memory
 402 locations.
 403 </p>
 404
 405 <p>To start our transformation of Kaleidoscope, we'll change the NamedValues
 406 map so that it maps to AllocaInst* instead of Value*.  Once we do this, the C++
 407 compiler will tell us what parts of the code we need to update:</p>
 408
 409 <div class="doc_code">
 410 <pre>
 411 static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
 412 </pre>
 413 </div>
 414
 415 <p>Also, since we will need to create these alloca's, we'll use a helper
 416 function that ensures that the allocas are created in the entry block of the
 417 function:</p>
 418
 419 <div class="doc_code">
 420 <pre>
 421 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
 422 /// the function.  This is used for mutable variables etc.
 423 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
 424                                           const std::string &amp;VarName) {
 425   IRBuilder&lt;&gt; TmpB(&amp;TheFunction-&gt;getEntryBlock(),
 426                  TheFunction-&gt;getEntryBlock().begin());
 427   return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0, VarName.c_str());
 428 }
 429 </pre>
 430 </div>
 431
 432 <p>This funny looking code creates an IRBuilder object that is pointing at
 433 the first instruction (.begin()) of the entry block.  It then creates an alloca
 434 with the expected name and returns it.  Because all values in Kaleidoscope are
 435 doubles, there is no need to pass in a type to use.</p>
 436
 437 <p>With this in place, the first functionality change we want to make is to
 438 variable references.  In our new scheme, variables live on the stack, so code
 439 generating a reference to them actually needs to produce a load from the stack
 440 slot:</p>
 441
 442 <div class="doc_code">
 443 <pre>
 444 Value *VariableExprAST::Codegen() {
 445   // Look this variable up in the function.
 446   Value *V = NamedValues[Name];
 447   if (V == 0) return ErrorV("Unknown variable name");
 448
 449   <b>// Load the value.
 450   return Builder.CreateLoad(V, Name.c_str());</b>
 451 }
 452 </pre>
 453 </div>
 454
 455 <p>As you can see, this is pretty straightforward.  Now we need to update the
 456 things that define the variables to set up the alloca.  We'll start with
 457 <tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
 458 the unabridged code):</p>
 459
 460 <div class="doc_code">
 461 <pre>
 462   Function *TheFunction = Builder.GetInsertBlock()->getParent();
 463
 464   <b>// Create an alloca for the variable in the entry block.
 465   AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
 466
 467     // Emit the start code first, without 'variable' in scope.
 468   Value *StartVal = Start-&gt;Codegen();
 469   if (StartVal == 0) return 0;
 470
 471   <b>// Store the value into the alloca.
 472   Builder.CreateStore(StartVal, Alloca);</b>
 473   ...
 474
 475   // Compute the end condition.
 476   Value *EndCond = End-&gt;Codegen();
 477   if (EndCond == 0) return EndCond;
 478
 479   <b>// Reload, increment, and restore the alloca.  This handles the case where
 480   // the body of the loop mutates the variable.
 481   Value *CurVar = Builder.CreateLoad(Alloca);
 482   Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
 483   Builder.CreateStore(NextVar, Alloca);</b>
 484   ...
 485 </pre>
 486 </div>
 487
 488 <p>This code is virtually identical to the code <a
 489 href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>.  The
 490 big difference is that we no longer have to construct a PHI node, and we use
 491 load/store to access the variable as needed.</p>
 492
 493 <p>To support mutable argument variables, we need to also make allocas for them.
 494 The code for this is also pretty simple:</p>
 495
 496 <div class="doc_code">
 497 <pre>
 498 /// CreateArgumentAllocas - Create an alloca for each argument and register the
 499 /// argument in the symbol table so that references to it will succeed.
 500 void PrototypeAST::CreateArgumentAllocas(Function *F) {
 501   Function::arg_iterator AI = F-&gt;arg_begin();
 502   for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
 503     // Create an alloca for this variable.
 504     AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
 505
 506     // Store the initial value into the alloca.
 507     Builder.CreateStore(AI, Alloca);
 508
 509     // Add arguments to variable symbol table.
 510     NamedValues[Args[Idx]] = Alloca;
 511   }
 512 }
 513 </pre>
 514 </div>
 515
 516 <p>For each argument, we make an alloca, store the input value to the function
 517 into the alloca, and register the alloca as the memory location for the
 518 argument.  This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
 519 it sets up the entry block for the function.</p>
 520
 521 <p>The final missing piece is adding the mem2reg pass, which allows us to get
 522 good codegen once again:</p>
 523
 524 <div class="doc_code">
 525 <pre>
 526     // Set up the optimizer pipeline.  Start with registering info about how the
 527     // target lays out data structures.
 528     OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
 529     <b>// Promote allocas to registers.
 530     OurFPM.add(createPromoteMemoryToRegisterPass());</b>
 531     // Do simple "peephole" optimizations and bit-twiddling optzns.
 532     OurFPM.add(createInstructionCombiningPass());
 533     // Reassociate expressions.
 534     OurFPM.add(createReassociatePass());
 535 </pre>
 536 </div>
 537
 538 <p>It is interesting to see what the code looks like before and after the
 539 mem2reg optimization runs.  For example, this is the before/after code for our
 540 recursive fib function.  Before the optimization:</p>
 541
 542 <div class="doc_code">
 543 <pre>
 544 define double @fib(double %x) {
 545 entry:
 546         <b>%x1 = alloca double
 547         store double %x, double* %x1
 548         %x2 = load double* %x1</b>
 549         %cmptmp = fcmp ult double %x2, 3.000000e+00
 550         %booltmp = uitofp i1 %cmptmp to double
 551         %ifcond = fcmp one double %booltmp, 0.000000e+00
 552         br i1 %ifcond, label %then, label %else
 553
 554 then:           ; preds = %entry
 555         br label %ifcont
 556
 557 else:           ; preds = %entry
 558         <b>%x3 = load double* %x1</b>
 559         %subtmp = sub double %x3, 1.000000e+00
 560         %calltmp = call double @fib( double %subtmp )
 561         <b>%x4 = load double* %x1</b>
 562         %subtmp5 = sub double %x4, 2.000000e+00
 563         %calltmp6 = call double @fib( double %subtmp5 )
 564         %addtmp = add double %calltmp, %calltmp6
 565         br label %ifcont
 566
 567 ifcont:         ; preds = %else, %then
 568         %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
 569         ret double %iftmp
 570 }
 571 </pre>
 572 </div>
 573
 574 <p>Here there is only one variable (x, the input argument) but you can still
 575 see the extremely simple-minded code generation strategy we are using.  In the
 576 entry block, an alloca is created, and the initial input value is stored into
 577 it.  Each reference to the variable does a reload from the stack.  Also, note
 578 that we didn't modify the if/then/else expression, so it still inserts a PHI
 579 node.  While we could make an alloca for it, it is actually easier to create a
 580 PHI node for it, so we still just make the PHI.</p>
 581
 582 <p>Here is the code after the mem2reg pass runs:</p>
 583
 584 <div class="doc_code">
 585 <pre>
 586 define double @fib(double %x) {
 587 entry:
 588         %cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00
 589         %booltmp = uitofp i1 %cmptmp to double
 590         %ifcond = fcmp one double %booltmp, 0.000000e+00
 591         br i1 %ifcond, label %then, label %else
 592
 593 then:
 594         br label %ifcont
 595
 596 else:
 597         %subtmp = sub double <b>%x</b>, 1.000000e+00
 598         %calltmp = call double @fib( double %subtmp )
 599         %subtmp5 = sub double <b>%x</b>, 2.000000e+00
 600         %calltmp6 = call double @fib( double %subtmp5 )
 601         %addtmp = add double %calltmp, %calltmp6
 602         br label %ifcont
 603
 604 ifcont:         ; preds = %else, %then
 605         %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
 606         ret double %iftmp
 607 }
 608 </pre>
 609 </div>
 610
 611 <p>This is a trivial case for mem2reg, since there are no redefinitions of the
 612 variable.  The point of showing this is to calm your tension about inserting
 613 such blatent inefficiencies :).</p>
 614
 615 <p>After the rest of the optimizers run, we get:</p>
 616
 617 <div class="doc_code">
 618 <pre>
 619 define double @fib(double %x) {
 620 entry:
 621         %cmptmp = fcmp ult double %x, 3.000000e+00
 622         %booltmp = uitofp i1 %cmptmp to double
 623         %ifcond = fcmp ueq double %booltmp, 0.000000e+00
 624         br i1 %ifcond, label %else, label %ifcont
 625
 626 else:
 627         %subtmp = sub double %x, 1.000000e+00
 628         %calltmp = call double @fib( double %subtmp )
 629         %subtmp5 = sub double %x, 2.000000e+00
 630         %calltmp6 = call double @fib( double %subtmp5 )
 631         %addtmp = add double %calltmp, %calltmp6
 632         ret double %addtmp
 633
 634 ifcont:
 635         ret double 1.000000e+00
 636 }
 637 </pre>
 638 </div>
 639
 640 <p>Here we see that the simplifycfg pass decided to clone the return instruction
 641 into the end of the 'else' block.  This allowed it to eliminate some branches
 642 and the PHI node.</p>
 643
 644 <p>Now that all symbol table references are updated to use stack variables,
 645 we'll add the assignment operator.</p>
 646
 647 </div>
 648
 649 <!-- *********************************************************************** -->
 650 <div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
 651 <!-- *********************************************************************** -->
 652
 653 <div class="doc_text">
 654
 655 <p>With our current framework, adding a new assignment operator is really
 656 simple.  We will parse it just like any other binary operator, but handle it
 657 internally (instead of allowing the user to define it).  The first step is to
 658 set a precedence:</p>
 659
 660 <div class="doc_code">
 661 <pre>
 662  int main() {
 663    // Install standard binary operators.
 664    // 1 is lowest precedence.
 665    <b>BinopPrecedence['='] = 2;</b>
 666    BinopPrecedence['&lt;'] = 10;
 667    BinopPrecedence['+'] = 20;
 668    BinopPrecedence['-'] = 20;
 669 </pre>
 670 </div>
 671
 672 <p>Now that the parser knows the precedence of the binary operator, it takes
 673 care of all the parsing and AST generation.  We just need to implement codegen
 674 for the assignment operator.  This looks like:</p>
 675
 676 <div class="doc_code">
 677 <pre>
 678 Value *BinaryExprAST::Codegen() {
 679   // Special case '=' because we don't want to emit the LHS as an expression.
 680   if (Op == '=') {
 681     // Assignment requires the LHS to be an identifier.
 682     VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
 683     if (!LHSE)
 684       return ErrorV("destination of '=' must be a variable");
 685 </pre>
 686 </div>
 687
 688 <p>Unlike the rest of the binary operators, our assignment operator doesn't
 689 follow the "emit LHS, emit RHS, do computation" model.  As such, it is handled
 690 as a special case before the other binary operators are handled.  The other
 691 strange thing is that it requires the LHS to be a variable.  It is invalid to
 692 have "(x+1) = expr" - only things like "x = expr" are allowed.
 693 </p>
 694
 695 <div class="doc_code">
 696 <pre>
 697     // Codegen the RHS.
 698     Value *Val = RHS-&gt;Codegen();
 699     if (Val == 0) return 0;
 700
 701     // Look up the name.
 702     Value *Variable = NamedValues[LHSE-&gt;getName()];
 703     if (Variable == 0) return ErrorV("Unknown variable name");
 704
 705     Builder.CreateStore(Val, Variable);
 706     return Val;
 707   }
 708   ...
 709 </pre>
 710 </div>
 711
 712 <p>Once we have the variable, codegen'ing the assignment is straightforward:
 713 we emit the RHS of the assignment, create a store, and return the computed
 714 value.  Returning a value allows for chained assignments like "X = (Y = Z)".</p>
 715
 716 <p>Now that we have an assignment operator, we can mutate loop variables and
 717 arguments.  For example, we can now run code like this:</p>
 718
 719 <div class="doc_code">
 720 <pre>
 721 # Function to print a double.
 722 extern printd(x);
 723
 724 # Define ':' for sequencing: as a low-precedence operator that ignores operands
 725 # and just returns the RHS.
 726 def binary : 1 (x y) y;
 727
 728 def test(x)
 729   printd(x) :
 730   x = 4 :
 731   printd(x);
 732
 733 test(123);
 734 </pre>
 735 </div>
 736
 737 <p>When run, this example prints "123" and then "4", showing that we did
 738 actually mutate the value!  Okay, we have now officially implemented our goal:
 739 getting this to work requires SSA construction in the general case.  However,
 740 to be really useful, we want the ability to define our own local variables, lets
 741 add this next!
 742 </p>
 743
 744 </div>
 745
 746 <!-- *********************************************************************** -->
 747 <div class="doc_section"><a name="localvars">User-defined Local
 748 Variables</a></div>
 749 <!-- *********************************************************************** -->
 750
 751 <div class="doc_text">
 752
 753 <p>Adding var/in is just like any other other extensions we made to
 754 Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
 755 The first step for adding our new 'var/in' construct is to extend the lexer.
 756 As before, this is pretty trivial, the code looks like this:</p>
 757
 758 <div class="doc_code">
 759 <pre>
 760 enum Token {
 761   ...
 762   <b>// var definition
 763   tok_var = -13</b>
 764 ...
 765 }
 766 ...
 767 static int gettok() {
 768 ...
 769     if (IdentifierStr == "in") return tok_in;
 770     if (IdentifierStr == "binary") return tok_binary;
 771     if (IdentifierStr == "unary") return tok_unary;
 772     <b>if (IdentifierStr == "var") return tok_var;</b>
 773     return tok_identifier;
 774 ...
 775 </pre>
 776 </div>
 777
 778 <p>The next step is to define the AST node that we will construct.  For var/in,
 779 it looks like this:</p>
 780
 781 <div class="doc_code">
 782 <pre>
 783 /// VarExprAST - Expression class for var/in
 784 class VarExprAST : public ExprAST {
 785   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
 786   ExprAST *Body;
 787 public:
 788   VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
 789              ExprAST *body)
 790   : VarNames(varnames), Body(body) {}
 791
 792   virtual Value *Codegen();
 793 };
 794 </pre>
 795 </div>
 796
 797 <p>var/in allows a list of names to be defined all at once, and each name can
 798 optionally have an initializer value.  As such, we capture this information in
 799 the VarNames vector.  Also, var/in has a body, this body is allowed to access
 800 the variables defined by the var/in.</p>
 801
 802 <p>With this in place, we can define the parser pieces.  The first thing we do is add
 803 it as a primary expression:</p>
 804
 805 <div class="doc_code">
 806 <pre>
 807 /// primary
 808 ///   ::= identifierexpr
 809 ///   ::= numberexpr
 810 ///   ::= parenexpr
 811 ///   ::= ifexpr
 812 ///   ::= forexpr
 813 <b>///   ::= varexpr</b>
 814 static ExprAST *ParsePrimary() {
 815   switch (CurTok) {
 816   default: return Error("unknown token when expecting an expression");
 817   case tok_identifier: return ParseIdentifierExpr();
 818   case tok_number:     return ParseNumberExpr();
 819   case '(':            return ParseParenExpr();
 820   case tok_if:         return ParseIfExpr();
 821   case tok_for:        return ParseForExpr();
 822   <b>case tok_var:        return ParseVarExpr();</b>
 823   }
 824 }
 825 </pre>
 826 </div>
 827
 828 <p>Next we define ParseVarExpr:</p>
 829
 830 <div class="doc_code">
 831 <pre>
 832 /// varexpr ::= 'var' identifier ('=' expression)?
 833 //                    (',' identifier ('=' expression)?)* 'in' expression
 834 static ExprAST *ParseVarExpr() {
 835   getNextToken();  // eat the var.
 836
 837   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
 838
 839   // At least one variable name is required.
 840   if (CurTok != tok_identifier)
 841     return Error("expected identifier after var");
 842 </pre>
 843 </div>
 844
 845 <p>The first part of this code parses the list of identifier/expr pairs into the
 846 local <tt>VarNames</tt> vector.
 847
 848 <div class="doc_code">
 849 <pre>
 850   while (1) {
 851     std::string Name = IdentifierStr;
 852     getNextToken();  // eat identifier.
 853
 854     // Read the optional initializer.
 855     ExprAST *Init = 0;
 856     if (CurTok == '=') {
 857       getNextToken(); // eat the '='.
 858
 859       Init = ParseExpression();
 860       if (Init == 0) return 0;
 861     }
 862
 863     VarNames.push_back(std::make_pair(Name, Init));
 864
 865     // End of var list, exit loop.
 866     if (CurTok != ',') break;
 867     getNextToken(); // eat the ','.
 868
 869     if (CurTok != tok_identifier)
 870       return Error("expected identifier list after var");
 871   }
 872 </pre>
 873 </div>
 874
 875 <p>Once all the variables are parsed, we then parse the body and create the
 876 AST node:</p>
 877
 878 <div class="doc_code">
 879 <pre>
 880   // At this point, we have to have 'in'.
 881   if (CurTok != tok_in)
 882     return Error("expected 'in' keyword after 'var'");
 883   getNextToken();  // eat 'in'.
 884
 885   ExprAST *Body = ParseExpression();
 886   if (Body == 0) return 0;
 887
 888   return new VarExprAST(VarNames, Body);
 889 }
 890 </pre>
 891 </div>
 892
 893 <p>Now that we can parse and represent the code, we need to support emission of
 894 LLVM IR for it.  This code starts out with:</p>
 895
 896 <div class="doc_code">
 897 <pre>
 898 Value *VarExprAST::Codegen() {
 899   std::vector&lt;AllocaInst *&gt; OldBindings;
 900
 901   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
 902
 903   // Register all variables and emit their initializer.
 904   for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
 905     const std::string &amp;VarName = VarNames[i].first;
 906     ExprAST *Init = VarNames[i].second;
 907 </pre>
 908 </div>
 909
 910 <p>Basically it loops over all the variables, installing them one at a time.
 911 For each variable we put into the symbol table, we remember the previous value
 912 that we replace in OldBindings.</p>
 913
 914 <div class="doc_code">
 915 <pre>
 916     // Emit the initializer before adding the variable to scope, this prevents
 917     // the initializer from referencing the variable itself, and permits stuff
 918     // like this:
 919     //  var a = 1 in
 920     //    var a = a in ...   # refers to outer 'a'.
 921     Value *InitVal;
 922     if (Init) {
 923       InitVal = Init-&gt;Codegen();
 924       if (InitVal == 0) return 0;
 925     } else { // If not specified, use 0.0.
 926       InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
 927     }
 928
 929     AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
 930     Builder.CreateStore(InitVal, Alloca);
 931
 932     // Remember the old variable binding so that we can restore the binding when
 933     // we unrecurse.
 934     OldBindings.push_back(NamedValues[VarName]);
 935
 936     // Remember this binding.
 937     NamedValues[VarName] = Alloca;
 938   }
 939 </pre>
 940 </div>
 941
 942 <p>There are more comments here than code.  The basic idea is that we emit the
 943 initializer, create the alloca, then update the symbol table to point to it.
 944 Once all the variables are installed in the symbol table, we evaluate the body
 945 of the var/in expression:</p>
 946
 947 <div class="doc_code">
 948 <pre>
 949   // Codegen the body, now that all vars are in scope.
 950   Value *BodyVal = Body-&gt;Codegen();
 951   if (BodyVal == 0) return 0;
 952 </pre>
 953 </div>
 954
 955 <p>Finally, before returning, we restore the previous variable bindings:</p>
 956
 957 <div class="doc_code">
 958 <pre>
 959   // Pop all our variables from scope.
 960   for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
 961     NamedValues[VarNames[i].first] = OldBindings[i];
 962
 963   // Return the body computation.
 964   return BodyVal;
 965 }
 966 </pre>
 967 </div>
 968
 969 <p>The end result of all of this is that we get properly scoped variable
 970 definitions, and we even (trivially) allow mutation of them :).</p>
 971
 972 <p>With this, we completed what we set out to do.  Our nice iterative fib
 973 example from the intro compiles and runs just fine.  The mem2reg pass optimizes
 974 all of our stack variables into SSA registers, inserting PHI nodes where needed,
 975 and our front-end remains simple: no "iterated dominance frontier" computation
 976 anywhere in sight.</p>
 977
 978 </div>
 979
 980 <!-- *********************************************************************** -->
 981 <div class="doc_section"><a name="code">Full Code Listing</a></div>
 982 <!-- *********************************************************************** -->
 983
 984 <div class="doc_text">
 985
 986 <p>
 987 Here is the complete code listing for our running example, enhanced with mutable
 988 variables and var/in support.  To build this example, use:
 989 </p>
 990
 991 <div class="doc_code">
 992 <pre>
 993    # Compile
 994    g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
 995    # Run
 996    ./toy
 997 </pre>
 998 </div>
 999
1000 <p>Here is the code:</p>
1001
1002 <div class="doc_code">
1003 <pre>
1004 #include "llvm/DerivedTypes.h"
1005 #include "llvm/ExecutionEngine/ExecutionEngine.h"
1006 #include "llvm/LLVMContext.h"
1007 #include "llvm/Module.h"
1008 #include "llvm/ModuleProvider.h"
1009 #include "llvm/PassManager.h"
1010 #include "llvm/Analysis/Verifier.h"
1011 #include "llvm/Target/TargetData.h"
1012 #include "llvm/Transforms/Scalar.h"
1013 #include "llvm/Support/IRBuilder.h"
1014 #include &lt;cstdio&gt;
1015 #include &lt;string&gt;
1016 #include &lt;map&gt;
1017 #include &lt;vector&gt;
1018 using namespace llvm;
1019
1020 //===----------------------------------------------------------------------===//
1021 // Lexer
1022 //===----------------------------------------------------------------------===//
1023
1024 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
1025 // of these for known things.
1026 enum Token {
1027   tok_eof = -1,
1028
1029   // commands
1030   tok_def = -2, tok_extern = -3,
1031
1032   // primary
1033   tok_identifier = -4, tok_number = -5,
1034
1035   // control
1036   tok_if = -6, tok_then = -7, tok_else = -8,
1037   tok_for = -9, tok_in = -10,
1038
1039   // operators
1040   tok_binary = -11, tok_unary = -12,
1041
1042   // var definition
1043   tok_var = -13
1044 };
1045
1046 static std::string IdentifierStr;  // Filled in if tok_identifier
1047 static double NumVal;              // Filled in if tok_number
1048
1049 /// gettok - Return the next token from standard input.
1050 static int gettok() {
1051   static int LastChar = ' ';
1052
1053   // Skip any whitespace.
1054   while (isspace(LastChar))
1055     LastChar = getchar();
1056
1057   if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
1058     IdentifierStr = LastChar;
1059     while (isalnum((LastChar = getchar())))
1060       IdentifierStr += LastChar;
1061
1062     if (IdentifierStr == "def") return tok_def;
1063     if (IdentifierStr == "extern") return tok_extern;
1064     if (IdentifierStr == "if") return tok_if;
1065     if (IdentifierStr == "then") return tok_then;
1066     if (IdentifierStr == "else") return tok_else;
1067     if (IdentifierStr == "for") return tok_for;
1068     if (IdentifierStr == "in") return tok_in;
1069     if (IdentifierStr == "binary") return tok_binary;
1070     if (IdentifierStr == "unary") return tok_unary;
1071     if (IdentifierStr == "var") return tok_var;
1072     return tok_identifier;
1073   }
1074
1075   if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
1076     std::string NumStr;
1077     do {
1078       NumStr += LastChar;
1079       LastChar = getchar();
1080     } while (isdigit(LastChar) || LastChar == '.');
1081
1082     NumVal = strtod(NumStr.c_str(), 0);
1083     return tok_number;
1084   }
1085
1086   if (LastChar == '#') {
1087     // Comment until end of line.
1088     do LastChar = getchar();
1089     while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
1090
1091     if (LastChar != EOF)
1092       return gettok();
1093   }
1094
1095   // Check for end of file.  Don't eat the EOF.
1096   if (LastChar == EOF)
1097     return tok_eof;
1098
1099   // Otherwise, just return the character as its ascii value.
1100   int ThisChar = LastChar;
1101   LastChar = getchar();
1102   return ThisChar;
1103 }
1104
1105 //===----------------------------------------------------------------------===//
1106 // Abstract Syntax Tree (aka Parse Tree)
1107 //===----------------------------------------------------------------------===//
1108
1109 /// ExprAST - Base class for all expression nodes.
1110 class ExprAST {
1111 public:
1112   virtual ~ExprAST() {}
1113   virtual Value *Codegen() = 0;
1114 };
1115
1116 /// NumberExprAST - Expression class for numeric literals like "1.0".
1117 class NumberExprAST : public ExprAST {
1118   double Val;
1119 public:
1120   NumberExprAST(double val) : Val(val) {}
1121   virtual Value *Codegen();
1122 };
1123
1124 /// VariableExprAST - Expression class for referencing a variable, like "a".
1125 class VariableExprAST : public ExprAST {
1126   std::string Name;
1127 public:
1128   VariableExprAST(const std::string &amp;name) : Name(name) {}
1129   const std::string &amp;getName() const { return Name; }
1130   virtual Value *Codegen();
1131 };
1132
1133 /// UnaryExprAST - Expression class for a unary operator.
1134 class UnaryExprAST : public ExprAST {
1135   char Opcode;
1136   ExprAST *Operand;
1137 public:
1138   UnaryExprAST(char opcode, ExprAST *operand)
1139     : Opcode(opcode), Operand(operand) {}
1140   virtual Value *Codegen();
1141 };
1142
1143 /// BinaryExprAST - Expression class for a binary operator.
1144 class BinaryExprAST : public ExprAST {
1145   char Op;
1146   ExprAST *LHS, *RHS;
1147 public:
1148   BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1149     : Op(op), LHS(lhs), RHS(rhs) {}
1150   virtual Value *Codegen();
1151 };
1152
1153 /// CallExprAST - Expression class for function calls.
1154 class CallExprAST : public ExprAST {
1155   std::string Callee;
1156   std::vector&lt;ExprAST*&gt; Args;
1157 public:
1158   CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1159     : Callee(callee), Args(args) {}
1160   virtual Value *Codegen();
1161 };
1162
1163 /// IfExprAST - Expression class for if/then/else.
1164 class IfExprAST : public ExprAST {
1165   ExprAST *Cond, *Then, *Else;
1166 public:
1167   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1168   : Cond(cond), Then(then), Else(_else) {}
1169   virtual Value *Codegen();
1170 };
1171
1172 /// ForExprAST - Expression class for for/in.
1173 class ForExprAST : public ExprAST {
1174   std::string VarName;
1175   ExprAST *Start, *End, *Step, *Body;
1176 public:
1177   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1178              ExprAST *step, ExprAST *body)
1179     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1180   virtual Value *Codegen();
1181 };
1182
1183 /// VarExprAST - Expression class for var/in
1184 class VarExprAST : public ExprAST {
1185   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1186   ExprAST *Body;
1187 public:
1188   VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
1189              ExprAST *body)
1190   : VarNames(varnames), Body(body) {}
1191
1192   virtual Value *Codegen();
1193 };
1194
1195 /// PrototypeAST - This class represents the "prototype" for a function,
1196 /// which captures its argument names as well as if it is an operator.
1197 class PrototypeAST {
1198   std::string Name;
1199   std::vector&lt;std::string&gt; Args;
1200   bool isOperator;
1201   unsigned Precedence;  // Precedence if a binary op.
1202 public:
1203   PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args,
1204                bool isoperator = false, unsigned prec = 0)
1205   : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
1206
1207   bool isUnaryOp() const { return isOperator &amp;&amp; Args.size() == 1; }
1208   bool isBinaryOp() const { return isOperator &amp;&amp; Args.size() == 2; }
1209
1210   char getOperatorName() const {
1211     assert(isUnaryOp() || isBinaryOp());
1212     return Name[Name.size()-1];
1213   }
1214
1215   unsigned getBinaryPrecedence() const { return Precedence; }
1216
1217   Function *Codegen();
1218
1219   void CreateArgumentAllocas(Function *F);
1220 };
1221
1222 /// FunctionAST - This class represents a function definition itself.
1223 class FunctionAST {
1224   PrototypeAST *Proto;
1225   ExprAST *Body;
1226 public:
1227   FunctionAST(PrototypeAST *proto, ExprAST *body)
1228     : Proto(proto), Body(body) {}
1229
1230   Function *Codegen();
1231 };
1232
1233 //===----------------------------------------------------------------------===//
1234 // Parser
1235 //===----------------------------------------------------------------------===//
1236
1237 /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
1238 /// token the parser it looking at.  getNextToken reads another token from the
1239 /// lexer and updates CurTok with its results.
1240 static int CurTok;
1241 static int getNextToken() {
1242   return CurTok = gettok();
1243 }
1244
1245 /// BinopPrecedence - This holds the precedence for each binary operator that is
1246 /// defined.
1247 static std::map&lt;char, int&gt; BinopPrecedence;
1248
1249 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
1250 static int GetTokPrecedence() {
1251   if (!isascii(CurTok))
1252     return -1;
1253
1254   // Make sure it's a declared binop.
1255   int TokPrec = BinopPrecedence[CurTok];
1256   if (TokPrec &lt;= 0) return -1;
1257   return TokPrec;
1258 }
1259
1260 /// Error* - These are little helper functions for error handling.
1261 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1262 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1263 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1264
1265 static ExprAST *ParseExpression();
1266
1267 /// identifierexpr
1268 ///   ::= identifier
1269 ///   ::= identifier '(' expression* ')'
1270 static ExprAST *ParseIdentifierExpr() {
1271   std::string IdName = IdentifierStr;
1272
1273   getNextToken();  // eat identifier.
1274
1275   if (CurTok != '(') // Simple variable ref.
1276     return new VariableExprAST(IdName);
1277
1278   // Call.
1279   getNextToken();  // eat (
1280   std::vector&lt;ExprAST*&gt; Args;
1281   if (CurTok != ')') {
1282     while (1) {
1283       ExprAST *Arg = ParseExpression();
1284       if (!Arg) return 0;
1285       Args.push_back(Arg);
1286
1287       if (CurTok == ')') break;
1288
1289       if (CurTok != ',')
1290         return Error("Expected ')' or ',' in argument list");
1291       getNextToken();
1292     }
1293   }
1294
1295   // Eat the ')'.
1296   getNextToken();
1297
1298   return new CallExprAST(IdName, Args);
1299 }
1300
1301 /// numberexpr ::= number
1302 static ExprAST *ParseNumberExpr() {
1303   ExprAST *Result = new NumberExprAST(NumVal);
1304   getNextToken(); // consume the number
1305   return Result;
1306 }
1307
1308 /// parenexpr ::= '(' expression ')'
1309 static ExprAST *ParseParenExpr() {
1310   getNextToken();  // eat (.
1311   ExprAST *V = ParseExpression();
1312   if (!V) return 0;
1313
1314   if (CurTok != ')')
1315     return Error("expected ')'");
1316   getNextToken();  // eat ).
1317   return V;
1318 }
1319
1320 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
1321 static ExprAST *ParseIfExpr() {
1322   getNextToken();  // eat the if.
1323
1324   // condition.
1325   ExprAST *Cond = ParseExpression();
1326   if (!Cond) return 0;
1327
1328   if (CurTok != tok_then)
1329     return Error("expected then");
1330   getNextToken();  // eat the then
1331
1332   ExprAST *Then = ParseExpression();
1333   if (Then == 0) return 0;
1334
1335   if (CurTok != tok_else)
1336     return Error("expected else");
1337
1338   getNextToken();
1339
1340   ExprAST *Else = ParseExpression();
1341   if (!Else) return 0;
1342
1343   return new IfExprAST(Cond, Then, Else);
1344 }
1345
1346 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
1347 static ExprAST *ParseForExpr() {
1348   getNextToken();  // eat the for.
1349
1350   if (CurTok != tok_identifier)
1351     return Error("expected identifier after for");
1352
1353   std::string IdName = IdentifierStr;
1354   getNextToken();  // eat identifier.
1355
1356   if (CurTok != '=')
1357     return Error("expected '=' after for");
1358   getNextToken();  // eat '='.
1359
1360
1361   ExprAST *Start = ParseExpression();
1362   if (Start == 0) return 0;
1363   if (CurTok != ',')
1364     return Error("expected ',' after for start value");
1365   getNextToken();
1366
1367   ExprAST *End = ParseExpression();
1368   if (End == 0) return 0;
1369
1370   // The step value is optional.
1371   ExprAST *Step = 0;
1372   if (CurTok == ',') {
1373     getNextToken();
1374     Step = ParseExpression();
1375     if (Step == 0) return 0;
1376   }
1377
1378   if (CurTok != tok_in)
1379     return Error("expected 'in' after for");
1380   getNextToken();  // eat 'in'.
1381
1382   ExprAST *Body = ParseExpression();
1383   if (Body == 0) return 0;
1384
1385   return new ForExprAST(IdName, Start, End, Step, Body);
1386 }
1387
1388 /// varexpr ::= 'var' identifier ('=' expression)?
1389 //                    (',' identifier ('=' expression)?)* 'in' expression
1390 static ExprAST *ParseVarExpr() {
1391   getNextToken();  // eat the var.
1392
1393   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1394
1395   // At least one variable name is required.
1396   if (CurTok != tok_identifier)
1397     return Error("expected identifier after var");
1398
1399   while (1) {
1400     std::string Name = IdentifierStr;
1401     getNextToken();  // eat identifier.
1402
1403     // Read the optional initializer.
1404     ExprAST *Init = 0;
1405     if (CurTok == '=') {
1406       getNextToken(); // eat the '='.
1407
1408       Init = ParseExpression();
1409       if (Init == 0) return 0;
1410     }
1411
1412     VarNames.push_back(std::make_pair(Name, Init));
1413
1414     // End of var list, exit loop.
1415     if (CurTok != ',') break;
1416     getNextToken(); // eat the ','.
1417
1418     if (CurTok != tok_identifier)
1419       return Error("expected identifier list after var");
1420   }
1421
1422   // At this point, we have to have 'in'.
1423   if (CurTok != tok_in)
1424     return Error("expected 'in' keyword after 'var'");
1425   getNextToken();  // eat 'in'.
1426
1427   ExprAST *Body = ParseExpression();
1428   if (Body == 0) return 0;
1429
1430   return new VarExprAST(VarNames, Body);
1431 }
1432
1433
1434 /// primary
1435 ///   ::= identifierexpr
1436 ///   ::= numberexpr
1437 ///   ::= parenexpr
1438 ///   ::= ifexpr
1439 ///   ::= forexpr
1440 ///   ::= varexpr
1441 static ExprAST *ParsePrimary() {
1442   switch (CurTok) {
1443   default: return Error("unknown token when expecting an expression");
1444   case tok_identifier: return ParseIdentifierExpr();
1445   case tok_number:     return ParseNumberExpr();
1446   case '(':            return ParseParenExpr();
1447   case tok_if:         return ParseIfExpr();
1448   case tok_for:        return ParseForExpr();
1449   case tok_var:        return ParseVarExpr();
1450   }
1451 }
1452
1453 /// unary
1454 ///   ::= primary
1455 ///   ::= '!' unary
1456 static ExprAST *ParseUnary() {
1457   // If the current token is not an operator, it must be a primary expr.
1458   if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1459     return ParsePrimary();
1460
1461   // If this is a unary operator, read it.
1462   int Opc = CurTok;
1463   getNextToken();
1464   if (ExprAST *Operand = ParseUnary())
1465     return new UnaryExprAST(Opc, Operand);
1466   return 0;
1467 }
1468
1469 /// binoprhs
1470 ///   ::= ('+' unary)*
1471 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1472   // If this is a binop, find its precedence.
1473   while (1) {
1474     int TokPrec = GetTokPrecedence();
1475
1476     // If this is a binop that binds at least as tightly as the current binop,
1477     // consume it, otherwise we are done.
1478     if (TokPrec &lt; ExprPrec)
1479       return LHS;
1480
1481     // Okay, we know this is a binop.
1482     int BinOp = CurTok;
1483     getNextToken();  // eat binop
1484
1485     // Parse the unary expression after the binary operator.
1486     ExprAST *RHS = ParseUnary();
1487     if (!RHS) return 0;
1488
1489     // If BinOp binds less tightly with RHS than the operator after RHS, let
1490     // the pending operator take RHS as its LHS.
1491     int NextPrec = GetTokPrecedence();
1492     if (TokPrec &lt; NextPrec) {
1493       RHS = ParseBinOpRHS(TokPrec+1, RHS);
1494       if (RHS == 0) return 0;
1495     }
1496
1497     // Merge LHS/RHS.
1498     LHS = new BinaryExprAST(BinOp, LHS, RHS);
1499   }
1500 }
1501
1502 /// expression
1503 ///   ::= unary binoprhs
1504 ///
1505 static ExprAST *ParseExpression() {
1506   ExprAST *LHS = ParseUnary();
1507   if (!LHS) return 0;
1508
1509   return ParseBinOpRHS(0, LHS);
1510 }
1511
1512 /// prototype
1513 ///   ::= id '(' id* ')'
1514 ///   ::= binary LETTER number? (id, id)
1515 ///   ::= unary LETTER (id)
1516 static PrototypeAST *ParsePrototype() {
1517   std::string FnName;
1518
1519   int Kind = 0;  // 0 = identifier, 1 = unary, 2 = binary.
1520   unsigned BinaryPrecedence = 30;
1521
1522   switch (CurTok) {
1523   default:
1524     return ErrorP("Expected function name in prototype");
1525   case tok_identifier:
1526     FnName = IdentifierStr;
1527     Kind = 0;
1528     getNextToken();
1529     break;
1530   case tok_unary:
1531     getNextToken();
1532     if (!isascii(CurTok))
1533       return ErrorP("Expected unary operator");
1534     FnName = "unary";
1535     FnName += (char)CurTok;
1536     Kind = 1;
1537     getNextToken();
1538     break;
1539   case tok_binary:
1540     getNextToken();
1541     if (!isascii(CurTok))
1542       return ErrorP("Expected binary operator");
1543     FnName = "binary";
1544     FnName += (char)CurTok;
1545     Kind = 2;
1546     getNextToken();
1547
1548     // Read the precedence if present.
1549     if (CurTok == tok_number) {
1550       if (NumVal &lt; 1 || NumVal &gt; 100)
1551         return ErrorP("Invalid precedecnce: must be 1..100");
1552       BinaryPrecedence = (unsigned)NumVal;
1553       getNextToken();
1554     }
1555     break;
1556   }
1557
1558   if (CurTok != '(')
1559     return ErrorP("Expected '(' in prototype");
1560
1561   std::vector&lt;std::string&gt; ArgNames;
1562   while (getNextToken() == tok_identifier)
1563     ArgNames.push_back(IdentifierStr);
1564   if (CurTok != ')')
1565     return ErrorP("Expected ')' in prototype");
1566
1567   // success.
1568   getNextToken();  // eat ')'.
1569
1570   // Verify right number of names for operator.
1571   if (Kind &amp;&amp; ArgNames.size() != Kind)
1572     return ErrorP("Invalid number of operands for operator");
1573
1574   return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1575 }
1576
1577 /// definition ::= 'def' prototype expression
1578 static FunctionAST *ParseDefinition() {
1579   getNextToken();  // eat def.
1580   PrototypeAST *Proto = ParsePrototype();
1581   if (Proto == 0) return 0;
1582
1583   if (ExprAST *E = ParseExpression())
1584     return new FunctionAST(Proto, E);
1585   return 0;
1586 }
1587
1588 /// toplevelexpr ::= expression
1589 static FunctionAST *ParseTopLevelExpr() {
1590   if (ExprAST *E = ParseExpression()) {
1591     // Make an anonymous proto.
1592     PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1593     return new FunctionAST(Proto, E);
1594   }
1595   return 0;
1596 }
1597
1598 /// external ::= 'extern' prototype
1599 static PrototypeAST *ParseExtern() {
1600   getNextToken();  // eat extern.
1601   return ParsePrototype();
1602 }
1603
1604 //===----------------------------------------------------------------------===//
1605 // Code Generation
1606 //===----------------------------------------------------------------------===//
1607
1608 static Module *TheModule;
1609 static IRBuilder&lt;&gt; Builder(getGlobalContext());
1610 static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
1611 static FunctionPassManager *TheFPM;
1612
1613 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1614
1615 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
1616 /// the function.  This is used for mutable variables etc.
1617 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
1618                                           const std::string &amp;VarName) {
1619   IRBuilder&lt;&gt; TmpB(&amp;TheFunction-&gt;getEntryBlock(),
1620                  TheFunction-&gt;getEntryBlock().begin());
1621   return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0, VarName.c_str());
1622 }
1623
1624
1625 Value *NumberExprAST::Codegen() {
1626   return ConstantFP::get(getGlobalContext(), APFloat(Val));
1627 }
1628
1629 Value *VariableExprAST::Codegen() {
1630   // Look this variable up in the function.
1631   Value *V = NamedValues[Name];
1632   if (V == 0) return ErrorV("Unknown variable name");
1633
1634   // Load the value.
1635   return Builder.CreateLoad(V, Name.c_str());
1636 }
1637
1638 Value *UnaryExprAST::Codegen() {
1639   Value *OperandV = Operand-&gt;Codegen();
1640   if (OperandV == 0) return 0;
1641
1642   Function *F = TheModule-&gt;getFunction(std::string("unary")+Opcode);
1643   if (F == 0)
1644     return ErrorV("Unknown unary operator");
1645
1646   return Builder.CreateCall(F, OperandV, "unop");
1647 }
1648
1649
1650 Value *BinaryExprAST::Codegen() {
1651   // Special case '=' because we don't want to emit the LHS as an expression.
1652   if (Op == '=') {
1653     // Assignment requires the LHS to be an identifier.
1654     VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
1655     if (!LHSE)
1656       return ErrorV("destination of '=' must be a variable");
1657     // Codegen the RHS.
1658     Value *Val = RHS-&gt;Codegen();
1659     if (Val == 0) return 0;
1660
1661     // Look up the name.
1662     Value *Variable = NamedValues[LHSE-&gt;getName()];
1663     if (Variable == 0) return ErrorV("Unknown variable name");
1664
1665     Builder.CreateStore(Val, Variable);
1666     return Val;
1667   }
1668
1669
1670   Value *L = LHS-&gt;Codegen();
1671   Value *R = RHS-&gt;Codegen();
1672   if (L == 0 || R == 0) return 0;
1673
1674   switch (Op) {
1675   case '+': return Builder.CreateAdd(L, R, "addtmp");
1676   case '-': return Builder.CreateSub(L, R, "subtmp");
1677   case '*': return Builder.CreateMul(L, R, "multmp");
1678   case '&lt;':
1679     L = Builder.CreateFCmpULT(L, R, "cmptmp");
1680     // Convert bool 0/1 to double 0.0 or 1.0
1681     return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()), "booltmp");
1682   default: break;
1683   }
1684
1685   // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1686   // a call to it.
1687   Function *F = TheModule-&gt;getFunction(std::string("binary")+Op);
1688   assert(F &amp;&amp; "binary operator not found!");
1689
1690   Value *Ops[] = { L, R };
1691   return Builder.CreateCall(F, Ops, Ops+2, "binop");
1692 }
1693
1694 Value *CallExprAST::Codegen() {
1695   // Look up the name in the global module table.
1696   Function *CalleeF = TheModule-&gt;getFunction(Callee);
1697   if (CalleeF == 0)
1698     return ErrorV("Unknown function referenced");
1699
1700   // If argument mismatch error.
1701   if (CalleeF-&gt;arg_size() != Args.size())
1702     return ErrorV("Incorrect # arguments passed");
1703
1704   std::vector&lt;Value*&gt; ArgsV;
1705   for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1706     ArgsV.push_back(Args[i]-&gt;Codegen());
1707     if (ArgsV.back() == 0) return 0;
1708   }
1709
1710   return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1711 }
1712
1713 Value *IfExprAST::Codegen() {
1714   Value *CondV = Cond-&gt;Codegen();
1715   if (CondV == 0) return 0;
1716
1717   // Convert condition to a bool by comparing equal to 0.0.
1718   CondV = Builder.CreateFCmpONE(CondV,
1719                               ConstantFP::get(getGlobalContext(), APFloat(0.0)),
1720                                 "ifcond");
1721
1722   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1723
1724   // Create blocks for the then and else cases.  Insert the 'then' block at the
1725   // end of the function.
1726   BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
1727   BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
1728   BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
1729
1730   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1731
1732   // Emit then value.
1733   Builder.SetInsertPoint(ThenBB);
1734
1735   Value *ThenV = Then-&gt;Codegen();
1736   if (ThenV == 0) return 0;
1737
1738   Builder.CreateBr(MergeBB);
1739   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1740   ThenBB = Builder.GetInsertBlock();
1741
1742   // Emit else block.
1743   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1744   Builder.SetInsertPoint(ElseBB);
1745
1746   Value *ElseV = Else-&gt;Codegen();
1747   if (ElseV == 0) return 0;
1748
1749   Builder.CreateBr(MergeBB);
1750   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1751   ElseBB = Builder.GetInsertBlock();
1752
1753   // Emit merge block.
1754   TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1755   Builder.SetInsertPoint(MergeBB);
1756   PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()), "iftmp");
1757
1758   PN-&gt;addIncoming(ThenV, ThenBB);
1759   PN-&gt;addIncoming(ElseV, ElseBB);
1760   return PN;
1761 }
1762
1763 Value *ForExprAST::Codegen() {
1764   // Output this as:
1765   //   var = alloca double
1766   //   ...
1767   //   start = startexpr
1768   //   store start -&gt; var
1769   //   goto loop
1770   // loop:
1771   //   ...
1772   //   bodyexpr
1773   //   ...
1774   // loopend:
1775   //   step = stepexpr
1776   //   endcond = endexpr
1777   //
1778   //   curvar = load var
1779   //   nextvar = curvar + step
1780   //   store nextvar -&gt; var
1781   //   br endcond, loop, endloop
1782   // outloop:
1783
1784   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1785
1786   // Create an alloca for the variable in the entry block.
1787   AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1788
1789   // Emit the start code first, without 'variable' in scope.
1790   Value *StartVal = Start-&gt;Codegen();
1791   if (StartVal == 0) return 0;
1792
1793   // Store the value into the alloca.
1794   Builder.CreateStore(StartVal, Alloca);
1795
1796   // Make the new basic block for the loop header, inserting after current
1797   // block.
1798   BasicBlock *PreheaderBB = Builder.GetInsertBlock();
1799   BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
1800
1801   // Insert an explicit fall through from the current block to the LoopBB.
1802   Builder.CreateBr(LoopBB);
1803
1804   // Start insertion in LoopBB.
1805   Builder.SetInsertPoint(LoopBB);
1806
1807   // Within the loop, the variable is defined equal to the PHI node.  If it
1808   // shadows an existing variable, we have to restore it, so save it now.
1809   AllocaInst *OldVal = NamedValues[VarName];
1810   NamedValues[VarName] = Alloca;
1811
1812   // Emit the body of the loop.  This, like any other expr, can change the
1813   // current BB.  Note that we ignore the value computed by the body, but don't
1814   // allow an error.
1815   if (Body-&gt;Codegen() == 0)
1816     return 0;
1817
1818   // Emit the step value.
1819   Value *StepVal;
1820   if (Step) {
1821     StepVal = Step-&gt;Codegen();
1822     if (StepVal == 0) return 0;
1823   } else {
1824     // If not specified, use 1.0.
1825     StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
1826   }
1827
1828   // Compute the end condition.
1829   Value *EndCond = End-&gt;Codegen();
1830   if (EndCond == 0) return EndCond;
1831
1832   // Reload, increment, and restore the alloca.  This handles the case where
1833   // the body of the loop mutates the variable.
1834   Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
1835   Value *NextVar = Builder.CreateAdd(CurVar, StepVal, "nextvar");
1836   Builder.CreateStore(NextVar, Alloca);
1837
1838   // Convert condition to a bool by comparing equal to 0.0.
1839   EndCond = Builder.CreateFCmpONE(EndCond,
1840                               ConstantFP::get(getGlobalContext(), APFloat(0.0)),
1841                                   "loopcond");
1842
1843   // Create the "after loop" block and insert it.
1844   BasicBlock *LoopEndBB = Builder.GetInsertBlock();
1845   BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
1846
1847   // Insert the conditional branch into the end of LoopEndBB.
1848   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1849
1850   // Any new code will be inserted in AfterBB.
1851   Builder.SetInsertPoint(AfterBB);
1852
1853   // Restore the unshadowed variable.
1854   if (OldVal)
1855     NamedValues[VarName] = OldVal;
1856   else
1857     NamedValues.erase(VarName);
1858
1859
1860   // for expr always returns 0.0.
1861   return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
1862 }
1863
1864 Value *VarExprAST::Codegen() {
1865   std::vector&lt;AllocaInst *&gt; OldBindings;
1866
1867   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1868
1869   // Register all variables and emit their initializer.
1870   for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1871     const std::string &amp;VarName = VarNames[i].first;
1872     ExprAST *Init = VarNames[i].second;
1873
1874     // Emit the initializer before adding the variable to scope, this prevents
1875     // the initializer from referencing the variable itself, and permits stuff
1876     // like this:
1877     //  var a = 1 in
1878     //    var a = a in ...   # refers to outer 'a'.
1879     Value *InitVal;
1880     if (Init) {
1881       InitVal = Init-&gt;Codegen();
1882       if (InitVal == 0) return 0;
1883     } else { // If not specified, use 0.0.
1884       InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
1885     }
1886
1887     AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1888     Builder.CreateStore(InitVal, Alloca);
1889
1890     // Remember the old variable binding so that we can restore the binding when
1891     // we unrecurse.
1892     OldBindings.push_back(NamedValues[VarName]);
1893
1894     // Remember this binding.
1895     NamedValues[VarName] = Alloca;
1896   }
1897
1898   // Codegen the body, now that all vars are in scope.
1899   Value *BodyVal = Body-&gt;Codegen();
1900   if (BodyVal == 0) return 0;
1901
1902   // Pop all our variables from scope.
1903   for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1904     NamedValues[VarNames[i].first] = OldBindings[i];
1905
1906   // Return the body computation.
1907   return BodyVal;
1908 }
1909
1910
1911 Function *PrototypeAST::Codegen() {
1912   // Make the function type:  double(double,double) etc.
1913   std::vector&lt;const Type*&gt; Doubles(Args.size(), Type::getDoubleTy(getGlobalContext()));
1914   FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()), Doubles, false);
1915
1916   Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
1917
1918   // If F conflicted, there was already something named 'Name'.  If it has a
1919   // body, don't allow redefinition or reextern.
1920   if (F-&gt;getName() != Name) {
1921     // Delete the one we just made and get the existing one.
1922     F-&gt;eraseFromParent();
1923     F = TheModule-&gt;getFunction(Name);
1924
1925     // If F already has a body, reject this.
1926     if (!F-&gt;empty()) {
1927       ErrorF("redefinition of function");
1928       return 0;
1929     }
1930
1931     // If F took a different number of args, reject.
1932     if (F-&gt;arg_size() != Args.size()) {
1933       ErrorF("redefinition of function with different # args");
1934       return 0;
1935     }
1936   }
1937
1938   // Set names for all arguments.
1939   unsigned Idx = 0;
1940   for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1941        ++AI, ++Idx)
1942     AI-&gt;setName(Args[Idx]);
1943
1944   return F;
1945 }
1946
1947 /// CreateArgumentAllocas - Create an alloca for each argument and register the
1948 /// argument in the symbol table so that references to it will succeed.
1949 void PrototypeAST::CreateArgumentAllocas(Function *F) {
1950   Function::arg_iterator AI = F-&gt;arg_begin();
1951   for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1952     // Create an alloca for this variable.
1953     AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1954
1955     // Store the initial value into the alloca.
1956     Builder.CreateStore(AI, Alloca);
1957
1958     // Add arguments to variable symbol table.
1959     NamedValues[Args[Idx]] = Alloca;
1960   }
1961 }
1962
1963
1964 Function *FunctionAST::Codegen() {
1965   NamedValues.clear();
1966
1967   Function *TheFunction = Proto-&gt;Codegen();
1968   if (TheFunction == 0)
1969     return 0;
1970
1971   // If this is an operator, install it.
1972   if (Proto-&gt;isBinaryOp())
1973     BinopPrecedence[Proto-&gt;getOperatorName()] = Proto-&gt;getBinaryPrecedence();
1974
1975   // Create a new basic block to start insertion into.
1976   BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
1977   Builder.SetInsertPoint(BB);
1978
1979   // Add all arguments to the symbol table and create their allocas.
1980   Proto-&gt;CreateArgumentAllocas(TheFunction);
1981
1982   if (Value *RetVal = Body-&gt;Codegen()) {
1983     // Finish off the function.
1984     Builder.CreateRet(RetVal);
1985
1986     // Validate the generated code, checking for consistency.
1987     verifyFunction(*TheFunction);
1988
1989     // Optimize the function.
1990     TheFPM-&gt;run(*TheFunction);
1991
1992     return TheFunction;
1993   }
1994
1995   // Error reading body, remove function.
1996   TheFunction-&gt;eraseFromParent();
1997
1998   if (Proto-&gt;isBinaryOp())
1999     BinopPrecedence.erase(Proto-&gt;getOperatorName());
2000   return 0;
2001 }
2002
2003 //===----------------------------------------------------------------------===//
2004 // Top-Level parsing and JIT Driver
2005 //===----------------------------------------------------------------------===//
2006
2007 static ExecutionEngine *TheExecutionEngine;
2008
2009 static void HandleDefinition() {
2010   if (FunctionAST *F = ParseDefinition()) {
2011     if (Function *LF = F-&gt;Codegen()) {
2012       fprintf(stderr, "Read function definition:");
2013       LF-&gt;dump();
2014     }
2015   } else {
2016     // Skip token for error recovery.
2017     getNextToken();
2018   }
2019 }
2020
2021 static void HandleExtern() {
2022   if (PrototypeAST *P = ParseExtern()) {
2023     if (Function *F = P-&gt;Codegen()) {
2024       fprintf(stderr, "Read extern: ");
2025       F-&gt;dump();
2026     }
2027   } else {
2028     // Skip token for error recovery.
2029     getNextToken();
2030   }
2031 }
2032
2033 static void HandleTopLevelExpression() {
2034   // Evaluate a top level expression into an anonymous function.
2035   if (FunctionAST *F = ParseTopLevelExpr()) {
2036     if (Function *LF = F-&gt;Codegen()) {
2037       // JIT the function, returning a function pointer.
2038       void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
2039
2040       // Cast it to the right type (takes no arguments, returns a double) so we
2041       // can call it as a native function.
2042       double (*FP)() = (double (*)())FPtr;
2043       fprintf(stderr, "Evaluated to %f\n", FP());
2044     }
2045   } else {
2046     // Skip token for error recovery.
2047     getNextToken();
2048   }
2049 }
2050
2051 /// top ::= definition | external | expression | ';'
2052 static void MainLoop() {
2053   while (1) {
2054     fprintf(stderr, "ready&gt; ");
2055     switch (CurTok) {
2056     case tok_eof:    return;
2057     case ';':        getNextToken(); break;  // ignore top level semicolons.
2058     case tok_def:    HandleDefinition(); break;
2059     case tok_extern: HandleExtern(); break;
2060     default:         HandleTopLevelExpression(); break;
2061     }
2062   }
2063 }
2064
2065
2066
2067 //===----------------------------------------------------------------------===//
2068 // "Library" functions that can be "extern'd" from user code.
2069 //===----------------------------------------------------------------------===//
2070
2071 /// putchard - putchar that takes a double and returns 0.
2072 extern "C"
2073 double putchard(double X) {
2074   putchar((char)X);
2075   return 0;
2076 }
2077
2078 /// printd - printf that takes a double prints it as "%f\n", returning 0.
2079 extern "C"
2080 double printd(double X) {
2081   printf("%f\n", X);
2082   return 0;
2083 }
2084
2085 //===----------------------------------------------------------------------===//
2086 // Main driver code.
2087 //===----------------------------------------------------------------------===//
2088
2089 int main() {
2090   // Install standard binary operators.
2091   // 1 is lowest precedence.
2092   BinopPrecedence['='] = 2;
2093   BinopPrecedence['&lt;'] = 10;
2094   BinopPrecedence['+'] = 20;
2095   BinopPrecedence['-'] = 20;
2096   BinopPrecedence['*'] = 40;  // highest.
2097
2098   // Prime the first token.
2099   fprintf(stderr, "ready&gt; ");
2100   getNextToken();
2101
2102   // Make the module, which holds all the code.
2103   TheModule = new Module("my cool jit", getGlobalContext());
2104
2105   ExistingModuleProvider *OurModuleProvider =
2106       new ExistingModuleProvider(TheModule);
2107
2108   // Create the JIT.  This takes ownership of the module and module provider.
2109   TheExecutionEngine = EngineBuilder(OurModuleProvider).create();
2110
2111   FunctionPassManager OurFPM(OurModuleProvider);
2112
2113   // Set up the optimizer pipeline.  Start with registering info about how the
2114   // target lays out data structures.
2115   OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
2116   // Do simple "peephole" optimizations and bit-twiddling optzns.
2117   OurFPM.add(createInstructionCombiningPass());
2118   // Reassociate expressions.
2119   OurFPM.add(createReassociatePass());
2120   // Eliminate Common SubExpressions.
2121   OurFPM.add(createGVNPass());
2122   // Simplify the control flow graph (deleting unreachable blocks, etc).
2123   OurFPM.add(createCFGSimplificationPass());
2124
2125   // Set the global so the code gen can use this.
2126   TheFPM = &amp;OurFPM;
2127
2128   // Run the main "interpreter loop" now.
2129   MainLoop();
2130
2131   TheFPM = 0;
2132
2133   // Print out all of the generated code.
2134   TheModule-&gt;dump();
2135
2136   return 0;
2137 }
2138 </pre>
2139 </div>
2140
2141 <a href="LangImpl8.html">Next: Conclusion and other useful LLVM tidbits</a>
2142 </div>
2143
2144 <!-- *********************************************************************** -->
2145 <hr>
2146 <address>
2147   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2148   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
2149   <a href="http://validator.w3.org/check/referer"><img
2150   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
2151
2152   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2153   <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
2154   Last modified: $Date: 2007-10-17 11:05:13 -0700 (Wed, 17 Oct 2007) $
2155 </address>
2156 </body>
2157 </html>