docs/tutorial/LangImpl7.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
   2                       "http://www.w3.org/TR/html4/strict.dtd">
   3
   4 <html>
   5 <head>
   6   <title>Kaleidoscope: Extending the Language: Mutable Variables / SSA
   7          construction</title>
   8   <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   9   <meta name="author" content="Chris Lattner">
  10   <link rel="stylesheet" href="../llvm.css" type="text/css">
  11 </head>
  12
  13 <body>
  14
  15 <div class="doc_title">Kaleidoscope: Extending the Language: Mutable Variables</div>
  16
  17 <ul>
  18 <li><a href="index.html">Up to Tutorial Index</a></li>
  19 <li>Chapter 7
  20   <ol>
  21     <li><a href="#intro">Chapter 7 Introduction</a></li>
  22     <li><a href="#why">Why is this a hard problem?</a></li>
  23     <li><a href="#memory">Memory in LLVM</a></li>
  24     <li><a href="#kalvars">Mutable Variables in Kaleidoscope</a></li>
  25     <li><a href="#adjustments">Adjusting Existing Variables for
  26      Mutation</a></li>
  27     <li><a href="#assignment">New Assignment Operator</a></li>
  28     <li><a href="#localvars">User-defined Local Variables</a></li>
  29     <li><a href="#code">Full Code Listing</a></li>
  30   </ol>
  31 </li>
  32 <li><a href="LangImpl8.html">Chapter 8</a>: Conclusion and other useful LLVM
  33  tidbits</li>
  34 </ul>
  35
  36 <div class="doc_author">
  37   <p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a></p>
  38 </div>
  39
  40 <!-- *********************************************************************** -->
  41 <div class="doc_section"><a name="intro">Chapter 7 Introduction</a></div>
  42 <!-- *********************************************************************** -->
  43
  44 <div class="doc_text">
  45
  46 <p>Welcome to Chapter 7 of the "<a href="index.html">Implementing a language
  47 with LLVM</a>" tutorial.  In chapters 1 through 6, we've built a very
  48 respectable, albeit simple, <a
  49 href="http://en.wikipedia.org/wiki/Functional_programming">functional
  50 programming language</a>.  In our journey, we learned some parsing techniques,
  51 how to build and represent an AST, how to build LLVM IR, and how to optimize
  52 the resultant code as well as JIT compile it.</p>
  53
  54 <p>While Kaleidoscope is interesting as a functional language, the fact that it
  55 is functional makes it "too easy" to generate LLVM IR for it.  In particular, a
  56 functional language makes it very easy to build LLVM IR directly in <a
  57 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
  58 Since LLVM requires that the input code be in SSA form, this is a very nice
  59 property and it is often unclear to newcomers how to generate code for an
  60 imperative language with mutable variables.</p>
  61
  62 <p>The short (and happy) summary of this chapter is that there is no need for
  63 your front-end to build SSA form: LLVM provides highly tuned and well tested
  64 support for this, though the way it works is a bit unexpected for some.</p>
  65
  66 </div>
  67
  68 <!-- *********************************************************************** -->
  69 <div class="doc_section"><a name="why">Why is this a hard problem?</a></div>
  70 <!-- *********************************************************************** -->
  71
  72 <div class="doc_text">
  73
  74 <p>
  75 To understand why mutable variables cause complexities in SSA construction,
  76 consider this extremely simple C example:
  77 </p>
  78
  79 <div class="doc_code">
  80 <pre>
  81 int G, H;
  82 int test(_Bool Condition) {
  83   int X;
  84   if (Condition)
  85     X = G;
  86   else
  87     X = H;
  88   return X;
  89 }
  90 </pre>
  91 </div>
  92
  93 <p>In this case, we have the variable "X", whose value depends on the path
  94 executed in the program.  Because there are two different possible values for X
  95 before the return instruction, a PHI node is inserted to merge the two values.
  96 The LLVM IR that we want for this example looks like this:</p>
  97
  98 <div class="doc_code">
  99 <pre>
 100 @G = weak global i32 0   ; type of @G is i32*
 101 @H = weak global i32 0   ; type of @H is i32*
 102
 103 define i32 @test(i1 %Condition) {
 104 entry:
 105         br i1 %Condition, label %cond_true, label %cond_false
 106
 107 cond_true:
 108         %X.0 = load i32* @G
 109         br label %cond_next
 110
 111 cond_false:
 112         %X.1 = load i32* @H
 113         br label %cond_next
 114
 115 cond_next:
 116         %X.2 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
 117         ret i32 %X.2
 118 }
 119 </pre>
 120 </div>
 121
 122 <p>In this example, the loads from the G and H global variables are explicit in
 123 the LLVM IR, and they live in the then/else branches of the if statement
 124 (cond_true/cond_false).  In order to merge the incoming values, the X.2 phi node
 125 in the cond_next block selects the right value to use based on where control
 126 flow is coming from: if control flow comes from the cond_false block, X.2 gets
 127 the value of X.1.  Alternatively, if control flow comes from cond_true, it gets
 128 the value of X.0.  The intent of this chapter is not to explain the details of
 129 SSA form.  For more information, see one of the many <a
 130 href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
 131 references</a>.</p>
 132
 133 <p>The question for this article is "who places the phi nodes when lowering
 134 assignments to mutable variables?".  The issue here is that LLVM
 135 <em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
 136 However, SSA construction requires non-trivial algorithms and data structures,
 137 so it is inconvenient and wasteful for every front-end to have to reproduce this
 138 logic.</p>
 139
 140 </div>
 141
 142 <!-- *********************************************************************** -->
 143 <div class="doc_section"><a name="memory">Memory in LLVM</a></div>
 144 <!-- *********************************************************************** -->
 145
 146 <div class="doc_text">
 147
 148 <p>The 'trick' here is that while LLVM does require all register values to be
 149 in SSA form, it does not require (or permit) memory objects to be in SSA form.
 150 In the example above, note that the loads from G and H are direct accesses to
 151 G and H: they are not renamed or versioned.  This differs from some other
 152 compiler systems, which do try to version memory objects.  In LLVM, instead of
 153 encoding dataflow analysis of memory into the LLVM IR, it is handled with <a
 154 href="../WritingAnLLVMPass.html">Analysis Passes</a> which are computed on
 155 demand.</p>
 156
 157 <p>
 158 With this in mind, the high-level idea is that we want to make a stack variable
 159 (which lives in memory, because it is on the stack) for each mutable object in
 160 a function.  To take advantage of this trick, we need to talk about how LLVM
 161 represents stack variables.
 162 </p>
 163
 164 <p>In LLVM, all memory accesses are explicit with load/store instructions, and
 165 it is carefully designed not to have (or need) an "address-of" operator.  Notice
 166 how the type of the @G/@H global variables is actually "i32*" even though the
 167 variable is defined as "i32".  What this means is that @G defines <em>space</em>
 168 for an i32 in the global data area, but its <em>name</em> actually refers to the
 169 address for that space.  Stack variables work the same way, except that instead of
 170 being declared with global variable definitions, they are declared with the
 171 <a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>
 172
 173 <div class="doc_code">
 174 <pre>
 175 define i32 @example() {
 176 entry:
 177         %X = alloca i32           ; type of %X is i32*.
 178         ...
 179         %tmp = load i32* %X       ; load the stack value %X from the stack.
 180         %tmp2 = add i32 %tmp, 1   ; increment it
 181         store i32 %tmp2, i32* %X  ; store it back
 182         ...
 183 </pre>
 184 </div>
 185
 186 <p>This code shows an example of how you can declare and manipulate a stack
 187 variable in the LLVM IR.  Stack memory allocated with the alloca instruction is
 188 fully general: you can pass the address of the stack slot to functions, you can
 189 store it in other variables, etc.  In our example above, we could rewrite the
 190 example to use the alloca technique to avoid using a PHI node:</p>
 191
 192 <div class="doc_code">
 193 <pre>
 194 @G = weak global i32 0   ; type of @G is i32*
 195 @H = weak global i32 0   ; type of @H is i32*
 196
 197 define i32 @test(i1 %Condition) {
 198 entry:
 199         %X = alloca i32           ; type of %X is i32*.
 200         br i1 %Condition, label %cond_true, label %cond_false
 201
 202 cond_true:
 203         %X.0 = load i32* @G
 204         store i32 %X.0, i32* %X   ; Update X
 205         br label %cond_next
 206
 207 cond_false:
 208         %X.1 = load i32* @H
 209         store i32 %X.1, i32* %X   ; Update X
 210         br label %cond_next
 211
 212 cond_next:
 213         %X.2 = load i32* %X       ; Read X
 214         ret i32 %X.2
 215 }
 216 </pre>
 217 </div>
 218
 219 <p>With this, we have discovered a way to handle arbitrary mutable variables
 220 without the need to create Phi nodes at all:</p>
 221
 222 <ol>
 223 <li>Each mutable variable becomes a stack allocation.</li>
 224 <li>Each read of the variable becomes a load from the stack.</li>
 225 <li>Each update of the variable becomes a store to the stack.</li>
 226 <li>Taking the address of a variable just uses the stack address directly.</li>
 227 </ol>
 228
 229 <p>While this solution has solved our immediate problem, it introduced another
 230 one: we have now apparently introduced a lot of stack traffic for very simple
 231 and common operations, a major performance problem.  Fortunately for us, the
 232 LLVM optimizer has a highly-tuned optimization pass named "mem2reg" that handles
 233 this case, promoting allocas like this into SSA registers, inserting Phi nodes
 234 as appropriate.  If you run this example through the pass, for example, you'll
 235 get:</p>
 236
 237 <div class="doc_code">
 238 <pre>
 239 $ <b>llvm-as &lt; example.ll | opt -mem2reg | llvm-dis</b>
 240 @G = weak global i32 0
 241 @H = weak global i32 0
 242
 243 define i32 @test(i1 %Condition) {
 244 entry:
 245         br i1 %Condition, label %cond_true, label %cond_false
 246
 247 cond_true:
 248         %X.0 = load i32* @G
 249         br label %cond_next
 250
 251 cond_false:
 252         %X.1 = load i32* @H
 253         br label %cond_next
 254
 255 cond_next:
 256         %X.01 = phi i32 [ %X.1, %cond_false ], [ %X.0, %cond_true ]
 257         ret i32 %X.01
 258 }
 259 </pre>
 260 </div>
 261
 262 <p>The mem2reg pass implements the standard "iterated dominance frontier"
 263 algorithm for constructing SSA form and has a number of optimizations that speed
 264 up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing
 265 with mutable variables, and we highly recommend that you depend on it.  Note that
 266 mem2reg only works on variables in certain circumstances:</p>
 267
 268 <ol>
 269 <li>mem2reg is alloca-driven: it looks for allocas and if it can handle them, it
 270 promotes them.  It does not apply to global variables or heap allocations.</li>
 271
 272 <li>mem2reg only looks for alloca instructions in the entry block of the
 273 function.  Being in the entry block guarantees that the alloca is only executed
 274 once, which makes analysis simpler.</li>
 275
 276 <li>mem2reg only promotes allocas whose uses are direct loads and stores.  If
 277 the address of the stack object is passed to a function, or if any funny pointer
 278 arithmetic is involved, the alloca will not be promoted.</li>
 279
 280 <li>mem2reg only works on allocas of <a
 281 href="../LangRef.html#t_classifications">first class</a>
 282 values (such as pointers, scalars and vectors), and only if the array size
 283 of the allocation is 1 (or missing in the .ll file).  mem2reg is not capable of
 284 promoting structs or arrays to registers.  Note that the "scalarrepl" pass is
 285 more powerful and can promote structs, "unions", and arrays in many cases.</li>
 286
 287 </ol>
 288
 289 <p>
 290 All of these properties are easy to satisfy for most imperative languages, and
 291 we'll illustrate it below with Kaleidoscope.  The final question you may be
 292 asking is: should I bother with this nonsense for my front-end?  Wouldn't it be
 293 better if I just did SSA construction directly, avoiding use of the mem2reg
 294 optimization pass?  In short, we strongly recommend that you use this technique
 295 for building SSA form, unless there is an extremely good reason not to.  Using
 296 this technique is:</p>
 297
 298 <ul>
 299 <li>Proven and well tested: llvm-gcc and clang both use this technique for local
 300 mutable variables.  As such, the most common clients of LLVM are using this to
 301 handle a bulk of their variables.  You can be sure that bugs are found fast and
 302 fixed early.</li>
 303
 304 <li>Extremely Fast: mem2reg has a number of special cases that make it fast in
 305 common cases as well as fully general.  For example, it has fast-paths for
 306 variables that are only used in a single block, variables that only have one
 307 assignment point, good heuristics to avoid insertion of unneeded phi nodes, etc.
 308 </li>
 309
 310 <li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
 311 Debug information in LLVM</a> relies on having the address of the variable
 312 exposed so that debug info can be attached to it.  This technique dovetails
 313 very naturally with this style of debug info.</li>
 314 </ul>
 315
 316 <p>If nothing else, this makes it much easier to get your front-end up and
 317 running, and is very simple to implement.  Lets extend Kaleidoscope with mutable
 318 variables now!
 319 </p>
 320
 321 </div>
 322
 323 <!-- *********************************************************************** -->
 324 <div class="doc_section"><a name="kalvars">Mutable Variables in
 325 Kaleidoscope</a></div>
 326 <!-- *********************************************************************** -->
 327
 328 <div class="doc_text">
 329
 330 <p>Now that we know the sort of problem we want to tackle, lets see what this
 331 looks like in the context of our little Kaleidoscope language.  We're going to
 332 add two features:</p>
 333
 334 <ol>
 335 <li>The ability to mutate variables with the '=' operator.</li>
 336 <li>The ability to define new variables.</li>
 337 </ol>
 338
 339 <p>While the first item is really what this is about, we only have variables
 340 for incoming arguments as well as for induction variables, and redefining those only
 341 goes so far :).  Also, the ability to define new variables is a
 342 useful thing regardless of whether you will be mutating them.  Here's a
 343 motivating example that shows how we could use these:</p>
 344
 345 <div class="doc_code">
 346 <pre>
 347 # Define ':' for sequencing: as a low-precedence operator that ignores operands
 348 # and just returns the RHS.
 349 def binary : 1 (x y) y;
 350
 351 # Recursive fib, we could do this before.
 352 def fib(x)
 353   if (x &lt; 3) then
 354     1
 355   else
 356     fib(x-1)+fib(x-2);
 357
 358 # Iterative fib.
 359 def fibi(x)
 360   <b>var a = 1, b = 1, c in</b>
 361   (for i = 3, i &lt; x in
 362      <b>c = a + b</b> :
 363      <b>a = b</b> :
 364      <b>b = c</b>) :
 365   b;
 366
 367 # Call it.
 368 fibi(10);
 369 </pre>
 370 </div>
 371
 372 <p>
 373 In order to mutate variables, we have to change our existing variables to use
 374 the "alloca trick".  Once we have that, we'll add our new operator, then extend
 375 Kaleidoscope to support new variable definitions.
 376 </p>
 377
 378 </div>
 379
 380 <!-- *********************************************************************** -->
 381 <div class="doc_section"><a name="adjustments">Adjusting Existing Variables for
 382 Mutation</a></div>
 383 <!-- *********************************************************************** -->
 384
 385 <div class="doc_text">
 386
 387 <p>
 388 The symbol table in Kaleidoscope is managed at code generation time by the
 389 '<tt>NamedValues</tt>' map.  This map currently keeps track of the LLVM "Value*"
 390 that holds the double value for the named variable.  In order to support
 391 mutation, we need to change this slightly, so that it <tt>NamedValues</tt> holds
 392 the <em>memory location</em> of the variable in question.  Note that this
 393 change is a refactoring: it changes the structure of the code, but does not
 394 (by itself) change the behavior of the compiler.  All of these changes are
 395 isolated in the Kaleidoscope code generator.</p>
 396
 397 <p>
 398 At this point in Kaleidoscope's development, it only supports variables for two
 399 things: incoming arguments to functions and the induction variable of 'for'
 400 loops.  For consistency, we'll allow mutation of these variables in addition to
 401 other user-defined variables.  This means that these will both need memory
 402 locations.
 403 </p>
 404
 405 <p>To start our transformation of Kaleidoscope, we'll change the NamedValues
 406 map so that it maps to AllocaInst* instead of Value*.  Once we do this, the C++
 407 compiler will tell us what parts of the code we need to update:</p>
 408
 409 <div class="doc_code">
 410 <pre>
 411 static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
 412 </pre>
 413 </div>
 414
 415 <p>Also, since we will need to create these alloca's, we'll use a helper
 416 function that ensures that the allocas are created in the entry block of the
 417 function:</p>
 418
 419 <div class="doc_code">
 420 <pre>
 421 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
 422 /// the function.  This is used for mutable variables etc.
 423 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
 424                                           const std::string &amp;VarName) {
 425   IRBuilder&lt;&gt; TmpB(&amp;TheFunction-&gt;getEntryBlock(),
 426                  TheFunction-&gt;getEntryBlock().begin());
 427   return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
 428                            VarName.c_str());
 429 }
 430 </pre>
 431 </div>
 432
 433 <p>This funny looking code creates an IRBuilder object that is pointing at
 434 the first instruction (.begin()) of the entry block.  It then creates an alloca
 435 with the expected name and returns it.  Because all values in Kaleidoscope are
 436 doubles, there is no need to pass in a type to use.</p>
 437
 438 <p>With this in place, the first functionality change we want to make is to
 439 variable references.  In our new scheme, variables live on the stack, so code
 440 generating a reference to them actually needs to produce a load from the stack
 441 slot:</p>
 442
 443 <div class="doc_code">
 444 <pre>
 445 Value *VariableExprAST::Codegen() {
 446   // Look this variable up in the function.
 447   Value *V = NamedValues[Name];
 448   if (V == 0) return ErrorV("Unknown variable name");
 449
 450   <b>// Load the value.
 451   return Builder.CreateLoad(V, Name.c_str());</b>
 452 }
 453 </pre>
 454 </div>
 455
 456 <p>As you can see, this is pretty straightforward.  Now we need to update the
 457 things that define the variables to set up the alloca.  We'll start with
 458 <tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
 459 the unabridged code):</p>
 460
 461 <div class="doc_code">
 462 <pre>
 463   Function *TheFunction = Builder.GetInsertBlock()->getParent();
 464
 465   <b>// Create an alloca for the variable in the entry block.
 466   AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);</b>
 467
 468     // Emit the start code first, without 'variable' in scope.
 469   Value *StartVal = Start-&gt;Codegen();
 470   if (StartVal == 0) return 0;
 471
 472   <b>// Store the value into the alloca.
 473   Builder.CreateStore(StartVal, Alloca);</b>
 474   ...
 475
 476   // Compute the end condition.
 477   Value *EndCond = End-&gt;Codegen();
 478   if (EndCond == 0) return EndCond;
 479
 480   <b>// Reload, increment, and restore the alloca.  This handles the case where
 481   // the body of the loop mutates the variable.
 482   Value *CurVar = Builder.CreateLoad(Alloca);
 483   Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
 484   Builder.CreateStore(NextVar, Alloca);</b>
 485   ...
 486 </pre>
 487 </div>
 488
 489 <p>This code is virtually identical to the code <a
 490 href="LangImpl5.html#forcodegen">before we allowed mutable variables</a>.  The
 491 big difference is that we no longer have to construct a PHI node, and we use
 492 load/store to access the variable as needed.</p>
 493
 494 <p>To support mutable argument variables, we need to also make allocas for them.
 495 The code for this is also pretty simple:</p>
 496
 497 <div class="doc_code">
 498 <pre>
 499 /// CreateArgumentAllocas - Create an alloca for each argument and register the
 500 /// argument in the symbol table so that references to it will succeed.
 501 void PrototypeAST::CreateArgumentAllocas(Function *F) {
 502   Function::arg_iterator AI = F-&gt;arg_begin();
 503   for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
 504     // Create an alloca for this variable.
 505     AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
 506
 507     // Store the initial value into the alloca.
 508     Builder.CreateStore(AI, Alloca);
 509
 510     // Add arguments to variable symbol table.
 511     NamedValues[Args[Idx]] = Alloca;
 512   }
 513 }
 514 </pre>
 515 </div>
 516
 517 <p>For each argument, we make an alloca, store the input value to the function
 518 into the alloca, and register the alloca as the memory location for the
 519 argument.  This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
 520 it sets up the entry block for the function.</p>
 521
 522 <p>The final missing piece is adding the mem2reg pass, which allows us to get
 523 good codegen once again:</p>
 524
 525 <div class="doc_code">
 526 <pre>
 527     // Set up the optimizer pipeline.  Start with registering info about how the
 528     // target lays out data structures.
 529     OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
 530     <b>// Promote allocas to registers.
 531     OurFPM.add(createPromoteMemoryToRegisterPass());</b>
 532     // Do simple "peephole" optimizations and bit-twiddling optzns.
 533     OurFPM.add(createInstructionCombiningPass());
 534     // Reassociate expressions.
 535     OurFPM.add(createReassociatePass());
 536 </pre>
 537 </div>
 538
 539 <p>It is interesting to see what the code looks like before and after the
 540 mem2reg optimization runs.  For example, this is the before/after code for our
 541 recursive fib function.  Before the optimization:</p>
 542
 543 <div class="doc_code">
 544 <pre>
 545 define double @fib(double %x) {
 546 entry:
 547         <b>%x1 = alloca double
 548         store double %x, double* %x1
 549         %x2 = load double* %x1</b>
 550         %cmptmp = fcmp ult double %x2, 3.000000e+00
 551         %booltmp = uitofp i1 %cmptmp to double
 552         %ifcond = fcmp one double %booltmp, 0.000000e+00
 553         br i1 %ifcond, label %then, label %else
 554
 555 then:           ; preds = %entry
 556         br label %ifcont
 557
 558 else:           ; preds = %entry
 559         <b>%x3 = load double* %x1</b>
 560         %subtmp = fsub double %x3, 1.000000e+00
 561         %calltmp = call double @fib(double %subtmp)
 562         <b>%x4 = load double* %x1</b>
 563         %subtmp5 = fsub double %x4, 2.000000e+00
 564         %calltmp6 = call double @fib(double %subtmp5)
 565         %addtmp = fadd double %calltmp, %calltmp6
 566         br label %ifcont
 567
 568 ifcont:         ; preds = %else, %then
 569         %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
 570         ret double %iftmp
 571 }
 572 </pre>
 573 </div>
 574
 575 <p>Here there is only one variable (x, the input argument) but you can still
 576 see the extremely simple-minded code generation strategy we are using.  In the
 577 entry block, an alloca is created, and the initial input value is stored into
 578 it.  Each reference to the variable does a reload from the stack.  Also, note
 579 that we didn't modify the if/then/else expression, so it still inserts a PHI
 580 node.  While we could make an alloca for it, it is actually easier to create a
 581 PHI node for it, so we still just make the PHI.</p>
 582
 583 <p>Here is the code after the mem2reg pass runs:</p>
 584
 585 <div class="doc_code">
 586 <pre>
 587 define double @fib(double %x) {
 588 entry:
 589         %cmptmp = fcmp ult double <b>%x</b>, 3.000000e+00
 590         %booltmp = uitofp i1 %cmptmp to double
 591         %ifcond = fcmp one double %booltmp, 0.000000e+00
 592         br i1 %ifcond, label %then, label %else
 593
 594 then:
 595         br label %ifcont
 596
 597 else:
 598         %subtmp = fsub double <b>%x</b>, 1.000000e+00
 599         %calltmp = call double @fib(double %subtmp)
 600         %subtmp5 = fsub double <b>%x</b>, 2.000000e+00
 601         %calltmp6 = call double @fib(double %subtmp5)
 602         %addtmp = fadd double %calltmp, %calltmp6
 603         br label %ifcont
 604
 605 ifcont:         ; preds = %else, %then
 606         %iftmp = phi double [ 1.000000e+00, %then ], [ %addtmp, %else ]
 607         ret double %iftmp
 608 }
 609 </pre>
 610 </div>
 611
 612 <p>This is a trivial case for mem2reg, since there are no redefinitions of the
 613 variable.  The point of showing this is to calm your tension about inserting
 614 such blatent inefficiencies :).</p>
 615
 616 <p>After the rest of the optimizers run, we get:</p>
 617
 618 <div class="doc_code">
 619 <pre>
 620 define double @fib(double %x) {
 621 entry:
 622         %cmptmp = fcmp ult double %x, 3.000000e+00
 623         %booltmp = uitofp i1 %cmptmp to double
 624         %ifcond = fcmp ueq double %booltmp, 0.000000e+00
 625         br i1 %ifcond, label %else, label %ifcont
 626
 627 else:
 628         %subtmp = fsub double %x, 1.000000e+00
 629         %calltmp = call double @fib(double %subtmp)
 630         %subtmp5 = fsub double %x, 2.000000e+00
 631         %calltmp6 = call double @fib(double %subtmp5)
 632         %addtmp = fadd double %calltmp, %calltmp6
 633         ret double %addtmp
 634
 635 ifcont:
 636         ret double 1.000000e+00
 637 }
 638 </pre>
 639 </div>
 640
 641 <p>Here we see that the simplifycfg pass decided to clone the return instruction
 642 into the end of the 'else' block.  This allowed it to eliminate some branches
 643 and the PHI node.</p>
 644
 645 <p>Now that all symbol table references are updated to use stack variables,
 646 we'll add the assignment operator.</p>
 647
 648 </div>
 649
 650 <!-- *********************************************************************** -->
 651 <div class="doc_section"><a name="assignment">New Assignment Operator</a></div>
 652 <!-- *********************************************************************** -->
 653
 654 <div class="doc_text">
 655
 656 <p>With our current framework, adding a new assignment operator is really
 657 simple.  We will parse it just like any other binary operator, but handle it
 658 internally (instead of allowing the user to define it).  The first step is to
 659 set a precedence:</p>
 660
 661 <div class="doc_code">
 662 <pre>
 663  int main() {
 664    // Install standard binary operators.
 665    // 1 is lowest precedence.
 666    <b>BinopPrecedence['='] = 2;</b>
 667    BinopPrecedence['&lt;'] = 10;
 668    BinopPrecedence['+'] = 20;
 669    BinopPrecedence['-'] = 20;
 670 </pre>
 671 </div>
 672
 673 <p>Now that the parser knows the precedence of the binary operator, it takes
 674 care of all the parsing and AST generation.  We just need to implement codegen
 675 for the assignment operator.  This looks like:</p>
 676
 677 <div class="doc_code">
 678 <pre>
 679 Value *BinaryExprAST::Codegen() {
 680   // Special case '=' because we don't want to emit the LHS as an expression.
 681   if (Op == '=') {
 682     // Assignment requires the LHS to be an identifier.
 683     VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
 684     if (!LHSE)
 685       return ErrorV("destination of '=' must be a variable");
 686 </pre>
 687 </div>
 688
 689 <p>Unlike the rest of the binary operators, our assignment operator doesn't
 690 follow the "emit LHS, emit RHS, do computation" model.  As such, it is handled
 691 as a special case before the other binary operators are handled.  The other
 692 strange thing is that it requires the LHS to be a variable.  It is invalid to
 693 have "(x+1) = expr" - only things like "x = expr" are allowed.
 694 </p>
 695
 696 <div class="doc_code">
 697 <pre>
 698     // Codegen the RHS.
 699     Value *Val = RHS-&gt;Codegen();
 700     if (Val == 0) return 0;
 701
 702     // Look up the name.
 703     Value *Variable = NamedValues[LHSE-&gt;getName()];
 704     if (Variable == 0) return ErrorV("Unknown variable name");
 705
 706     Builder.CreateStore(Val, Variable);
 707     return Val;
 708   }
 709   ...
 710 </pre>
 711 </div>
 712
 713 <p>Once we have the variable, codegen'ing the assignment is straightforward:
 714 we emit the RHS of the assignment, create a store, and return the computed
 715 value.  Returning a value allows for chained assignments like "X = (Y = Z)".</p>
 716
 717 <p>Now that we have an assignment operator, we can mutate loop variables and
 718 arguments.  For example, we can now run code like this:</p>
 719
 720 <div class="doc_code">
 721 <pre>
 722 # Function to print a double.
 723 extern printd(x);
 724
 725 # Define ':' for sequencing: as a low-precedence operator that ignores operands
 726 # and just returns the RHS.
 727 def binary : 1 (x y) y;
 728
 729 def test(x)
 730   printd(x) :
 731   x = 4 :
 732   printd(x);
 733
 734 test(123);
 735 </pre>
 736 </div>
 737
 738 <p>When run, this example prints "123" and then "4", showing that we did
 739 actually mutate the value!  Okay, we have now officially implemented our goal:
 740 getting this to work requires SSA construction in the general case.  However,
 741 to be really useful, we want the ability to define our own local variables, lets
 742 add this next!
 743 </p>
 744
 745 </div>
 746
 747 <!-- *********************************************************************** -->
 748 <div class="doc_section"><a name="localvars">User-defined Local
 749 Variables</a></div>
 750 <!-- *********************************************************************** -->
 751
 752 <div class="doc_text">
 753
 754 <p>Adding var/in is just like any other other extensions we made to
 755 Kaleidoscope: we extend the lexer, the parser, the AST and the code generator.
 756 The first step for adding our new 'var/in' construct is to extend the lexer.
 757 As before, this is pretty trivial, the code looks like this:</p>
 758
 759 <div class="doc_code">
 760 <pre>
 761 enum Token {
 762   ...
 763   <b>// var definition
 764   tok_var = -13</b>
 765 ...
 766 }
 767 ...
 768 static int gettok() {
 769 ...
 770     if (IdentifierStr == "in") return tok_in;
 771     if (IdentifierStr == "binary") return tok_binary;
 772     if (IdentifierStr == "unary") return tok_unary;
 773     <b>if (IdentifierStr == "var") return tok_var;</b>
 774     return tok_identifier;
 775 ...
 776 </pre>
 777 </div>
 778
 779 <p>The next step is to define the AST node that we will construct.  For var/in,
 780 it looks like this:</p>
 781
 782 <div class="doc_code">
 783 <pre>
 784 /// VarExprAST - Expression class for var/in
 785 class VarExprAST : public ExprAST {
 786   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
 787   ExprAST *Body;
 788 public:
 789   VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
 790              ExprAST *body)
 791   : VarNames(varnames), Body(body) {}
 792
 793   virtual Value *Codegen();
 794 };
 795 </pre>
 796 </div>
 797
 798 <p>var/in allows a list of names to be defined all at once, and each name can
 799 optionally have an initializer value.  As such, we capture this information in
 800 the VarNames vector.  Also, var/in has a body, this body is allowed to access
 801 the variables defined by the var/in.</p>
 802
 803 <p>With this in place, we can define the parser pieces.  The first thing we do is add
 804 it as a primary expression:</p>
 805
 806 <div class="doc_code">
 807 <pre>
 808 /// primary
 809 ///   ::= identifierexpr
 810 ///   ::= numberexpr
 811 ///   ::= parenexpr
 812 ///   ::= ifexpr
 813 ///   ::= forexpr
 814 <b>///   ::= varexpr</b>
 815 static ExprAST *ParsePrimary() {
 816   switch (CurTok) {
 817   default: return Error("unknown token when expecting an expression");
 818   case tok_identifier: return ParseIdentifierExpr();
 819   case tok_number:     return ParseNumberExpr();
 820   case '(':            return ParseParenExpr();
 821   case tok_if:         return ParseIfExpr();
 822   case tok_for:        return ParseForExpr();
 823   <b>case tok_var:        return ParseVarExpr();</b>
 824   }
 825 }
 826 </pre>
 827 </div>
 828
 829 <p>Next we define ParseVarExpr:</p>
 830
 831 <div class="doc_code">
 832 <pre>
 833 /// varexpr ::= 'var' identifier ('=' expression)?
 834 //                    (',' identifier ('=' expression)?)* 'in' expression
 835 static ExprAST *ParseVarExpr() {
 836   getNextToken();  // eat the var.
 837
 838   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
 839
 840   // At least one variable name is required.
 841   if (CurTok != tok_identifier)
 842     return Error("expected identifier after var");
 843 </pre>
 844 </div>
 845
 846 <p>The first part of this code parses the list of identifier/expr pairs into the
 847 local <tt>VarNames</tt> vector.
 848
 849 <div class="doc_code">
 850 <pre>
 851   while (1) {
 852     std::string Name = IdentifierStr;
 853     getNextToken();  // eat identifier.
 854
 855     // Read the optional initializer.
 856     ExprAST *Init = 0;
 857     if (CurTok == '=') {
 858       getNextToken(); // eat the '='.
 859
 860       Init = ParseExpression();
 861       if (Init == 0) return 0;
 862     }
 863
 864     VarNames.push_back(std::make_pair(Name, Init));
 865
 866     // End of var list, exit loop.
 867     if (CurTok != ',') break;
 868     getNextToken(); // eat the ','.
 869
 870     if (CurTok != tok_identifier)
 871       return Error("expected identifier list after var");
 872   }
 873 </pre>
 874 </div>
 875
 876 <p>Once all the variables are parsed, we then parse the body and create the
 877 AST node:</p>
 878
 879 <div class="doc_code">
 880 <pre>
 881   // At this point, we have to have 'in'.
 882   if (CurTok != tok_in)
 883     return Error("expected 'in' keyword after 'var'");
 884   getNextToken();  // eat 'in'.
 885
 886   ExprAST *Body = ParseExpression();
 887   if (Body == 0) return 0;
 888
 889   return new VarExprAST(VarNames, Body);
 890 }
 891 </pre>
 892 </div>
 893
 894 <p>Now that we can parse and represent the code, we need to support emission of
 895 LLVM IR for it.  This code starts out with:</p>
 896
 897 <div class="doc_code">
 898 <pre>
 899 Value *VarExprAST::Codegen() {
 900   std::vector&lt;AllocaInst *&gt; OldBindings;
 901
 902   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
 903
 904   // Register all variables and emit their initializer.
 905   for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
 906     const std::string &amp;VarName = VarNames[i].first;
 907     ExprAST *Init = VarNames[i].second;
 908 </pre>
 909 </div>
 910
 911 <p>Basically it loops over all the variables, installing them one at a time.
 912 For each variable we put into the symbol table, we remember the previous value
 913 that we replace in OldBindings.</p>
 914
 915 <div class="doc_code">
 916 <pre>
 917     // Emit the initializer before adding the variable to scope, this prevents
 918     // the initializer from referencing the variable itself, and permits stuff
 919     // like this:
 920     //  var a = 1 in
 921     //    var a = a in ...   # refers to outer 'a'.
 922     Value *InitVal;
 923     if (Init) {
 924       InitVal = Init-&gt;Codegen();
 925       if (InitVal == 0) return 0;
 926     } else { // If not specified, use 0.0.
 927       InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
 928     }
 929
 930     AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
 931     Builder.CreateStore(InitVal, Alloca);
 932
 933     // Remember the old variable binding so that we can restore the binding when
 934     // we unrecurse.
 935     OldBindings.push_back(NamedValues[VarName]);
 936
 937     // Remember this binding.
 938     NamedValues[VarName] = Alloca;
 939   }
 940 </pre>
 941 </div>
 942
 943 <p>There are more comments here than code.  The basic idea is that we emit the
 944 initializer, create the alloca, then update the symbol table to point to it.
 945 Once all the variables are installed in the symbol table, we evaluate the body
 946 of the var/in expression:</p>
 947
 948 <div class="doc_code">
 949 <pre>
 950   // Codegen the body, now that all vars are in scope.
 951   Value *BodyVal = Body-&gt;Codegen();
 952   if (BodyVal == 0) return 0;
 953 </pre>
 954 </div>
 955
 956 <p>Finally, before returning, we restore the previous variable bindings:</p>
 957
 958 <div class="doc_code">
 959 <pre>
 960   // Pop all our variables from scope.
 961   for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
 962     NamedValues[VarNames[i].first] = OldBindings[i];
 963
 964   // Return the body computation.
 965   return BodyVal;
 966 }
 967 </pre>
 968 </div>
 969
 970 <p>The end result of all of this is that we get properly scoped variable
 971 definitions, and we even (trivially) allow mutation of them :).</p>
 972
 973 <p>With this, we completed what we set out to do.  Our nice iterative fib
 974 example from the intro compiles and runs just fine.  The mem2reg pass optimizes
 975 all of our stack variables into SSA registers, inserting PHI nodes where needed,
 976 and our front-end remains simple: no "iterated dominance frontier" computation
 977 anywhere in sight.</p>
 978
 979 </div>
 980
 981 <!-- *********************************************************************** -->
 982 <div class="doc_section"><a name="code">Full Code Listing</a></div>
 983 <!-- *********************************************************************** -->
 984
 985 <div class="doc_text">
 986
 987 <p>
 988 Here is the complete code listing for our running example, enhanced with mutable
 989 variables and var/in support.  To build this example, use:
 990 </p>
 991
 992 <div class="doc_code">
 993 <pre>
 994    # Compile
 995    g++ -g toy.cpp `llvm-config --cppflags --ldflags --libs core jit native` -O3 -o toy
 996    # Run
 997    ./toy
 998 </pre>
 999 </div>
1000
1001 <p>Here is the code:</p>
1002
1003 <div class="doc_code">
1004 <pre>
1005 #include "llvm/DerivedTypes.h"
1006 #include "llvm/ExecutionEngine/ExecutionEngine.h"
1007 #include "llvm/ExecutionEngine/JIT.h"
1008 #include "llvm/LLVMContext.h"
1009 #include "llvm/Module.h"
1010 #include "llvm/PassManager.h"
1011 #include "llvm/Analysis/Verifier.h"
1012 #include "llvm/Target/TargetData.h"
1013 #include "llvm/Target/TargetSelect.h"
1014 #include "llvm/Transforms/Scalar.h"
1015 #include "llvm/Support/IRBuilder.h"
1016 #include &lt;cstdio&gt;
1017 #include &lt;string&gt;
1018 #include &lt;map&gt;
1019 #include &lt;vector&gt;
1020 using namespace llvm;
1021
1022 //===----------------------------------------------------------------------===//
1023 // Lexer
1024 //===----------------------------------------------------------------------===//
1025
1026 // The lexer returns tokens [0-255] if it is an unknown character, otherwise one
1027 // of these for known things.
1028 enum Token {
1029   tok_eof = -1,
1030
1031   // commands
1032   tok_def = -2, tok_extern = -3,
1033
1034   // primary
1035   tok_identifier = -4, tok_number = -5,
1036
1037   // control
1038   tok_if = -6, tok_then = -7, tok_else = -8,
1039   tok_for = -9, tok_in = -10,
1040
1041   // operators
1042   tok_binary = -11, tok_unary = -12,
1043
1044   // var definition
1045   tok_var = -13
1046 };
1047
1048 static std::string IdentifierStr;  // Filled in if tok_identifier
1049 static double NumVal;              // Filled in if tok_number
1050
1051 /// gettok - Return the next token from standard input.
1052 static int gettok() {
1053   static int LastChar = ' ';
1054
1055   // Skip any whitespace.
1056   while (isspace(LastChar))
1057     LastChar = getchar();
1058
1059   if (isalpha(LastChar)) { // identifier: [a-zA-Z][a-zA-Z0-9]*
1060     IdentifierStr = LastChar;
1061     while (isalnum((LastChar = getchar())))
1062       IdentifierStr += LastChar;
1063
1064     if (IdentifierStr == "def") return tok_def;
1065     if (IdentifierStr == "extern") return tok_extern;
1066     if (IdentifierStr == "if") return tok_if;
1067     if (IdentifierStr == "then") return tok_then;
1068     if (IdentifierStr == "else") return tok_else;
1069     if (IdentifierStr == "for") return tok_for;
1070     if (IdentifierStr == "in") return tok_in;
1071     if (IdentifierStr == "binary") return tok_binary;
1072     if (IdentifierStr == "unary") return tok_unary;
1073     if (IdentifierStr == "var") return tok_var;
1074     return tok_identifier;
1075   }
1076
1077   if (isdigit(LastChar) || LastChar == '.') {   // Number: [0-9.]+
1078     std::string NumStr;
1079     do {
1080       NumStr += LastChar;
1081       LastChar = getchar();
1082     } while (isdigit(LastChar) || LastChar == '.');
1083
1084     NumVal = strtod(NumStr.c_str(), 0);
1085     return tok_number;
1086   }
1087
1088   if (LastChar == '#') {
1089     // Comment until end of line.
1090     do LastChar = getchar();
1091     while (LastChar != EOF &amp;&amp; LastChar != '\n' &amp;&amp; LastChar != '\r');
1092
1093     if (LastChar != EOF)
1094       return gettok();
1095   }
1096
1097   // Check for end of file.  Don't eat the EOF.
1098   if (LastChar == EOF)
1099     return tok_eof;
1100
1101   // Otherwise, just return the character as its ascii value.
1102   int ThisChar = LastChar;
1103   LastChar = getchar();
1104   return ThisChar;
1105 }
1106
1107 //===----------------------------------------------------------------------===//
1108 // Abstract Syntax Tree (aka Parse Tree)
1109 //===----------------------------------------------------------------------===//
1110
1111 /// ExprAST - Base class for all expression nodes.
1112 class ExprAST {
1113 public:
1114   virtual ~ExprAST() {}
1115   virtual Value *Codegen() = 0;
1116 };
1117
1118 /// NumberExprAST - Expression class for numeric literals like "1.0".
1119 class NumberExprAST : public ExprAST {
1120   double Val;
1121 public:
1122   NumberExprAST(double val) : Val(val) {}
1123   virtual Value *Codegen();
1124 };
1125
1126 /// VariableExprAST - Expression class for referencing a variable, like "a".
1127 class VariableExprAST : public ExprAST {
1128   std::string Name;
1129 public:
1130   VariableExprAST(const std::string &amp;name) : Name(name) {}
1131   const std::string &amp;getName() const { return Name; }
1132   virtual Value *Codegen();
1133 };
1134
1135 /// UnaryExprAST - Expression class for a unary operator.
1136 class UnaryExprAST : public ExprAST {
1137   char Opcode;
1138   ExprAST *Operand;
1139 public:
1140   UnaryExprAST(char opcode, ExprAST *operand)
1141     : Opcode(opcode), Operand(operand) {}
1142   virtual Value *Codegen();
1143 };
1144
1145 /// BinaryExprAST - Expression class for a binary operator.
1146 class BinaryExprAST : public ExprAST {
1147   char Op;
1148   ExprAST *LHS, *RHS;
1149 public:
1150   BinaryExprAST(char op, ExprAST *lhs, ExprAST *rhs)
1151     : Op(op), LHS(lhs), RHS(rhs) {}
1152   virtual Value *Codegen();
1153 };
1154
1155 /// CallExprAST - Expression class for function calls.
1156 class CallExprAST : public ExprAST {
1157   std::string Callee;
1158   std::vector&lt;ExprAST*&gt; Args;
1159 public:
1160   CallExprAST(const std::string &amp;callee, std::vector&lt;ExprAST*&gt; &amp;args)
1161     : Callee(callee), Args(args) {}
1162   virtual Value *Codegen();
1163 };
1164
1165 /// IfExprAST - Expression class for if/then/else.
1166 class IfExprAST : public ExprAST {
1167   ExprAST *Cond, *Then, *Else;
1168 public:
1169   IfExprAST(ExprAST *cond, ExprAST *then, ExprAST *_else)
1170   : Cond(cond), Then(then), Else(_else) {}
1171   virtual Value *Codegen();
1172 };
1173
1174 /// ForExprAST - Expression class for for/in.
1175 class ForExprAST : public ExprAST {
1176   std::string VarName;
1177   ExprAST *Start, *End, *Step, *Body;
1178 public:
1179   ForExprAST(const std::string &amp;varname, ExprAST *start, ExprAST *end,
1180              ExprAST *step, ExprAST *body)
1181     : VarName(varname), Start(start), End(end), Step(step), Body(body) {}
1182   virtual Value *Codegen();
1183 };
1184
1185 /// VarExprAST - Expression class for var/in
1186 class VarExprAST : public ExprAST {
1187   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1188   ExprAST *Body;
1189 public:
1190   VarExprAST(const std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; &amp;varnames,
1191              ExprAST *body)
1192   : VarNames(varnames), Body(body) {}
1193
1194   virtual Value *Codegen();
1195 };
1196
1197 /// PrototypeAST - This class represents the "prototype" for a function,
1198 /// which captures its name, and its argument names (thus implicitly the number
1199 /// of arguments the function takes), as well as if it is an operator.
1200 class PrototypeAST {
1201   std::string Name;
1202   std::vector&lt;std::string&gt; Args;
1203   bool isOperator;
1204   unsigned Precedence;  // Precedence if a binary op.
1205 public:
1206   PrototypeAST(const std::string &amp;name, const std::vector&lt;std::string&gt; &amp;args,
1207                bool isoperator = false, unsigned prec = 0)
1208   : Name(name), Args(args), isOperator(isoperator), Precedence(prec) {}
1209
1210   bool isUnaryOp() const { return isOperator &amp;&amp; Args.size() == 1; }
1211   bool isBinaryOp() const { return isOperator &amp;&amp; Args.size() == 2; }
1212
1213   char getOperatorName() const {
1214     assert(isUnaryOp() || isBinaryOp());
1215     return Name[Name.size()-1];
1216   }
1217
1218   unsigned getBinaryPrecedence() const { return Precedence; }
1219
1220   Function *Codegen();
1221
1222   void CreateArgumentAllocas(Function *F);
1223 };
1224
1225 /// FunctionAST - This class represents a function definition itself.
1226 class FunctionAST {
1227   PrototypeAST *Proto;
1228   ExprAST *Body;
1229 public:
1230   FunctionAST(PrototypeAST *proto, ExprAST *body)
1231     : Proto(proto), Body(body) {}
1232
1233   Function *Codegen();
1234 };
1235
1236 //===----------------------------------------------------------------------===//
1237 // Parser
1238 //===----------------------------------------------------------------------===//
1239
1240 /// CurTok/getNextToken - Provide a simple token buffer.  CurTok is the current
1241 /// token the parser is looking at.  getNextToken reads another token from the
1242 /// lexer and updates CurTok with its results.
1243 static int CurTok;
1244 static int getNextToken() {
1245   return CurTok = gettok();
1246 }
1247
1248 /// BinopPrecedence - This holds the precedence for each binary operator that is
1249 /// defined.
1250 static std::map&lt;char, int&gt; BinopPrecedence;
1251
1252 /// GetTokPrecedence - Get the precedence of the pending binary operator token.
1253 static int GetTokPrecedence() {
1254   if (!isascii(CurTok))
1255     return -1;
1256
1257   // Make sure it's a declared binop.
1258   int TokPrec = BinopPrecedence[CurTok];
1259   if (TokPrec &lt;= 0) return -1;
1260   return TokPrec;
1261 }
1262
1263 /// Error* - These are little helper functions for error handling.
1264 ExprAST *Error(const char *Str) { fprintf(stderr, "Error: %s\n", Str);return 0;}
1265 PrototypeAST *ErrorP(const char *Str) { Error(Str); return 0; }
1266 FunctionAST *ErrorF(const char *Str) { Error(Str); return 0; }
1267
1268 static ExprAST *ParseExpression();
1269
1270 /// identifierexpr
1271 ///   ::= identifier
1272 ///   ::= identifier '(' expression* ')'
1273 static ExprAST *ParseIdentifierExpr() {
1274   std::string IdName = IdentifierStr;
1275
1276   getNextToken();  // eat identifier.
1277
1278   if (CurTok != '(') // Simple variable ref.
1279     return new VariableExprAST(IdName);
1280
1281   // Call.
1282   getNextToken();  // eat (
1283   std::vector&lt;ExprAST*&gt; Args;
1284   if (CurTok != ')') {
1285     while (1) {
1286       ExprAST *Arg = ParseExpression();
1287       if (!Arg) return 0;
1288       Args.push_back(Arg);
1289
1290       if (CurTok == ')') break;
1291
1292       if (CurTok != ',')
1293         return Error("Expected ')' or ',' in argument list");
1294       getNextToken();
1295     }
1296   }
1297
1298   // Eat the ')'.
1299   getNextToken();
1300
1301   return new CallExprAST(IdName, Args);
1302 }
1303
1304 /// numberexpr ::= number
1305 static ExprAST *ParseNumberExpr() {
1306   ExprAST *Result = new NumberExprAST(NumVal);
1307   getNextToken(); // consume the number
1308   return Result;
1309 }
1310
1311 /// parenexpr ::= '(' expression ')'
1312 static ExprAST *ParseParenExpr() {
1313   getNextToken();  // eat (.
1314   ExprAST *V = ParseExpression();
1315   if (!V) return 0;
1316
1317   if (CurTok != ')')
1318     return Error("expected ')'");
1319   getNextToken();  // eat ).
1320   return V;
1321 }
1322
1323 /// ifexpr ::= 'if' expression 'then' expression 'else' expression
1324 static ExprAST *ParseIfExpr() {
1325   getNextToken();  // eat the if.
1326
1327   // condition.
1328   ExprAST *Cond = ParseExpression();
1329   if (!Cond) return 0;
1330
1331   if (CurTok != tok_then)
1332     return Error("expected then");
1333   getNextToken();  // eat the then
1334
1335   ExprAST *Then = ParseExpression();
1336   if (Then == 0) return 0;
1337
1338   if (CurTok != tok_else)
1339     return Error("expected else");
1340
1341   getNextToken();
1342
1343   ExprAST *Else = ParseExpression();
1344   if (!Else) return 0;
1345
1346   return new IfExprAST(Cond, Then, Else);
1347 }
1348
1349 /// forexpr ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression
1350 static ExprAST *ParseForExpr() {
1351   getNextToken();  // eat the for.
1352
1353   if (CurTok != tok_identifier)
1354     return Error("expected identifier after for");
1355
1356   std::string IdName = IdentifierStr;
1357   getNextToken();  // eat identifier.
1358
1359   if (CurTok != '=')
1360     return Error("expected '=' after for");
1361   getNextToken();  // eat '='.
1362
1363
1364   ExprAST *Start = ParseExpression();
1365   if (Start == 0) return 0;
1366   if (CurTok != ',')
1367     return Error("expected ',' after for start value");
1368   getNextToken();
1369
1370   ExprAST *End = ParseExpression();
1371   if (End == 0) return 0;
1372
1373   // The step value is optional.
1374   ExprAST *Step = 0;
1375   if (CurTok == ',') {
1376     getNextToken();
1377     Step = ParseExpression();
1378     if (Step == 0) return 0;
1379   }
1380
1381   if (CurTok != tok_in)
1382     return Error("expected 'in' after for");
1383   getNextToken();  // eat 'in'.
1384
1385   ExprAST *Body = ParseExpression();
1386   if (Body == 0) return 0;
1387
1388   return new ForExprAST(IdName, Start, End, Step, Body);
1389 }
1390
1391 /// varexpr ::= 'var' identifier ('=' expression)?
1392 //                    (',' identifier ('=' expression)?)* 'in' expression
1393 static ExprAST *ParseVarExpr() {
1394   getNextToken();  // eat the var.
1395
1396   std::vector&lt;std::pair&lt;std::string, ExprAST*&gt; &gt; VarNames;
1397
1398   // At least one variable name is required.
1399   if (CurTok != tok_identifier)
1400     return Error("expected identifier after var");
1401
1402   while (1) {
1403     std::string Name = IdentifierStr;
1404     getNextToken();  // eat identifier.
1405
1406     // Read the optional initializer.
1407     ExprAST *Init = 0;
1408     if (CurTok == '=') {
1409       getNextToken(); // eat the '='.
1410
1411       Init = ParseExpression();
1412       if (Init == 0) return 0;
1413     }
1414
1415     VarNames.push_back(std::make_pair(Name, Init));
1416
1417     // End of var list, exit loop.
1418     if (CurTok != ',') break;
1419     getNextToken(); // eat the ','.
1420
1421     if (CurTok != tok_identifier)
1422       return Error("expected identifier list after var");
1423   }
1424
1425   // At this point, we have to have 'in'.
1426   if (CurTok != tok_in)
1427     return Error("expected 'in' keyword after 'var'");
1428   getNextToken();  // eat 'in'.
1429
1430   ExprAST *Body = ParseExpression();
1431   if (Body == 0) return 0;
1432
1433   return new VarExprAST(VarNames, Body);
1434 }
1435
1436 /// primary
1437 ///   ::= identifierexpr
1438 ///   ::= numberexpr
1439 ///   ::= parenexpr
1440 ///   ::= ifexpr
1441 ///   ::= forexpr
1442 ///   ::= varexpr
1443 static ExprAST *ParsePrimary() {
1444   switch (CurTok) {
1445   default: return Error("unknown token when expecting an expression");
1446   case tok_identifier: return ParseIdentifierExpr();
1447   case tok_number:     return ParseNumberExpr();
1448   case '(':            return ParseParenExpr();
1449   case tok_if:         return ParseIfExpr();
1450   case tok_for:        return ParseForExpr();
1451   case tok_var:        return ParseVarExpr();
1452   }
1453 }
1454
1455 /// unary
1456 ///   ::= primary
1457 ///   ::= '!' unary
1458 static ExprAST *ParseUnary() {
1459   // If the current token is not an operator, it must be a primary expr.
1460   if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
1461     return ParsePrimary();
1462
1463   // If this is a unary operator, read it.
1464   int Opc = CurTok;
1465   getNextToken();
1466   if (ExprAST *Operand = ParseUnary())
1467     return new UnaryExprAST(Opc, Operand);
1468   return 0;
1469 }
1470
1471 /// binoprhs
1472 ///   ::= ('+' unary)*
1473 static ExprAST *ParseBinOpRHS(int ExprPrec, ExprAST *LHS) {
1474   // If this is a binop, find its precedence.
1475   while (1) {
1476     int TokPrec = GetTokPrecedence();
1477
1478     // If this is a binop that binds at least as tightly as the current binop,
1479     // consume it, otherwise we are done.
1480     if (TokPrec &lt; ExprPrec)
1481       return LHS;
1482
1483     // Okay, we know this is a binop.
1484     int BinOp = CurTok;
1485     getNextToken();  // eat binop
1486
1487     // Parse the unary expression after the binary operator.
1488     ExprAST *RHS = ParseUnary();
1489     if (!RHS) return 0;
1490
1491     // If BinOp binds less tightly with RHS than the operator after RHS, let
1492     // the pending operator take RHS as its LHS.
1493     int NextPrec = GetTokPrecedence();
1494     if (TokPrec &lt; NextPrec) {
1495       RHS = ParseBinOpRHS(TokPrec+1, RHS);
1496       if (RHS == 0) return 0;
1497     }
1498
1499     // Merge LHS/RHS.
1500     LHS = new BinaryExprAST(BinOp, LHS, RHS);
1501   }
1502 }
1503
1504 /// expression
1505 ///   ::= unary binoprhs
1506 ///
1507 static ExprAST *ParseExpression() {
1508   ExprAST *LHS = ParseUnary();
1509   if (!LHS) return 0;
1510
1511   return ParseBinOpRHS(0, LHS);
1512 }
1513
1514 /// prototype
1515 ///   ::= id '(' id* ')'
1516 ///   ::= binary LETTER number? (id, id)
1517 ///   ::= unary LETTER (id)
1518 static PrototypeAST *ParsePrototype() {
1519   std::string FnName;
1520
1521   unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
1522   unsigned BinaryPrecedence = 30;
1523
1524   switch (CurTok) {
1525   default:
1526     return ErrorP("Expected function name in prototype");
1527   case tok_identifier:
1528     FnName = IdentifierStr;
1529     Kind = 0;
1530     getNextToken();
1531     break;
1532   case tok_unary:
1533     getNextToken();
1534     if (!isascii(CurTok))
1535       return ErrorP("Expected unary operator");
1536     FnName = "unary";
1537     FnName += (char)CurTok;
1538     Kind = 1;
1539     getNextToken();
1540     break;
1541   case tok_binary:
1542     getNextToken();
1543     if (!isascii(CurTok))
1544       return ErrorP("Expected binary operator");
1545     FnName = "binary";
1546     FnName += (char)CurTok;
1547     Kind = 2;
1548     getNextToken();
1549
1550     // Read the precedence if present.
1551     if (CurTok == tok_number) {
1552       if (NumVal &lt; 1 || NumVal &gt; 100)
1553         return ErrorP("Invalid precedecnce: must be 1..100");
1554       BinaryPrecedence = (unsigned)NumVal;
1555       getNextToken();
1556     }
1557     break;
1558   }
1559
1560   if (CurTok != '(')
1561     return ErrorP("Expected '(' in prototype");
1562
1563   std::vector&lt;std::string&gt; ArgNames;
1564   while (getNextToken() == tok_identifier)
1565     ArgNames.push_back(IdentifierStr);
1566   if (CurTok != ')')
1567     return ErrorP("Expected ')' in prototype");
1568
1569   // success.
1570   getNextToken();  // eat ')'.
1571
1572   // Verify right number of names for operator.
1573   if (Kind &amp;&amp; ArgNames.size() != Kind)
1574     return ErrorP("Invalid number of operands for operator");
1575
1576   return new PrototypeAST(FnName, ArgNames, Kind != 0, BinaryPrecedence);
1577 }
1578
1579 /// definition ::= 'def' prototype expression
1580 static FunctionAST *ParseDefinition() {
1581   getNextToken();  // eat def.
1582   PrototypeAST *Proto = ParsePrototype();
1583   if (Proto == 0) return 0;
1584
1585   if (ExprAST *E = ParseExpression())
1586     return new FunctionAST(Proto, E);
1587   return 0;
1588 }
1589
1590 /// toplevelexpr ::= expression
1591 static FunctionAST *ParseTopLevelExpr() {
1592   if (ExprAST *E = ParseExpression()) {
1593     // Make an anonymous proto.
1594     PrototypeAST *Proto = new PrototypeAST("", std::vector&lt;std::string&gt;());
1595     return new FunctionAST(Proto, E);
1596   }
1597   return 0;
1598 }
1599
1600 /// external ::= 'extern' prototype
1601 static PrototypeAST *ParseExtern() {
1602   getNextToken();  // eat extern.
1603   return ParsePrototype();
1604 }
1605
1606 //===----------------------------------------------------------------------===//
1607 // Code Generation
1608 //===----------------------------------------------------------------------===//
1609
1610 static Module *TheModule;
1611 static IRBuilder&lt;&gt; Builder(getGlobalContext());
1612 static std::map&lt;std::string, AllocaInst*&gt; NamedValues;
1613 static FunctionPassManager *TheFPM;
1614
1615 Value *ErrorV(const char *Str) { Error(Str); return 0; }
1616
1617 /// CreateEntryBlockAlloca - Create an alloca instruction in the entry block of
1618 /// the function.  This is used for mutable variables etc.
1619 static AllocaInst *CreateEntryBlockAlloca(Function *TheFunction,
1620                                           const std::string &amp;VarName) {
1621   IRBuilder&lt;&gt; TmpB(&amp;TheFunction-&gt;getEntryBlock(),
1622                  TheFunction-&gt;getEntryBlock().begin());
1623   return TmpB.CreateAlloca(Type::getDoubleTy(getGlobalContext()), 0,
1624                            VarName.c_str());
1625 }
1626
1627 Value *NumberExprAST::Codegen() {
1628   return ConstantFP::get(getGlobalContext(), APFloat(Val));
1629 }
1630
1631 Value *VariableExprAST::Codegen() {
1632   // Look this variable up in the function.
1633   Value *V = NamedValues[Name];
1634   if (V == 0) return ErrorV("Unknown variable name");
1635
1636   // Load the value.
1637   return Builder.CreateLoad(V, Name.c_str());
1638 }
1639
1640 Value *UnaryExprAST::Codegen() {
1641   Value *OperandV = Operand-&gt;Codegen();
1642   if (OperandV == 0) return 0;
1643
1644   Function *F = TheModule-&gt;getFunction(std::string("unary")+Opcode);
1645   if (F == 0)
1646     return ErrorV("Unknown unary operator");
1647
1648   return Builder.CreateCall(F, OperandV, "unop");
1649 }
1650
1651 Value *BinaryExprAST::Codegen() {
1652   // Special case '=' because we don't want to emit the LHS as an expression.
1653   if (Op == '=') {
1654     // Assignment requires the LHS to be an identifier.
1655     VariableExprAST *LHSE = dynamic_cast&lt;VariableExprAST*&gt;(LHS);
1656     if (!LHSE)
1657       return ErrorV("destination of '=' must be a variable");
1658     // Codegen the RHS.
1659     Value *Val = RHS-&gt;Codegen();
1660     if (Val == 0) return 0;
1661
1662     // Look up the name.
1663     Value *Variable = NamedValues[LHSE-&gt;getName()];
1664     if (Variable == 0) return ErrorV("Unknown variable name");
1665
1666     Builder.CreateStore(Val, Variable);
1667     return Val;
1668   }
1669
1670   Value *L = LHS-&gt;Codegen();
1671   Value *R = RHS-&gt;Codegen();
1672   if (L == 0 || R == 0) return 0;
1673
1674   switch (Op) {
1675   case '+': return Builder.CreateFAdd(L, R, "addtmp");
1676   case '-': return Builder.CreateFSub(L, R, "subtmp");
1677   case '*': return Builder.CreateFMul(L, R, "multmp");
1678   case '&lt;':
1679     L = Builder.CreateFCmpULT(L, R, "cmptmp");
1680     // Convert bool 0/1 to double 0.0 or 1.0
1681     return Builder.CreateUIToFP(L, Type::getDoubleTy(getGlobalContext()),
1682                                 "booltmp");
1683   default: break;
1684   }
1685
1686   // If it wasn't a builtin binary operator, it must be a user defined one. Emit
1687   // a call to it.
1688   Function *F = TheModule-&gt;getFunction(std::string("binary")+Op);
1689   assert(F &amp;&amp; "binary operator not found!");
1690
1691   Value *Ops[] = { L, R };
1692   return Builder.CreateCall(F, Ops, Ops+2, "binop");
1693 }
1694
1695 Value *CallExprAST::Codegen() {
1696   // Look up the name in the global module table.
1697   Function *CalleeF = TheModule-&gt;getFunction(Callee);
1698   if (CalleeF == 0)
1699     return ErrorV("Unknown function referenced");
1700
1701   // If argument mismatch error.
1702   if (CalleeF-&gt;arg_size() != Args.size())
1703     return ErrorV("Incorrect # arguments passed");
1704
1705   std::vector&lt;Value*&gt; ArgsV;
1706   for (unsigned i = 0, e = Args.size(); i != e; ++i) {
1707     ArgsV.push_back(Args[i]-&gt;Codegen());
1708     if (ArgsV.back() == 0) return 0;
1709   }
1710
1711   return Builder.CreateCall(CalleeF, ArgsV.begin(), ArgsV.end(), "calltmp");
1712 }
1713
1714 Value *IfExprAST::Codegen() {
1715   Value *CondV = Cond-&gt;Codegen();
1716   if (CondV == 0) return 0;
1717
1718   // Convert condition to a bool by comparing equal to 0.0.
1719   CondV = Builder.CreateFCmpONE(CondV,
1720                               ConstantFP::get(getGlobalContext(), APFloat(0.0)),
1721                                 "ifcond");
1722
1723   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1724
1725   // Create blocks for the then and else cases.  Insert the 'then' block at the
1726   // end of the function.
1727   BasicBlock *ThenBB = BasicBlock::Create(getGlobalContext(), "then", TheFunction);
1728   BasicBlock *ElseBB = BasicBlock::Create(getGlobalContext(), "else");
1729   BasicBlock *MergeBB = BasicBlock::Create(getGlobalContext(), "ifcont");
1730
1731   Builder.CreateCondBr(CondV, ThenBB, ElseBB);
1732
1733   // Emit then value.
1734   Builder.SetInsertPoint(ThenBB);
1735
1736   Value *ThenV = Then-&gt;Codegen();
1737   if (ThenV == 0) return 0;
1738
1739   Builder.CreateBr(MergeBB);
1740   // Codegen of 'Then' can change the current block, update ThenBB for the PHI.
1741   ThenBB = Builder.GetInsertBlock();
1742
1743   // Emit else block.
1744   TheFunction-&gt;getBasicBlockList().push_back(ElseBB);
1745   Builder.SetInsertPoint(ElseBB);
1746
1747   Value *ElseV = Else-&gt;Codegen();
1748   if (ElseV == 0) return 0;
1749
1750   Builder.CreateBr(MergeBB);
1751   // Codegen of 'Else' can change the current block, update ElseBB for the PHI.
1752   ElseBB = Builder.GetInsertBlock();
1753
1754   // Emit merge block.
1755   TheFunction-&gt;getBasicBlockList().push_back(MergeBB);
1756   Builder.SetInsertPoint(MergeBB);
1757   PHINode *PN = Builder.CreatePHI(Type::getDoubleTy(getGlobalContext()),
1758                                   "iftmp");
1759
1760   PN-&gt;addIncoming(ThenV, ThenBB);
1761   PN-&gt;addIncoming(ElseV, ElseBB);
1762   return PN;
1763 }
1764
1765 Value *ForExprAST::Codegen() {
1766   // Output this as:
1767   //   var = alloca double
1768   //   ...
1769   //   start = startexpr
1770   //   store start -&gt; var
1771   //   goto loop
1772   // loop:
1773   //   ...
1774   //   bodyexpr
1775   //   ...
1776   // loopend:
1777   //   step = stepexpr
1778   //   endcond = endexpr
1779   //
1780   //   curvar = load var
1781   //   nextvar = curvar + step
1782   //   store nextvar -&gt; var
1783   //   br endcond, loop, endloop
1784   // outloop:
1785
1786   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1787
1788   // Create an alloca for the variable in the entry block.
1789   AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1790
1791   // Emit the start code first, without 'variable' in scope.
1792   Value *StartVal = Start-&gt;Codegen();
1793   if (StartVal == 0) return 0;
1794
1795   // Store the value into the alloca.
1796   Builder.CreateStore(StartVal, Alloca);
1797
1798   // Make the new basic block for the loop header, inserting after current
1799   // block.
1800   BasicBlock *LoopBB = BasicBlock::Create(getGlobalContext(), "loop", TheFunction);
1801
1802   // Insert an explicit fall through from the current block to the LoopBB.
1803   Builder.CreateBr(LoopBB);
1804
1805   // Start insertion in LoopBB.
1806   Builder.SetInsertPoint(LoopBB);
1807
1808   // Within the loop, the variable is defined equal to the PHI node.  If it
1809   // shadows an existing variable, we have to restore it, so save it now.
1810   AllocaInst *OldVal = NamedValues[VarName];
1811   NamedValues[VarName] = Alloca;
1812
1813   // Emit the body of the loop.  This, like any other expr, can change the
1814   // current BB.  Note that we ignore the value computed by the body, but don't
1815   // allow an error.
1816   if (Body-&gt;Codegen() == 0)
1817     return 0;
1818
1819   // Emit the step value.
1820   Value *StepVal;
1821   if (Step) {
1822     StepVal = Step-&gt;Codegen();
1823     if (StepVal == 0) return 0;
1824   } else {
1825     // If not specified, use 1.0.
1826     StepVal = ConstantFP::get(getGlobalContext(), APFloat(1.0));
1827   }
1828
1829   // Compute the end condition.
1830   Value *EndCond = End-&gt;Codegen();
1831   if (EndCond == 0) return EndCond;
1832
1833   // Reload, increment, and restore the alloca.  This handles the case where
1834   // the body of the loop mutates the variable.
1835   Value *CurVar = Builder.CreateLoad(Alloca, VarName.c_str());
1836   Value *NextVar = Builder.CreateFAdd(CurVar, StepVal, "nextvar");
1837   Builder.CreateStore(NextVar, Alloca);
1838
1839   // Convert condition to a bool by comparing equal to 0.0.
1840   EndCond = Builder.CreateFCmpONE(EndCond,
1841                               ConstantFP::get(getGlobalContext(), APFloat(0.0)),
1842                                   "loopcond");
1843
1844   // Create the "after loop" block and insert it.
1845   BasicBlock *AfterBB = BasicBlock::Create(getGlobalContext(), "afterloop", TheFunction);
1846
1847   // Insert the conditional branch into the end of LoopEndBB.
1848   Builder.CreateCondBr(EndCond, LoopBB, AfterBB);
1849
1850   // Any new code will be inserted in AfterBB.
1851   Builder.SetInsertPoint(AfterBB);
1852
1853   // Restore the unshadowed variable.
1854   if (OldVal)
1855     NamedValues[VarName] = OldVal;
1856   else
1857     NamedValues.erase(VarName);
1858
1859
1860   // for expr always returns 0.0.
1861   return Constant::getNullValue(Type::getDoubleTy(getGlobalContext()));
1862 }
1863
1864 Value *VarExprAST::Codegen() {
1865   std::vector&lt;AllocaInst *&gt; OldBindings;
1866
1867   Function *TheFunction = Builder.GetInsertBlock()-&gt;getParent();
1868
1869   // Register all variables and emit their initializer.
1870   for (unsigned i = 0, e = VarNames.size(); i != e; ++i) {
1871     const std::string &amp;VarName = VarNames[i].first;
1872     ExprAST *Init = VarNames[i].second;
1873
1874     // Emit the initializer before adding the variable to scope, this prevents
1875     // the initializer from referencing the variable itself, and permits stuff
1876     // like this:
1877     //  var a = 1 in
1878     //    var a = a in ...   # refers to outer 'a'.
1879     Value *InitVal;
1880     if (Init) {
1881       InitVal = Init-&gt;Codegen();
1882       if (InitVal == 0) return 0;
1883     } else { // If not specified, use 0.0.
1884       InitVal = ConstantFP::get(getGlobalContext(), APFloat(0.0));
1885     }
1886
1887     AllocaInst *Alloca = CreateEntryBlockAlloca(TheFunction, VarName);
1888     Builder.CreateStore(InitVal, Alloca);
1889
1890     // Remember the old variable binding so that we can restore the binding when
1891     // we unrecurse.
1892     OldBindings.push_back(NamedValues[VarName]);
1893
1894     // Remember this binding.
1895     NamedValues[VarName] = Alloca;
1896   }
1897
1898   // Codegen the body, now that all vars are in scope.
1899   Value *BodyVal = Body-&gt;Codegen();
1900   if (BodyVal == 0) return 0;
1901
1902   // Pop all our variables from scope.
1903   for (unsigned i = 0, e = VarNames.size(); i != e; ++i)
1904     NamedValues[VarNames[i].first] = OldBindings[i];
1905
1906   // Return the body computation.
1907   return BodyVal;
1908 }
1909
1910 Function *PrototypeAST::Codegen() {
1911   // Make the function type:  double(double,double) etc.
1912   std::vector&lt;const Type*&gt; Doubles(Args.size(),
1913                                    Type::getDoubleTy(getGlobalContext()));
1914   FunctionType *FT = FunctionType::get(Type::getDoubleTy(getGlobalContext()),
1915                                        Doubles, false);
1916
1917   Function *F = Function::Create(FT, Function::ExternalLinkage, Name, TheModule);
1918
1919   // If F conflicted, there was already something named 'Name'.  If it has a
1920   // body, don't allow redefinition or reextern.
1921   if (F-&gt;getName() != Name) {
1922     // Delete the one we just made and get the existing one.
1923     F-&gt;eraseFromParent();
1924     F = TheModule-&gt;getFunction(Name);
1925
1926     // If F already has a body, reject this.
1927     if (!F-&gt;empty()) {
1928       ErrorF("redefinition of function");
1929       return 0;
1930     }
1931
1932     // If F took a different number of args, reject.
1933     if (F-&gt;arg_size() != Args.size()) {
1934       ErrorF("redefinition of function with different # args");
1935       return 0;
1936     }
1937   }
1938
1939   // Set names for all arguments.
1940   unsigned Idx = 0;
1941   for (Function::arg_iterator AI = F-&gt;arg_begin(); Idx != Args.size();
1942        ++AI, ++Idx)
1943     AI-&gt;setName(Args[Idx]);
1944
1945   return F;
1946 }
1947
1948 /// CreateArgumentAllocas - Create an alloca for each argument and register the
1949 /// argument in the symbol table so that references to it will succeed.
1950 void PrototypeAST::CreateArgumentAllocas(Function *F) {
1951   Function::arg_iterator AI = F-&gt;arg_begin();
1952   for (unsigned Idx = 0, e = Args.size(); Idx != e; ++Idx, ++AI) {
1953     // Create an alloca for this variable.
1954     AllocaInst *Alloca = CreateEntryBlockAlloca(F, Args[Idx]);
1955
1956     // Store the initial value into the alloca.
1957     Builder.CreateStore(AI, Alloca);
1958
1959     // Add arguments to variable symbol table.
1960     NamedValues[Args[Idx]] = Alloca;
1961   }
1962 }
1963
1964 Function *FunctionAST::Codegen() {
1965   NamedValues.clear();
1966
1967   Function *TheFunction = Proto-&gt;Codegen();
1968   if (TheFunction == 0)
1969     return 0;
1970
1971   // If this is an operator, install it.
1972   if (Proto-&gt;isBinaryOp())
1973     BinopPrecedence[Proto-&gt;getOperatorName()] = Proto-&gt;getBinaryPrecedence();
1974
1975   // Create a new basic block to start insertion into.
1976   BasicBlock *BB = BasicBlock::Create(getGlobalContext(), "entry", TheFunction);
1977   Builder.SetInsertPoint(BB);
1978
1979   // Add all arguments to the symbol table and create their allocas.
1980   Proto-&gt;CreateArgumentAllocas(TheFunction);
1981
1982   if (Value *RetVal = Body-&gt;Codegen()) {
1983     // Finish off the function.
1984     Builder.CreateRet(RetVal);
1985
1986     // Validate the generated code, checking for consistency.
1987     verifyFunction(*TheFunction);
1988
1989     // Optimize the function.
1990     TheFPM-&gt;run(*TheFunction);
1991
1992     return TheFunction;
1993   }
1994
1995   // Error reading body, remove function.
1996   TheFunction-&gt;eraseFromParent();
1997
1998   if (Proto-&gt;isBinaryOp())
1999     BinopPrecedence.erase(Proto-&gt;getOperatorName());
2000   return 0;
2001 }
2002
2003 //===----------------------------------------------------------------------===//
2004 // Top-Level parsing and JIT Driver
2005 //===----------------------------------------------------------------------===//
2006
2007 static ExecutionEngine *TheExecutionEngine;
2008
2009 static void HandleDefinition() {
2010   if (FunctionAST *F = ParseDefinition()) {
2011     if (Function *LF = F-&gt;Codegen()) {
2012       fprintf(stderr, "Read function definition:");
2013       LF-&gt;dump();
2014     }
2015   } else {
2016     // Skip token for error recovery.
2017     getNextToken();
2018   }
2019 }
2020
2021 static void HandleExtern() {
2022   if (PrototypeAST *P = ParseExtern()) {
2023     if (Function *F = P-&gt;Codegen()) {
2024       fprintf(stderr, "Read extern: ");
2025       F-&gt;dump();
2026     }
2027   } else {
2028     // Skip token for error recovery.
2029     getNextToken();
2030   }
2031 }
2032
2033 static void HandleTopLevelExpression() {
2034   // Evaluate a top-level expression into an anonymous function.
2035   if (FunctionAST *F = ParseTopLevelExpr()) {
2036     if (Function *LF = F-&gt;Codegen()) {
2037       // JIT the function, returning a function pointer.
2038       void *FPtr = TheExecutionEngine-&gt;getPointerToFunction(LF);
2039
2040       // Cast it to the right type (takes no arguments, returns a double) so we
2041       // can call it as a native function.
2042       double (*FP)() = (double (*)())(intptr_t)FPtr;
2043       fprintf(stderr, "Evaluated to %f\n", FP());
2044     }
2045   } else {
2046     // Skip token for error recovery.
2047     getNextToken();
2048   }
2049 }
2050
2051 /// top ::= definition | external | expression | ';'
2052 static void MainLoop() {
2053   while (1) {
2054     fprintf(stderr, "ready&gt; ");
2055     switch (CurTok) {
2056     case tok_eof:    return;
2057     case ';':        getNextToken(); break;  // ignore top-level semicolons.
2058     case tok_def:    HandleDefinition(); break;
2059     case tok_extern: HandleExtern(); break;
2060     default:         HandleTopLevelExpression(); break;
2061     }
2062   }
2063 }
2064
2065 //===----------------------------------------------------------------------===//
2066 // "Library" functions that can be "extern'd" from user code.
2067 //===----------------------------------------------------------------------===//
2068
2069 /// putchard - putchar that takes a double and returns 0.
2070 extern "C"
2071 double putchard(double X) {
2072   putchar((char)X);
2073   return 0;
2074 }
2075
2076 /// printd - printf that takes a double prints it as "%f\n", returning 0.
2077 extern "C"
2078 double printd(double X) {
2079   printf("%f\n", X);
2080   return 0;
2081 }
2082
2083 //===----------------------------------------------------------------------===//
2084 // Main driver code.
2085 //===----------------------------------------------------------------------===//
2086
2087 int main() {
2088   InitializeNativeTarget();
2089   LLVMContext &amp;Context = getGlobalContext();
2090
2091   // Install standard binary operators.
2092   // 1 is lowest precedence.
2093   BinopPrecedence['='] = 2;
2094   BinopPrecedence['&lt;'] = 10;
2095   BinopPrecedence['+'] = 20;
2096   BinopPrecedence['-'] = 20;
2097   BinopPrecedence['*'] = 40;  // highest.
2098
2099   // Prime the first token.
2100   fprintf(stderr, "ready&gt; ");
2101   getNextToken();
2102
2103   // Make the module, which holds all the code.
2104   TheModule = new Module("my cool jit", Context);
2105
2106   // Create the JIT.  This takes ownership of the module.
2107   std::string ErrStr;
2108   TheExecutionEngine = EngineBuilder(TheModule).setErrorStr(&amp;ErrStr).create();
2109   if (!TheExecutionEngine) {
2110     fprintf(stderr, "Could not create ExecutionEngine: %s\n", ErrStr.c_str());
2111     exit(1);
2112   }
2113
2114   FunctionPassManager OurFPM(TheModule);
2115
2116   // Set up the optimizer pipeline.  Start with registering info about how the
2117   // target lays out data structures.
2118   OurFPM.add(new TargetData(*TheExecutionEngine-&gt;getTargetData()));
2119   // Promote allocas to registers.
2120   OurFPM.add(createPromoteMemoryToRegisterPass());
2121   // Do simple "peephole" optimizations and bit-twiddling optzns.
2122   OurFPM.add(createInstructionCombiningPass());
2123   // Reassociate expressions.
2124   OurFPM.add(createReassociatePass());
2125   // Eliminate Common SubExpressions.
2126   OurFPM.add(createGVNPass());
2127   // Simplify the control flow graph (deleting unreachable blocks, etc).
2128   OurFPM.add(createCFGSimplificationPass());
2129
2130   OurFPM.doInitialization();
2131
2132   // Set the global so the code gen can use this.
2133   TheFPM = &amp;OurFPM;
2134
2135   // Run the main "interpreter loop" now.
2136   MainLoop();
2137
2138   TheFPM = 0;
2139
2140   // Print out all of the generated code.
2141   TheModule-&gt;dump();
2142
2143   return 0;
2144 }
2145 </pre>
2146 </div>
2147
2148 <a href="LangImpl8.html">Next: Conclusion and other useful LLVM tidbits</a>
2149 </div>
2150
2151 <!-- *********************************************************************** -->
2152 <hr>
2153 <address>
2154   <a href="http://jigsaw.w3.org/css-validator/check/referer"><img
2155   src="http://jigsaw.w3.org/css-validator/images/vcss" alt="Valid CSS!"></a>
2156   <a href="http://validator.w3.org/check/referer"><img
2157   src="http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01!"></a>
2158
2159   <a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
2160   <a href="http://llvm.org">The LLVM Compiler Infrastructure</a><br>
2161   Last modified: $Date$
2162 </address>
2163 </body>
2164 </html>