1 ==============================================
2 Kaleidoscope: Adding JIT and Optimizer Support
3 ==============================================
11 Welcome to Chapter 4 of the "`Implementing a language with
12 LLVM <index.html>`_" tutorial. Chapters 1-3 described the implementation
13 of a simple language and added support for generating LLVM IR. This
14 chapter describes two new techniques: adding optimizer support to your
15 language, and adding JIT compiler support. These additions will
16 demonstrate how to get nice, efficient code for the Kaleidoscope
19 Trivial Constant Folding
20 ========================
22 Our demonstration for Chapter 3 is elegant and easy to extend.
23 Unfortunately, it does not produce wonderful code. The IRBuilder,
24 however, does give us obvious optimizations when compiling simple code:
28 ready> def test(x) 1+2+x;
29 Read function definition:
30 define double @test(double %x) {
32 %addtmp = fadd double 3.000000e+00, %x
36 This code is not a literal transcription of the AST built by parsing the
41 ready> def test(x) 1+2+x;
42 Read function definition:
43 define double @test(double %x) {
45 %addtmp = fadd double 2.000000e+00, 1.000000e+00
46 %addtmp1 = fadd double %addtmp, %x
50 Constant folding, as seen above, in particular, is a very common and
51 very important optimization: so much so that many language implementors
52 implement constant folding support in their AST representation.
54 With LLVM, you don't need this support in the AST. Since all calls to
55 build LLVM IR go through the LLVM IR builder, the builder itself checked
56 to see if there was a constant folding opportunity when you call it. If
57 so, it just does the constant fold and return the constant instead of
58 creating an instruction.
60 Well, that was easy :). In practice, we recommend always using
61 ``IRBuilder`` when generating code like this. It has no "syntactic
62 overhead" for its use (you don't have to uglify your compiler with
63 constant checks everywhere) and it can dramatically reduce the amount of
64 LLVM IR that is generated in some cases (particular for languages with a
65 macro preprocessor or that use a lot of constants).
67 On the other hand, the ``IRBuilder`` is limited by the fact that it does
68 all of its analysis inline with the code as it is built. If you take a
69 slightly more complex example:
73 ready> def test(x) (1+2+x)*(x+(1+2));
74 ready> Read function definition:
75 define double @test(double %x) {
77 %addtmp = fadd double 3.000000e+00, %x
78 %addtmp1 = fadd double %x, 3.000000e+00
79 %multmp = fmul double %addtmp, %addtmp1
83 In this case, the LHS and RHS of the multiplication are the same value.
84 We'd really like to see this generate "``tmp = x+3; result = tmp*tmp;``"
85 instead of computing "``x+3``" twice.
87 Unfortunately, no amount of local analysis will be able to detect and
88 correct this. This requires two transformations: reassociation of
89 expressions (to make the add's lexically identical) and Common
90 Subexpression Elimination (CSE) to delete the redundant add instruction.
91 Fortunately, LLVM provides a broad range of optimizations that you can
92 use, in the form of "passes".
94 LLVM Optimization Passes
95 ========================
99 Due to the transition to the new PassManager infrastructure this tutorial
100 is based on ``llvm::legacy::FunctionPassManager`` which can be found in
101 `LegacyPassManager.h <https://llvm.org/doxygen/classllvm_1_1legacy_1_1FunctionPassManager.html>`_.
102 For the purpose of the this tutorial the above should be used until
103 the pass manager transition is complete.
105 LLVM provides many optimization passes, which do many different sorts of
106 things and have different tradeoffs. Unlike other systems, LLVM doesn't
107 hold to the mistaken notion that one set of optimizations is right for
108 all languages and for all situations. LLVM allows a compiler implementor
109 to make complete decisions about what optimizations to use, in which
110 order, and in what situation.
112 As a concrete example, LLVM supports both "whole module" passes, which
113 look across as large of body of code as they can (often a whole file,
114 but if run at link time, this can be a substantial portion of the whole
115 program). It also supports and includes "per-function" passes which just
116 operate on a single function at a time, without looking at other
117 functions. For more information on passes and how they are run, see the
118 `How to Write a Pass <../../WritingAnLLVMPass.html>`_ document and the
119 `List of LLVM Passes <../../Passes.html>`_.
121 For Kaleidoscope, we are currently generating functions on the fly, one
122 at a time, as the user types them in. We aren't shooting for the
123 ultimate optimization experience in this setting, but we also want to
124 catch the easy and quick stuff where possible. As such, we will choose
125 to run a few per-function optimizations as the user types the function
126 in. If we wanted to make a "static Kaleidoscope compiler", we would use
127 exactly the code we have now, except that we would defer running the
128 optimizer until the entire file has been parsed.
130 In order to get per-function optimizations going, we need to set up a
131 `FunctionPassManager <../../WritingAnLLVMPass.html#what-passmanager-doesr>`_ to hold
132 and organize the LLVM optimizations that we want to run. Once we have
133 that, we can add a set of optimizations to run. We'll need a new
134 FunctionPassManager for each module that we want to optimize, so we'll
135 write a function to create and initialize both the module and pass manager
140 void InitializeModuleAndPassManager(void) {
141 // Open a new module.
142 TheModule = std::make_unique<Module>("my cool jit", TheContext);
144 // Create a new pass manager attached to it.
145 TheFPM = std::make_unique<legacy::FunctionPassManager>(TheModule.get());
147 // Do simple "peephole" optimizations and bit-twiddling optzns.
148 TheFPM->add(createInstructionCombiningPass());
149 // Reassociate expressions.
150 TheFPM->add(createReassociatePass());
151 // Eliminate Common SubExpressions.
152 TheFPM->add(createGVNPass());
153 // Simplify the control flow graph (deleting unreachable blocks, etc).
154 TheFPM->add(createCFGSimplificationPass());
156 TheFPM->doInitialization();
159 This code initializes the global module ``TheModule``, and the function pass
160 manager ``TheFPM``, which is attached to ``TheModule``. Once the pass manager is
161 set up, we use a series of "add" calls to add a bunch of LLVM passes.
163 In this case, we choose to add four optimization passes.
164 The passes we choose here are a pretty standard set
165 of "cleanup" optimizations that are useful for a wide variety of code. I won't
166 delve into what they do but, believe me, they are a good starting place :).
168 Once the PassManager is set up, we need to make use of it. We do this by
169 running it after our newly created function is constructed (in
170 ``FunctionAST::codegen()``), but before it is returned to the client:
174 if (Value *RetVal = Body->codegen()) {
175 // Finish off the function.
176 Builder.CreateRet(RetVal);
178 // Validate the generated code, checking for consistency.
179 verifyFunction(*TheFunction);
181 // Optimize the function.
182 TheFPM->run(*TheFunction);
187 As you can see, this is pretty straightforward. The
188 ``FunctionPassManager`` optimizes and updates the LLVM Function\* in
189 place, improving (hopefully) its body. With this in place, we can try
190 our test above again:
194 ready> def test(x) (1+2+x)*(x+(1+2));
195 ready> Read function definition:
196 define double @test(double %x) {
198 %addtmp = fadd double %x, 3.000000e+00
199 %multmp = fmul double %addtmp, %addtmp
203 As expected, we now get our nicely optimized code, saving a floating
204 point add instruction from every execution of this function.
206 LLVM provides a wide variety of optimizations that can be used in
207 certain circumstances. Some `documentation about the various
208 passes <../../Passes.html>`_ is available, but it isn't very complete.
209 Another good source of ideas can come from looking at the passes that
210 ``Clang`` runs to get started. The "``opt``" tool allows you to
211 experiment with passes from the command line, so you can see if they do
214 Now that we have reasonable code coming out of our front-end, let's talk
217 Adding a JIT Compiler
218 =====================
220 Code that is available in LLVM IR can have a wide variety of tools
221 applied to it. For example, you can run optimizations on it (as we did
222 above), you can dump it out in textual or binary forms, you can compile
223 the code to an assembly file (.s) for some target, or you can JIT
224 compile it. The nice thing about the LLVM IR representation is that it
225 is the "common currency" between many different parts of the compiler.
227 In this section, we'll add JIT compiler support to our interpreter. The
228 basic idea that we want for Kaleidoscope is to have the user enter
229 function bodies as they do now, but immediately evaluate the top-level
230 expressions they type in. For example, if they type in "1 + 2;", we
231 should evaluate and print out 3. If they define a function, they should
232 be able to call it from the command line.
234 In order to do this, we first prepare the environment to create code for
235 the current native target and declare and initialize the JIT. This is
236 done by calling some ``InitializeNativeTarget\*`` functions and
237 adding a global variable ``TheJIT``, and initializing it in
242 static std::unique_ptr<KaleidoscopeJIT> TheJIT;
245 InitializeNativeTarget();
246 InitializeNativeTargetAsmPrinter();
247 InitializeNativeTargetAsmParser();
249 // Install standard binary operators.
250 // 1 is lowest precedence.
251 BinopPrecedence['<'] = 10;
252 BinopPrecedence['+'] = 20;
253 BinopPrecedence['-'] = 20;
254 BinopPrecedence['*'] = 40; // highest.
256 // Prime the first token.
257 fprintf(stderr, "ready> ");
260 TheJIT = std::make_unique<KaleidoscopeJIT>();
262 // Run the main "interpreter loop" now.
268 We also need to setup the data layout for the JIT:
272 void InitializeModuleAndPassManager(void) {
273 // Open a new module.
274 TheModule = std::make_unique<Module>("my cool jit", TheContext);
275 TheModule->setDataLayout(TheJIT->getTargetMachine().createDataLayout());
277 // Create a new pass manager attached to it.
278 TheFPM = std::make_unique<legacy::FunctionPassManager>(TheModule.get());
281 The KaleidoscopeJIT class is a simple JIT built specifically for these
282 tutorials, available inside the LLVM source code
283 at llvm-src/examples/Kaleidoscope/include/KaleidoscopeJIT.h.
284 In later chapters we will look at how it works and extend it with
285 new features, but for now we will take it as given. Its API is very simple:
286 ``addModule`` adds an LLVM IR module to the JIT, making its functions
287 available for execution; ``removeModule`` removes a module, freeing any
288 memory associated with the code in that module; and ``findSymbol`` allows us
289 to look up pointers to the compiled code.
291 We can take this simple API and change our code that parses top-level expressions to
296 static void HandleTopLevelExpression() {
297 // Evaluate a top-level expression into an anonymous function.
298 if (auto FnAST = ParseTopLevelExpr()) {
299 if (FnAST->codegen()) {
301 // JIT the module containing the anonymous expression, keeping a handle so
302 // we can free it later.
303 auto H = TheJIT->addModule(std::move(TheModule));
304 InitializeModuleAndPassManager();
306 // Search the JIT for the __anon_expr symbol.
307 auto ExprSymbol = TheJIT->findSymbol("__anon_expr");
308 assert(ExprSymbol && "Function not found");
310 // Get the symbol's address and cast it to the right type (takes no
311 // arguments, returns a double) so we can call it as a native function.
312 double (*FP)() = (double (*)())(intptr_t)ExprSymbol.getAddress();
313 fprintf(stderr, "Evaluated to %f\n", FP());
315 // Delete the anonymous expression module from the JIT.
316 TheJIT->removeModule(H);
319 If parsing and codegen succeed, the next step is to add the module containing
320 the top-level expression to the JIT. We do this by calling addModule, which
321 triggers code generation for all the functions in the module, and returns a
322 handle that can be used to remove the module from the JIT later. Once the module
323 has been added to the JIT it can no longer be modified, so we also open a new
324 module to hold subsequent code by calling ``InitializeModuleAndPassManager()``.
326 Once we've added the module to the JIT we need to get a pointer to the final
327 generated code. We do this by calling the JIT's findSymbol method, and passing
328 the name of the top-level expression function: ``__anon_expr``. Since we just
329 added this function, we assert that findSymbol returned a result.
331 Next, we get the in-memory address of the ``__anon_expr`` function by calling
332 ``getAddress()`` on the symbol. Recall that we compile top-level expressions
333 into a self-contained LLVM function that takes no arguments and returns the
334 computed double. Because the LLVM JIT compiler matches the native platform ABI,
335 this means that you can just cast the result pointer to a function pointer of
336 that type and call it directly. This means, there is no difference between JIT
337 compiled code and native machine code that is statically linked into your
340 Finally, since we don't support re-evaluation of top-level expressions, we
341 remove the module from the JIT when we're done to free the associated memory.
342 Recall, however, that the module we created a few lines earlier (via
343 ``InitializeModuleAndPassManager``) is still open and waiting for new code to be
346 With just these two changes, let's see how Kaleidoscope works now!
351 Read top-level expression:
354 ret double 9.000000e+00
357 Evaluated to 9.000000
359 Well this looks like it is basically working. The dump of the function
360 shows the "no argument function that always returns double" that we
361 synthesize for each top-level expression that is typed in. This
362 demonstrates very basic functionality, but can we do more?
366 ready> def testfunc(x y) x + y*2;
367 Read function definition:
368 define double @testfunc(double %x, double %y) {
370 %multmp = fmul double %y, 2.000000e+00
371 %addtmp = fadd double %multmp, %x
375 ready> testfunc(4, 10);
376 Read top-level expression:
379 %calltmp = call double @testfunc(double 4.000000e+00, double 1.000000e+01)
383 Evaluated to 24.000000
385 ready> testfunc(5, 10);
386 ready> LLVM ERROR: Program used external function 'testfunc' which could not be resolved!
389 Function definitions and calls also work, but something went very wrong on that
390 last line. The call looks valid, so what happened? As you may have guessed from
391 the API a Module is a unit of allocation for the JIT, and testfunc was part
392 of the same module that contained anonymous expression. When we removed that
393 module from the JIT to free the memory for the anonymous expression, we deleted
394 the definition of ``testfunc`` along with it. Then, when we tried to call
395 testfunc a second time, the JIT could no longer find it.
397 The easiest way to fix this is to put the anonymous expression in a separate
398 module from the rest of the function definitions. The JIT will happily resolve
399 function calls across module boundaries, as long as each of the functions called
400 has a prototype, and is added to the JIT before it is called. By putting the
401 anonymous expression in a different module we can delete it without affecting
402 the rest of the functions.
404 In fact, we're going to go a step further and put every function in its own
405 module. Doing so allows us to exploit a useful property of the KaleidoscopeJIT
406 that will make our environment more REPL-like: Functions can be added to the
407 JIT more than once (unlike a module where every function must have a unique
408 definition). When you look up a symbol in KaleidoscopeJIT it will always return
409 the most recent definition:
413 ready> def foo(x) x + 1;
414 Read function definition:
415 define double @foo(double %x) {
417 %addtmp = fadd double %x, 1.000000e+00
422 Evaluated to 3.000000
424 ready> def foo(x) x + 2;
425 define double @foo(double %x) {
427 %addtmp = fadd double %x, 2.000000e+00
432 Evaluated to 4.000000
435 To allow each function to live in its own module we'll need a way to
436 re-generate previous function declarations into each new module we open:
440 static std::unique_ptr<KaleidoscopeJIT> TheJIT;
444 Function *getFunction(std::string Name) {
445 // First, see if the function has already been added to the current module.
446 if (auto *F = TheModule->getFunction(Name))
449 // If not, check whether we can codegen the declaration from some existing
451 auto FI = FunctionProtos.find(Name);
452 if (FI != FunctionProtos.end())
453 return FI->second->codegen();
455 // If no existing prototype exists, return null.
461 Value *CallExprAST::codegen() {
462 // Look up the name in the global module table.
463 Function *CalleeF = getFunction(Callee);
467 Function *FunctionAST::codegen() {
468 // Transfer ownership of the prototype to the FunctionProtos map, but keep a
469 // reference to it for use below.
471 FunctionProtos[Proto->getName()] = std::move(Proto);
472 Function *TheFunction = getFunction(P.getName());
477 To enable this, we'll start by adding a new global, ``FunctionProtos``, that
478 holds the most recent prototype for each function. We'll also add a convenience
479 method, ``getFunction()``, to replace calls to ``TheModule->getFunction()``.
480 Our convenience method searches ``TheModule`` for an existing function
481 declaration, falling back to generating a new declaration from FunctionProtos if
482 it doesn't find one. In ``CallExprAST::codegen()`` we just need to replace the
483 call to ``TheModule->getFunction()``. In ``FunctionAST::codegen()`` we need to
484 update the FunctionProtos map first, then call ``getFunction()``. With this
485 done, we can always obtain a function declaration in the current module for any
486 previously declared function.
488 We also need to update HandleDefinition and HandleExtern:
492 static void HandleDefinition() {
493 if (auto FnAST = ParseDefinition()) {
494 if (auto *FnIR = FnAST->codegen()) {
495 fprintf(stderr, "Read function definition:");
497 fprintf(stderr, "\n");
498 TheJIT->addModule(std::move(TheModule));
499 InitializeModuleAndPassManager();
502 // Skip token for error recovery.
507 static void HandleExtern() {
508 if (auto ProtoAST = ParseExtern()) {
509 if (auto *FnIR = ProtoAST->codegen()) {
510 fprintf(stderr, "Read extern: ");
512 fprintf(stderr, "\n");
513 FunctionProtos[ProtoAST->getName()] = std::move(ProtoAST);
516 // Skip token for error recovery.
521 In HandleDefinition, we add two lines to transfer the newly defined function to
522 the JIT and open a new module. In HandleExtern, we just need to add one line to
523 add the prototype to FunctionProtos.
525 With these changes made, let's try our REPL again (I removed the dump of the
526 anonymous functions this time, you should get the idea by now :) :
530 ready> def foo(x) x + 1;
532 Evaluated to 3.000000
534 ready> def foo(x) x + 2;
536 Evaluated to 4.000000
540 Even with this simple code, we get some surprisingly powerful capabilities -
545 ready> extern sin(x);
547 declare double @sin(double)
549 ready> extern cos(x);
551 declare double @cos(double)
554 Read top-level expression:
557 ret double 0x3FEAED548F090CEE
560 Evaluated to 0.841471
562 ready> def foo(x) sin(x)*sin(x) + cos(x)*cos(x);
563 Read function definition:
564 define double @foo(double %x) {
566 %calltmp = call double @sin(double %x)
567 %multmp = fmul double %calltmp, %calltmp
568 %calltmp2 = call double @cos(double %x)
569 %multmp4 = fmul double %calltmp2, %calltmp2
570 %addtmp = fadd double %multmp, %multmp4
575 Read top-level expression:
578 %calltmp = call double @foo(double 4.000000e+00)
582 Evaluated to 1.000000
584 Whoa, how does the JIT know about sin and cos? The answer is surprisingly
585 simple: The KaleidoscopeJIT has a straightforward symbol resolution rule that
586 it uses to find symbols that aren't available in any given module: First
587 it searches all the modules that have already been added to the JIT, from the
588 most recent to the oldest, to find the newest definition. If no definition is
589 found inside the JIT, it falls back to calling "``dlsym("sin")``" on the
590 Kaleidoscope process itself. Since "``sin``" is defined within the JIT's
591 address space, it simply patches up calls in the module to call the libm
592 version of ``sin`` directly. But in some cases this even goes further:
593 as sin and cos are names of standard math functions, the constant folder
594 will directly evaluate the function calls to the correct result when called
595 with constants like in the "``sin(1.0)``" above.
597 In the future we'll see how tweaking this symbol resolution rule can be used to
598 enable all sorts of useful features, from security (restricting the set of
599 symbols available to JIT'd code), to dynamic code generation based on symbol
600 names, and even lazy compilation.
602 One immediate benefit of the symbol resolution rule is that we can now extend
603 the language by writing arbitrary C++ code to implement operations. For example,
609 #define DLLEXPORT __declspec(dllexport)
614 /// putchard - putchar that takes a double and returns 0.
615 extern "C" DLLEXPORT double putchard(double X) {
616 fputc((char)X, stderr);
620 Note, that for Windows we need to actually export the functions because
621 the dynamic symbol loader will use GetProcAddress to find the symbols.
623 Now we can produce simple output to the console by using things like:
624 "``extern putchard(x); putchard(120);``", which prints a lowercase 'x'
625 on the console (120 is the ASCII code for 'x'). Similar code could be
626 used to implement file I/O, console input, and many other capabilities
629 This completes the JIT and optimizer chapter of the Kaleidoscope
630 tutorial. At this point, we can compile a non-Turing-complete
631 programming language, optimize and JIT compile it in a user-driven way.
632 Next up we'll look into `extending the language with control flow
633 constructs <LangImpl05.html>`_, tackling some interesting LLVM IR issues
639 Here is the complete code listing for our running example, enhanced with
640 the LLVM JIT and optimizer. To build this example, use:
645 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
649 If you are compiling this on Linux, make sure to add the "-rdynamic"
650 option as well. This makes sure that the external functions are resolved
655 .. literalinclude:: ../../../examples/Kaleidoscope/Chapter4/toy.cpp
658 `Next: Extending the language: control flow <LangImpl05.html>`_