1 ========================================
2 Kaleidoscope: Code generation to LLVM IR
3 ========================================
11 Welcome to Chapter 3 of the "`Implementing a language with
12 LLVM <index.html>`_" tutorial. This chapter shows you how to transform
13 the `Abstract Syntax Tree <OCamlLangImpl2.html>`_, built in Chapter 2,
14 into LLVM IR. This will teach you a little bit about how LLVM does
15 things, as well as demonstrate how easy it is to use. It's much more
16 work to build a lexer and parser than it is to generate LLVM IR code. :)
18 **Please note**: the code in this chapter and later require LLVM 2.3 or
19 LLVM SVN to work. LLVM 2.2 and before will not work with it.
24 In order to generate LLVM IR, we want some simple setup to get started.
25 First we define virtual code generation (codegen) methods in each AST
30 let rec codegen_expr = function
32 | Ast.Variable name -> ...
34 The ``Codegen.codegen_expr`` function says to emit IR for that AST node
35 along with all the things it depends on, and they all return an LLVM
36 Value object. "Value" is the class used to represent a "`Static Single
38 (SSA) <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
39 register" or "SSA value" in LLVM. The most distinct aspect of SSA values
40 is that their value is computed as the related instruction executes, and
41 it does not get a new value until (and if) the instruction re-executes.
42 In other words, there is no way to "change" an SSA value. For more
43 information, please read up on `Static Single
44 Assignment <http://en.wikipedia.org/wiki/Static_single_assignment_form>`_
45 - the concepts are really quite natural once you grok them.
47 The second thing we want is an "Error" exception like we used for the
48 parser, which will be used to report errors found during code generation
49 (for example, use of an undeclared parameter):
53 exception Error of string
55 let context = global_context ()
56 let the_module = create_module context "my cool jit"
57 let builder = builder context
58 let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
59 let double_type = double_type context
61 The static variables will be used during code generation.
62 ``Codgen.the_module`` is the LLVM construct that contains all of the
63 functions and global variables in a chunk of code. In many ways, it is
64 the top-level structure that the LLVM IR uses to contain code.
66 The ``Codegen.builder`` object is a helper object that makes it easy to
67 generate LLVM instructions. Instances of the
68 `IRBuilder <http://llvm.org/doxygen/IRBuilder_8h-source.html>`_
69 class keep track of the current place to insert instructions and has
70 methods to create new instructions.
72 The ``Codegen.named_values`` map keeps track of which values are defined
73 in the current scope and what their LLVM representation is. (In other
74 words, it is a symbol table for the code). In this form of Kaleidoscope,
75 the only things that can be referenced are function parameters. As such,
76 function parameters will be in this map when generating code for their
79 With these basics in place, we can start talking about how to generate
80 code for each expression. Note that this assumes that the
81 ``Codgen.builder`` has been set up to generate code *into* something.
82 For now, we'll assume that this has already been done, and we'll just
85 Expression Code Generation
86 ==========================
88 Generating LLVM code for expression nodes is very straightforward: less
89 than 30 lines of commented code for all four of our expression nodes.
90 First we'll do numeric literals:
94 | Ast.Number n -> const_float double_type n
96 In the LLVM IR, numeric constants are represented with the
97 ``ConstantFP`` class, which holds the numeric value in an ``APFloat``
98 internally (``APFloat`` has the capability of holding floating point
99 constants of Arbitrary Precision). This code basically just creates
100 and returns a ``ConstantFP``. Note that in the LLVM IR that constants
101 are all uniqued together and shared. For this reason, the API uses "the
102 foo::get(..)" idiom instead of "new foo(..)" or "foo::Create(..)".
104 .. code-block:: ocaml
106 | Ast.Variable name ->
107 (try Hashtbl.find named_values name with
108 | Not_found -> raise (Error "unknown variable name"))
110 References to variables are also quite simple using LLVM. In the simple
111 version of Kaleidoscope, we assume that the variable has already been
112 emitted somewhere and its value is available. In practice, the only
113 values that can be in the ``Codegen.named_values`` map are function
114 arguments. This code simply checks to see that the specified name is in
115 the map (if not, an unknown variable is being referenced) and returns
116 the value for it. In future chapters, we'll add support for `loop
117 induction variables <LangImpl5.html#for-loop-expression>`_ in the symbol table, and for
118 `local variables <LangImpl7.html#user-defined-local-variables>`_.
120 .. code-block:: ocaml
122 | Ast.Binary (op, lhs, rhs) ->
123 let lhs_val = codegen_expr lhs in
124 let rhs_val = codegen_expr rhs in
127 | '+' -> build_fadd lhs_val rhs_val "addtmp" builder
128 | '-' -> build_fsub lhs_val rhs_val "subtmp" builder
129 | '*' -> build_fmul lhs_val rhs_val "multmp" builder
131 (* Convert bool 0/1 to double 0.0 or 1.0 *)
132 let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
133 build_uitofp i double_type "booltmp" builder
134 | _ -> raise (Error "invalid binary operator")
137 Binary operators start to get more interesting. The basic idea here is
138 that we recursively emit code for the left-hand side of the expression,
139 then the right-hand side, then we compute the result of the binary
140 expression. In this code, we do a simple switch on the opcode to create
141 the right LLVM instruction.
143 In the example above, the LLVM builder class is starting to show its
144 value. IRBuilder knows where to insert the newly created instruction,
145 all you have to do is specify what instruction to create (e.g. with
146 ``Llvm.create_add``), which operands to use (``lhs`` and ``rhs`` here)
147 and optionally provide a name for the generated instruction.
149 One nice thing about LLVM is that the name is just a hint. For instance,
150 if the code above emits multiple "addtmp" variables, LLVM will
151 automatically provide each one with an increasing, unique numeric
152 suffix. Local value names for instructions are purely optional, but it
153 makes it much easier to read the IR dumps.
155 `LLVM instructions <../LangRef.html#instruction-reference>`_ are constrained by strict
156 rules: for example, the Left and Right operators of an `add
157 instruction <../LangRef.html#add-instruction>`_ must have the same type, and the
158 result type of the add must match the operand types. Because all values
159 in Kaleidoscope are doubles, this makes for very simple code for add,
162 On the other hand, LLVM specifies that the `fcmp
163 instruction <../LangRef.html#fcmp-instruction>`_ always returns an 'i1' value (a
164 one bit integer). The problem with this is that Kaleidoscope wants the
165 value to be a 0.0 or 1.0 value. In order to get these semantics, we
166 combine the fcmp instruction with a `uitofp
167 instruction <../LangRef.html#uitofp-to-instruction>`_. This instruction converts its
168 input integer into a floating point value by treating the input as an
169 unsigned value. In contrast, if we used the `sitofp
170 instruction <../LangRef.html#sitofp-to-instruction>`_, the Kaleidoscope '<' operator
171 would return 0.0 and -1.0, depending on the input value.
173 .. code-block:: ocaml
175 | Ast.Call (callee, args) ->
176 (* Look up the name in the module table. *)
178 match lookup_function callee the_module with
179 | Some callee -> callee
180 | None -> raise (Error "unknown function referenced")
182 let params = params callee in
184 (* If argument mismatch error. *)
185 if Array.length params == Array.length args then () else
186 raise (Error "incorrect # arguments passed");
187 let args = Array.map codegen_expr args in
188 build_call callee args "calltmp" builder
190 Code generation for function calls is quite straightforward with LLVM.
191 The code above initially does a function name lookup in the LLVM
192 Module's symbol table. Recall that the LLVM Module is the container that
193 holds all of the functions we are JIT'ing. By giving each function the
194 same name as what the user specifies, we can use the LLVM symbol table
195 to resolve function names for us.
197 Once we have the function to call, we recursively codegen each argument
198 that is to be passed in, and create an LLVM `call
199 instruction <../LangRef.html#call-instruction>`_. Note that LLVM uses the native C
200 calling conventions by default, allowing these calls to also call into
201 standard library functions like "sin" and "cos", with no additional
204 This wraps up our handling of the four basic expressions that we have so
205 far in Kaleidoscope. Feel free to go in and add some more. For example,
206 by browsing the `LLVM language reference <../LangRef.html>`_ you'll find
207 several other interesting instructions that are really easy to plug into
210 Function Code Generation
211 ========================
213 Code generation for prototypes and functions must handle a number of
214 details, which make their code less beautiful than expression code
215 generation, but allows us to illustrate some important points. First,
216 lets talk about code generation for prototypes: they are used both for
217 function bodies and external function declarations. The code starts
220 .. code-block:: ocaml
222 let codegen_proto = function
223 | Ast.Prototype (name, args) ->
224 (* Make the function type: double(double,double) etc. *)
225 let doubles = Array.make (Array.length args) double_type in
226 let ft = function_type double_type doubles in
228 match lookup_function name the_module with
230 This code packs a lot of power into a few lines. Note first that this
231 function returns a "Function\*" instead of a "Value\*" (although at the
232 moment they both are modeled by ``llvalue`` in ocaml). Because a
233 "prototype" really talks about the external interface for a function
234 (not the value computed by an expression), it makes sense for it to
235 return the LLVM Function it corresponds to when codegen'd.
237 The call to ``Llvm.function_type`` creates the ``Llvm.llvalue`` that
238 should be used for a given Prototype. Since all function arguments in
239 Kaleidoscope are of type double, the first line creates a vector of "N"
240 LLVM double types. It then uses the ``Llvm.function_type`` method to
241 create a function type that takes "N" doubles as arguments, returns one
242 double as a result, and that is not vararg (that uses the function
243 ``Llvm.var_arg_function_type``). Note that Types in LLVM are uniqued
244 just like ``Constant``'s are, so you don't "new" a type, you "get" it.
246 The final line above checks if the function has already been defined in
247 ``Codegen.the_module``. If not, we will create it.
249 .. code-block:: ocaml
251 | None -> declare_function name ft the_module
253 This indicates the type and name to use, as well as which module to
254 insert into. By default we assume a function has
255 ``Llvm.Linkage.ExternalLinkage``. "`external
256 linkage <../LangRef.html#linkage>`_" means that the function may be defined
257 outside the current module and/or that it is callable by functions
258 outside the module. The "``name``" passed in is the name the user
259 specified: this name is registered in "``Codegen.the_module``"s symbol
260 table, which is used by the function call code above.
262 In Kaleidoscope, I choose to allow redefinitions of functions in two
263 cases: first, we want to allow 'extern'ing a function more than once, as
264 long as the prototypes for the externs match (since all arguments have
265 the same type, we just have to check that the number of arguments
266 match). Second, we want to allow 'extern'ing a function and then
267 defining a body for it. This is useful when defining mutually recursive
270 .. code-block:: ocaml
272 (* If 'f' conflicted, there was already something named 'name'. If it
273 * has a body, don't allow redefinition or reextern. *)
275 (* If 'f' already has a body, reject this. *)
276 if Array.length (basic_blocks f) == 0 then () else
277 raise (Error "redefinition of function");
279 (* If 'f' took a different number of arguments, reject. *)
280 if Array.length (params f) == Array.length args then () else
281 raise (Error "redefinition of function with different # args");
285 In order to verify the logic above, we first check to see if the
286 pre-existing function is "empty". In this case, empty means that it has
287 no basic blocks in it, which means it has no body. If it has no body, it
288 is a forward declaration. Since we don't allow anything after a full
289 definition of the function, the code rejects this case. If the previous
290 reference to a function was an 'extern', we simply verify that the
291 number of arguments for that definition and this one match up. If not,
294 .. code-block:: ocaml
296 (* Set names for all arguments. *)
297 Array.iteri (fun i a ->
300 Hashtbl.add named_values n a;
304 The last bit of code for prototypes loops over all of the arguments in
305 the function, setting the name of the LLVM Argument objects to match,
306 and registering the arguments in the ``Codegen.named_values`` map for
307 future use by the ``Ast.Variable`` variant. Once this is set up, it
308 returns the Function object to the caller. Note that we don't check for
309 conflicting argument names here (e.g. "extern foo(a b a)"). Doing so
310 would be very straight-forward with the mechanics we have already used
313 .. code-block:: ocaml
315 let codegen_func = function
316 | Ast.Function (proto, body) ->
317 Hashtbl.clear named_values;
318 let the_function = codegen_proto proto in
320 Code generation for function definitions starts out simply enough: we
321 just codegen the prototype (Proto) and verify that it is ok. We then
322 clear out the ``Codegen.named_values`` map to make sure that there isn't
323 anything in it from the last function we compiled. Code generation of
324 the prototype ensures that there is an LLVM Function object that is
327 .. code-block:: ocaml
329 (* Create a new basic block to start insertion into. *)
330 let bb = append_block context "entry" the_function in
331 position_at_end bb builder;
334 let ret_val = codegen_expr body in
336 Now we get to the point where the ``Codegen.builder`` is set up. The
337 first line creates a new `basic
338 block <http://en.wikipedia.org/wiki/Basic_block>`_ (named "entry"),
339 which is inserted into ``the_function``. The second line then tells the
340 builder that new instructions should be inserted into the end of the new
341 basic block. Basic blocks in LLVM are an important part of functions
342 that define the `Control Flow
343 Graph <http://en.wikipedia.org/wiki/Control_flow_graph>`_. Since we
344 don't have any control flow, our functions will only contain one block
345 at this point. We'll fix this in `Chapter 5 <OCamlLangImpl5.html>`_ :).
347 .. code-block:: ocaml
349 let ret_val = codegen_expr body in
351 (* Finish off the function. *)
352 let _ = build_ret ret_val builder in
354 (* Validate the generated code, checking for consistency. *)
355 Llvm_analysis.assert_valid_function the_function;
359 Once the insertion point is set up, we call the ``Codegen.codegen_func``
360 method for the root expression of the function. If no error happens,
361 this emits code to compute the expression into the entry block and
362 returns the value that was computed. Assuming no error, we then create
363 an LLVM `ret instruction <../LangRef.html#ret-instruction>`_, which completes the
364 function. Once the function is built, we call
365 ``Llvm_analysis.assert_valid_function``, which is provided by LLVM. This
366 function does a variety of consistency checks on the generated code, to
367 determine if our compiler is doing everything right. Using this is
368 important: it can catch a lot of bugs. Once the function is finished and
369 validated, we return it.
371 .. code-block:: ocaml
374 delete_function the_function;
377 The only piece left here is handling of the error case. For simplicity,
378 we handle this by merely deleting the function we produced with the
379 ``Llvm.delete_function`` method. This allows the user to redefine a
380 function that they incorrectly typed in before: if we didn't delete it,
381 it would live in the symbol table, with a body, preventing future
384 This code does have a bug, though. Since the ``Codegen.codegen_proto``
385 can return a previously defined forward declaration, our code can
386 actually delete a forward declaration. There are a number of ways to fix
387 this bug, see what you can come up with! Here is a testcase:
391 extern foo(a b); # ok, defines foo.
392 def foo(a b) c; # error, 'c' is invalid.
393 def bar() foo(1, 2); # error, unknown function "foo"
395 Driver Changes and Closing Thoughts
396 ===================================
398 For now, code generation to LLVM doesn't really get us much, except that
399 we can look at the pretty IR calls. The sample code inserts calls to
400 Codegen into the "``Toplevel.main_loop``", and then dumps out the LLVM
401 IR. This gives a nice way to look at the LLVM IR for simple functions.
407 Read top-level expression:
408 define double @""() {
410 %addtmp = fadd double 4.000000e+00, 5.000000e+00
414 Note how the parser turns the top-level expression into anonymous
415 functions for us. This will be handy when we add `JIT
416 support <OCamlLangImpl4.html#adding-a-jit-compiler>`_ in the next chapter. Also note that
417 the code is very literally transcribed, no optimizations are being
418 performed. We will `add
419 optimizations <OCamlLangImpl4.html#trivial-constant-folding>`_ explicitly in the
424 ready> def foo(a b) a*a + 2*a*b + b*b;
425 Read function definition:
426 define double @foo(double %a, double %b) {
428 %multmp = fmul double %a, %a
429 %multmp1 = fmul double 2.000000e+00, %a
430 %multmp2 = fmul double %multmp1, %b
431 %addtmp = fadd double %multmp, %multmp2
432 %multmp3 = fmul double %b, %b
433 %addtmp4 = fadd double %addtmp, %multmp3
437 This shows some simple arithmetic. Notice the striking similarity to the
438 LLVM builder calls that we use to create the instructions.
442 ready> def bar(a) foo(a, 4.0) + bar(31337);
443 Read function definition:
444 define double @bar(double %a) {
446 %calltmp = call double @foo(double %a, double 4.000000e+00)
447 %calltmp1 = call double @bar(double 3.133700e+04)
448 %addtmp = fadd double %calltmp, %calltmp1
452 This shows some function calls. Note that this function will take a long
453 time to execute if you call it. In the future we'll add conditional
454 control flow to actually make recursion useful :).
458 ready> extern cos(x);
460 declare double @cos(double)
463 Read top-level expression:
464 define double @""() {
466 %calltmp = call double @cos(double 1.234000e+00)
470 This shows an extern for the libm "cos" function, and a call to it.
475 ; ModuleID = 'my cool jit'
477 define double @""() {
479 %addtmp = fadd double 4.000000e+00, 5.000000e+00
483 define double @foo(double %a, double %b) {
485 %multmp = fmul double %a, %a
486 %multmp1 = fmul double 2.000000e+00, %a
487 %multmp2 = fmul double %multmp1, %b
488 %addtmp = fadd double %multmp, %multmp2
489 %multmp3 = fmul double %b, %b
490 %addtmp4 = fadd double %addtmp, %multmp3
494 define double @bar(double %a) {
496 %calltmp = call double @foo(double %a, double 4.000000e+00)
497 %calltmp1 = call double @bar(double 3.133700e+04)
498 %addtmp = fadd double %calltmp, %calltmp1
502 declare double @cos(double)
504 define double @""() {
506 %calltmp = call double @cos(double 1.234000e+00)
510 When you quit the current demo, it dumps out the IR for the entire
511 module generated. Here you can see the big picture with all the
512 functions referencing each other.
514 This wraps up the third chapter of the Kaleidoscope tutorial. Up next,
515 we'll describe how to `add JIT codegen and optimizer
516 support <OCamlLangImpl4.html>`_ to this so we can actually start running
522 Here is the complete code listing for our running example, enhanced with
523 the LLVM code generator. Because this uses the LLVM libraries, we need
524 to link them in. To do this, we use the
525 `llvm-config <http://llvm.org/cmds/llvm-config.html>`_ tool to inform
526 our makefile/command line about which options to use:
540 <{lexer,parser}.ml>: use_camlp4, pp(camlp4of)
541 <*.{byte,native}>: g++, use_llvm, use_llvm_analysis
544 .. code-block:: ocaml
546 open Ocamlbuild_plugin;;
548 ocaml_lib ~extern:true "llvm";;
549 ocaml_lib ~extern:true "llvm_analysis";;
551 flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"]);;
554 .. code-block:: ocaml
556 (*===----------------------------------------------------------------------===
558 *===----------------------------------------------------------------------===*)
560 (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
561 * these others for known things. *)
567 | Ident of string | Number of float
573 .. code-block:: ocaml
575 (*===----------------------------------------------------------------------===
577 *===----------------------------------------------------------------------===*)
580 (* Skip any whitespace. *)
581 | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
583 (* identifier: [a-zA-Z][a-zA-Z0-9] *)
584 | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
585 let buffer = Buffer.create 1 in
586 Buffer.add_char buffer c;
587 lex_ident buffer stream
589 (* number: [0-9.]+ *)
590 | [< ' ('0' .. '9' as c); stream >] ->
591 let buffer = Buffer.create 1 in
592 Buffer.add_char buffer c;
593 lex_number buffer stream
595 (* Comment until end of line. *)
596 | [< ' ('#'); stream >] ->
599 (* Otherwise, just return the character as its ascii value. *)
600 | [< 'c; stream >] ->
601 [< 'Token.Kwd c; lex stream >]
606 and lex_number buffer = parser
607 | [< ' ('0' .. '9' | '.' as c); stream >] ->
608 Buffer.add_char buffer c;
609 lex_number buffer stream
610 | [< stream=lex >] ->
611 [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
613 and lex_ident buffer = parser
614 | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
615 Buffer.add_char buffer c;
616 lex_ident buffer stream
617 | [< stream=lex >] ->
618 match Buffer.contents buffer with
619 | "def" -> [< 'Token.Def; stream >]
620 | "extern" -> [< 'Token.Extern; stream >]
621 | id -> [< 'Token.Ident id; stream >]
623 and lex_comment = parser
624 | [< ' ('\n'); stream=lex >] -> stream
625 | [< 'c; e=lex_comment >] -> e
629 .. code-block:: ocaml
631 (*===----------------------------------------------------------------------===
632 * Abstract Syntax Tree (aka Parse Tree)
633 *===----------------------------------------------------------------------===*)
635 (* expr - Base type for all expression nodes. *)
637 (* variant for numeric literals like "1.0". *)
640 (* variant for referencing a variable, like "a". *)
643 (* variant for a binary operator. *)
644 | Binary of char * expr * expr
646 (* variant for function calls. *)
647 | Call of string * expr array
649 (* proto - This type represents the "prototype" for a function, which captures
650 * its name, and its argument names (thus implicitly the number of arguments the
651 * function takes). *)
652 type proto = Prototype of string * string array
654 (* func - This type represents a function definition itself. *)
655 type func = Function of proto * expr
658 .. code-block:: ocaml
660 (*===---------------------------------------------------------------------===
662 *===---------------------------------------------------------------------===*)
664 (* binop_precedence - This holds the precedence for each binary operator that is
666 let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
668 (* precedence - Get the precedence of the pending binary operator token. *)
669 let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
675 let rec parse_primary = parser
676 (* numberexpr ::= number *)
677 | [< 'Token.Number n >] -> Ast.Number n
679 (* parenexpr ::= '(' expression ')' *)
680 | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
684 * ::= identifier '(' argumentexpr ')' *)
685 | [< 'Token.Ident id; stream >] ->
686 let rec parse_args accumulator = parser
687 | [< e=parse_expr; stream >] ->
689 | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
690 | [< >] -> e :: accumulator
692 | [< >] -> accumulator
694 let rec parse_ident id = parser
698 'Token.Kwd ')' ?? "expected ')'">] ->
699 Ast.Call (id, Array.of_list (List.rev args))
701 (* Simple variable ref. *)
702 | [< >] -> Ast.Variable id
704 parse_ident id stream
706 | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
709 * ::= ('+' primary)* *)
710 and parse_bin_rhs expr_prec lhs stream =
711 match Stream.peek stream with
712 (* If this is a binop, find its precedence. *)
713 | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
714 let token_prec = precedence c in
716 (* If this is a binop that binds at least as tightly as the current binop,
717 * consume it, otherwise we are done. *)
718 if token_prec < expr_prec then lhs else begin
722 (* Parse the primary expression after the binary operator. *)
723 let rhs = parse_primary stream in
725 (* Okay, we know this is a binop. *)
727 match Stream.peek stream with
728 | Some (Token.Kwd c2) ->
729 (* If BinOp binds less tightly with rhs than the operator after
730 * rhs, let the pending operator take rhs as its lhs. *)
731 let next_prec = precedence c2 in
732 if token_prec < next_prec
733 then parse_bin_rhs (token_prec + 1) rhs stream
739 let lhs = Ast.Binary (c, lhs, rhs) in
740 parse_bin_rhs expr_prec lhs stream
745 * ::= primary binoprhs *)
746 and parse_expr = parser
747 | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
750 * ::= id '(' id* ')' *)
751 let parse_prototype =
752 let rec parse_args accumulator = parser
753 | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
754 | [< >] -> accumulator
758 | [< 'Token.Ident id;
759 'Token.Kwd '(' ?? "expected '(' in prototype";
761 'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
763 Ast.Prototype (id, Array.of_list (List.rev args))
766 raise (Stream.Error "expected function name in prototype")
768 (* definition ::= 'def' prototype expression *)
769 let parse_definition = parser
770 | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
773 (* toplevelexpr ::= expression *)
774 let parse_toplevel = parser
775 | [< e=parse_expr >] ->
776 (* Make an anonymous proto. *)
777 Ast.Function (Ast.Prototype ("", [||]), e)
779 (* external ::= 'extern' prototype *)
780 let parse_extern = parser
781 | [< 'Token.Extern; e=parse_prototype >] -> e
784 .. code-block:: ocaml
786 (*===----------------------------------------------------------------------===
788 *===----------------------------------------------------------------------===*)
792 exception Error of string
794 let context = global_context ()
795 let the_module = create_module context "my cool jit"
796 let builder = builder context
797 let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
798 let double_type = double_type context
800 let rec codegen_expr = function
801 | Ast.Number n -> const_float double_type n
802 | Ast.Variable name ->
803 (try Hashtbl.find named_values name with
804 | Not_found -> raise (Error "unknown variable name"))
805 | Ast.Binary (op, lhs, rhs) ->
806 let lhs_val = codegen_expr lhs in
807 let rhs_val = codegen_expr rhs in
810 | '+' -> build_add lhs_val rhs_val "addtmp" builder
811 | '-' -> build_sub lhs_val rhs_val "subtmp" builder
812 | '*' -> build_mul lhs_val rhs_val "multmp" builder
814 (* Convert bool 0/1 to double 0.0 or 1.0 *)
815 let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
816 build_uitofp i double_type "booltmp" builder
817 | _ -> raise (Error "invalid binary operator")
819 | Ast.Call (callee, args) ->
820 (* Look up the name in the module table. *)
822 match lookup_function callee the_module with
823 | Some callee -> callee
824 | None -> raise (Error "unknown function referenced")
826 let params = params callee in
828 (* If argument mismatch error. *)
829 if Array.length params == Array.length args then () else
830 raise (Error "incorrect # arguments passed");
831 let args = Array.map codegen_expr args in
832 build_call callee args "calltmp" builder
834 let codegen_proto = function
835 | Ast.Prototype (name, args) ->
836 (* Make the function type: double(double,double) etc. *)
837 let doubles = Array.make (Array.length args) double_type in
838 let ft = function_type double_type doubles in
840 match lookup_function name the_module with
841 | None -> declare_function name ft the_module
843 (* If 'f' conflicted, there was already something named 'name'. If it
844 * has a body, don't allow redefinition or reextern. *)
846 (* If 'f' already has a body, reject this. *)
847 if block_begin f <> At_end f then
848 raise (Error "redefinition of function");
850 (* If 'f' took a different number of arguments, reject. *)
851 if element_type (type_of f) <> ft then
852 raise (Error "redefinition of function with different # args");
856 (* Set names for all arguments. *)
857 Array.iteri (fun i a ->
860 Hashtbl.add named_values n a;
864 let codegen_func = function
865 | Ast.Function (proto, body) ->
866 Hashtbl.clear named_values;
867 let the_function = codegen_proto proto in
869 (* Create a new basic block to start insertion into. *)
870 let bb = append_block context "entry" the_function in
871 position_at_end bb builder;
874 let ret_val = codegen_expr body in
876 (* Finish off the function. *)
877 let _ = build_ret ret_val builder in
879 (* Validate the generated code, checking for consistency. *)
880 Llvm_analysis.assert_valid_function the_function;
884 delete_function the_function;
888 .. code-block:: ocaml
890 (*===----------------------------------------------------------------------===
891 * Top-Level parsing and JIT Driver
892 *===----------------------------------------------------------------------===*)
896 (* top ::= definition | external | expression | ';' *)
897 let rec main_loop stream =
898 match Stream.peek stream with
901 (* ignore top-level semicolons. *)
902 | Some (Token.Kwd ';') ->
910 let e = Parser.parse_definition stream in
911 print_endline "parsed a function definition.";
912 dump_value (Codegen.codegen_func e);
914 let e = Parser.parse_extern stream in
915 print_endline "parsed an extern.";
916 dump_value (Codegen.codegen_proto e);
918 (* Evaluate a top-level expression into an anonymous function. *)
919 let e = Parser.parse_toplevel stream in
920 print_endline "parsed a top-level expr";
921 dump_value (Codegen.codegen_func e);
922 with Stream.Error s | Codegen.Error s ->
923 (* Skip token for error recovery. *)
927 print_string "ready> "; flush stdout;
931 .. code-block:: ocaml
933 (*===----------------------------------------------------------------------===
935 *===----------------------------------------------------------------------===*)
940 (* Install standard binary operators.
941 * 1 is the lowest precedence. *)
942 Hashtbl.add Parser.binop_precedence '<' 10;
943 Hashtbl.add Parser.binop_precedence '+' 20;
944 Hashtbl.add Parser.binop_precedence '-' 20;
945 Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *)
947 (* Prime the first token. *)
948 print_string "ready> "; flush stdout;
949 let stream = Lexer.lex (Stream.of_channel stdin) in
951 (* Run the main "interpreter loop" now. *)
952 Toplevel.main_loop stream;
954 (* Print out all the generated code. *)
955 dump_module Codegen.the_module
960 `Next: Adding JIT and Optimizer Support <OCamlLangImpl4.html>`_