1 ============================================================
2 Kaleidoscope: Extending the Language: User-defined Operators
3 ============================================================
11 Welcome to Chapter 6 of the "`Implementing a language with
12 LLVM <index.html>`_" tutorial. At this point in our tutorial, we now
13 have a fully functional language that is fairly minimal, but also
14 useful. There is still one big problem with it, however. Our language
15 doesn't have many useful operators (like division, logical negation, or
16 even any comparisons besides less-than).
18 This chapter of the tutorial takes a wild digression into adding
19 user-defined operators to the simple and beautiful Kaleidoscope
20 language. This digression now gives us a simple and ugly language in
21 some ways, but also a powerful one at the same time. One of the great
22 things about creating your own language is that you get to decide what
23 is good or bad. In this tutorial we'll assume that it is okay to use
24 this as a way to show some interesting parsing techniques.
26 At the end of this tutorial, we'll run through an example Kaleidoscope
27 application that `renders the Mandelbrot set <#kicking-the-tires>`_. This gives an
28 example of what you can build with Kaleidoscope and its feature set.
30 User-defined Operators: the Idea
31 ================================
33 The "operator overloading" that we will add to Kaleidoscope is more
34 general than languages like C++. In C++, you are only allowed to
35 redefine existing operators: you can't programmatically change the
36 grammar, introduce new operators, change precedence levels, etc. In this
37 chapter, we will add this capability to Kaleidoscope, which will let the
38 user round out the set of operators that are supported.
40 The point of going into user-defined operators in a tutorial like this
41 is to show the power and flexibility of using a hand-written parser.
42 Thus far, the parser we have been implementing uses recursive descent
43 for most parts of the grammar and operator precedence parsing for the
44 expressions. See `Chapter 2 <OCamlLangImpl2.html>`_ for details. Without
45 using operator precedence parsing, it would be very difficult to allow
46 the programmer to introduce new operators into the grammar: the grammar
47 is dynamically extensible as the JIT runs.
49 The two specific features we'll add are programmable unary operators
50 (right now, Kaleidoscope has no unary operators at all) as well as
51 binary operators. An example of this is:
62 # Define > with the same precedence as <.
63 def binary> 10 (LHS RHS)
66 # Binary "logical or", (note that it does not "short circuit")
67 def binary| 5 (LHS RHS)
75 # Define = with slightly lower precedence than relationals.
76 def binary= 9 (LHS RHS)
77 !(LHS < RHS | LHS > RHS);
79 Many languages aspire to being able to implement their standard runtime
80 library in the language itself. In Kaleidoscope, we can implement
81 significant parts of the language in the library!
83 We will break down implementation of these features into two parts:
84 implementing support for user-defined binary operators and adding unary
87 User-defined Binary Operators
88 =============================
90 Adding support for user-defined binary operators is pretty simple with
91 our current framework. We'll first add support for the unary/binary
103 and lex_ident buffer = parser
105 | "for" -> [< 'Token.For; stream >]
106 | "in" -> [< 'Token.In; stream >]
107 | "binary" -> [< 'Token.Binary; stream >]
108 | "unary" -> [< 'Token.Unary; stream >]
110 This just adds lexer support for the unary and binary keywords, like we
111 did in `previous chapters <OCamlLangImpl5.html#lexer-extensions-for-if-then-else>`_. One nice
112 thing about our current AST, is that we represent binary operators with
113 full generalisation by using their ASCII code as the opcode. For our
114 extended operators, we'll use this same representation, so we don't need
115 any new AST or parser support.
117 On the other hand, we have to be able to represent the definitions of
118 these new operators, in the "def binary\| 5" part of the function
119 definition. In our grammar so far, the "name" for the function
120 definition is parsed as the "prototype" production and into the
121 ``Ast.Prototype`` AST node. To represent our new user-defined operators
122 as prototypes, we have to extend the ``Ast.Prototype`` AST node like
125 .. code-block:: ocaml
127 (* proto - This type represents the "prototype" for a function, which captures
128 * its name, and its argument names (thus implicitly the number of arguments the
129 * function takes). *)
131 | Prototype of string * string array
132 | BinOpPrototype of string * string array * int
134 Basically, in addition to knowing a name for the prototype, we now keep
135 track of whether it was an operator, and if it was, what precedence
136 level the operator is at. The precedence is only used for binary
137 operators (as you'll see below, it just doesn't apply for unary
138 operators). Now that we have a way to represent the prototype for a
139 user-defined operator, we need to parse it:
141 .. code-block:: ocaml
145 * ::= binary LETTER number? (id, id)
146 * ::= unary LETTER number? (id) *)
147 let parse_prototype =
148 let rec parse_args accumulator = parser
149 | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
150 | [< >] -> accumulator
152 let parse_operator = parser
153 | [< 'Token.Unary >] -> "unary", 1
154 | [< 'Token.Binary >] -> "binary", 2
156 let parse_binary_precedence = parser
157 | [< 'Token.Number n >] -> int_of_float n
161 | [< 'Token.Ident id;
162 'Token.Kwd '(' ?? "expected '(' in prototype";
164 'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
166 Ast.Prototype (id, Array.of_list (List.rev args))
167 | [< (prefix, kind)=parse_operator;
168 'Token.Kwd op ?? "expected an operator";
169 (* Read the precedence if present. *)
170 binary_precedence=parse_binary_precedence;
171 'Token.Kwd '(' ?? "expected '(' in prototype";
173 'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
174 let name = prefix ^ (String.make 1 op) in
175 let args = Array.of_list (List.rev args) in
177 (* Verify right number of arguments for operator. *)
178 if Array.length args != kind
179 then raise (Stream.Error "invalid number of operands for operator")
182 Ast.Prototype (name, args)
184 Ast.BinOpPrototype (name, args, binary_precedence)
186 raise (Stream.Error "expected function name in prototype")
188 This is all fairly straightforward parsing code, and we have already
189 seen a lot of similar code in the past. One interesting part about the
190 code above is the couple lines that set up ``name`` for binary
191 operators. This builds names like "binary@" for a newly defined "@"
192 operator. This then takes advantage of the fact that symbol names in the
193 LLVM symbol table are allowed to have any character in them, including
194 embedded nul characters.
196 The next interesting thing to add, is codegen support for these binary
197 operators. Given our current structure, this is a simple addition of a
198 default case for our existing binary operator node:
200 .. code-block:: ocaml
202 let codegen_expr = function
204 | Ast.Binary (op, lhs, rhs) ->
205 let lhs_val = codegen_expr lhs in
206 let rhs_val = codegen_expr rhs in
209 | '+' -> build_add lhs_val rhs_val "addtmp" builder
210 | '-' -> build_sub lhs_val rhs_val "subtmp" builder
211 | '*' -> build_mul lhs_val rhs_val "multmp" builder
213 (* Convert bool 0/1 to double 0.0 or 1.0 *)
214 let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
215 build_uitofp i double_type "booltmp" builder
217 (* If it wasn't a builtin binary operator, it must be a user defined
218 * one. Emit a call to it. *)
219 let callee = "binary" ^ (String.make 1 op) in
221 match lookup_function callee the_module with
222 | Some callee -> callee
223 | None -> raise (Error "binary operator not found!")
225 build_call callee [|lhs_val; rhs_val|] "binop" builder
228 As you can see above, the new code is actually really simple. It just
229 does a lookup for the appropriate operator in the symbol table and
230 generates a function call to it. Since user-defined operators are just
231 built as normal functions (because the "prototype" boils down to a
232 function with the right name) everything falls into place.
234 The final piece of code we are missing, is a bit of top level magic:
236 .. code-block:: ocaml
238 let codegen_func the_fpm = function
239 | Ast.Function (proto, body) ->
240 Hashtbl.clear named_values;
241 let the_function = codegen_proto proto in
243 (* If this is an operator, install it. *)
244 begin match proto with
245 | Ast.BinOpPrototype (name, args, prec) ->
246 let op = name.[String.length name - 1] in
247 Hashtbl.add Parser.binop_precedence op prec;
251 (* Create a new basic block to start insertion into. *)
252 let bb = append_block context "entry" the_function in
253 position_at_end bb builder;
256 Basically, before codegening a function, if it is a user-defined
257 operator, we register it in the precedence table. This allows the binary
258 operator parsing logic we already have in place to handle it. Since we
259 are working on a fully-general operator precedence parser, this is all
260 we need to do to "extend the grammar".
262 Now we have useful user-defined binary operators. This builds a lot on
263 the previous framework we built for other operators. Adding unary
264 operators is a bit more challenging, because we don't have any framework
265 for it yet - lets see what it takes.
267 User-defined Unary Operators
268 ============================
270 Since we don't currently support unary operators in the Kaleidoscope
271 language, we'll need to add everything to support them. Above, we added
272 simple support for the 'unary' keyword to the lexer. In addition to
273 that, we need an AST node:
275 .. code-block:: ocaml
279 (* variant for a unary operator. *)
280 | Unary of char * expr
283 This AST node is very simple and obvious by now. It directly mirrors the
284 binary operator AST node, except that it only has one child. With this,
285 we need to add the parsing logic. Parsing a unary operator is pretty
286 simple: we'll add a new function to do it:
288 .. code-block:: ocaml
293 and parse_unary = parser
294 (* If this is a unary operator, read it. *)
295 | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] ->
296 Ast.Unary (op, operand)
298 (* If the current token is not an operator, it must be a primary expr. *)
299 | [< stream >] -> parse_primary stream
301 The grammar we add is pretty straightforward here. If we see a unary
302 operator when parsing a primary operator, we eat the operator as a
303 prefix and parse the remaining piece as another unary operator. This
304 allows us to handle multiple unary operators (e.g. "!!x"). Note that
305 unary operators can't have ambiguous parses like binary operators can,
306 so there is no need for precedence information.
308 The problem with this function, is that we need to call ParseUnary from
309 somewhere. To do this, we change previous callers of ParsePrimary to
310 call ``parse_unary`` instead:
312 .. code-block:: ocaml
315 * ::= ('+' primary)* *)
316 and parse_bin_rhs expr_prec lhs stream =
318 (* Parse the unary expression after the binary operator. *)
319 let rhs = parse_unary stream in
325 * ::= primary binoprhs *)
326 and parse_expr = parser
327 | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream
329 With these two simple changes, we are now able to parse unary operators
330 and build the AST for them. Next up, we need to add parser support for
331 prototypes, to parse the unary operator prototype. We extend the binary
332 operator code above with:
334 .. code-block:: ocaml
338 * ::= binary LETTER number? (id, id)
339 * ::= unary LETTER number? (id) *)
340 let parse_prototype =
341 let rec parse_args accumulator = parser
342 | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
343 | [< >] -> accumulator
345 let parse_operator = parser
346 | [< 'Token.Unary >] -> "unary", 1
347 | [< 'Token.Binary >] -> "binary", 2
349 let parse_binary_precedence = parser
350 | [< 'Token.Number n >] -> int_of_float n
354 | [< 'Token.Ident id;
355 'Token.Kwd '(' ?? "expected '(' in prototype";
357 'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
359 Ast.Prototype (id, Array.of_list (List.rev args))
360 | [< (prefix, kind)=parse_operator;
361 'Token.Kwd op ?? "expected an operator";
362 (* Read the precedence if present. *)
363 binary_precedence=parse_binary_precedence;
364 'Token.Kwd '(' ?? "expected '(' in prototype";
366 'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
367 let name = prefix ^ (String.make 1 op) in
368 let args = Array.of_list (List.rev args) in
370 (* Verify right number of arguments for operator. *)
371 if Array.length args != kind
372 then raise (Stream.Error "invalid number of operands for operator")
375 Ast.Prototype (name, args)
377 Ast.BinOpPrototype (name, args, binary_precedence)
379 raise (Stream.Error "expected function name in prototype")
381 As with binary operators, we name unary operators with a name that
382 includes the operator character. This assists us at code generation
383 time. Speaking of, the final piece we need to add is codegen support for
384 unary operators. It looks like this:
386 .. code-block:: ocaml
388 let rec codegen_expr = function
390 | Ast.Unary (op, operand) ->
391 let operand = codegen_expr operand in
392 let callee = "unary" ^ (String.make 1 op) in
394 match lookup_function callee the_module with
395 | Some callee -> callee
396 | None -> raise (Error "unknown unary operator")
398 build_call callee [|operand|] "unop" builder
400 This code is similar to, but simpler than, the code for binary
401 operators. It is simpler primarily because it doesn't need to handle any
402 predefined operators.
407 It is somewhat hard to believe, but with a few simple extensions we've
408 covered in the last chapters, we have grown a real-ish language. With
409 this, we can do a lot of interesting things, including I/O, math, and a
410 bunch of other things. For example, we can now add a nice sequencing
411 operator (printd is defined to print out the specified value and a
416 ready> extern printd(x);
417 Read extern: declare double @printd(double)
418 ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.
420 ready> printd(123) : printd(456) : printd(789);
424 Evaluated to 0.000000
426 We can also define a bunch of other "primitive" operations, such as:
441 # Define > with the same precedence as <.
442 def binary> 10 (LHS RHS)
445 # Binary logical or, which does not short circuit.
446 def binary| 5 (LHS RHS)
454 # Binary logical and, which does not short circuit.
455 def binary& 6 (LHS RHS)
461 # Define = with slightly lower precedence than relationals.
462 def binary = 9 (LHS RHS)
463 !(LHS < RHS | LHS > RHS);
465 Given the previous if/then/else support, we can also define interesting
466 functions for I/O. For example, the following prints out a character
467 whose "density" reflects the value passed in: the lower the value, the
468 denser the character:
474 extern putchard(char)
485 ready> printdensity(1): printdensity(2): printdensity(3) :
486 printdensity(4): printdensity(5): printdensity(9): putchard(10);
488 Evaluated to 0.000000
490 Based on these simple primitive operations, we can start to define more
491 interesting things. For example, here's a little function that solves
492 for the number of iterations it takes a function in the complex plane to
497 # determine whether the specific location diverges.
498 # Solve for z = z^2 + c in the complex plane.
499 def mandelconverger(real imag iters creal cimag)
500 if iters > 255 | (real*real + imag*imag > 4) then
503 mandelconverger(real*real - imag*imag + creal,
505 iters+1, creal, cimag);
507 # return the number of iterations required for the iteration to escape
508 def mandelconverge(real imag)
509 mandelconverger(real, imag, 0, real, imag);
511 This "z = z\ :sup:`2`\ + c" function is a beautiful little creature
512 that is the basis for computation of the `Mandelbrot
513 Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our
514 ``mandelconverge`` function returns the number of iterations that it
515 takes for a complex orbit to escape, saturating to 255. This is not a
516 very useful function by itself, but if you plot its value over a
517 two-dimensional plane, you can see the Mandelbrot set. Given that we are
518 limited to using putchard here, our amazing graphical output is limited,
519 but we can whip together something using the density plotter above:
523 # compute and plot the mandelbrot set with the specified 2 dimensional range
525 def mandelhelp(xmin xmax xstep ymin ymax ystep)
526 for y = ymin, y < ymax, ystep in (
527 (for x = xmin, x < xmax, xstep in
528 printdensity(mandelconverge(x,y)))
532 # mandel - This is a convenient helper function for plotting the mandelbrot set
533 # from the specified position with the specified Magnification.
534 def mandel(realstart imagstart realmag imagmag)
535 mandelhelp(realstart, realstart+realmag*78, realmag,
536 imagstart, imagstart+imagmag*40, imagmag);
538 Given this, we can try plotting out the mandelbrot set! Lets try it out:
542 ready> mandel(-2.3, -1.3, 0.05, 0.07);
543 *******************************+++++++++++*************************************
544 *************************+++++++++++++++++++++++*******************************
545 **********************+++++++++++++++++++++++++++++****************************
546 *******************+++++++++++++++++++++.. ...++++++++*************************
547 *****************++++++++++++++++++++++.... ...+++++++++***********************
548 ***************+++++++++++++++++++++++..... ...+++++++++*********************
549 **************+++++++++++++++++++++++.... ....+++++++++********************
550 *************++++++++++++++++++++++...... .....++++++++*******************
551 ************+++++++++++++++++++++....... .......+++++++******************
552 ***********+++++++++++++++++++.... ... .+++++++*****************
553 **********+++++++++++++++++....... .+++++++****************
554 *********++++++++++++++........... ...+++++++***************
555 ********++++++++++++............ ...++++++++**************
556 ********++++++++++... .......... .++++++++**************
557 *******+++++++++..... .+++++++++*************
558 *******++++++++...... ..+++++++++*************
559 *******++++++....... ..+++++++++*************
560 *******+++++...... ..+++++++++*************
561 *******.... .... ...+++++++++*************
562 *******.... . ...+++++++++*************
563 *******+++++...... ...+++++++++*************
564 *******++++++....... ..+++++++++*************
565 *******++++++++...... .+++++++++*************
566 *******+++++++++..... ..+++++++++*************
567 ********++++++++++... .......... .++++++++**************
568 ********++++++++++++............ ...++++++++**************
569 *********++++++++++++++.......... ...+++++++***************
570 **********++++++++++++++++........ .+++++++****************
571 **********++++++++++++++++++++.... ... ..+++++++****************
572 ***********++++++++++++++++++++++....... .......++++++++*****************
573 ************+++++++++++++++++++++++...... ......++++++++******************
574 **************+++++++++++++++++++++++.... ....++++++++********************
575 ***************+++++++++++++++++++++++..... ...+++++++++*********************
576 *****************++++++++++++++++++++++.... ...++++++++***********************
577 *******************+++++++++++++++++++++......++++++++*************************
578 *********************++++++++++++++++++++++.++++++++***************************
579 *************************+++++++++++++++++++++++*******************************
580 ******************************+++++++++++++************************************
581 *******************************************************************************
582 *******************************************************************************
583 *******************************************************************************
584 Evaluated to 0.000000
585 ready> mandel(-2, -1, 0.02, 0.04);
586 **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
587 ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
588 *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
589 *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
590 *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
591 ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
592 **************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
593 ************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
594 ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ .
595 **********++++++++++++++++++++++++++++++++++++++++++++++.............
596 ********+++++++++++++++++++++++++++++++++++++++++++..................
597 *******+++++++++++++++++++++++++++++++++++++++.......................
598 ******+++++++++++++++++++++++++++++++++++...........................
599 *****++++++++++++++++++++++++++++++++............................
600 *****++++++++++++++++++++++++++++...............................
601 ****++++++++++++++++++++++++++...... .........................
602 ***++++++++++++++++++++++++......... ...... ...........
603 ***++++++++++++++++++++++............
604 **+++++++++++++++++++++..............
605 **+++++++++++++++++++................
606 *++++++++++++++++++.................
607 *++++++++++++++++............ ...
608 *++++++++++++++..............
609 *+++....++++................
610 *.......... ...........
612 *.......... ...........
613 *+++....++++................
614 *++++++++++++++..............
615 *++++++++++++++++............ ...
616 *++++++++++++++++++.................
617 **+++++++++++++++++++................
618 **+++++++++++++++++++++..............
619 ***++++++++++++++++++++++............
620 ***++++++++++++++++++++++++......... ...... ...........
621 ****++++++++++++++++++++++++++...... .........................
622 *****++++++++++++++++++++++++++++...............................
623 *****++++++++++++++++++++++++++++++++............................
624 ******+++++++++++++++++++++++++++++++++++...........................
625 *******+++++++++++++++++++++++++++++++++++++++.......................
626 ********+++++++++++++++++++++++++++++++++++++++++++..................
627 Evaluated to 0.000000
628 ready> mandel(-0.9, -1.4, 0.02, 0.03);
629 *******************************************************************************
630 *******************************************************************************
631 *******************************************************************************
632 **********+++++++++++++++++++++************************************************
633 *+++++++++++++++++++++++++++++++++++++++***************************************
634 +++++++++++++++++++++++++++++++++++++++++++++**********************************
635 ++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
636 ++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
637 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
638 +++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
639 +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++****************
640 +++++++++++++++++++++++++++++....... ........+++++++++++++++++++**************
641 ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************
642 +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++**********
643 ++++++++++++++++++++++++++........... ....++++++++++++++++++++++********
644 ++++++++++++++++++++++++............. .......++++++++++++++++++++++******
645 +++++++++++++++++++++++............. ........+++++++++++++++++++++++****
646 ++++++++++++++++++++++........... ..........++++++++++++++++++++++***
647 ++++++++++++++++++++........... .........++++++++++++++++++++++*
648 ++++++++++++++++++............ ...........++++++++++++++++++++
649 ++++++++++++++++............... .............++++++++++++++++++
650 ++++++++++++++................. ...............++++++++++++++++
651 ++++++++++++.................. .................++++++++++++++
652 +++++++++.................. .................+++++++++++++
653 ++++++........ . ......... ..++++++++++++
654 ++............ ...... ....++++++++++
655 .............. ...++++++++++
656 .............. ....+++++++++
657 .............. .....++++++++
658 ............. ......++++++++
659 ........... .......++++++++
660 ......... ........+++++++
661 ......... ........+++++++
662 ......... ....+++++++
670 Evaluated to 0.000000
673 At this point, you may be starting to realize that Kaleidoscope is a
674 real and powerful language. It may not be self-similar :), but it can be
675 used to plot things that are!
677 With this, we conclude the "adding user-defined operators" chapter of
678 the tutorial. We have successfully augmented our language, adding the
679 ability to extend the language in the library, and we have shown how
680 this can be used to build a simple but interesting end-user application
681 in Kaleidoscope. At this point, Kaleidoscope can build a variety of
682 applications that are functional and can call functions with
683 side-effects, but it can't actually define and mutate a variable itself.
685 Strikingly, variable mutation is an important feature of some languages,
686 and it is not at all obvious how to `add support for mutable
687 variables <OCamlLangImpl7.html>`_ without having to add an "SSA
688 construction" phase to your front-end. In the next chapter, we will
689 describe how you can add variable mutation without building SSA in your
695 Here is the complete code listing for our running example, enhanced with
696 the if/then/else and for expressions.. To build this example, use:
710 <{lexer,parser}.ml>: use_camlp4, pp(camlp4of)
711 <*.{byte,native}>: g++, use_llvm, use_llvm_analysis
712 <*.{byte,native}>: use_llvm_executionengine, use_llvm_target
713 <*.{byte,native}>: use_llvm_scalar_opts, use_bindings
716 .. code-block:: ocaml
718 open Ocamlbuild_plugin;;
720 ocaml_lib ~extern:true "llvm";;
721 ocaml_lib ~extern:true "llvm_analysis";;
722 ocaml_lib ~extern:true "llvm_executionengine";;
723 ocaml_lib ~extern:true "llvm_target";;
724 ocaml_lib ~extern:true "llvm_scalar_opts";;
726 flag ["link"; "ocaml"; "g++"] (S[A"-cc"; A"g++"; A"-cclib"; A"-rdynamic"]);;
727 dep ["link"; "ocaml"; "use_bindings"] ["bindings.o"];;
730 .. code-block:: ocaml
732 (*===----------------------------------------------------------------------===
734 *===----------------------------------------------------------------------===*)
736 (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
737 * these others for known things. *)
743 | Ident of string | Number of float
756 .. code-block:: ocaml
758 (*===----------------------------------------------------------------------===
760 *===----------------------------------------------------------------------===*)
763 (* Skip any whitespace. *)
764 | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
766 (* identifier: [a-zA-Z][a-zA-Z0-9] *)
767 | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
768 let buffer = Buffer.create 1 in
769 Buffer.add_char buffer c;
770 lex_ident buffer stream
772 (* number: [0-9.]+ *)
773 | [< ' ('0' .. '9' as c); stream >] ->
774 let buffer = Buffer.create 1 in
775 Buffer.add_char buffer c;
776 lex_number buffer stream
778 (* Comment until end of line. *)
779 | [< ' ('#'); stream >] ->
782 (* Otherwise, just return the character as its ascii value. *)
783 | [< 'c; stream >] ->
784 [< 'Token.Kwd c; lex stream >]
789 and lex_number buffer = parser
790 | [< ' ('0' .. '9' | '.' as c); stream >] ->
791 Buffer.add_char buffer c;
792 lex_number buffer stream
793 | [< stream=lex >] ->
794 [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
796 and lex_ident buffer = parser
797 | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
798 Buffer.add_char buffer c;
799 lex_ident buffer stream
800 | [< stream=lex >] ->
801 match Buffer.contents buffer with
802 | "def" -> [< 'Token.Def; stream >]
803 | "extern" -> [< 'Token.Extern; stream >]
804 | "if" -> [< 'Token.If; stream >]
805 | "then" -> [< 'Token.Then; stream >]
806 | "else" -> [< 'Token.Else; stream >]
807 | "for" -> [< 'Token.For; stream >]
808 | "in" -> [< 'Token.In; stream >]
809 | "binary" -> [< 'Token.Binary; stream >]
810 | "unary" -> [< 'Token.Unary; stream >]
811 | id -> [< 'Token.Ident id; stream >]
813 and lex_comment = parser
814 | [< ' ('\n'); stream=lex >] -> stream
815 | [< 'c; e=lex_comment >] -> e
819 .. code-block:: ocaml
821 (*===----------------------------------------------------------------------===
822 * Abstract Syntax Tree (aka Parse Tree)
823 *===----------------------------------------------------------------------===*)
825 (* expr - Base type for all expression nodes. *)
827 (* variant for numeric literals like "1.0". *)
830 (* variant for referencing a variable, like "a". *)
833 (* variant for a unary operator. *)
834 | Unary of char * expr
836 (* variant for a binary operator. *)
837 | Binary of char * expr * expr
839 (* variant for function calls. *)
840 | Call of string * expr array
842 (* variant for if/then/else. *)
843 | If of expr * expr * expr
845 (* variant for for/in. *)
846 | For of string * expr * expr * expr option * expr
848 (* proto - This type represents the "prototype" for a function, which captures
849 * its name, and its argument names (thus implicitly the number of arguments the
850 * function takes). *)
852 | Prototype of string * string array
853 | BinOpPrototype of string * string array * int
855 (* func - This type represents a function definition itself. *)
856 type func = Function of proto * expr
859 .. code-block:: ocaml
861 (*===---------------------------------------------------------------------===
863 *===---------------------------------------------------------------------===*)
865 (* binop_precedence - This holds the precedence for each binary operator that is
867 let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
869 (* precedence - Get the precedence of the pending binary operator token. *)
870 let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
878 let rec parse_primary = parser
879 (* numberexpr ::= number *)
880 | [< 'Token.Number n >] -> Ast.Number n
882 (* parenexpr ::= '(' expression ')' *)
883 | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
887 * ::= identifier '(' argumentexpr ')' *)
888 | [< 'Token.Ident id; stream >] ->
889 let rec parse_args accumulator = parser
890 | [< e=parse_expr; stream >] ->
892 | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
893 | [< >] -> e :: accumulator
895 | [< >] -> accumulator
897 let rec parse_ident id = parser
901 'Token.Kwd ')' ?? "expected ')'">] ->
902 Ast.Call (id, Array.of_list (List.rev args))
904 (* Simple variable ref. *)
905 | [< >] -> Ast.Variable id
907 parse_ident id stream
909 (* ifexpr ::= 'if' expr 'then' expr 'else' expr *)
910 | [< 'Token.If; c=parse_expr;
911 'Token.Then ?? "expected 'then'"; t=parse_expr;
912 'Token.Else ?? "expected 'else'"; e=parse_expr >] ->
916 ::= 'for' identifier '=' expr ',' expr (',' expr)? 'in' expression *)
918 'Token.Ident id ?? "expected identifier after for";
919 'Token.Kwd '=' ?? "expected '=' after for";
924 'Token.Kwd ',' ?? "expected ',' after for";
929 | [< 'Token.Kwd ','; step=parse_expr >] -> Some step
934 | [< 'Token.In; body=parse_expr >] ->
935 Ast.For (id, start, end_, step, body)
937 raise (Stream.Error "expected 'in' after for")
940 raise (Stream.Error "expected '=' after for")
943 | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
948 and parse_unary = parser
949 (* If this is a unary operator, read it. *)
950 | [< 'Token.Kwd op when op != '(' && op != ')'; operand=parse_expr >] ->
951 Ast.Unary (op, operand)
953 (* If the current token is not an operator, it must be a primary expr. *)
954 | [< stream >] -> parse_primary stream
957 * ::= ('+' primary)* *)
958 and parse_bin_rhs expr_prec lhs stream =
959 match Stream.peek stream with
960 (* If this is a binop, find its precedence. *)
961 | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
962 let token_prec = precedence c in
964 (* If this is a binop that binds at least as tightly as the current binop,
965 * consume it, otherwise we are done. *)
966 if token_prec < expr_prec then lhs else begin
970 (* Parse the unary expression after the binary operator. *)
971 let rhs = parse_unary stream in
973 (* Okay, we know this is a binop. *)
975 match Stream.peek stream with
976 | Some (Token.Kwd c2) ->
977 (* If BinOp binds less tightly with rhs than the operator after
978 * rhs, let the pending operator take rhs as its lhs. *)
979 let next_prec = precedence c2 in
980 if token_prec < next_prec
981 then parse_bin_rhs (token_prec + 1) rhs stream
987 let lhs = Ast.Binary (c, lhs, rhs) in
988 parse_bin_rhs expr_prec lhs stream
993 * ::= primary binoprhs *)
994 and parse_expr = parser
995 | [< lhs=parse_unary; stream >] -> parse_bin_rhs 0 lhs stream
999 * ::= binary LETTER number? (id, id)
1000 * ::= unary LETTER number? (id) *)
1001 let parse_prototype =
1002 let rec parse_args accumulator = parser
1003 | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
1004 | [< >] -> accumulator
1006 let parse_operator = parser
1007 | [< 'Token.Unary >] -> "unary", 1
1008 | [< 'Token.Binary >] -> "binary", 2
1010 let parse_binary_precedence = parser
1011 | [< 'Token.Number n >] -> int_of_float n
1015 | [< 'Token.Ident id;
1016 'Token.Kwd '(' ?? "expected '(' in prototype";
1018 'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
1020 Ast.Prototype (id, Array.of_list (List.rev args))
1021 | [< (prefix, kind)=parse_operator;
1022 'Token.Kwd op ?? "expected an operator";
1023 (* Read the precedence if present. *)
1024 binary_precedence=parse_binary_precedence;
1025 'Token.Kwd '(' ?? "expected '(' in prototype";
1027 'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
1028 let name = prefix ^ (String.make 1 op) in
1029 let args = Array.of_list (List.rev args) in
1031 (* Verify right number of arguments for operator. *)
1032 if Array.length args != kind
1033 then raise (Stream.Error "invalid number of operands for operator")
1036 Ast.Prototype (name, args)
1038 Ast.BinOpPrototype (name, args, binary_precedence)
1040 raise (Stream.Error "expected function name in prototype")
1042 (* definition ::= 'def' prototype expression *)
1043 let parse_definition = parser
1044 | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
1047 (* toplevelexpr ::= expression *)
1048 let parse_toplevel = parser
1049 | [< e=parse_expr >] ->
1050 (* Make an anonymous proto. *)
1051 Ast.Function (Ast.Prototype ("", [||]), e)
1053 (* external ::= 'extern' prototype *)
1054 let parse_extern = parser
1055 | [< 'Token.Extern; e=parse_prototype >] -> e
1058 .. code-block:: ocaml
1060 (*===----------------------------------------------------------------------===
1062 *===----------------------------------------------------------------------===*)
1066 exception Error of string
1068 let context = global_context ()
1069 let the_module = create_module context "my cool jit"
1070 let builder = builder context
1071 let named_values:(string, llvalue) Hashtbl.t = Hashtbl.create 10
1072 let double_type = double_type context
1074 let rec codegen_expr = function
1075 | Ast.Number n -> const_float double_type n
1076 | Ast.Variable name ->
1077 (try Hashtbl.find named_values name with
1078 | Not_found -> raise (Error "unknown variable name"))
1079 | Ast.Unary (op, operand) ->
1080 let operand = codegen_expr operand in
1081 let callee = "unary" ^ (String.make 1 op) in
1083 match lookup_function callee the_module with
1084 | Some callee -> callee
1085 | None -> raise (Error "unknown unary operator")
1087 build_call callee [|operand|] "unop" builder
1088 | Ast.Binary (op, lhs, rhs) ->
1089 let lhs_val = codegen_expr lhs in
1090 let rhs_val = codegen_expr rhs in
1093 | '+' -> build_add lhs_val rhs_val "addtmp" builder
1094 | '-' -> build_sub lhs_val rhs_val "subtmp" builder
1095 | '*' -> build_mul lhs_val rhs_val "multmp" builder
1097 (* Convert bool 0/1 to double 0.0 or 1.0 *)
1098 let i = build_fcmp Fcmp.Ult lhs_val rhs_val "cmptmp" builder in
1099 build_uitofp i double_type "booltmp" builder
1101 (* If it wasn't a builtin binary operator, it must be a user defined
1102 * one. Emit a call to it. *)
1103 let callee = "binary" ^ (String.make 1 op) in
1105 match lookup_function callee the_module with
1106 | Some callee -> callee
1107 | None -> raise (Error "binary operator not found!")
1109 build_call callee [|lhs_val; rhs_val|] "binop" builder
1111 | Ast.Call (callee, args) ->
1112 (* Look up the name in the module table. *)
1114 match lookup_function callee the_module with
1115 | Some callee -> callee
1116 | None -> raise (Error "unknown function referenced")
1118 let params = params callee in
1120 (* If argument mismatch error. *)
1121 if Array.length params == Array.length args then () else
1122 raise (Error "incorrect # arguments passed");
1123 let args = Array.map codegen_expr args in
1124 build_call callee args "calltmp" builder
1125 | Ast.If (cond, then_, else_) ->
1126 let cond = codegen_expr cond in
1128 (* Convert condition to a bool by comparing equal to 0.0 *)
1129 let zero = const_float double_type 0.0 in
1130 let cond_val = build_fcmp Fcmp.One cond zero "ifcond" builder in
1132 (* Grab the first block so that we might later add the conditional branch
1133 * to it at the end of the function. *)
1134 let start_bb = insertion_block builder in
1135 let the_function = block_parent start_bb in
1137 let then_bb = append_block context "then" the_function in
1139 (* Emit 'then' value. *)
1140 position_at_end then_bb builder;
1141 let then_val = codegen_expr then_ in
1143 (* Codegen of 'then' can change the current block, update then_bb for the
1144 * phi. We create a new name because one is used for the phi node, and the
1145 * other is used for the conditional branch. *)
1146 let new_then_bb = insertion_block builder in
1148 (* Emit 'else' value. *)
1149 let else_bb = append_block context "else" the_function in
1150 position_at_end else_bb builder;
1151 let else_val = codegen_expr else_ in
1153 (* Codegen of 'else' can change the current block, update else_bb for the
1155 let new_else_bb = insertion_block builder in
1157 (* Emit merge block. *)
1158 let merge_bb = append_block context "ifcont" the_function in
1159 position_at_end merge_bb builder;
1160 let incoming = [(then_val, new_then_bb); (else_val, new_else_bb)] in
1161 let phi = build_phi incoming "iftmp" builder in
1163 (* Return to the start block to add the conditional branch. *)
1164 position_at_end start_bb builder;
1165 ignore (build_cond_br cond_val then_bb else_bb builder);
1167 (* Set a unconditional branch at the end of the 'then' block and the
1168 * 'else' block to the 'merge' block. *)
1169 position_at_end new_then_bb builder; ignore (build_br merge_bb builder);
1170 position_at_end new_else_bb builder; ignore (build_br merge_bb builder);
1172 (* Finally, set the builder to the end of the merge block. *)
1173 position_at_end merge_bb builder;
1176 | Ast.For (var_name, start, end_, step, body) ->
1177 (* Emit the start code first, without 'variable' in scope. *)
1178 let start_val = codegen_expr start in
1180 (* Make the new basic block for the loop header, inserting after current
1182 let preheader_bb = insertion_block builder in
1183 let the_function = block_parent preheader_bb in
1184 let loop_bb = append_block context "loop" the_function in
1186 (* Insert an explicit fall through from the current block to the
1188 ignore (build_br loop_bb builder);
1190 (* Start insertion in loop_bb. *)
1191 position_at_end loop_bb builder;
1193 (* Start the PHI node with an entry for start. *)
1194 let variable = build_phi [(start_val, preheader_bb)] var_name builder in
1196 (* Within the loop, the variable is defined equal to the PHI node. If it
1197 * shadows an existing variable, we have to restore it, so save it
1200 try Some (Hashtbl.find named_values var_name) with Not_found -> None
1202 Hashtbl.add named_values var_name variable;
1204 (* Emit the body of the loop. This, like any other expr, can change the
1205 * current BB. Note that we ignore the value computed by the body, but
1206 * don't allow an error *)
1207 ignore (codegen_expr body);
1209 (* Emit the step value. *)
1212 | Some step -> codegen_expr step
1213 (* If not specified, use 1.0. *)
1214 | None -> const_float double_type 1.0
1217 let next_var = build_add variable step_val "nextvar" builder in
1219 (* Compute the end condition. *)
1220 let end_cond = codegen_expr end_ in
1222 (* Convert condition to a bool by comparing equal to 0.0. *)
1223 let zero = const_float double_type 0.0 in
1224 let end_cond = build_fcmp Fcmp.One end_cond zero "loopcond" builder in
1226 (* Create the "after loop" block and insert it. *)
1227 let loop_end_bb = insertion_block builder in
1228 let after_bb = append_block context "afterloop" the_function in
1230 (* Insert the conditional branch into the end of loop_end_bb. *)
1231 ignore (build_cond_br end_cond loop_bb after_bb builder);
1233 (* Any new code will be inserted in after_bb. *)
1234 position_at_end after_bb builder;
1236 (* Add a new entry to the PHI node for the backedge. *)
1237 add_incoming (next_var, loop_end_bb) variable;
1239 (* Restore the unshadowed variable. *)
1240 begin match old_val with
1241 | Some old_val -> Hashtbl.add named_values var_name old_val
1245 (* for expr always returns 0.0. *)
1246 const_null double_type
1248 let codegen_proto = function
1249 | Ast.Prototype (name, args) | Ast.BinOpPrototype (name, args, _) ->
1250 (* Make the function type: double(double,double) etc. *)
1251 let doubles = Array.make (Array.length args) double_type in
1252 let ft = function_type double_type doubles in
1254 match lookup_function name the_module with
1255 | None -> declare_function name ft the_module
1257 (* If 'f' conflicted, there was already something named 'name'. If it
1258 * has a body, don't allow redefinition or reextern. *)
1260 (* If 'f' already has a body, reject this. *)
1261 if block_begin f <> At_end f then
1262 raise (Error "redefinition of function");
1264 (* If 'f' took a different number of arguments, reject. *)
1265 if element_type (type_of f) <> ft then
1266 raise (Error "redefinition of function with different # args");
1270 (* Set names for all arguments. *)
1271 Array.iteri (fun i a ->
1274 Hashtbl.add named_values n a;
1278 let codegen_func the_fpm = function
1279 | Ast.Function (proto, body) ->
1280 Hashtbl.clear named_values;
1281 let the_function = codegen_proto proto in
1283 (* If this is an operator, install it. *)
1284 begin match proto with
1285 | Ast.BinOpPrototype (name, args, prec) ->
1286 let op = name.[String.length name - 1] in
1287 Hashtbl.add Parser.binop_precedence op prec;
1291 (* Create a new basic block to start insertion into. *)
1292 let bb = append_block context "entry" the_function in
1293 position_at_end bb builder;
1296 let ret_val = codegen_expr body in
1298 (* Finish off the function. *)
1299 let _ = build_ret ret_val builder in
1301 (* Validate the generated code, checking for consistency. *)
1302 Llvm_analysis.assert_valid_function the_function;
1304 (* Optimize the function. *)
1305 let _ = PassManager.run_function the_function the_fpm in
1309 delete_function the_function;
1313 .. code-block:: ocaml
1315 (*===----------------------------------------------------------------------===
1316 * Top-Level parsing and JIT Driver
1317 *===----------------------------------------------------------------------===*)
1320 open Llvm_executionengine
1322 (* top ::= definition | external | expression | ';' *)
1323 let rec main_loop the_fpm the_execution_engine stream =
1324 match Stream.peek stream with
1327 (* ignore top-level semicolons. *)
1328 | Some (Token.Kwd ';') ->
1330 main_loop the_fpm the_execution_engine stream
1334 try match token with
1336 let e = Parser.parse_definition stream in
1337 print_endline "parsed a function definition.";
1338 dump_value (Codegen.codegen_func the_fpm e);
1340 let e = Parser.parse_extern stream in
1341 print_endline "parsed an extern.";
1342 dump_value (Codegen.codegen_proto e);
1344 (* Evaluate a top-level expression into an anonymous function. *)
1345 let e = Parser.parse_toplevel stream in
1346 print_endline "parsed a top-level expr";
1347 let the_function = Codegen.codegen_func the_fpm e in
1348 dump_value the_function;
1350 (* JIT the function, returning a function pointer. *)
1351 let result = ExecutionEngine.run_function the_function [||]
1352 the_execution_engine in
1354 print_string "Evaluated to ";
1355 print_float (GenericValue.as_float Codegen.double_type result);
1357 with Stream.Error s | Codegen.Error s ->
1358 (* Skip token for error recovery. *)
1362 print_string "ready> "; flush stdout;
1363 main_loop the_fpm the_execution_engine stream
1366 .. code-block:: ocaml
1368 (*===----------------------------------------------------------------------===
1370 *===----------------------------------------------------------------------===*)
1373 open Llvm_executionengine
1375 open Llvm_scalar_opts
1378 ignore (initialize_native_target ());
1380 (* Install standard binary operators.
1381 * 1 is the lowest precedence. *)
1382 Hashtbl.add Parser.binop_precedence '<' 10;
1383 Hashtbl.add Parser.binop_precedence '+' 20;
1384 Hashtbl.add Parser.binop_precedence '-' 20;
1385 Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *)
1387 (* Prime the first token. *)
1388 print_string "ready> "; flush stdout;
1389 let stream = Lexer.lex (Stream.of_channel stdin) in
1391 (* Create the JIT. *)
1392 let the_execution_engine = ExecutionEngine.create Codegen.the_module in
1393 let the_fpm = PassManager.create_function Codegen.the_module in
1395 (* Set up the optimizer pipeline. Start with registering info about how the
1396 * target lays out data structures. *)
1397 DataLayout.add (ExecutionEngine.target_data the_execution_engine) the_fpm;
1399 (* Do simple "peephole" optimizations and bit-twiddling optzn. *)
1400 add_instruction_combination the_fpm;
1402 (* reassociate expressions. *)
1403 add_reassociation the_fpm;
1405 (* Eliminate Common SubExpressions. *)
1408 (* Simplify the control flow graph (deleting unreachable blocks, etc). *)
1409 add_cfg_simplification the_fpm;
1411 ignore (PassManager.initialize the_fpm);
1413 (* Run the main "interpreter loop" now. *)
1414 Toplevel.main_loop the_fpm the_execution_engine stream;
1416 (* Print out all the generated code. *)
1417 dump_module Codegen.the_module
1427 /* putchard - putchar that takes a double and returns 0. */
1428 extern double putchard(double X) {
1433 /* printd - printf that takes a double prints it as "%f\n", returning 0. */
1434 extern double printd(double X) {
1439 `Next: Extending the language: mutable variables / SSA
1440 construction <OCamlLangImpl7.html>`_