1 ===========================================
2 Kaleidoscope: Implementing a Parser and AST
3 ===========================================
11 Welcome to Chapter 2 of the "`Implementing a language with LLVM in
12 Objective Caml <index.html>`_" tutorial. This chapter shows you how to
13 use the lexer, built in `Chapter 1 <OCamlLangImpl1.html>`_, to build a
14 full `parser <http://en.wikipedia.org/wiki/Parsing>`_ for our
15 Kaleidoscope language. Once we have a parser, we'll define and build an
17 Tree <http://en.wikipedia.org/wiki/Abstract_syntax_tree>`_ (AST).
19 The parser we will build uses a combination of `Recursive Descent
20 Parsing <http://en.wikipedia.org/wiki/Recursive_descent_parser>`_ and
22 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_ to
23 parse the Kaleidoscope language (the latter for binary expressions and
24 the former for everything else). Before we get to parsing though, lets
25 talk about the output of the parser: the Abstract Syntax Tree.
27 The Abstract Syntax Tree (AST)
28 ==============================
30 The AST for a program captures its behavior in such a way that it is
31 easy for later stages of the compiler (e.g. code generation) to
32 interpret. We basically want one object for each construct in the
33 language, and the AST should closely model the language. In
34 Kaleidoscope, we have expressions, a prototype, and a function object.
35 We'll start with expressions first:
39 (* expr - Base type for all expression nodes. *)
41 (* variant for numeric literals like "1.0". *)
44 The code above shows the definition of the base ExprAST class and one
45 subclass which we use for numeric literals. The important thing to note
46 about this code is that the Number variant captures the numeric value of
47 the literal as an instance variable. This allows later phases of the
48 compiler to know what the stored numeric value is.
50 Right now we only create the AST, so there are no useful functions on
51 them. It would be very easy to add a function to pretty print the code,
52 for example. Here are the other expression AST node definitions that
53 we'll use in the basic form of the Kaleidoscope language:
57 (* variant for referencing a variable, like "a". *)
60 (* variant for a binary operator. *)
61 | Binary of char * expr * expr
63 (* variant for function calls. *)
64 | Call of string * expr array
66 This is all (intentionally) rather straight-forward: variables capture
67 the variable name, binary operators capture their opcode (e.g. '+'), and
68 calls capture a function name as well as a list of any argument
69 expressions. One thing that is nice about our AST is that it captures
70 the language features without talking about the syntax of the language.
71 Note that there is no discussion about precedence of binary operators,
72 lexical structure, etc.
74 For our basic language, these are all of the expression nodes we'll
75 define. Because it doesn't have conditional control flow, it isn't
76 Turing-complete; we'll fix that in a later installment. The two things
77 we need next are a way to talk about the interface to a function, and a
78 way to talk about functions themselves:
82 (* proto - This type represents the "prototype" for a function, which captures
83 * its name, and its argument names (thus implicitly the number of arguments the
85 type proto = Prototype of string * string array
87 (* func - This type represents a function definition itself. *)
88 type func = Function of proto * expr
90 In Kaleidoscope, functions are typed with just a count of their
91 arguments. Since all values are double precision floating point, the
92 type of each argument doesn't need to be stored anywhere. In a more
93 aggressive and realistic language, the "expr" variants would probably
96 With this scaffolding, we can now talk about parsing expressions and
97 function bodies in Kaleidoscope.
102 Now that we have an AST to build, we need to define the parser code to
103 build it. The idea here is that we want to parse something like "x+y"
104 (which is returned as three tokens by the lexer) into an AST that could
105 be generated with calls like this:
107 .. code-block:: ocaml
109 let x = Variable "x" in
110 let y = Variable "y" in
111 let result = Binary ('+', x, y) in
114 The error handling routines make use of the builtin ``Stream.Failure``
115 and ``Stream.Error``s. ``Stream.Failure`` is raised when the parser is
116 unable to find any matching token in the first position of a pattern.
117 ``Stream.Error`` is raised when the first token matches, but the rest do
118 not. The error recovery in our parser will not be the best and is not
119 particular user-friendly, but it will be enough for our tutorial. These
120 exceptions make it easier to handle errors in routines that have various
123 With these basic types and exceptions, we can implement the first piece
124 of our grammar: numeric literals.
126 Basic Expression Parsing
127 ========================
129 We start with numeric literals, because they are the simplest to
130 process. For each production in our grammar, we'll define a function
131 which parses that production. We call this class of expressions
132 "primary" expressions, for reasons that will become more clear `later in
133 the tutorial <OCamlLangImpl6.html#user-defined-unary-operators>`_. In order to parse an
134 arbitrary primary expression, we need to determine what sort of
135 expression it is. For numeric literals, we have:
137 .. code-block:: ocaml
143 parse_primary = parser
144 (* numberexpr ::= number *)
145 | [< 'Token.Number n >] -> Ast.Number n
147 This routine is very simple: it expects to be called when the current
148 token is a ``Token.Number`` token. It takes the current number value,
149 creates a ``Ast.Number`` node, advances the lexer to the next token, and
152 There are some interesting aspects to this. The most important one is
153 that this routine eats all of the tokens that correspond to the
154 production and returns the lexer buffer with the next token (which is
155 not part of the grammar production) ready to go. This is a fairly
156 standard way to go for recursive descent parsers. For a better example,
157 the parenthesis operator is defined like this:
159 .. code-block:: ocaml
161 (* parenexpr ::= '(' expression ')' *)
162 | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
164 This function illustrates a number of interesting things about the
167 1) It shows how we use the ``Stream.Error`` exception. When called, this
168 function expects that the current token is a '(' token, but after
169 parsing the subexpression, it is possible that there is no ')' waiting.
170 For example, if the user types in "(4 x" instead of "(4)", the parser
171 should emit an error. Because errors can occur, the parser needs a way
172 to indicate that they happened. In our parser, we use the camlp4
173 shortcut syntax ``token ?? "parse error"``, where if the token before
174 the ``??`` does not match, then ``Stream.Error "parse error"`` will be
177 2) Another interesting aspect of this function is that it uses recursion
178 by calling ``Parser.parse_primary`` (we will soon see that
179 ``Parser.parse_primary`` can call ``Parser.parse_primary``). This is
180 powerful because it allows us to handle recursive grammars, and keeps
181 each production very simple. Note that parentheses do not cause
182 construction of AST nodes themselves. While we could do it this way, the
183 most important role of parentheses are to guide the parser and provide
184 grouping. Once the parser constructs the AST, parentheses are not
187 The next simple production is for handling variable references and
190 .. code-block:: ocaml
194 * ::= identifier '(' argumentexpr ')' *)
195 | [< 'Token.Ident id; stream >] ->
196 let rec parse_args accumulator = parser
197 | [< e=parse_expr; stream >] ->
199 | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
200 | [< >] -> e :: accumulator
202 | [< >] -> accumulator
204 let rec parse_ident id = parser
208 'Token.Kwd ')' ?? "expected ')'">] ->
209 Ast.Call (id, Array.of_list (List.rev args))
211 (* Simple variable ref. *)
212 | [< >] -> Ast.Variable id
214 parse_ident id stream
216 This routine follows the same style as the other routines. (It expects
217 to be called if the current token is a ``Token.Ident`` token). It also
218 has recursion and error handling. One interesting aspect of this is that
219 it uses *look-ahead* to determine if the current identifier is a stand
220 alone variable reference or if it is a function call expression. It
221 handles this by checking to see if the token after the identifier is a
222 '(' token, constructing either a ``Ast.Variable`` or ``Ast.Call`` node
225 We finish up by raising an exception if we received a token we didn't
228 .. code-block:: ocaml
230 | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
232 Now that basic expressions are handled, we need to handle binary
233 expressions. They are a bit more complex.
235 Binary Expression Parsing
236 =========================
238 Binary expressions are significantly harder to parse because they are
239 often ambiguous. For example, when given the string "x+y\*z", the parser
240 can choose to parse it as either "(x+y)\*z" or "x+(y\*z)". With common
241 definitions from mathematics, we expect the later parse, because "\*"
242 (multiplication) has higher *precedence* than "+" (addition).
244 There are many ways to handle this, but an elegant and efficient way is
245 to use `Operator-Precedence
246 Parsing <http://en.wikipedia.org/wiki/Operator-precedence_parser>`_.
247 This parsing technique uses the precedence of binary operators to guide
248 recursion. To start with, we need a table of precedences:
250 .. code-block:: ocaml
252 (* binop_precedence - This holds the precedence for each binary operator that is
254 let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
256 (* precedence - Get the precedence of the pending binary operator token. *)
257 let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
262 (* Install standard binary operators.
263 * 1 is the lowest precedence. *)
264 Hashtbl.add Parser.binop_precedence '<' 10;
265 Hashtbl.add Parser.binop_precedence '+' 20;
266 Hashtbl.add Parser.binop_precedence '-' 20;
267 Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *)
270 For the basic form of Kaleidoscope, we will only support 4 binary
271 operators (this can obviously be extended by you, our brave and intrepid
272 reader). The ``Parser.precedence`` function returns the precedence for
273 the current token, or -1 if the token is not a binary operator. Having a
274 ``Hashtbl.t`` makes it easy to add new operators and makes it clear that
275 the algorithm doesn't depend on the specific operators involved, but it
276 would be easy enough to eliminate the ``Hashtbl.t`` and do the
277 comparisons in the ``Parser.precedence`` function. (Or just use a
280 With the helper above defined, we can now start parsing binary
281 expressions. The basic idea of operator precedence parsing is to break
282 down an expression with potentially ambiguous binary operators into
283 pieces. Consider, for example, the expression "a+b+(c+d)\*e\*f+g".
284 Operator precedence parsing considers this as a stream of primary
285 expressions separated by binary operators. As such, it will first parse
286 the leading primary expression "a", then it will see the pairs [+, b]
287 [+, (c+d)] [\*, e] [\*, f] and [+, g]. Note that because parentheses are
288 primary expressions, the binary expression parser doesn't need to worry
289 about nested subexpressions like (c+d) at all.
291 To start, an expression is a primary expression potentially followed by
292 a sequence of [binop,primaryexpr] pairs:
294 .. code-block:: ocaml
297 * ::= primary binoprhs *)
298 and parse_expr = parser
299 | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
301 ``Parser.parse_bin_rhs`` is the function that parses the sequence of
302 pairs for us. It takes a precedence and a pointer to an expression for
303 the part that has been parsed so far. Note that "x" is a perfectly valid
304 expression: As such, "binoprhs" is allowed to be empty, in which case it
305 returns the expression that is passed into it. In our example above, the
306 code passes the expression for "a" into ``Parser.parse_bin_rhs`` and the
307 current token is "+".
309 The precedence value passed into ``Parser.parse_bin_rhs`` indicates the
310 *minimal operator precedence* that the function is allowed to eat. For
311 example, if the current pair stream is [+, x] and
312 ``Parser.parse_bin_rhs`` is passed in a precedence of 40, it will not
313 consume any tokens (because the precedence of '+' is only 20). With this
314 in mind, ``Parser.parse_bin_rhs`` starts with:
316 .. code-block:: ocaml
319 * ::= ('+' primary)* *)
320 and parse_bin_rhs expr_prec lhs stream =
321 match Stream.peek stream with
322 (* If this is a binop, find its precedence. *)
323 | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
324 let token_prec = precedence c in
326 (* If this is a binop that binds at least as tightly as the current binop,
327 * consume it, otherwise we are done. *)
328 if token_prec < expr_prec then lhs else begin
330 This code gets the precedence of the current token and checks to see if
331 if is too low. Because we defined invalid tokens to have a precedence of
332 -1, this check implicitly knows that the pair-stream ends when the token
333 stream runs out of binary operators. If this check succeeds, we know
334 that the token is a binary operator and that it will be included in this
337 .. code-block:: ocaml
342 (* Parse the primary expression after the binary operator *)
343 let rhs = parse_primary stream in
345 (* Okay, we know this is a binop. *)
347 match Stream.peek stream with
348 | Some (Token.Kwd c2) ->
350 As such, this code eats (and remembers) the binary operator and then
351 parses the primary expression that follows. This builds up the whole
352 pair, the first of which is [+, b] for the running example.
354 Now that we parsed the left-hand side of an expression and one pair of
355 the RHS sequence, we have to decide which way the expression associates.
356 In particular, we could have "(a+b) binop unparsed" or "a + (b binop
357 unparsed)". To determine this, we look ahead at "binop" to determine its
358 precedence and compare it to BinOp's precedence (which is '+' in this
361 .. code-block:: ocaml
363 (* If BinOp binds less tightly with rhs than the operator after
364 * rhs, let the pending operator take rhs as its lhs. *)
365 let next_prec = precedence c2 in
366 if token_prec < next_prec
368 If the precedence of the binop to the right of "RHS" is lower or equal
369 to the precedence of our current operator, then we know that the
370 parentheses associate as "(a+b) binop ...". In our example, the current
371 operator is "+" and the next operator is "+", we know that they have the
372 same precedence. In this case we'll create the AST node for "a+b", and
373 then continue parsing:
375 .. code-block:: ocaml
377 ... if body omitted ...
381 let lhs = Ast.Binary (c, lhs, rhs) in
382 parse_bin_rhs expr_prec lhs stream
385 In our example above, this will turn "a+b+" into "(a+b)" and execute the
386 next iteration of the loop, with "+" as the current token. The code
387 above will eat, remember, and parse "(c+d)" as the primary expression,
388 which makes the current pair equal to [+, (c+d)]. It will then evaluate
389 the 'if' conditional above with "\*" as the binop to the right of the
390 primary. In this case, the precedence of "\*" is higher than the
391 precedence of "+" so the if condition will be entered.
393 The critical question left here is "how can the if condition parse the
394 right hand side in full"? In particular, to build the AST correctly for
395 our example, it needs to get all of "(c+d)\*e\*f" as the RHS expression
396 variable. The code to do this is surprisingly simple (code from the
397 above two blocks duplicated for context):
399 .. code-block:: ocaml
401 match Stream.peek stream with
402 | Some (Token.Kwd c2) ->
403 (* If BinOp binds less tightly with rhs than the operator after
404 * rhs, let the pending operator take rhs as its lhs. *)
405 if token_prec < precedence c2
406 then parse_bin_rhs (token_prec + 1) rhs stream
412 let lhs = Ast.Binary (c, lhs, rhs) in
413 parse_bin_rhs expr_prec lhs stream
416 At this point, we know that the binary operator to the RHS of our
417 primary has higher precedence than the binop we are currently parsing.
418 As such, we know that any sequence of pairs whose operators are all
419 higher precedence than "+" should be parsed together and returned as
420 "RHS". To do this, we recursively invoke the ``Parser.parse_bin_rhs``
421 function specifying "token\_prec+1" as the minimum precedence required
422 for it to continue. In our example above, this will cause it to return
423 the AST node for "(c+d)\*e\*f" as RHS, which is then set as the RHS of
426 Finally, on the next iteration of the while loop, the "+g" piece is
427 parsed and added to the AST. With this little bit of code (14
428 non-trivial lines), we correctly handle fully general binary expression
429 parsing in a very elegant way. This was a whirlwind tour of this code,
430 and it is somewhat subtle. I recommend running through it with a few
431 tough examples to see how it works.
433 This wraps up handling of expressions. At this point, we can point the
434 parser at an arbitrary token stream and build an expression from it,
435 stopping at the first token that is not part of the expression. Next up
436 we need to handle function definitions, etc.
441 The next thing missing is handling of function prototypes. In
442 Kaleidoscope, these are used both for 'extern' function declarations as
443 well as function body definitions. The code to do this is
444 straight-forward and not very interesting (once you've survived
447 .. code-block:: ocaml
450 * ::= id '(' id* ')' *)
451 let parse_prototype =
452 let rec parse_args accumulator = parser
453 | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
454 | [< >] -> accumulator
458 | [< 'Token.Ident id;
459 'Token.Kwd '(' ?? "expected '(' in prototype";
461 'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
463 Ast.Prototype (id, Array.of_list (List.rev args))
466 raise (Stream.Error "expected function name in prototype")
468 Given this, a function definition is very simple, just a prototype plus
469 an expression to implement the body:
471 .. code-block:: ocaml
473 (* definition ::= 'def' prototype expression *)
474 let parse_definition = parser
475 | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
478 In addition, we support 'extern' to declare functions like 'sin' and
479 'cos' as well as to support forward declaration of user functions. These
480 'extern's are just prototypes with no body:
482 .. code-block:: ocaml
484 (* external ::= 'extern' prototype *)
485 let parse_extern = parser
486 | [< 'Token.Extern; e=parse_prototype >] -> e
488 Finally, we'll also let the user type in arbitrary top-level expressions
489 and evaluate them on the fly. We will handle this by defining anonymous
490 nullary (zero argument) functions for them:
492 .. code-block:: ocaml
494 (* toplevelexpr ::= expression *)
495 let parse_toplevel = parser
496 | [< e=parse_expr >] ->
497 (* Make an anonymous proto. *)
498 Ast.Function (Ast.Prototype ("", [||]), e)
500 Now that we have all the pieces, let's build a little driver that will
501 let us actually *execute* this code we've built!
506 The driver for this simply invokes all of the parsing pieces with a
507 top-level dispatch loop. There isn't much interesting here, so I'll just
508 include the top-level loop. See `below <#full-code-listing>`_ for full code in the
509 "Top-Level Parsing" section.
511 .. code-block:: ocaml
513 (* top ::= definition | external | expression | ';' *)
514 let rec main_loop stream =
515 match Stream.peek stream with
518 (* ignore top-level semicolons. *)
519 | Some (Token.Kwd ';') ->
527 ignore(Parser.parse_definition stream);
528 print_endline "parsed a function definition.";
530 ignore(Parser.parse_extern stream);
531 print_endline "parsed an extern.";
533 (* Evaluate a top-level expression into an anonymous function. *)
534 ignore(Parser.parse_toplevel stream);
535 print_endline "parsed a top-level expr";
536 with Stream.Error s ->
537 (* Skip token for error recovery. *)
541 print_string "ready> "; flush stdout;
544 The most interesting part of this is that we ignore top-level
545 semicolons. Why is this, you ask? The basic reason is that if you type
546 "4 + 5" at the command line, the parser doesn't know whether that is the
547 end of what you will type or not. For example, on the next line you
548 could type "def foo..." in which case 4+5 is the end of a top-level
549 expression. Alternatively you could type "\* 6", which would continue
550 the expression. Having top-level semicolons allows you to type "4+5;",
551 and the parser will know you are done.
556 With just under 300 lines of commented code (240 lines of non-comment,
557 non-blank code), we fully defined our minimal language, including a
558 lexer, parser, and AST builder. With this done, the executable will
559 validate Kaleidoscope code and tell us if it is grammatically invalid.
560 For example, here is a sample interaction:
565 ready> def foo(x y) x+foo(y, 4.0);
566 Parsed a function definition.
567 ready> def foo(x y) x+y y;
568 Parsed a function definition.
569 Parsed a top-level expr
570 ready> def foo(x y) x+y );
571 Parsed a function definition.
572 Error: unknown token when expecting an expression
573 ready> extern sin(a);
574 ready> Parsed an extern
578 There is a lot of room for extension here. You can define new AST nodes,
579 extend the language in many ways, etc. In the `next
580 installment <OCamlLangImpl3.html>`_, we will describe how to generate
581 LLVM Intermediate Representation (IR) from the AST.
586 Here is the complete code listing for this and the previous chapter.
587 Note that it is fully self-contained: you don't need LLVM or any
588 external libraries at all for this. (Besides the ocaml standard
589 libraries, of course.) To build this, just compile with:
603 <{lexer,parser}.ml>: use_camlp4, pp(camlp4of)
606 .. code-block:: ocaml
608 (*===----------------------------------------------------------------------===
610 *===----------------------------------------------------------------------===*)
612 (* The lexer returns these 'Kwd' if it is an unknown character, otherwise one of
613 * these others for known things. *)
619 | Ident of string | Number of float
625 .. code-block:: ocaml
627 (*===----------------------------------------------------------------------===
629 *===----------------------------------------------------------------------===*)
632 (* Skip any whitespace. *)
633 | [< ' (' ' | '\n' | '\r' | '\t'); stream >] -> lex stream
635 (* identifier: [a-zA-Z][a-zA-Z0-9] *)
636 | [< ' ('A' .. 'Z' | 'a' .. 'z' as c); stream >] ->
637 let buffer = Buffer.create 1 in
638 Buffer.add_char buffer c;
639 lex_ident buffer stream
641 (* number: [0-9.]+ *)
642 | [< ' ('0' .. '9' as c); stream >] ->
643 let buffer = Buffer.create 1 in
644 Buffer.add_char buffer c;
645 lex_number buffer stream
647 (* Comment until end of line. *)
648 | [< ' ('#'); stream >] ->
651 (* Otherwise, just return the character as its ascii value. *)
652 | [< 'c; stream >] ->
653 [< 'Token.Kwd c; lex stream >]
658 and lex_number buffer = parser
659 | [< ' ('0' .. '9' | '.' as c); stream >] ->
660 Buffer.add_char buffer c;
661 lex_number buffer stream
662 | [< stream=lex >] ->
663 [< 'Token.Number (float_of_string (Buffer.contents buffer)); stream >]
665 and lex_ident buffer = parser
666 | [< ' ('A' .. 'Z' | 'a' .. 'z' | '0' .. '9' as c); stream >] ->
667 Buffer.add_char buffer c;
668 lex_ident buffer stream
669 | [< stream=lex >] ->
670 match Buffer.contents buffer with
671 | "def" -> [< 'Token.Def; stream >]
672 | "extern" -> [< 'Token.Extern; stream >]
673 | id -> [< 'Token.Ident id; stream >]
675 and lex_comment = parser
676 | [< ' ('\n'); stream=lex >] -> stream
677 | [< 'c; e=lex_comment >] -> e
681 .. code-block:: ocaml
683 (*===----------------------------------------------------------------------===
684 * Abstract Syntax Tree (aka Parse Tree)
685 *===----------------------------------------------------------------------===*)
687 (* expr - Base type for all expression nodes. *)
689 (* variant for numeric literals like "1.0". *)
692 (* variant for referencing a variable, like "a". *)
695 (* variant for a binary operator. *)
696 | Binary of char * expr * expr
698 (* variant for function calls. *)
699 | Call of string * expr array
701 (* proto - This type represents the "prototype" for a function, which captures
702 * its name, and its argument names (thus implicitly the number of arguments the
703 * function takes). *)
704 type proto = Prototype of string * string array
706 (* func - This type represents a function definition itself. *)
707 type func = Function of proto * expr
710 .. code-block:: ocaml
712 (*===---------------------------------------------------------------------===
714 *===---------------------------------------------------------------------===*)
716 (* binop_precedence - This holds the precedence for each binary operator that is
718 let binop_precedence:(char, int) Hashtbl.t = Hashtbl.create 10
720 (* precedence - Get the precedence of the pending binary operator token. *)
721 let precedence c = try Hashtbl.find binop_precedence c with Not_found -> -1
727 let rec parse_primary = parser
728 (* numberexpr ::= number *)
729 | [< 'Token.Number n >] -> Ast.Number n
731 (* parenexpr ::= '(' expression ')' *)
732 | [< 'Token.Kwd '('; e=parse_expr; 'Token.Kwd ')' ?? "expected ')'" >] -> e
736 * ::= identifier '(' argumentexpr ')' *)
737 | [< 'Token.Ident id; stream >] ->
738 let rec parse_args accumulator = parser
739 | [< e=parse_expr; stream >] ->
741 | [< 'Token.Kwd ','; e=parse_args (e :: accumulator) >] -> e
742 | [< >] -> e :: accumulator
744 | [< >] -> accumulator
746 let rec parse_ident id = parser
750 'Token.Kwd ')' ?? "expected ')'">] ->
751 Ast.Call (id, Array.of_list (List.rev args))
753 (* Simple variable ref. *)
754 | [< >] -> Ast.Variable id
756 parse_ident id stream
758 | [< >] -> raise (Stream.Error "unknown token when expecting an expression.")
761 * ::= ('+' primary)* *)
762 and parse_bin_rhs expr_prec lhs stream =
763 match Stream.peek stream with
764 (* If this is a binop, find its precedence. *)
765 | Some (Token.Kwd c) when Hashtbl.mem binop_precedence c ->
766 let token_prec = precedence c in
768 (* If this is a binop that binds at least as tightly as the current binop,
769 * consume it, otherwise we are done. *)
770 if token_prec < expr_prec then lhs else begin
774 (* Parse the primary expression after the binary operator. *)
775 let rhs = parse_primary stream in
777 (* Okay, we know this is a binop. *)
779 match Stream.peek stream with
780 | Some (Token.Kwd c2) ->
781 (* If BinOp binds less tightly with rhs than the operator after
782 * rhs, let the pending operator take rhs as its lhs. *)
783 let next_prec = precedence c2 in
784 if token_prec < next_prec
785 then parse_bin_rhs (token_prec + 1) rhs stream
791 let lhs = Ast.Binary (c, lhs, rhs) in
792 parse_bin_rhs expr_prec lhs stream
797 * ::= primary binoprhs *)
798 and parse_expr = parser
799 | [< lhs=parse_primary; stream >] -> parse_bin_rhs 0 lhs stream
802 * ::= id '(' id* ')' *)
803 let parse_prototype =
804 let rec parse_args accumulator = parser
805 | [< 'Token.Ident id; e=parse_args (id::accumulator) >] -> e
806 | [< >] -> accumulator
810 | [< 'Token.Ident id;
811 'Token.Kwd '(' ?? "expected '(' in prototype";
813 'Token.Kwd ')' ?? "expected ')' in prototype" >] ->
815 Ast.Prototype (id, Array.of_list (List.rev args))
818 raise (Stream.Error "expected function name in prototype")
820 (* definition ::= 'def' prototype expression *)
821 let parse_definition = parser
822 | [< 'Token.Def; p=parse_prototype; e=parse_expr >] ->
825 (* toplevelexpr ::= expression *)
826 let parse_toplevel = parser
827 | [< e=parse_expr >] ->
828 (* Make an anonymous proto. *)
829 Ast.Function (Ast.Prototype ("", [||]), e)
831 (* external ::= 'extern' prototype *)
832 let parse_extern = parser
833 | [< 'Token.Extern; e=parse_prototype >] -> e
836 .. code-block:: ocaml
838 (*===----------------------------------------------------------------------===
839 * Top-Level parsing and JIT Driver
840 *===----------------------------------------------------------------------===*)
842 (* top ::= definition | external | expression | ';' *)
843 let rec main_loop stream =
844 match Stream.peek stream with
847 (* ignore top-level semicolons. *)
848 | Some (Token.Kwd ';') ->
856 ignore(Parser.parse_definition stream);
857 print_endline "parsed a function definition.";
859 ignore(Parser.parse_extern stream);
860 print_endline "parsed an extern.";
862 (* Evaluate a top-level expression into an anonymous function. *)
863 ignore(Parser.parse_toplevel stream);
864 print_endline "parsed a top-level expr";
865 with Stream.Error s ->
866 (* Skip token for error recovery. *)
870 print_string "ready> "; flush stdout;
874 .. code-block:: ocaml
876 (*===----------------------------------------------------------------------===
878 *===----------------------------------------------------------------------===*)
881 (* Install standard binary operators.
882 * 1 is the lowest precedence. *)
883 Hashtbl.add Parser.binop_precedence '<' 10;
884 Hashtbl.add Parser.binop_precedence '+' 20;
885 Hashtbl.add Parser.binop_precedence '-' 20;
886 Hashtbl.add Parser.binop_precedence '*' 40; (* highest. *)
888 (* Prime the first token. *)
889 print_string "ready> "; flush stdout;
890 let stream = Lexer.lex (Stream.of_channel stdin) in
892 (* Run the main "interpreter loop" now. *)
893 Toplevel.main_loop stream;
898 `Next: Implementing Code Generation to LLVM IR <OCamlLangImpl3.html>`_