3 ============================================================
4 Kaleidoscope: Extending the Language: User-defined Operators
5 ============================================================
10 Chapter 6 Introduction
11 ======================
13 Welcome to Chapter 6 of the "`Implementing a language with
14 LLVM <index.html>`_" tutorial. At this point in our tutorial, we now
15 have a fully functional language that is fairly minimal, but also
16 useful. There is still one big problem with it, however. Our language
17 doesn't have many useful operators (like division, logical negation, or
18 even any comparisons besides less-than).
20 This chapter of the tutorial takes a wild digression into adding
21 user-defined operators to the simple and beautiful Kaleidoscope
22 language. This digression now gives us a simple and ugly language in
23 some ways, but also a powerful one at the same time. One of the great
24 things about creating your own language is that you get to decide what
25 is good or bad. In this tutorial we'll assume that it is okay to use
26 this as a way to show some interesting parsing techniques.
28 At the end of this tutorial, we'll run through an example Kaleidoscope
29 application that `renders the Mandelbrot set <#kicking-the-tires>`_. This gives an
30 example of what you can build with Kaleidoscope and its feature set.
32 User-defined Operators: the Idea
33 ================================
35 The "operator overloading" that we will add to Kaleidoscope is more
36 general than in languages like C++. In C++, you are only allowed to
37 redefine existing operators: you can't programmatically change the
38 grammar, introduce new operators, change precedence levels, etc. In this
39 chapter, we will add this capability to Kaleidoscope, which will let the
40 user round out the set of operators that are supported.
42 The point of going into user-defined operators in a tutorial like this
43 is to show the power and flexibility of using a hand-written parser.
44 Thus far, the parser we have been implementing uses recursive descent
45 for most parts of the grammar and operator precedence parsing for the
46 expressions. See `Chapter 2 <LangImpl02.html>`_ for details. By
47 using operator precedence parsing, it is very easy to allow
48 the programmer to introduce new operators into the grammar: the grammar
49 is dynamically extensible as the JIT runs.
51 The two specific features we'll add are programmable unary operators
52 (right now, Kaleidoscope has no unary operators at all) as well as
53 binary operators. An example of this is:
64 # Define > with the same precedence as <.
65 def binary> 10 (LHS RHS)
68 # Binary "logical or", (note that it does not "short circuit")
69 def binary| 5 (LHS RHS)
77 # Define = with slightly lower precedence than relationals.
78 def binary= 9 (LHS RHS)
79 !(LHS < RHS | LHS > RHS);
81 Many languages aspire to being able to implement their standard runtime
82 library in the language itself. In Kaleidoscope, we can implement
83 significant parts of the language in the library!
85 We will break down implementation of these features into two parts:
86 implementing support for user-defined binary operators and adding unary
89 User-defined Binary Operators
90 =============================
92 Adding support for user-defined binary operators is pretty simple with
93 our current framework. We'll first add support for the unary/binary
105 static int gettok() {
107 if (IdentifierStr == "for")
109 if (IdentifierStr == "in")
111 if (IdentifierStr == "binary")
113 if (IdentifierStr == "unary")
115 return tok_identifier;
117 This just adds lexer support for the unary and binary keywords, like we
118 did in `previous chapters <LangImpl5.html#lexer-extensions-for-if-then-else>`_. One nice thing
119 about our current AST, is that we represent binary operators with full
120 generalisation by using their ASCII code as the opcode. For our extended
121 operators, we'll use this same representation, so we don't need any new
122 AST or parser support.
124 On the other hand, we have to be able to represent the definitions of
125 these new operators, in the "def binary\| 5" part of the function
126 definition. In our grammar so far, the "name" for the function
127 definition is parsed as the "prototype" production and into the
128 ``PrototypeAST`` AST node. To represent our new user-defined operators
129 as prototypes, we have to extend the ``PrototypeAST`` AST node like
134 /// PrototypeAST - This class represents the "prototype" for a function,
135 /// which captures its argument names as well as if it is an operator.
138 std::vector<std::string> Args;
140 unsigned Precedence; // Precedence if a binary op.
143 PrototypeAST(const std::string &name, std::vector<std::string> Args,
144 bool IsOperator = false, unsigned Prec = 0)
145 : Name(name), Args(std::move(Args)), IsOperator(IsOperator),
149 const std::string &getName() const { return Name; }
151 bool isUnaryOp() const { return IsOperator && Args.size() == 1; }
152 bool isBinaryOp() const { return IsOperator && Args.size() == 2; }
154 char getOperatorName() const {
155 assert(isUnaryOp() || isBinaryOp());
156 return Name[Name.size() - 1];
159 unsigned getBinaryPrecedence() const { return Precedence; }
162 Basically, in addition to knowing a name for the prototype, we now keep
163 track of whether it was an operator, and if it was, what precedence
164 level the operator is at. The precedence is only used for binary
165 operators (as you'll see below, it just doesn't apply for unary
166 operators). Now that we have a way to represent the prototype for a
167 user-defined operator, we need to parse it:
172 /// ::= id '(' id* ')'
173 /// ::= binary LETTER number? (id, id)
174 static std::unique_ptr<PrototypeAST> ParsePrototype() {
177 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
178 unsigned BinaryPrecedence = 30;
182 return LogErrorP("Expected function name in prototype");
184 FnName = IdentifierStr;
190 if (!isascii(CurTok))
191 return LogErrorP("Expected binary operator");
193 FnName += (char)CurTok;
197 // Read the precedence if present.
198 if (CurTok == tok_number) {
199 if (NumVal < 1 || NumVal > 100)
200 return LogErrorP("Invalid precedence: must be 1..100");
201 BinaryPrecedence = (unsigned)NumVal;
208 return LogErrorP("Expected '(' in prototype");
210 std::vector<std::string> ArgNames;
211 while (getNextToken() == tok_identifier)
212 ArgNames.push_back(IdentifierStr);
214 return LogErrorP("Expected ')' in prototype");
217 getNextToken(); // eat ')'.
219 // Verify right number of names for operator.
220 if (Kind && ArgNames.size() != Kind)
221 return LogErrorP("Invalid number of operands for operator");
223 return std::make_unique<PrototypeAST>(FnName, std::move(ArgNames), Kind != 0,
227 This is all fairly straightforward parsing code, and we have already
228 seen a lot of similar code in the past. One interesting part about the
229 code above is the couple lines that set up ``FnName`` for binary
230 operators. This builds names like "binary@" for a newly defined "@"
231 operator. It then takes advantage of the fact that symbol names in the
232 LLVM symbol table are allowed to have any character in them, including
233 embedded nul characters.
235 The next interesting thing to add, is codegen support for these binary
236 operators. Given our current structure, this is a simple addition of a
237 default case for our existing binary operator node:
241 Value *BinaryExprAST::codegen() {
242 Value *L = LHS->codegen();
243 Value *R = RHS->codegen();
249 return Builder.CreateFAdd(L, R, "addtmp");
251 return Builder.CreateFSub(L, R, "subtmp");
253 return Builder.CreateFMul(L, R, "multmp");
255 L = Builder.CreateFCmpULT(L, R, "cmptmp");
256 // Convert bool 0/1 to double 0.0 or 1.0
257 return Builder.CreateUIToFP(L, Type::getDoubleTy(TheContext),
263 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
265 Function *F = getFunction(std::string("binary") + Op);
266 assert(F && "binary operator not found!");
268 Value *Ops[2] = { L, R };
269 return Builder.CreateCall(F, Ops, "binop");
272 As you can see above, the new code is actually really simple. It just
273 does a lookup for the appropriate operator in the symbol table and
274 generates a function call to it. Since user-defined operators are just
275 built as normal functions (because the "prototype" boils down to a
276 function with the right name) everything falls into place.
278 The final piece of code we are missing, is a bit of top-level magic:
282 Function *FunctionAST::codegen() {
283 // Transfer ownership of the prototype to the FunctionProtos map, but keep a
284 // reference to it for use below.
286 FunctionProtos[Proto->getName()] = std::move(Proto);
287 Function *TheFunction = getFunction(P.getName());
291 // If this is an operator, install it.
293 BinopPrecedence[P.getOperatorName()] = P.getBinaryPrecedence();
295 // Create a new basic block to start insertion into.
296 BasicBlock *BB = BasicBlock::Create(TheContext, "entry", TheFunction);
299 Basically, before codegening a function, if it is a user-defined
300 operator, we register it in the precedence table. This allows the binary
301 operator parsing logic we already have in place to handle it. Since we
302 are working on a fully-general operator precedence parser, this is all
303 we need to do to "extend the grammar".
305 Now we have useful user-defined binary operators. This builds a lot on
306 the previous framework we built for other operators. Adding unary
307 operators is a bit more challenging, because we don't have any framework
308 for it yet - let's see what it takes.
310 User-defined Unary Operators
311 ============================
313 Since we don't currently support unary operators in the Kaleidoscope
314 language, we'll need to add everything to support them. Above, we added
315 simple support for the 'unary' keyword to the lexer. In addition to
316 that, we need an AST node:
320 /// UnaryExprAST - Expression class for a unary operator.
321 class UnaryExprAST : public ExprAST {
323 std::unique_ptr<ExprAST> Operand;
326 UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand)
327 : Opcode(Opcode), Operand(std::move(Operand)) {}
329 Value *codegen() override;
332 This AST node is very simple and obvious by now. It directly mirrors the
333 binary operator AST node, except that it only has one child. With this,
334 we need to add the parsing logic. Parsing a unary operator is pretty
335 simple: we'll add a new function to do it:
342 static std::unique_ptr<ExprAST> ParseUnary() {
343 // If the current token is not an operator, it must be a primary expr.
344 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
345 return ParsePrimary();
347 // If this is a unary operator, read it.
350 if (auto Operand = ParseUnary())
351 return std::make_unique<UnaryExprAST>(Opc, std::move(Operand));
355 The grammar we add is pretty straightforward here. If we see a unary
356 operator when parsing a primary operator, we eat the operator as a
357 prefix and parse the remaining piece as another unary operator. This
358 allows us to handle multiple unary operators (e.g. "!!x"). Note that
359 unary operators can't have ambiguous parses like binary operators can,
360 so there is no need for precedence information.
362 The problem with this function, is that we need to call ParseUnary from
363 somewhere. To do this, we change previous callers of ParsePrimary to
364 call ParseUnary instead:
370 static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
371 std::unique_ptr<ExprAST> LHS) {
373 // Parse the unary expression after the binary operator.
374 auto RHS = ParseUnary();
380 /// ::= unary binoprhs
382 static std::unique_ptr<ExprAST> ParseExpression() {
383 auto LHS = ParseUnary();
387 return ParseBinOpRHS(0, std::move(LHS));
390 With these two simple changes, we are now able to parse unary operators
391 and build the AST for them. Next up, we need to add parser support for
392 prototypes, to parse the unary operator prototype. We extend the binary
393 operator code above with:
398 /// ::= id '(' id* ')'
399 /// ::= binary LETTER number? (id, id)
400 /// ::= unary LETTER (id)
401 static std::unique_ptr<PrototypeAST> ParsePrototype() {
404 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
405 unsigned BinaryPrecedence = 30;
409 return LogErrorP("Expected function name in prototype");
411 FnName = IdentifierStr;
417 if (!isascii(CurTok))
418 return LogErrorP("Expected unary operator");
420 FnName += (char)CurTok;
427 As with binary operators, we name unary operators with a name that
428 includes the operator character. This assists us at code generation
429 time. Speaking of, the final piece we need to add is codegen support for
430 unary operators. It looks like this:
434 Value *UnaryExprAST::codegen() {
435 Value *OperandV = Operand->codegen();
439 Function *F = getFunction(std::string("unary") + Opcode);
441 return LogErrorV("Unknown unary operator");
443 return Builder.CreateCall(F, OperandV, "unop");
446 This code is similar to, but simpler than, the code for binary
447 operators. It is simpler primarily because it doesn't need to handle any
448 predefined operators.
453 It is somewhat hard to believe, but with a few simple extensions we've
454 covered in the last chapters, we have grown a real-ish language. With
455 this, we can do a lot of interesting things, including I/O, math, and a
456 bunch of other things. For example, we can now add a nice sequencing
457 operator (printd is defined to print out the specified value and a
462 ready> extern printd(x);
464 declare double @printd(double)
466 ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.
468 ready> printd(123) : printd(456) : printd(789);
472 Evaluated to 0.000000
474 We can also define a bunch of other "primitive" operations, such as:
489 # Define > with the same precedence as <.
490 def binary> 10 (LHS RHS)
493 # Binary logical or, which does not short circuit.
494 def binary| 5 (LHS RHS)
502 # Binary logical and, which does not short circuit.
503 def binary& 6 (LHS RHS)
509 # Define = with slightly lower precedence than relationals.
510 def binary = 9 (LHS RHS)
511 !(LHS < RHS | LHS > RHS);
513 # Define ':' for sequencing: as a low-precedence operator that ignores operands
514 # and just returns the RHS.
515 def binary : 1 (x y) y;
517 Given the previous if/then/else support, we can also define interesting
518 functions for I/O. For example, the following prints out a character
519 whose "density" reflects the value passed in: the lower the value, the
520 denser the character:
524 ready> extern putchard(char);
526 ready> def printdensity(d)
536 ready> printdensity(1): printdensity(2): printdensity(3):
537 printdensity(4): printdensity(5): printdensity(9):
540 Evaluated to 0.000000
542 Based on these simple primitive operations, we can start to define more
543 interesting things. For example, here's a little function that determines
544 the number of iterations it takes for a certain function in the complex
549 # Determine whether the specific location diverges.
550 # Solve for z = z^2 + c in the complex plane.
551 def mandelconverger(real imag iters creal cimag)
552 if iters > 255 | (real*real + imag*imag > 4) then
555 mandelconverger(real*real - imag*imag + creal,
557 iters+1, creal, cimag);
559 # Return the number of iterations required for the iteration to escape
560 def mandelconverge(real imag)
561 mandelconverger(real, imag, 0, real, imag);
563 This "``z = z2 + c``" function is a beautiful little creature that is
564 the basis for computation of the `Mandelbrot
565 Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our
566 ``mandelconverge`` function returns the number of iterations that it
567 takes for a complex orbit to escape, saturating to 255. This is not a
568 very useful function by itself, but if you plot its value over a
569 two-dimensional plane, you can see the Mandelbrot set. Given that we are
570 limited to using putchard here, our amazing graphical output is limited,
571 but we can whip together something using the density plotter above:
575 # Compute and plot the mandelbrot set with the specified 2 dimensional range
577 def mandelhelp(xmin xmax xstep ymin ymax ystep)
578 for y = ymin, y < ymax, ystep in (
579 (for x = xmin, x < xmax, xstep in
580 printdensity(mandelconverge(x,y)))
584 # mandel - This is a convenient helper function for plotting the mandelbrot set
585 # from the specified position with the specified Magnification.
586 def mandel(realstart imagstart realmag imagmag)
587 mandelhelp(realstart, realstart+realmag*78, realmag,
588 imagstart, imagstart+imagmag*40, imagmag);
590 Given this, we can try plotting out the mandelbrot set! Lets try it out:
594 ready> mandel(-2.3, -1.3, 0.05, 0.07);
595 *******************************+++++++++++*************************************
596 *************************+++++++++++++++++++++++*******************************
597 **********************+++++++++++++++++++++++++++++****************************
598 *******************+++++++++++++++++++++.. ...++++++++*************************
599 *****************++++++++++++++++++++++.... ...+++++++++***********************
600 ***************+++++++++++++++++++++++..... ...+++++++++*********************
601 **************+++++++++++++++++++++++.... ....+++++++++********************
602 *************++++++++++++++++++++++...... .....++++++++*******************
603 ************+++++++++++++++++++++....... .......+++++++******************
604 ***********+++++++++++++++++++.... ... .+++++++*****************
605 **********+++++++++++++++++....... .+++++++****************
606 *********++++++++++++++........... ...+++++++***************
607 ********++++++++++++............ ...++++++++**************
608 ********++++++++++... .......... .++++++++**************
609 *******+++++++++..... .+++++++++*************
610 *******++++++++...... ..+++++++++*************
611 *******++++++....... ..+++++++++*************
612 *******+++++...... ..+++++++++*************
613 *******.... .... ...+++++++++*************
614 *******.... . ...+++++++++*************
615 *******+++++...... ...+++++++++*************
616 *******++++++....... ..+++++++++*************
617 *******++++++++...... .+++++++++*************
618 *******+++++++++..... ..+++++++++*************
619 ********++++++++++... .......... .++++++++**************
620 ********++++++++++++............ ...++++++++**************
621 *********++++++++++++++.......... ...+++++++***************
622 **********++++++++++++++++........ .+++++++****************
623 **********++++++++++++++++++++.... ... ..+++++++****************
624 ***********++++++++++++++++++++++....... .......++++++++*****************
625 ************+++++++++++++++++++++++...... ......++++++++******************
626 **************+++++++++++++++++++++++.... ....++++++++********************
627 ***************+++++++++++++++++++++++..... ...+++++++++*********************
628 *****************++++++++++++++++++++++.... ...++++++++***********************
629 *******************+++++++++++++++++++++......++++++++*************************
630 *********************++++++++++++++++++++++.++++++++***************************
631 *************************+++++++++++++++++++++++*******************************
632 ******************************+++++++++++++************************************
633 *******************************************************************************
634 *******************************************************************************
635 *******************************************************************************
636 Evaluated to 0.000000
637 ready> mandel(-2, -1, 0.02, 0.04);
638 **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
639 ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
640 *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
641 *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
642 *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
643 ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
644 **************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
645 ************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
646 ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ .
647 **********++++++++++++++++++++++++++++++++++++++++++++++.............
648 ********+++++++++++++++++++++++++++++++++++++++++++..................
649 *******+++++++++++++++++++++++++++++++++++++++.......................
650 ******+++++++++++++++++++++++++++++++++++...........................
651 *****++++++++++++++++++++++++++++++++............................
652 *****++++++++++++++++++++++++++++...............................
653 ****++++++++++++++++++++++++++...... .........................
654 ***++++++++++++++++++++++++......... ...... ...........
655 ***++++++++++++++++++++++............
656 **+++++++++++++++++++++..............
657 **+++++++++++++++++++................
658 *++++++++++++++++++.................
659 *++++++++++++++++............ ...
660 *++++++++++++++..............
661 *+++....++++................
662 *.......... ...........
664 *.......... ...........
665 *+++....++++................
666 *++++++++++++++..............
667 *++++++++++++++++............ ...
668 *++++++++++++++++++.................
669 **+++++++++++++++++++................
670 **+++++++++++++++++++++..............
671 ***++++++++++++++++++++++............
672 ***++++++++++++++++++++++++......... ...... ...........
673 ****++++++++++++++++++++++++++...... .........................
674 *****++++++++++++++++++++++++++++...............................
675 *****++++++++++++++++++++++++++++++++............................
676 ******+++++++++++++++++++++++++++++++++++...........................
677 *******+++++++++++++++++++++++++++++++++++++++.......................
678 ********+++++++++++++++++++++++++++++++++++++++++++..................
679 Evaluated to 0.000000
680 ready> mandel(-0.9, -1.4, 0.02, 0.03);
681 *******************************************************************************
682 *******************************************************************************
683 *******************************************************************************
684 **********+++++++++++++++++++++************************************************
685 *+++++++++++++++++++++++++++++++++++++++***************************************
686 +++++++++++++++++++++++++++++++++++++++++++++**********************************
687 ++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
688 ++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
689 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
690 +++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
691 +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++****************
692 +++++++++++++++++++++++++++++....... ........+++++++++++++++++++**************
693 ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************
694 +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++**********
695 ++++++++++++++++++++++++++........... ....++++++++++++++++++++++********
696 ++++++++++++++++++++++++............. .......++++++++++++++++++++++******
697 +++++++++++++++++++++++............. ........+++++++++++++++++++++++****
698 ++++++++++++++++++++++........... ..........++++++++++++++++++++++***
699 ++++++++++++++++++++........... .........++++++++++++++++++++++*
700 ++++++++++++++++++............ ...........++++++++++++++++++++
701 ++++++++++++++++............... .............++++++++++++++++++
702 ++++++++++++++................. ...............++++++++++++++++
703 ++++++++++++.................. .................++++++++++++++
704 +++++++++.................. .................+++++++++++++
705 ++++++........ . ......... ..++++++++++++
706 ++............ ...... ....++++++++++
707 .............. ...++++++++++
708 .............. ....+++++++++
709 .............. .....++++++++
710 ............. ......++++++++
711 ........... .......++++++++
712 ......... ........+++++++
713 ......... ........+++++++
714 ......... ....+++++++
722 Evaluated to 0.000000
725 At this point, you may be starting to realize that Kaleidoscope is a
726 real and powerful language. It may not be self-similar :), but it can be
727 used to plot things that are!
729 With this, we conclude the "adding user-defined operators" chapter of
730 the tutorial. We have successfully augmented our language, adding the
731 ability to extend the language in the library, and we have shown how
732 this can be used to build a simple but interesting end-user application
733 in Kaleidoscope. At this point, Kaleidoscope can build a variety of
734 applications that are functional and can call functions with
735 side-effects, but it can't actually define and mutate a variable itself.
737 Strikingly, variable mutation is an important feature of some languages,
738 and it is not at all obvious how to `add support for mutable
739 variables <LangImpl07.html>`_ without having to add an "SSA construction"
740 phase to your front-end. In the next chapter, we will describe how you
741 can add variable mutation without building SSA in your front-end.
746 Here is the complete code listing for our running example, enhanced with
747 the support for user-defined operators. To build this example, use:
752 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core mcjit native` -O3 -o toy
756 On some platforms, you will need to specify -rdynamic or
757 -Wl,--export-dynamic when linking. This ensures that symbols defined in
758 the main executable are exported to the dynamic linker and so are
759 available for symbol resolution at run time. This is not needed if you
760 compile your support code into a shared library, although doing that
761 will cause problems on Windows.
765 .. literalinclude:: ../../../examples/Kaleidoscope/Chapter6/toy.cpp
768 `Next: Extending the language: mutable variables / SSA
769 construction <LangImpl07.html>`_