1 ============================================================
2 Kaleidoscope: Extending the Language: User-defined Operators
3 ============================================================
11 Welcome to Chapter 6 of the "`Implementing a language with
12 LLVM <index.html>`_" tutorial. At this point in our tutorial, we now
13 have a fully functional language that is fairly minimal, but also
14 useful. There is still one big problem with it, however. Our language
15 doesn't have many useful operators (like division, logical negation, or
16 even any comparisons besides less-than).
18 This chapter of the tutorial takes a wild digression into adding
19 user-defined operators to the simple and beautiful Kaleidoscope
20 language. This digression now gives us a simple and ugly language in
21 some ways, but also a powerful one at the same time. One of the great
22 things about creating your own language is that you get to decide what
23 is good or bad. In this tutorial we'll assume that it is okay to use
24 this as a way to show some interesting parsing techniques.
26 At the end of this tutorial, we'll run through an example Kaleidoscope
27 application that `renders the Mandelbrot set <#kicking-the-tires>`_. This gives an
28 example of what you can build with Kaleidoscope and its feature set.
30 User-defined Operators: the Idea
31 ================================
33 The "operator overloading" that we will add to Kaleidoscope is more
34 general than in languages like C++. In C++, you are only allowed to
35 redefine existing operators: you can't programmatically change the
36 grammar, introduce new operators, change precedence levels, etc. In this
37 chapter, we will add this capability to Kaleidoscope, which will let the
38 user round out the set of operators that are supported.
40 The point of going into user-defined operators in a tutorial like this
41 is to show the power and flexibility of using a hand-written parser.
42 Thus far, the parser we have been implementing uses recursive descent
43 for most parts of the grammar and operator precedence parsing for the
44 expressions. See `Chapter 2 <LangImpl02.html>`_ for details. By
45 using operator precedence parsing, it is very easy to allow
46 the programmer to introduce new operators into the grammar: the grammar
47 is dynamically extensible as the JIT runs.
49 The two specific features we'll add are programmable unary operators
50 (right now, Kaleidoscope has no unary operators at all) as well as
51 binary operators. An example of this is:
62 # Define > with the same precedence as <.
63 def binary> 10 (LHS RHS)
66 # Binary "logical or", (note that it does not "short circuit")
67 def binary| 5 (LHS RHS)
75 # Define = with slightly lower precedence than relationals.
76 def binary= 9 (LHS RHS)
77 !(LHS < RHS | LHS > RHS);
79 Many languages aspire to being able to implement their standard runtime
80 library in the language itself. In Kaleidoscope, we can implement
81 significant parts of the language in the library!
83 We will break down implementation of these features into two parts:
84 implementing support for user-defined binary operators and adding unary
87 User-defined Binary Operators
88 =============================
90 Adding support for user-defined binary operators is pretty simple with
91 our current framework. We'll first add support for the unary/binary
103 static int gettok() {
105 if (IdentifierStr == "for")
107 if (IdentifierStr == "in")
109 if (IdentifierStr == "binary")
111 if (IdentifierStr == "unary")
113 return tok_identifier;
115 This just adds lexer support for the unary and binary keywords, like we
116 did in `previous chapters <LangImpl05.html#lexer-extensions-for-if-then-else>`_. One nice thing
117 about our current AST, is that we represent binary operators with full
118 generalisation by using their ASCII code as the opcode. For our extended
119 operators, we'll use this same representation, so we don't need any new
120 AST or parser support.
122 On the other hand, we have to be able to represent the definitions of
123 these new operators, in the "def binary\| 5" part of the function
124 definition. In our grammar so far, the "name" for the function
125 definition is parsed as the "prototype" production and into the
126 ``PrototypeAST`` AST node. To represent our new user-defined operators
127 as prototypes, we have to extend the ``PrototypeAST`` AST node like
132 /// PrototypeAST - This class represents the "prototype" for a function,
133 /// which captures its argument names as well as if it is an operator.
136 std::vector<std::string> Args;
138 unsigned Precedence; // Precedence if a binary op.
141 PrototypeAST(const std::string &Name, std::vector<std::string> Args,
142 bool IsOperator = false, unsigned Prec = 0)
143 : Name(Name), Args(std::move(Args)), IsOperator(IsOperator),
147 const std::string &getName() const { return Name; }
149 bool isUnaryOp() const { return IsOperator && Args.size() == 1; }
150 bool isBinaryOp() const { return IsOperator && Args.size() == 2; }
152 char getOperatorName() const {
153 assert(isUnaryOp() || isBinaryOp());
154 return Name[Name.size() - 1];
157 unsigned getBinaryPrecedence() const { return Precedence; }
160 Basically, in addition to knowing a name for the prototype, we now keep
161 track of whether it was an operator, and if it was, what precedence
162 level the operator is at. The precedence is only used for binary
163 operators (as you'll see below, it just doesn't apply for unary
164 operators). Now that we have a way to represent the prototype for a
165 user-defined operator, we need to parse it:
170 /// ::= id '(' id* ')'
171 /// ::= binary LETTER number? (id, id)
172 static std::unique_ptr<PrototypeAST> ParsePrototype() {
175 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
176 unsigned BinaryPrecedence = 30;
180 return LogErrorP("Expected function name in prototype");
182 FnName = IdentifierStr;
188 if (!isascii(CurTok))
189 return LogErrorP("Expected binary operator");
191 FnName += (char)CurTok;
195 // Read the precedence if present.
196 if (CurTok == tok_number) {
197 if (NumVal < 1 || NumVal > 100)
198 return LogErrorP("Invalid precedence: must be 1..100");
199 BinaryPrecedence = (unsigned)NumVal;
206 return LogErrorP("Expected '(' in prototype");
208 std::vector<std::string> ArgNames;
209 while (getNextToken() == tok_identifier)
210 ArgNames.push_back(IdentifierStr);
212 return LogErrorP("Expected ')' in prototype");
215 getNextToken(); // eat ')'.
217 // Verify right number of names for operator.
218 if (Kind && ArgNames.size() != Kind)
219 return LogErrorP("Invalid number of operands for operator");
221 return std::make_unique<PrototypeAST>(FnName, std::move(ArgNames), Kind != 0,
225 This is all fairly straightforward parsing code, and we have already
226 seen a lot of similar code in the past. One interesting part about the
227 code above is the couple lines that set up ``FnName`` for binary
228 operators. This builds names like "binary@" for a newly defined "@"
229 operator. It then takes advantage of the fact that symbol names in the
230 LLVM symbol table are allowed to have any character in them, including
231 embedded nul characters.
233 The next interesting thing to add, is codegen support for these binary
234 operators. Given our current structure, this is a simple addition of a
235 default case for our existing binary operator node:
239 Value *BinaryExprAST::codegen() {
240 Value *L = LHS->codegen();
241 Value *R = RHS->codegen();
247 return Builder->CreateFAdd(L, R, "addtmp");
249 return Builder->CreateFSub(L, R, "subtmp");
251 return Builder->CreateFMul(L, R, "multmp");
253 L = Builder->CreateFCmpULT(L, R, "cmptmp");
254 // Convert bool 0/1 to double 0.0 or 1.0
255 return Builder->CreateUIToFP(L, Type::getDoubleTy(*TheContext),
261 // If it wasn't a builtin binary operator, it must be a user defined one. Emit
263 Function *F = getFunction(std::string("binary") + Op);
264 assert(F && "binary operator not found!");
266 Value *Ops[2] = { L, R };
267 return Builder->CreateCall(F, Ops, "binop");
270 As you can see above, the new code is actually really simple. It just
271 does a lookup for the appropriate operator in the symbol table and
272 generates a function call to it. Since user-defined operators are just
273 built as normal functions (because the "prototype" boils down to a
274 function with the right name) everything falls into place.
276 The final piece of code we are missing, is a bit of top-level magic:
280 Function *FunctionAST::codegen() {
281 // Transfer ownership of the prototype to the FunctionProtos map, but keep a
282 // reference to it for use below.
284 FunctionProtos[Proto->getName()] = std::move(Proto);
285 Function *TheFunction = getFunction(P.getName());
289 // If this is an operator, install it.
291 BinopPrecedence[P.getOperatorName()] = P.getBinaryPrecedence();
293 // Create a new basic block to start insertion into.
294 BasicBlock *BB = BasicBlock::Create(*TheContext, "entry", TheFunction);
297 Basically, before codegening a function, if it is a user-defined
298 operator, we register it in the precedence table. This allows the binary
299 operator parsing logic we already have in place to handle it. Since we
300 are working on a fully-general operator precedence parser, this is all
301 we need to do to "extend the grammar".
303 Now we have useful user-defined binary operators. This builds a lot on
304 the previous framework we built for other operators. Adding unary
305 operators is a bit more challenging, because we don't have any framework
306 for it yet - let's see what it takes.
308 User-defined Unary Operators
309 ============================
311 Since we don't currently support unary operators in the Kaleidoscope
312 language, we'll need to add everything to support them. Above, we added
313 simple support for the 'unary' keyword to the lexer. In addition to
314 that, we need an AST node:
318 /// UnaryExprAST - Expression class for a unary operator.
319 class UnaryExprAST : public ExprAST {
321 std::unique_ptr<ExprAST> Operand;
324 UnaryExprAST(char Opcode, std::unique_ptr<ExprAST> Operand)
325 : Opcode(Opcode), Operand(std::move(Operand)) {}
327 Value *codegen() override;
330 This AST node is very simple and obvious by now. It directly mirrors the
331 binary operator AST node, except that it only has one child. With this,
332 we need to add the parsing logic. Parsing a unary operator is pretty
333 simple: we'll add a new function to do it:
340 static std::unique_ptr<ExprAST> ParseUnary() {
341 // If the current token is not an operator, it must be a primary expr.
342 if (!isascii(CurTok) || CurTok == '(' || CurTok == ',')
343 return ParsePrimary();
345 // If this is a unary operator, read it.
348 if (auto Operand = ParseUnary())
349 return std::make_unique<UnaryExprAST>(Opc, std::move(Operand));
353 The grammar we add is pretty straightforward here. If we see a unary
354 operator when parsing a primary operator, we eat the operator as a
355 prefix and parse the remaining piece as another unary operator. This
356 allows us to handle multiple unary operators (e.g. "!!x"). Note that
357 unary operators can't have ambiguous parses like binary operators can,
358 so there is no need for precedence information.
360 The problem with this function, is that we need to call ParseUnary from
361 somewhere. To do this, we change previous callers of ParsePrimary to
362 call ParseUnary instead:
368 static std::unique_ptr<ExprAST> ParseBinOpRHS(int ExprPrec,
369 std::unique_ptr<ExprAST> LHS) {
371 // Parse the unary expression after the binary operator.
372 auto RHS = ParseUnary();
378 /// ::= unary binoprhs
380 static std::unique_ptr<ExprAST> ParseExpression() {
381 auto LHS = ParseUnary();
385 return ParseBinOpRHS(0, std::move(LHS));
388 With these two simple changes, we are now able to parse unary operators
389 and build the AST for them. Next up, we need to add parser support for
390 prototypes, to parse the unary operator prototype. We extend the binary
391 operator code above with:
396 /// ::= id '(' id* ')'
397 /// ::= binary LETTER number? (id, id)
398 /// ::= unary LETTER (id)
399 static std::unique_ptr<PrototypeAST> ParsePrototype() {
402 unsigned Kind = 0; // 0 = identifier, 1 = unary, 2 = binary.
403 unsigned BinaryPrecedence = 30;
407 return LogErrorP("Expected function name in prototype");
409 FnName = IdentifierStr;
415 if (!isascii(CurTok))
416 return LogErrorP("Expected unary operator");
418 FnName += (char)CurTok;
425 As with binary operators, we name unary operators with a name that
426 includes the operator character. This assists us at code generation
427 time. Speaking of, the final piece we need to add is codegen support for
428 unary operators. It looks like this:
432 Value *UnaryExprAST::codegen() {
433 Value *OperandV = Operand->codegen();
437 Function *F = getFunction(std::string("unary") + Opcode);
439 return LogErrorV("Unknown unary operator");
441 return Builder->CreateCall(F, OperandV, "unop");
444 This code is similar to, but simpler than, the code for binary
445 operators. It is simpler primarily because it doesn't need to handle any
446 predefined operators.
451 It is somewhat hard to believe, but with a few simple extensions we've
452 covered in the last chapters, we have grown a real-ish language. With
453 this, we can do a lot of interesting things, including I/O, math, and a
454 bunch of other things. For example, we can now add a nice sequencing
455 operator (printd is defined to print out the specified value and a
460 ready> extern printd(x);
462 declare double @printd(double)
464 ready> def binary : 1 (x y) 0; # Low-precedence operator that ignores operands.
466 ready> printd(123) : printd(456) : printd(789);
470 Evaluated to 0.000000
472 We can also define a bunch of other "primitive" operations, such as:
487 # Define > with the same precedence as <.
488 def binary> 10 (LHS RHS)
491 # Binary logical or, which does not short circuit.
492 def binary| 5 (LHS RHS)
500 # Binary logical and, which does not short circuit.
501 def binary& 6 (LHS RHS)
507 # Define = with slightly lower precedence than relationals.
508 def binary = 9 (LHS RHS)
509 !(LHS < RHS | LHS > RHS);
511 # Define ':' for sequencing: as a low-precedence operator that ignores operands
512 # and just returns the RHS.
513 def binary : 1 (x y) y;
515 Given the previous if/then/else support, we can also define interesting
516 functions for I/O. For example, the following prints out a character
517 whose "density" reflects the value passed in: the lower the value, the
518 denser the character:
522 ready> extern putchard(char);
524 ready> def printdensity(d)
534 ready> printdensity(1): printdensity(2): printdensity(3):
535 printdensity(4): printdensity(5): printdensity(9):
538 Evaluated to 0.000000
540 Based on these simple primitive operations, we can start to define more
541 interesting things. For example, here's a little function that determines
542 the number of iterations it takes for a certain function in the complex
547 # Determine whether the specific location diverges.
548 # Solve for z = z^2 + c in the complex plane.
549 def mandelconverger(real imag iters creal cimag)
550 if iters > 255 | (real*real + imag*imag > 4) then
553 mandelconverger(real*real - imag*imag + creal,
555 iters+1, creal, cimag);
557 # Return the number of iterations required for the iteration to escape
558 def mandelconverge(real imag)
559 mandelconverger(real, imag, 0, real, imag);
561 This "``z = z2 + c``" function is a beautiful little creature that is
562 the basis for computation of the `Mandelbrot
563 Set <http://en.wikipedia.org/wiki/Mandelbrot_set>`_. Our
564 ``mandelconverge`` function returns the number of iterations that it
565 takes for a complex orbit to escape, saturating to 255. This is not a
566 very useful function by itself, but if you plot its value over a
567 two-dimensional plane, you can see the Mandelbrot set. Given that we are
568 limited to using putchard here, our amazing graphical output is limited,
569 but we can whip together something using the density plotter above:
573 # Compute and plot the mandelbrot set with the specified 2 dimensional range
575 def mandelhelp(xmin xmax xstep ymin ymax ystep)
576 for y = ymin, y < ymax, ystep in (
577 (for x = xmin, x < xmax, xstep in
578 printdensity(mandelconverge(x,y)))
582 # mandel - This is a convenient helper function for plotting the mandelbrot set
583 # from the specified position with the specified Magnification.
584 def mandel(realstart imagstart realmag imagmag)
585 mandelhelp(realstart, realstart+realmag*78, realmag,
586 imagstart, imagstart+imagmag*40, imagmag);
588 Given this, we can try plotting out the mandelbrot set! Lets try it out:
592 ready> mandel(-2.3, -1.3, 0.05, 0.07);
593 *******************************+++++++++++*************************************
594 *************************+++++++++++++++++++++++*******************************
595 **********************+++++++++++++++++++++++++++++****************************
596 *******************+++++++++++++++++++++.. ...++++++++*************************
597 *****************++++++++++++++++++++++.... ...+++++++++***********************
598 ***************+++++++++++++++++++++++..... ...+++++++++*********************
599 **************+++++++++++++++++++++++.... ....+++++++++********************
600 *************++++++++++++++++++++++...... .....++++++++*******************
601 ************+++++++++++++++++++++....... .......+++++++******************
602 ***********+++++++++++++++++++.... ... .+++++++*****************
603 **********+++++++++++++++++....... .+++++++****************
604 *********++++++++++++++........... ...+++++++***************
605 ********++++++++++++............ ...++++++++**************
606 ********++++++++++... .......... .++++++++**************
607 *******+++++++++..... .+++++++++*************
608 *******++++++++...... ..+++++++++*************
609 *******++++++....... ..+++++++++*************
610 *******+++++...... ..+++++++++*************
611 *******.... .... ...+++++++++*************
612 *******.... . ...+++++++++*************
613 *******+++++...... ...+++++++++*************
614 *******++++++....... ..+++++++++*************
615 *******++++++++...... .+++++++++*************
616 *******+++++++++..... ..+++++++++*************
617 ********++++++++++... .......... .++++++++**************
618 ********++++++++++++............ ...++++++++**************
619 *********++++++++++++++.......... ...+++++++***************
620 **********++++++++++++++++........ .+++++++****************
621 **********++++++++++++++++++++.... ... ..+++++++****************
622 ***********++++++++++++++++++++++....... .......++++++++*****************
623 ************+++++++++++++++++++++++...... ......++++++++******************
624 **************+++++++++++++++++++++++.... ....++++++++********************
625 ***************+++++++++++++++++++++++..... ...+++++++++*********************
626 *****************++++++++++++++++++++++.... ...++++++++***********************
627 *******************+++++++++++++++++++++......++++++++*************************
628 *********************++++++++++++++++++++++.++++++++***************************
629 *************************+++++++++++++++++++++++*******************************
630 ******************************+++++++++++++************************************
631 *******************************************************************************
632 *******************************************************************************
633 *******************************************************************************
634 Evaluated to 0.000000
635 ready> mandel(-2, -1, 0.02, 0.04);
636 **************************+++++++++++++++++++++++++++++++++++++++++++++++++++++
637 ***********************++++++++++++++++++++++++++++++++++++++++++++++++++++++++
638 *********************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.
639 *******************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++...
640 *****************+++++++++++++++++++++++++++++++++++++++++++++++++++++++++.....
641 ***************++++++++++++++++++++++++++++++++++++++++++++++++++++++++........
642 **************++++++++++++++++++++++++++++++++++++++++++++++++++++++...........
643 ************+++++++++++++++++++++++++++++++++++++++++++++++++++++..............
644 ***********++++++++++++++++++++++++++++++++++++++++++++++++++........ .
645 **********++++++++++++++++++++++++++++++++++++++++++++++.............
646 ********+++++++++++++++++++++++++++++++++++++++++++..................
647 *******+++++++++++++++++++++++++++++++++++++++.......................
648 ******+++++++++++++++++++++++++++++++++++...........................
649 *****++++++++++++++++++++++++++++++++............................
650 *****++++++++++++++++++++++++++++...............................
651 ****++++++++++++++++++++++++++...... .........................
652 ***++++++++++++++++++++++++......... ...... ...........
653 ***++++++++++++++++++++++............
654 **+++++++++++++++++++++..............
655 **+++++++++++++++++++................
656 *++++++++++++++++++.................
657 *++++++++++++++++............ ...
658 *++++++++++++++..............
659 *+++....++++................
660 *.......... ...........
662 *.......... ...........
663 *+++....++++................
664 *++++++++++++++..............
665 *++++++++++++++++............ ...
666 *++++++++++++++++++.................
667 **+++++++++++++++++++................
668 **+++++++++++++++++++++..............
669 ***++++++++++++++++++++++............
670 ***++++++++++++++++++++++++......... ...... ...........
671 ****++++++++++++++++++++++++++...... .........................
672 *****++++++++++++++++++++++++++++...............................
673 *****++++++++++++++++++++++++++++++++............................
674 ******+++++++++++++++++++++++++++++++++++...........................
675 *******+++++++++++++++++++++++++++++++++++++++.......................
676 ********+++++++++++++++++++++++++++++++++++++++++++..................
677 Evaluated to 0.000000
678 ready> mandel(-0.9, -1.4, 0.02, 0.03);
679 *******************************************************************************
680 *******************************************************************************
681 *******************************************************************************
682 **********+++++++++++++++++++++************************************************
683 *+++++++++++++++++++++++++++++++++++++++***************************************
684 +++++++++++++++++++++++++++++++++++++++++++++**********************************
685 ++++++++++++++++++++++++++++++++++++++++++++++++++*****************************
686 ++++++++++++++++++++++++++++++++++++++++++++++++++++++*************************
687 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++**********************
688 +++++++++++++++++++++++++++++++++.........++++++++++++++++++*******************
689 +++++++++++++++++++++++++++++++.... ......+++++++++++++++++++****************
690 +++++++++++++++++++++++++++++....... ........+++++++++++++++++++**************
691 ++++++++++++++++++++++++++++........ ........++++++++++++++++++++************
692 +++++++++++++++++++++++++++......... .. ...+++++++++++++++++++++**********
693 ++++++++++++++++++++++++++........... ....++++++++++++++++++++++********
694 ++++++++++++++++++++++++............. .......++++++++++++++++++++++******
695 +++++++++++++++++++++++............. ........+++++++++++++++++++++++****
696 ++++++++++++++++++++++........... ..........++++++++++++++++++++++***
697 ++++++++++++++++++++........... .........++++++++++++++++++++++*
698 ++++++++++++++++++............ ...........++++++++++++++++++++
699 ++++++++++++++++............... .............++++++++++++++++++
700 ++++++++++++++................. ...............++++++++++++++++
701 ++++++++++++.................. .................++++++++++++++
702 +++++++++.................. .................+++++++++++++
703 ++++++........ . ......... ..++++++++++++
704 ++............ ...... ....++++++++++
705 .............. ...++++++++++
706 .............. ....+++++++++
707 .............. .....++++++++
708 ............. ......++++++++
709 ........... .......++++++++
710 ......... ........+++++++
711 ......... ........+++++++
712 ......... ....+++++++
720 Evaluated to 0.000000
723 At this point, you may be starting to realize that Kaleidoscope is a
724 real and powerful language. It may not be self-similar :), but it can be
725 used to plot things that are!
727 With this, we conclude the "adding user-defined operators" chapter of
728 the tutorial. We have successfully augmented our language, adding the
729 ability to extend the language in the library, and we have shown how
730 this can be used to build a simple but interesting end-user application
731 in Kaleidoscope. At this point, Kaleidoscope can build a variety of
732 applications that are functional and can call functions with
733 side-effects, but it can't actually define and mutate a variable itself.
735 Strikingly, variable mutation is an important feature of some languages,
736 and it is not at all obvious how to `add support for mutable
737 variables <LangImpl07.html>`_ without having to add an "SSA construction"
738 phase to your front-end. In the next chapter, we will describe how you
739 can add variable mutation without building SSA in your front-end.
744 Here is the complete code listing for our running example, enhanced with
745 the support for user-defined operators. To build this example, use:
750 clang++ -g toy.cpp `llvm-config --cxxflags --ldflags --system-libs --libs core orcjit native` -O3 -o toy
754 On some platforms, you will need to specify -rdynamic or
755 -Wl,--export-dynamic when linking. This ensures that symbols defined in
756 the main executable are exported to the dynamic linker and so are
757 available for symbol resolution at run time. This is not needed if you
758 compile your support code into a shared library, although doing that
759 will cause problems on Windows.
763 .. literalinclude:: ../../../examples/Kaleidoscope/Chapter6/toy.cpp
766 `Next: Extending the language: mutable variables / SSA
767 construction <LangImpl07.html>`_