5 **Clang-Repl** is an interactive C++ interpreter that allows for incremental
6 compilation. It supports interactive programming for C++ in a
7 read-evaluate-print-loop (REPL) style. It uses Clang as a library to compile the
8 high level programming language into LLVM IR. Then the LLVM IR is executed by
9 the LLVM just-in-time (JIT) infrastructure.
11 Clang-Repl is suitable for exploratory programming and in places where time
12 to insight is important. Clang-Repl is a project inspired by the work in
13 `Cling <https://github.com/root-project/cling>`_, a LLVM-based C/C++ interpreter
14 developed by the field of high energy physics and used by the scientific data
15 analysis framework `ROOT <https://root.cern/>`_. Clang-Repl allows to move parts
16 of Cling upstream, making them useful and available to a broader audience.
19 Clang-Repl Basic Data Flow
20 ==========================
22 .. image:: ClangRepl_design.png
24 :alt: ClangRepl design
26 Clang-Repl data flow can be divided into roughly 8 phases:
28 1. Clang-Repl controls the input infrastructure by an interactive prompt or by
29 an interface allowing the incremental processing of input.
31 2. Then it sends the input to the underlying incremental facilities in Clang
34 3. Clang compiles the input into an AST representation.
36 4. When required the AST can be further transformed in order to attach specific
39 5. The AST representation is then lowered to LLVM IR.
41 6. The LLVM IR is the input format for LLVM’s JIT compilation infrastructure.
42 The tool will instruct the JIT to run specified functions, translating them
43 into machine code targeting the underlying device architecture (eg. Intel
46 7. The LLVM JIT lowers the LLVM IR to machine code.
48 8. The machine code is then executed.
54 .. code-block:: console
59 $ cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DLLVM_ENABLE_PROJECTS=clang -G "Unix Makefiles" ../llvm
61 **Note here**, above RelWithDebInfo - Debug / Release
63 .. code-block:: console
65 cmake --build . --target clang clang-repl -j n
67 cmake --build . --target clang clang-repl
69 **Clang-repl** is built under llvm-project/build/bin. Proceed into the directory **llvm-project/build/bin**
71 .. code-block:: console
80 **Clang-Repl** is an interactive C++ interpreter that allows for incremental
81 compilation. It supports interactive programming for C++ in a
82 read-evaluate-print-loop (REPL) style. It uses Clang as a library to compile the
83 high level programming language into LLVM IR. Then the LLVM IR is executed by
84 the LLVM just-in-time (JIT) infrastructure.
92 clang-repl> #include <iostream>
93 clang-repl> int f() { std::cout << "Hello Interpreted World!\n"; return 0; }
94 clang-repl> auto r = f();
95 // Prints Hello Interpreted World!
99 clang-repl> #include<iostream>
100 clang-repl> using namespace std;
101 clang-repl> std::cout << "Welcome to CLANG-REPL" << std::endl;
102 Welcome to CLANG-REPL
103 // Prints Welcome to CLANG-REPL
106 Function Definitions and Calls:
107 ===============================
111 clang-repl> #include <iostream>
112 clang-repl> int sum(int a, int b){ return a+b; };
113 clang-repl> int c = sum(9,10);
114 clang-repl> std::cout << c << std::endl;
118 Iterative Structures:
119 =====================
123 clang-repl> #include <iostream>
124 clang-repl> for (int i = 0;i < 3;i++){ std::cout << i << std::endl;}
128 clang-repl> while(i < 7){ i++; std::cout << i << std::endl;}
134 Classes and Structures:
135 =======================
139 clang-repl> #include <iostream>
140 clang-repl> class Rectangle {int width, height; public: void set_values (int,int);\
141 clang-repl... int area() {return width*height;}};
142 clang-repl> void Rectangle::set_values (int x, int y) { width = x;height = y;}
143 clang-repl> int main () { Rectangle rect;rect.set_values (3,4);\
144 clang-repl... std::cout << "area: " << rect.area() << std::endl;\
145 clang-repl... return 0;}
149 // Note: This '\' can be used for continuation of the statements in the next line
156 clang-repl> #include <iostream>
157 clang-repl> using namespace std;
158 clang-repl> auto welcome = []() { std::cout << "Welcome to REPL" << std::endl;};
159 clang-repl> welcome();
162 Using Dynamic Library:
163 ======================
167 clang-repl> %lib print.so
168 clang-repl> #include"print.hpp"
169 clang-repl> print(9);
172 **Generation of dynamic library**
182 std::cout << a << std::endl;
189 clang++-17 -c -o print.o print.cpp
190 clang-17 -shared print.o -o print.so
197 clang-repl> // Comments in Clang-Repl
198 clang-repl> /* Comments in Clang-Repl */
201 Closure or Termination:
202 =======================
209 Just like Clang, Clang-Repl can be integrated in existing applications as a library
210 (using the clangInterpreter library). This turns your C++ compiler into a service that
211 can incrementally consume and execute code. The **Compiler as A Service** (**CaaS**)
212 concept helps support advanced use cases such as template instantiations on demand and
213 automatic language interoperability. It also helps static languages such as C/C++ become
214 apt for data science.
216 Execution Results Handling in Clang-Repl
217 ========================================
219 Execution Results Handling features discussed below help extend the Clang-Repl
220 functionality by creating an interface between the execution results of a
221 program and the compiled program.
223 1. **Capture Execution Results**: This feature helps capture the execution results
224 of a program and bring them back to the compiled program.
226 2. **Dump Captured Execution Results**: This feature helps create a temporary dump
227 for Value Printing/Automatic Printf, that is, to display the value and type of
231 1. Capture Execution Results
232 ============================
234 In many cases, it is useful to bring back the program execution result to the
235 compiled program. This result can be stored in an object of type **Value**.
237 How Execution Results are captured (Value Synthesis):
238 -----------------------------------------------------
240 The synthesizer chooses which expression to synthesize, and then it replaces
241 the original expression with the synthesized expression. Depending on the
242 expression type, it may choose to save an object (``LastValue``) of type 'value'
243 while allocating memory to it (``SetValueWithAlloc()``), or not (
244 ``SetValueNoAlloc()``).
247 :name: valuesynthesis
248 :caption: Value Synthesis
249 :alt: Shows how an object of type 'Value' is synthesized
252 digraph "valuesynthesis" {
254 graph [fontname="Verdana", fontsize="12"];
255 node [fontname="Verdana", fontsize="12"];
256 edge [fontname="Sans", fontsize="9"];
258 start [label=" Create an Object \n 'Last Value' \n of type 'Value' ", shape="note", fontcolor=white, fillcolor="#3333ff", style=filled];
259 assign [label=" Assign the result \n to the 'LastValue' \n (based on respective \n Memory Allocation \n scenario) ", shape="box"]
260 print [label=" Pretty Print \n the Value Object ", shape="Msquare", fillcolor="yellow", style=filled];
264 subgraph SynthesizeExpression {
265 synth [label=" SynthesizeExpr() ", shape="note", fontcolor=white, fillcolor="#3333ff", style=filled];
266 mem [label=" New Memory \n Allocation? ", shape="diamond"];
267 withaloc [label=" SetValueWithAlloc() ", shape="box"];
268 noaloc [label=" SetValueNoAlloc() ", shape="box"];
269 right [label=" 1. RValue Structure \n (a temporary value)", shape="box"];
270 left2 [label=" 2. LValue Structure \n (a variable with \n an address)", shape="box"];
271 left3 [label=" 3. Built-In Type \n (int, float, etc.)", shape="box"];
272 output [label=" move to 'Assign' step ", shape="box"];
275 mem -> withaloc [label="Yes"];
276 mem -> noaloc [label="No"];
287 Where is the captured result stored?
288 ------------------------------------
290 ``LastValue`` holds the last result of the value printing. It is a class member
291 because it can be accessed even after subsequent inputs.
293 **Note:** If no value printing happens, then it is in an invalid state.
295 Improving Efficiency and User Experience
296 ----------------------------------------
298 The Value object is essentially used to create a mapping between an expression
299 'type' and the allocated 'memory'. Built-in types (bool, char, int,
300 float, double, etc.) are copyable. Their memory allocation size is known
301 and the Value object can introduce a small-buffer optimization.
302 In case of objects, the ``Value`` class provides reference-counted memory
305 The implementation maps the type as written and the Clang Type to be able to use
306 the preprocessor to synthesize the relevant cast operations. For example,
307 ``X(char, Char_S)``, where ``char`` is the type from the language's type system
308 and ``Char_S`` is the Clang builtin type which represents it. This mapping helps
309 to import execution results from the interpreter in a compiled program and vice
310 versa. The ``Value.h`` header file can be included at runtime and this is why it
311 has a very low token count and was developed with strict constraints in mind.
313 This also enables the user to receive the computed 'type' back in their code
314 and then transform the type into something else (e.g., re-cast a double into
315 a float). Normally, the compiler can handle these conversions transparently,
316 but in interpreter mode, the compiler cannot see all the 'from' and 'to' types,
317 so it cannot implicitly do the conversions. So this logic enables providing
318 these conversions on request.
320 On-request conversions can help improve the user experience, by allowing
321 conversion to a desired 'to' type, when the 'from' type is unknown or unclear.
323 Significance of this Feature
324 ----------------------------
326 The 'Value' object enables wrapping a memory region that comes from the
327 JIT, and bringing it back to the compiled code (and vice versa).
328 This is a very useful functionality when:
330 - connecting an interpreter to the compiled code, or
331 - connecting an interpreter in another language.
333 For example, this feature helps transport values across boundaries. A notable
334 example is the cppyy project code makes use of this feature to enable running C++
335 within Python. It enables transporting values/information between C++
338 Note: `cppyy <https://github.com/wlav/cppyy/>`_ is an automatic, run-time,
339 Python-to-C++ bindings generator, for calling C++ from Python and Python from C++.
340 It uses LLVM along with a C++ interpreter (e.g., Cling) to enable features like
341 run-time instantiation of C++ templates, cross-inheritance, callbacks,
342 auto-casting, transparent use of smart pointers, etc.
344 In a nutshell, this feature enables a new way of developing code, paving the
345 way for language interoperability and easier interactive programming.
347 Implementation Details
348 ======================
350 Interpreter as a REPL vs. as a Library
351 --------------------------------------
353 1 - If we're using the interpreter in interactive (REPL) mode, it will dump
354 the value (i.e., value printing).
356 .. code-block:: console
358 if (LastValue.isValid()) {
363 *V = std::move(LastValue);
367 2 - If we're using the interpreter as a library, then it will pass the value
370 Incremental AST Consumer
371 ------------------------
373 The ``IncrementalASTConsumer`` class wraps the original code generator
374 ``ASTConsumer`` and it performs a hook, to traverse all the top-level decls, to
375 look for expressions to synthesize, based on the ``isSemiMissing()`` condition.
377 If this condition is found to be true, then ``Interp.SynthesizeExpr()`` will be
380 **Note:** Following is a sample code snippet. Actual code may vary over time.
382 .. code-block:: console
385 if (auto *TSD = llvm::dyn_cast<TopLevelStmtDecl>(D);
386 TSD && TSD->isSemiMissing())
387 TSD->setStmt(Interp.SynthesizeExpr(cast<Expr>(TSD->getStmt())));
389 return Consumer->HandleTopLevelDecl(DGR);
391 The synthesizer will then choose the relevant expression, based on its type.
393 Communication between Compiled Code and Interpreted Code
394 --------------------------------------------------------
396 In Clang-Repl there is **interpreted code**, and this feature adds a 'value'
397 runtime that can talk to the **compiled code**.
399 Following is an example where the compiled code interacts with the interpreter
400 code. The execution results of an expression are stored in the object 'V' of
401 type Value. This value is then printed, effectively helping the interpreter
402 use a value from the compiled code.
404 .. code-block:: console
407 void setGlobal(int val) { Global = val; }
408 int getGlobal() { return Global; }
409 Interp.ParseAndExecute(“void setGlobal(int val);”);
410 Interp.ParseAndExecute(“int getGlobal();”);
412 Interp.ParseAndExecute(“getGlobal()”, &V);
413 std::cout << V.getAs<int>() << “\n”; // Prints 42
416 **Note:** Above is an example of interoperability between the compiled code and
417 the interpreted code. Interoperability between languages (e.g., C++ and Python)
421 2. Dump Captured Execution Results
422 ==================================
424 This feature helps create a temporary dump to display the value and type
425 (pretty print) of the desired data. This is a good way to interact with the
426 interpreter during interactive programming.
428 How value printing is simplified (Automatic Printf)
429 ---------------------------------------------------
431 The ``Automatic Printf`` feature makes it easy to display variable values during
432 program execution. Using the ``printf`` function repeatedly is not required.
433 This is achieved using an extension in the ``libclangInterpreter`` library.
435 To automatically print the value of an expression, simply write the expression
436 in the global scope **without a semicolon**.
439 :name: automaticprintf
440 :caption: Automatic PrintF
441 :alt: Shows how Automatic PrintF can be used
444 digraph "AutomaticPrintF" {
447 graph [fontname="Verdana", fontsize="12"];
448 node [fontname="Verdana", fontsize="12"];
449 edge [fontname="Sans", fontsize="9"];
451 manual [label=" Manual PrintF ", shape="box"];
452 int1 [label=" int ( &) 42 ", shape="box"]
453 auto [label=" Automatic PrintF ", shape="box"];
454 int2 [label=" int ( &) 42 ", shape="box"]
456 auto -> int2 [label="int x = 42; \n x"];
457 manual -> int1 [label="int x = 42; \n printf("(int &) %d \\n", x);"];
461 Significance of this feature
462 ----------------------------
464 Inspired by a similar implementation in `Cling <https://github.com/root-project/cling>`_,
465 this feature added to upstream Clang repo has essentially extended the syntax of
466 C++, so that it can be more helpful for people that are writing code for data
467 science applications.
469 This is useful, for example, when you want to experiment with a set of values
470 against a set of functions, and you'd like to know the results right away.
471 This is similar to how Python works (hence its popularity in data science
472 research), but the superior performance of C++, along with this flexibility
473 makes it a more attractive option.
475 Implementation Details
476 ======================
481 The Interpreter in Clang-Repl (``Interpreter.cpp``) includes the function
482 ``ParseAndExecute()`` that can accept a 'Value' parameter to capture the result.
483 But if the value parameter is made optional and it is omitted (i.e., that the
484 user does not want to utilize it elsewhere), then the last value can be
485 validated and pushed into the ``dump()`` function.
489 :caption: Parsing Mechanism
490 :alt: Shows the Parsing Mechanism for Pretty Printing
494 digraph "prettyprint" {
496 graph [fontname="Verdana", fontsize="12"];
497 node [fontname="Verdana", fontsize="12"];
498 edge [fontname="Verdana", fontsize="9"];
500 parse [label=" ParseAndExecute() \n in Clang ", shape="box"];
501 capture [label=" Capture 'Value' parameter \n for processing? ", shape="diamond"];
502 use [label=" Use for processing ", shape="box"];
503 dump [label=" Validate and push \n to dump()", shape="box"];
504 callp [label=" call print() function ", shape="box"];
505 type [label=" Print the Type \n ReplPrintTypeImpl()", shape="box"];
506 data [label=" Print the Data \n ReplPrintDataImpl() ", shape="box"];
507 output [label=" Output Pretty Print \n to the user ", shape="box", fontcolor=white, fillcolor="#3333ff", style=filled];
509 parse -> capture [label="Optional 'Value' Parameter"];
510 capture -> use [label="Yes"];
512 capture -> dump [label="No"];
520 **Note:** Following is a sample code snippet. Actual code may vary over time.
522 .. code-block:: console
524 llvm::Error Interpreter::ParseAndExecute(llvm::StringRef Code, Value *V) {
526 auto PTU = Parse(Code);
528 return PTU.takeError();
530 if (llvm::Error Err = Execute(*PTU))
533 if (LastValue.isValid()) {
538 *V = std::move(LastValue);
540 return llvm::Error::success();
543 The ``dump()`` function (in ``value.cpp``) calls the ``print()`` function.
545 Printing the Data and Type are handled in their respective functions:
546 ``ReplPrintDataImpl()`` and ``ReplPrintTypeImpl()``.
548 Annotation Token (annot_repl_input_end)
549 ---------------------------------------
551 This feature uses a new token (``annot_repl_input_end``) to consider printing the
552 value of an expression if it doesn't end with a semicolon. When parsing an
553 Expression Statement, if the last semicolon is missing, then the code will
554 pretend that there one and set a marker there for later utilization, and
557 A semicolon is normally required in C++, but this feature expands the C++
558 syntax to handle cases where a missing semicolon is expected (i.e., when
559 handling an expression statement). It also makes sure that an error is not
560 generated for the missing semicolon in this specific case.
562 This is accomplished by identifying the end position of the user input
563 (expression statement). This helps store and return the expression statement
564 effectively, so that it can be printed (displayed to the user automatically).
566 **Note:** This logic is only available for C++ for now, since part of the
567 implementation itself requires C++ features. Future versions may support more
570 .. code-block:: console
572 Token *CurTok = nullptr;
573 // If the semicolon is missing at the end of REPL input, consider if
574 // we want to do value printing. Note this is only enabled in C++ mode
575 // since part of the implementation requires C++ language features.
576 // Note we shouldn't eat the token since the callback needs it.
577 if (Tok.is(tok::annot_repl_input_end) && Actions.getLangOpts().CPlusPlus)
580 // Otherwise, eat the semicolon.
581 ExpectAndConsumeSemi(diag::err_expected_semi_after_expr);
583 StmtResult R = handleExprStmt(Expr, StmtCtx);
584 if (CurTok && !R.isInvalid())
585 CurTok->setAnnotationValue(R.get());
593 When Sema encounters the ``annot_repl_input_end`` token, it knows to transform
594 the AST before the real CodeGen process. It will consume the token and set a
595 'semi missing' bit in the respective decl.
597 .. code-block:: console
599 if (Tok.is(tok::annot_repl_input_end) &&
600 Tok.getAnnotationValue() != nullptr) {
601 ConsumeAnnotationToken();
602 cast<TopLevelStmtDecl>(DeclsInGroup.back())->setSemiMissing();
605 In the AST Consumer, traverse all the Top Level Decls, to look for expressions
606 to synthesize. If the current Decl is the Top Level Statement
607 Decl(``TopLevelStmtDecl``) and has a semicolon missing, then ask the interpreter
608 to synthesize another expression (an internal function call) to replace this
612 Detailed RFC and Discussion:
613 ----------------------------
615 For more technical details, community discussion and links to patches related
617 Please visit: `RFC on LLVM Discourse <https://discourse.llvm.org/t/rfc-handle-execution-results-in-clang-repl/68493>`_.
619 Some logic presented in the RFC (e.g. ValueGetter()) may be outdated,
620 compared to the final developed solution.
624 `Cling Transitions to LLVM's Clang-Repl <https://root.cern/blog/cling-in-llvm/>`_
626 `Moving (parts of) the Cling REPL in Clang <https://lists.llvm.org/pipermail/llvm-dev/2020-July/143257.html>`_
628 `GPU Accelerated Automatic Differentiation With Clad <https://arxiv.org/pdf/2203.06139.pdf>`_