1 ===============================
2 ASTImporter: Merging Clang ASTs
3 ===============================
5 The ``ASTImporter`` class is part of Clang's core library, the AST library.
6 It imports nodes of an ``ASTContext`` into another ``ASTContext``.
8 In this document, we assume basic knowledge about the Clang AST. See the :doc:`Introduction
9 to the Clang AST <IntroductionToTheClangAST>` if you want to learn more
10 about how the AST is structured.
11 Knowledge about :doc:`matching the Clang AST <LibASTMatchers>` and the `reference for the matchers <https://clang.llvm.org/docs/LibASTMatchersReference.html>`_ are also useful.
19 ``ASTContext`` holds long-lived AST nodes (such as types and decls) that can be referred to throughout the semantic analysis of a file.
20 In some cases it is preferable to work with more than one ``ASTContext``.
21 For example, we'd like to parse multiple different files inside the same Clang tool.
22 It may be convenient if we could view the set of the resulting ASTs as if they were one AST resulting from the parsing of each file together.
23 ``ASTImporter`` provides the way to copy types or declarations from one ``ASTContext`` to another.
24 We refer to the context from which we import as the **"from" context** or *source context*; and the context into which we import as the **"to" context** or *destination context*.
26 Existing clients of the ``ASTImporter`` library are Cross Translation Unit (CTU) static analysis and the LLDB expression parser.
27 CTU static analysis imports a definition of a function if its definition is found in another translation unit (TU).
28 This way the analysis can breach out from the single TU limitation.
29 LLDB's ``expr`` command parses a user-defined expression, creates an ``ASTContext`` for that and then imports the missing definitions from the AST what we got from the debug information (DWARF, etc).
31 Algorithm of the import
32 -----------------------
34 Importing one AST node copies that node into the destination ``ASTContext``.
35 Why do we have to copy the node?
36 Isn't enough to insert the pointer to that node into the destination context?
37 One reason is that the "from" context may outlive the "to" context.
38 Also, the Clang AST consider nodes (or certain properties of nodes) equivalent if they have the same address!
40 The import algorithm has to ensure that the structurally equivalent nodes in the different translation units are not getting duplicated in the merged AST.
41 E.g. if we include the definition of the vector template (``#include <vector>``) in two translation units, then their merged AST should have only one node which represents the template.
42 Also, we have to discover *one definition rule* (ODR) violations.
43 For instance, if there is a class definition with the same name in both translation units, but one of the definition contains a different number of fields.
44 So, we look up existing definitions, and then we check the structural equivalency on those nodes.
45 The following pseudo-code demonstrates the basics of the import mechanism:
49 // Pseudo-code(!) of import:
50 ErrorOrDecl Import(Decl *FromD) {
51 Decl *ToDecl = nullptr;
52 FoundDeclsList = Look up all Decls in the "to" Ctx with the same name of FromD;
53 for (auto FoundDecl : FoundDeclsList) {
54 if (StructurallyEquivalentDecls(FoundDecl, FromD)) {
56 Mark FromD as imported;
63 if (FoundDeclsList is empty) {
64 Import dependent declarations and types of ToDecl;
65 ToDecl = create a new AST node in "to" Ctx;
66 Mark FromD as imported;
71 Two AST nodes are *structurally equivalent* if they are
73 - builtin types and refer to the same type, e.g. ``int`` and ``int`` are structurally equivalent,
74 - function types and all their parameters have structurally equivalent types,
75 - record types and all their fields in order of their definition have the same identifier names and structurally equivalent types,
76 - variable or function declarations and they have the same identifier name and their types are structurally equivalent.
78 We could extend the definition of structural equivalency to templates similarly.
80 If A and B are AST nodes and *A depends on B*, then we say that A is a **dependant** of B and B is a **dependency** of A.
81 The words "dependant" and "dependency" are nouns in British English.
82 Unfortunately, in American English, the adjective "dependent" is used for both meanings.
83 In this document, with the "dependent" adjective we always address the dependencies, the B node in the example.
88 Let's create a tool which uses the ASTImporter class!
89 First, we build two ASTs from virtual files; the content of the virtual files are synthesized from string literals:
93 std::unique_ptr<ASTUnit> ToUnit = buildASTFromCode(
94 "", "to.cc"); // empty file
95 std::unique_ptr<ASTUnit> FromUnit = buildASTFromCode(
104 The first AST corresponds to the destination ("to") context - which is empty - and the second for the source ("from") context.
105 Next, we define a matcher to match ``MyClass`` in the "from" context:
109 auto Matcher = cxxRecordDecl(hasName("MyClass"));
110 auto *From = getFirstDecl<CXXRecordDecl>(Matcher, FromUnit);
112 Now we create the Importer and do the import:
116 ASTImporter Importer(ToUnit->getASTContext(), ToUnit->getFileManager(),
117 FromUnit->getASTContext(), FromUnit->getFileManager(),
118 /*MinimalImport=*/true);
119 llvm::Expected<Decl *> ImportedOrErr = Importer.Import(From);
121 The ``Import`` call returns with ``llvm::Expected``, so, we must check for any error.
122 Please refer to the `error handling <https://llvm.org/docs/ProgrammersManual.html#recoverable-errors>`_ documentation for details.
126 if (!ImportedOrErr) {
127 llvm::Error Err = ImportedOrErr.takeError();
128 llvm::errs() << "ERROR: " << Err << "\n";
129 consumeError(std::move(Err));
133 If there's no error then we can get the underlying value.
134 In this example we will print the AST of the "to" context.
138 Decl *Imported = *ImportedOrErr;
139 Imported->getTranslationUnitDecl()->dump();
141 Since we set **minimal import** in the constructor of the importer, the AST will not contain the declaration of the members (once we run the test tool).
145 TranslationUnitDecl 0x68b9a8 <<invalid sloc>> <invalid sloc>
146 `-CXXRecordDecl 0x6c7e30 <line:2:7, col:13> col:13 class MyClass definition
147 `-DefinitionData pass_in_registers standard_layout trivially_copyable trivial literal
148 |-DefaultConstructor exists trivial needs_implicit
149 |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
150 |-MoveConstructor exists simple trivial needs_implicit
151 |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
152 |-MoveAssignment exists simple trivial needs_implicit
153 `-Destructor simple irrelevant trivial needs_implicit
155 We'd like to get the members too, so, we use ``ImportDefinition`` to copy the whole definition of ``MyClass`` into the "to" context.
156 Then we dump the AST again.
160 if (llvm::Error Err = Importer.ImportDefinition(From)) {
161 llvm::errs() << "ERROR: " << Err << "\n";
162 consumeError(std::move(Err));
165 llvm::errs() << "Imported definition.\n";
166 Imported->getTranslationUnitDecl()->dump();
168 This time the AST is going to contain the members too.
172 TranslationUnitDecl 0x68b9a8 <<invalid sloc>> <invalid sloc>
173 `-CXXRecordDecl 0x6c7e30 <line:2:7, col:13> col:13 class MyClass definition
174 |-DefinitionData pass_in_registers standard_layout trivially_copyable trivial literal
175 | |-DefaultConstructor exists trivial needs_implicit
176 | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
177 | |-MoveConstructor exists simple trivial needs_implicit
178 | |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
179 | |-MoveAssignment exists simple trivial needs_implicit
180 | `-Destructor simple irrelevant trivial needs_implicit
181 |-CXXRecordDecl 0x6c7f48 <col:7, col:13> col:13 implicit class MyClass
182 |-FieldDecl 0x6c7ff0 <line:3:9, col:13> col:13 m1 'int'
183 `-FieldDecl 0x6c8058 <line:4:9, col:13> col:13 m2 'int'
185 We can spare the call for ``ImportDefinition`` if we set up the importer to do a "normal" (not minimal) import.
189 ASTImporter Importer( .... /*MinimalImport=*/false);
191 With **normal import**, all dependent declarations are imported normally.
192 However, with minimal import, the dependent Decls are imported without definition, and we have to import their definition for each if we later need that.
194 Putting this all together here is how the source of the tool looks like:
198 #include "clang/AST/ASTImporter.h"
199 #include "clang/ASTMatchers/ASTMatchFinder.h"
200 #include "clang/ASTMatchers/ASTMatchers.h"
201 #include "clang/Tooling/Tooling.h"
203 using namespace clang;
204 using namespace tooling;
205 using namespace ast_matchers;
207 template <typename Node, typename Matcher>
208 Node *getFirstDecl(Matcher M, const std::unique_ptr<ASTUnit> &Unit) {
209 auto MB = M.bind("bindStr"); // Bind the to-be-matched node to a string key.
210 auto MatchRes = match(MB, Unit->getASTContext());
211 // We should have at least one match.
212 assert(MatchRes.size() >= 1);
213 // Get the first matched and bound node.
215 const_cast<Node *>(MatchRes[0].template getNodeAs<Node>("bindStr"));
221 std::unique_ptr<ASTUnit> ToUnit = buildASTFromCode(
223 std::unique_ptr<ASTUnit> FromUnit = buildASTFromCode(
231 auto Matcher = cxxRecordDecl(hasName("MyClass"));
232 auto *From = getFirstDecl<CXXRecordDecl>(Matcher, FromUnit);
234 ASTImporter Importer(ToUnit->getASTContext(), ToUnit->getFileManager(),
235 FromUnit->getASTContext(), FromUnit->getFileManager(),
236 /*MinimalImport=*/true);
237 llvm::Expected<Decl *> ImportedOrErr = Importer.Import(From);
238 if (!ImportedOrErr) {
239 llvm::Error Err = ImportedOrErr.takeError();
240 llvm::errs() << "ERROR: " << Err << "\n";
241 consumeError(std::move(Err));
244 Decl *Imported = *ImportedOrErr;
245 Imported->getTranslationUnitDecl()->dump();
247 if (llvm::Error Err = Importer.ImportDefinition(From)) {
248 llvm::errs() << "ERROR: " << Err << "\n";
249 consumeError(std::move(Err));
252 llvm::errs() << "Imported definition.\n";
253 Imported->getTranslationUnitDecl()->dump();
258 We may extend the ``CMakeLists.txt`` under let's say ``clang/tools`` with the build and link instructions:
262 add_clang_executable(astimporter-demo ASTImporterDemo.cpp)
263 clang_target_link_libraries(astimporter-demo
274 Then we can build and execute the new tool.
278 $ ninja astimporter-demo && ./bin/astimporter-demo
280 Errors during the import process
281 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
283 Normally, either the source or the destination context contains the definition of a declaration.
284 However, there may be cases when both of the contexts have a definition for a given symbol.
285 If these definitions differ, then we have a name conflict, in C++ it is known as ODR (one definition rule) violation.
286 Let's modify the previous tool we had written and try to import a ``ClassTemplateSpecializationDecl`` with a conflicting definition:
291 std::unique_ptr<ASTUnit> ToUnit = buildASTFromCode(
294 template <typename T>
296 // explicit specialization
298 struct X<int> { int i; };
301 ToUnit->enableSourceFileDiagnostics();
302 std::unique_ptr<ASTUnit> FromUnit = buildASTFromCode(
305 template <typename T>
307 // explicit specialization
309 struct X<int> { int i2; };
310 // field mismatch: ^^
313 FromUnit->enableSourceFileDiagnostics();
314 auto Matcher = classTemplateSpecializationDecl(hasName("X"));
315 auto *From = getFirstDecl<ClassTemplateSpecializationDecl>(Matcher, FromUnit);
316 auto *To = getFirstDecl<ClassTemplateSpecializationDecl>(Matcher, ToUnit);
318 ASTImporter Importer(ToUnit->getASTContext(), ToUnit->getFileManager(),
319 FromUnit->getASTContext(), FromUnit->getFileManager(),
320 /*MinimalImport=*/false);
321 llvm::Expected<Decl *> ImportedOrErr = Importer.Import(From);
322 if (!ImportedOrErr) {
323 llvm::Error Err = ImportedOrErr.takeError();
324 llvm::errs() << "ERROR: " << Err << "\n";
325 consumeError(std::move(Err));
326 To->getTranslationUnitDecl()->dump();
332 When we run the tool we have the following warning:
336 to.cc:7:14: warning: type 'X<int>' has incompatible definitions in different translation units [-Wodr]
337 struct X<int> { int i; };
339 to.cc:7:27: note: field has name 'i' here
340 struct X<int> { int i; };
342 from.cc:7:27: note: field has name 'i2' here
343 struct X<int> { int i2; };
346 Note, because of these diagnostics we had to call ``enableSourceFileDiagnostics`` on the ``ASTUnit`` objects.
348 Since we could not import the specified declaration (``From``), we get an error in the return value.
349 The AST does not contain the conflicting definition, so we are left with the original AST.
354 TranslationUnitDecl 0xe54a48 <<invalid sloc>> <invalid sloc>
355 |-ClassTemplateDecl 0xe91020 <to.cc:3:7, line:4:17> col:14 X
356 | |-TemplateTypeParmDecl 0xe90ed0 <line:3:17, col:26> col:26 typename depth 0 index 0 T
357 | |-CXXRecordDecl 0xe90f90 <line:4:7, col:17> col:14 struct X definition
358 | | |-DefinitionData empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
359 | | | |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
360 | | | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
361 | | | |-MoveConstructor exists simple trivial needs_implicit
362 | | | |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
363 | | | |-MoveAssignment exists simple trivial needs_implicit
364 | | | `-Destructor simple irrelevant trivial needs_implicit
365 | | `-CXXRecordDecl 0xe91270 <col:7, col:14> col:14 implicit struct X
366 | `-ClassTemplateSpecialization 0xe91340 'X'
367 `-ClassTemplateSpecializationDecl 0xe91340 <line:6:7, line:7:30> col:14 struct X definition
368 |-DefinitionData pass_in_registers aggregate standard_layout trivially_copyable pod trivial literal
369 | |-DefaultConstructor exists trivial needs_implicit
370 | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
371 | |-MoveConstructor exists simple trivial needs_implicit
372 | |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
373 | |-MoveAssignment exists simple trivial needs_implicit
374 | `-Destructor simple irrelevant trivial needs_implicit
375 |-TemplateArgument type 'int'
376 |-CXXRecordDecl 0xe91558 <col:7, col:14> col:14 implicit struct X
377 `-FieldDecl 0xe91600 <col:23, col:27> col:27 i 'int'
382 If there is a dependent node we have to import before we could import a given node then the import error associated to the dependency propagates to the dependant node.
383 Let's modify the previous example and import a ``FieldDecl`` instead of the ``ClassTemplateSpecializationDecl``.
387 auto Matcher = fieldDecl(hasName("i2"));
388 auto *From = getFirstDecl<FieldDecl>(Matcher, FromUnit);
390 In this case we can see that an error is associated (``getImportDeclErrorIfAny``) to the specialization also, not just to the field:
394 llvm::Expected<Decl *> ImportedOrErr = Importer.Import(From);
395 if (!ImportedOrErr) {
396 llvm::Error Err = ImportedOrErr.takeError();
397 consumeError(std::move(Err));
399 // check that the ClassTemplateSpecializationDecl is also marked as
401 auto *FromSpec = getFirstDecl<ClassTemplateSpecializationDecl>(
402 classTemplateSpecializationDecl(hasName("X")), FromUnit);
403 assert(Importer.getImportDeclErrorIfAny(FromSpec));
404 // Btw, the error is also set for the FieldDecl.
405 assert(Importer.getImportDeclErrorIfAny(From));
412 We may recognize an error during the import of a dependent node. However, by that time, we had already created the dependant.
413 In these cases we do not remove the existing erroneous node from the "to" context, rather we associate an error to that node.
414 Let's extend the previous example with another class ``Y``.
415 This class has a forward definition in the "to" context, but its definition is in the "from" context.
416 We'd like to import the definition, but it contains a member whose type conflicts with the type in the "to" context:
420 std::unique_ptr<ASTUnit> ToUnit = buildASTFromCode(
423 template <typename T>
425 // explicit specialization
427 struct X<int> { int i; };
432 ToUnit->enableSourceFileDiagnostics();
433 std::unique_ptr<ASTUnit> FromUnit = buildASTFromCode(
436 template <typename T>
438 // explicit specialization
440 struct X<int> { int i2; };
441 // field mismatch: ^^
443 class Y { void f() { X<int> xi; } };
446 FromUnit->enableSourceFileDiagnostics();
447 auto Matcher = cxxRecordDecl(hasName("Y"));
448 auto *From = getFirstDecl<CXXRecordDecl>(Matcher, FromUnit);
449 auto *To = getFirstDecl<CXXRecordDecl>(Matcher, ToUnit);
451 This time we create a shared_ptr for ``ASTImporterSharedState`` which owns the associated errors for the "to" context.
452 Note, there may be several different ASTImporter objects which import into the same "to" context but from different "from" contexts; they should share the same ``ASTImporterSharedState``.
453 (Also note, we have to include the corresponding ``ASTImporterSharedState.h`` header file.)
457 auto ImporterState = std::make_shared<ASTImporterSharedState>();
458 ASTImporter Importer(ToUnit->getASTContext(), ToUnit->getFileManager(),
459 FromUnit->getASTContext(), FromUnit->getFileManager(),
460 /*MinimalImport=*/false, ImporterState);
461 llvm::Expected<Decl *> ImportedOrErr = Importer.Import(From);
462 if (!ImportedOrErr) {
463 llvm::Error Err = ImportedOrErr.takeError();
464 consumeError(std::move(Err));
466 // ... but the node had been created.
467 auto *ToYDef = getFirstDecl<CXXRecordDecl>(
468 cxxRecordDecl(hasName("Y"), isDefinition()), ToUnit);
470 // An error is set for "ToYDef" in the shared state.
471 Optional<ImportError> OptErr =
472 ImporterState->getImportDeclErrorIfAny(ToYDef);
478 If we take a look at the AST, then we can see that the Decl with the definition is created, but the field is missing.
482 |-CXXRecordDecl 0xf66678 <line:9:7, col:13> col:13 class Y
483 `-CXXRecordDecl 0xf66730 prev 0xf66678 <:10:7, col:13> col:13 class Y definition
484 |-DefinitionData pass_in_registers empty aggregate standard_layout trivially_copyable pod trivial literal has_constexpr_non_copy_move_ctor can_const_default_init
485 | |-DefaultConstructor exists trivial constexpr needs_implicit defaulted_is_constexpr
486 | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
487 | |-MoveConstructor exists simple trivial needs_implicit
488 | |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
489 | |-MoveAssignment exists simple trivial needs_implicit
490 | `-Destructor simple irrelevant trivial needs_implicit
491 `-CXXRecordDecl 0xf66828 <col:7, col:13> col:13 implicit class Y
493 We do not remove the erroneous nodes because by the time when we recognize the error it is too late to remove the node, there may be additional references to that already in the AST.
494 This is aligned with the overall `design principle of the Clang AST <InternalsManual.html#immutability>`_: Clang AST nodes (types, declarations, statements, expressions, and so on) are generally designed to be **immutable once created**.
495 Thus, clients of the ASTImporter library should always check if there is any associated error for the node which they inspect in the destination context.
496 We recommend skipping the processing of those nodes which have an error associated with them.
498 Using the ``-ast-merge`` Clang front-end action
499 -----------------------------------------------
501 The ``-ast-merge <pch-file>`` command-line switch can be used to merge from the given serialized AST file.
502 This file represents the source context.
503 When this switch is present then each top-level AST node of the source context is being merged into the destination context.
504 If the merge was successful then ``ASTConsumer::HandleTopLevelDecl`` is called for the Decl.
505 This results that we can execute the original front-end action on the extended AST.
510 Let's consider the following three files:
532 Let's generate the AST files for the two source files:
536 $ clang -cc1 -emit-pch -o bar.ast bar.c
537 $ clang -cc1 -emit-pch -o main.ast main.c
539 Then, let's check how the merged AST would look like if we consider only the ``bar()`` function:
543 $ clang -cc1 -ast-merge bar.ast -ast-merge main.ast /dev/null -ast-dump
544 TranslationUnitDecl 0x12b0738 <<invalid sloc>> <invalid sloc>
545 |-FunctionDecl 0x12b1470 </path/bar.h:4:1, col:9> col:5 used bar 'int ()'
546 |-FunctionDecl 0x12b1538 prev 0x12b1470 </path/bar.c:3:1, line:5:1> line:3:5 used bar 'int ()'
547 | `-CompoundStmt 0x12b1608 <col:11, line:5:1>
548 | `-ReturnStmt 0x12b15f8 <line:4:3, col:10>
549 | `-IntegerLiteral 0x12b15d8 <col:10> 'int' 41
550 |-FunctionDecl 0x12b1648 prev 0x12b1538 </path/bar.h:4:1, col:9> col:5 used bar 'int ()'
552 We can inspect that the prototype of the function and the definition of it is merged into the same redeclaration chain.
553 What's more there is a third prototype declaration merged to the chain.
554 The functions are merged in a way that prototypes are added to the redecl chain if they refer to the same type, but we can have only one definition.
555 The first two declarations are from ``bar.ast``, the third is from ``main.ast``.
557 Now, let's create an object file from the merged AST:
561 $ clang -cc1 -ast-merge bar.ast -ast-merge main.ast /dev/null -emit-obj -o main.o
563 Next, we may call the linker and execute the created binary file.
567 $ clang -o a.out main.o
576 In the case of C++, the generation of the AST files and the way how we invoke the front-end is a bit different.
577 Assuming we have these three files:
601 We shall generate the AST files, merge them, create the executable and then run it:
605 $ clang++ -x c++-header -o foo.ast foo.cpp
606 $ clang++ -x c++-header -o main.ast main.cpp
607 $ clang++ -cc1 -x c++ -ast-merge foo.ast -ast-merge main.ast /dev/null -ast-dump
608 $ clang++ -cc1 -x c++ -ast-merge foo.ast -ast-merge main.ast /dev/null -emit-obj -o main.o
609 $ clang++ -o a.out main.o