1 ==========================
2 Clang Transformer Tutorial
3 ==========================
5 A tutorial on how to write a source-to-source translation tool using Clang Transformer.
10 What is Clang Transformer?
11 --------------------------
13 Clang Transformer is a framework for writing C++ diagnostics and program
14 transformations. It is built on the clang toolchain and the LibTooling library,
15 but aims to hide much of the complexity of clang's native, low-level libraries.
17 The core abstraction of Transformer is the *rewrite rule*, which specifies how
18 to change a given program pattern into a new form. Here are some examples of
19 tasks you can achieve with Transformer:
21 * warn against using the name ``MkX`` for a declared function,
22 * change ``MkX`` to ``MakeX``, where ``MkX`` is the name of a declared function,
23 * change ``s.size()`` to ``Size(s)``, where ``s`` is a ``string``,
24 * collapse ``e.child().m()`` to ``e.m()``, for any expression ``e`` and method named
27 All of the examples have a common form: they identify a pattern that is the
28 target of the transformation, they specify an *edit* to the code identified by
29 the pattern, and their pattern and edit refer to common variables, like ``s``,
30 ``e``, and ``m``, that range over code fragments. Our first and second examples also
31 specify constraints on the pattern that aren't apparent from the syntax alone,
32 like "``s`` is a ``string``." Even the first example ("warn ...") shares this form,
33 even though it doesn't change any of the code -- it's "edit" is simply a no-op.
35 Transformer helps users succinctly specify rules of this sort and easily execute
36 them locally over a collection of files, apply them to selected portions of
37 a codebase, or even bundle them as a clang-tidy check for ongoing application.
39 Who is Clang Transformer for?
40 -----------------------------
42 Clang Transformer is for developers who want to write clang-tidy checks or write
43 tools to modify a large number of C++ files in (roughly) the same way. What
44 qualifies as "large" really depends on the nature of the change and your
45 patience for repetitive editing. In our experience, automated solutions become
46 worthwhile somewhere between 100 and 500 files.
51 Patterns in Transformer are expressed with :doc:`clang's AST matchers <LibASTMatchers>`.
52 Matchers are a language of combinators for describing portions of a clang
53 Abstract Syntax Tree (AST). Since clang's AST includes complete type information
54 (within the limits of single `Translation Unit (TU)`_,
55 these patterns can even encode rich constraints on the type properties of AST
58 .. _`Translation Unit (TU)`: https://en.wikipedia.org/wiki/Translation_unit_\(programming\)
60 We assume a familiarity with the clang AST and the corresponding AST matchers
61 for the purpose of this tutorial. Users who are unfamiliar with either are
62 encouraged to start with the recommended references in `Related Reading`_.
64 Example: style-checking names
65 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
67 Assume you have a style-guide rule which forbids functions from being named
68 "MkX" and you want to write a check that catches any violations of this rule. We
69 can express this a Transformer rewrite rule:
73 makeRule(functionDecl(hasName("MkX").bind("fun"),
74 noopEdit(node("fun")),
75 cat("The name ``MkX`` is not allowed for functions; please rename"));
77 ``makeRule`` is our go-to function for generating rewrite rules. It takes three
78 arguments: the pattern, the edit, and (optionally) an explanatory note. In our
79 example, the pattern (``functionDecl(...)``) identifies the declaration of the
80 function ``MkX``. Since we're just diagnosing the problem, but not suggesting a
81 fix, our edit is an no-op. But, it contains an *anchor* for the diagnostic
82 message: ``node("fun")`` says to associate the message with the source range of
83 the AST node bound to "fun"; in this case, the ill-named function declaration.
84 Finally, we use ``cat`` to build a message that explains the change. Regarding the
85 name ``cat`` -- we'll discuss it in more detail below, but suffice it to say that
86 it can also take multiple arguments and concatenate their results.
88 Note that the result of ``makeRule`` is a value of type
89 ``clang::transformer::RewriteRule``, but most users don't need to care about the
92 Example: renaming a function
93 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
95 Now, let's extend this example to a *transformation*; specifically, the second
100 makeRule(declRefExpr(to(functionDecl(hasName("MkX")))),
101 changeTo(cat("MakeX")),
102 cat("MkX has been renamed MakeX"));
104 In this example, the pattern (``declRefExpr(...)``) identifies any *reference* to
105 the function ``MkX``, rather than the declaration itself, as in our previous
106 example. Our edit (``changeTo(...)``) says to *change* the code matched by the
107 pattern *to* the text "MakeX". Finally, we use ``cat`` again to build a message
108 that explains the change.
110 Here are some example changes that this rule would make:
112 +--------------------------+----------------------------+
113 | Original | Result |
114 +==========================+============================+
115 | ``X x = MkX(3);`` | ``X x = MakeX(3);`` |
116 +--------------------------+----------------------------+
117 | ``CallFactory(MkX, 3);`` | ``CallFactory(MakeX, 3);`` |
118 +--------------------------+----------------------------+
119 | ``auto f = MkX;`` | ``auto f = MakeX;`` |
120 +--------------------------+----------------------------+
122 Example: method to function
123 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
125 Next, let's write a rule to replace a method call with a (free) function call,
126 applied to the original method call's target object. Specifically, "change
127 ``s.size()`` to ``Size(s)``, where ``s`` is a ``string``." We start with a simpler
128 change that ignores the type of ``s``. That is, it will modify *any* method call
129 where the method is named "size":
133 llvm::StringRef s = "str";
137 callee(cxxMethodDecl(hasName("size")))),
138 changeTo(cat("Size(", node(s), ")")),
139 cat("Method ``size`` is deprecated in favor of free function ``Size``"));
141 We express the pattern with the given AST matcher, which binds the method call's
142 target to ``s`` [#f1]_. For the edit, we again use ``changeTo``, but this
143 time we construct the term from multiple parts, which we compose with ``cat``. The
144 second part of our term is ``node(s)``, which selects the source code
145 corresponding to the AST node ``s`` that was bound when a match was found in the
146 AST for our rule's pattern. ``node(s)`` constructs a ``RangeSelector``, which, when
147 used in ``cat``, indicates that the selected source should be inserted in the
148 output at that point.
150 Now, we probably don't want to rewrite *all* invocations of "size" methods, just
151 those on ``std::string``\ s. We can achieve this change simply by refining our
152 matcher. The rest of the rule remains unchanged:
156 llvm::StringRef s = "str";
159 on(expr(hasType(namedDecl(hasName("std::string"))))
161 callee(cxxMethodDecl(hasName("size")))),
162 changeTo(cat("Size(", node(s), ")")),
163 cat("Method ``size`` is deprecated in favor of free function ``Size``"));
165 Example: rewriting method calls
166 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
168 In this example, we delete an "intermediary" method call in a string of
169 invocations. This scenario can arise, for example, if you want to collapse a
170 substructure into its parent.
174 llvm::StringRef e = "expr", m = "member";
175 auto child_call = cxxMemberCallExpr(on(expr().bind(e)),
176 callee(cxxMethodDecl(hasName("child"))));
177 makeRule(cxxMemberCallExpr(on(child_call), callee(memberExpr().bind(m)),
178 changeTo(cat(e, ".", member(m), "()"))),
179 cat("``child`` accessor is being removed; call ",
180 member(m), " directly on parent"));
182 This rule isn't quite what we want: it will rewrite ``my_object.child().foo()`` to
183 ``my_object.foo()``, but it will also rewrite ``my_ptr->child().foo()`` to
184 ``my_ptr.foo()``, which is not what we intend. We could fix this by restricting
185 the pattern with ``not(isArrow())`` in the definition of ``child_call``. Yet, we
186 *want* to rewrite calls through pointers.
188 To capture this idiom, we provide the ``access`` combinator to intelligently
189 construct a field/method access. In our example, the member access is expressed
194 access(e, cat(member(m)))
196 The first argument specifies the object being accessed and the second, a
197 description of the field/method name. In this case, we specify that the method
198 name should be copied from the source -- specifically, the source range of ``m``'s
199 member. To construct the method call, we would use this expression in ``cat``:
203 cat(access(e, cat(member(m))), "()")
205 Reference: ranges, stencils, edits, rules
206 -----------------------------------------
208 The above examples demonstrate just the basics of rewrite rules. Every element
209 we touched on has more available constructors: range selectors, stencils, edits
210 and rules. In this section, we'll briefly review each in turn, with references
211 to the source headers for up-to-date information. First, though, we clarify what
212 rewrite rules are actually rewriting.
214 Rewriting ASTs to... Text?
215 ^^^^^^^^^^^^^^^^^^^^^^^^^^
217 The astute reader may have noticed that we've been somewhat vague in our
218 explanation of what the rewrite rules are actually rewriting. We've referred to
219 "code", but code can be represented both as raw source text and as an abstract
220 syntax tree. So, which one is it?
222 Ideally, we'd be rewriting the input AST to a new AST, but clang's AST is not
223 terribly amenable to this kind of transformation. So, we compromise: we express
224 our patterns and the names that they bind in terms of the AST, but our changes
225 in terms of source code text. We've designed Transformer's language to bridge
226 the gap between the two representations, in an attempt to minimize the user's
227 need to reason about source code locations and other, low-level syntactic
233 Transformer provides a small API for describing source ranges: the
234 ``RangeSelector`` combinators. These ranges are most commonly used to specify the
235 source code affected by an edit and to extract source code in constructing new
238 Roughly, there are two kinds of range combinators: ones that select a source
239 range based on the AST, and others that combine existing ranges into new ranges.
240 For example, ``node`` selects the range of source spanned by a particular AST
241 node, as we've seen, while ``after`` selects the (empty) range located immediately
242 after its argument range. So, ``after(node("id"))`` is the empty range immediately
243 following the AST node bound to ``id``.
245 For the full collection of ``RangeSelector``\ s, see the header,
246 `clang/Tooling/Transformer/RangeSelector.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RangeSelector.h>`_
251 Transformer offers a large and growing collection of combinators for
252 constructing output. Above, we demonstrated ``cat``, the core function for
253 constructing stencils. It takes a series of arguments, of three possible kinds:
255 #. Raw text, to be copied directly to the output.
256 #. Selector: specified with a ``RangeSelector``, indicates a range of source text
257 to copy to the output.
258 #. Builder: an operation that constructs a code snippet from its arguments. For
259 example, the ``access`` function we saw above.
261 Data of these different types are all represented (generically) by a ``Stencil``.
262 ``cat`` takes text and ``RangeSelector``\ s directly as arguments, rather than
263 requiring that they be constructed with a builder; other builders are
264 constructed explicitly.
266 In general, ``Stencil``\ s produce text from a match result. So, they are not
267 limited to generating source code, but can also be used to generate diagnostic
268 messages that reference (named) elements of the matched code, like we saw in the
269 example of rewriting method calls.
271 Further details of the ``Stencil`` type are documented in the header file
272 `clang/Tooling/Transformer/Stencil.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/Stencil.h>`_.
277 Transformer supports additional forms of edits. First, in a ``changeTo``, we can
278 specify the particular portion of code to be replaced, using the same
279 ``RangeSelector`` we saw earlier. For example, we could change the function name
280 in a function declaration with:
284 makeRule(functionDecl(hasName("bad")).bind(f),
285 changeTo(name(f), cat("good")),
286 cat("bad is now good"));
288 We also provide simpler editing primitives for insertion and deletion:
289 ``insertBefore``, ``insertAfter`` and ``remove``. These can all be found in the header
291 `clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_.
293 We are not limited one edit per match found. Some situations require making
294 multiple edits for each match. For example, suppose we wanted to swap two
295 arguments of a function call.
297 For this, we provide an overload of ``makeRule`` that takes a list of edits,
298 rather than just a single one. Our example might look like:
302 makeRule(callExpr(...),
303 {changeTo(node(arg0), cat(node(arg2))),
304 changeTo(node(arg2), cat(node(arg0)))},
305 cat("swap the first and third arguments of the call"));
307 ``EditGenerator``\ s (Advanced)
308 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
310 The particular edits we've seen so far are all instances of the ``ASTEdit`` class,
311 or a list of such. But, not all edits can be expressed as ``ASTEdit``\ s. So, we
312 also support a very general signature for edit generators:
316 using EditGenerator = MatchConsumer<llvm::SmallVector<Edit, 1>>;
318 That is, an ``EditGenerator`` is function that maps a ``MatchResult`` to a set
319 of edits, or fails. This signature supports a very general form of computation
320 over match results. Transformer provides a number of functions for working with
321 ``EditGenerator``\ s, most notably
322 `flatten <https://github.com/llvm/llvm-project/blob/1fabe6e51917bcd7a1242294069c682fe6dffa45/clang/include/clang/Tooling/Transformer/RewriteRule.h#L165-L167>`_
323 ``EditGenerator``\ s, like list flattening. For the full list, see the header file
324 `clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_.
329 We can also compose multiple *rules*, rather than just edits within a rule,
330 using ``applyFirst``: it composes a list of rules as an ordered choice, where
331 Transformer applies the first rule whose pattern matches, ignoring others in the
332 list that follow. If the matchers are independent then order doesn't matter. In
333 that case, ``applyFirst`` is simply joining the set of rules into one.
335 The benefit of ``applyFirst`` is that, for some problems, it allows the user to
336 more concisely formulate later rules in the list, since their patterns need not
337 explicitly exclude the earlier patterns of the list. For example, consider a set
338 of rules that rewrite compound statements, where one rule handles the case of an
339 empty compound statement and the other handles non-empty compound statements.
340 With ``applyFirst``, these rules can be expressed compactly as:
345 makeRule(compoundStmt(statementCountIs(0)).bind("empty"), ...),
346 makeRule(compoundStmt().bind("non-empty"),...)
349 The second rule does not need to explicitly specify that the compound statement
350 is non-empty -- it follows from the rules position in ``applyFirst``. For more
351 complicated examples, this can lead to substantially more readable code.
353 Sometimes, a modification to the code might require the inclusion of a
354 particular header file. To this end, users can modify rules to specify include
355 directives with ``addInclude``.
357 For additional documentation on these functions, see the header file
358 `clang/Tooling/Transformer/RewriteRule.h <https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Transformer/RewriteRule.h>`_.
360 Using a RewriteRule as a clang-tidy check
361 -----------------------------------------
363 Transformer supports executing a rewrite rule as a
364 `clang-tidy <https://clang.llvm.org/extra/clang-tidy/>`_ check, with the class
365 ``clang::tidy::utils::TransformerClangTidyCheck``. It is designed to require
366 minimal code in the definition. For example, given a rule
367 ``MyCheckAsRewriteRule``, one can define a tidy check as follows:
371 class MyCheck : public TransformerClangTidyCheck {
373 MyCheck(StringRef Name, ClangTidyContext *Context)
374 : TransformerClangTidyCheck(MyCheckAsRewriteRule, Name, Context) {}
377 ``TransformerClangTidyCheck`` implements the virtual ``registerMatchers`` and
378 ``check`` methods based on your rule specification, so you don't need to implement
379 them yourself. If the rule needs to be configured based on the language options
380 and/or the clang-tidy configuration, it can be expressed as a function taking
381 these as parameters and (optionally) returning a ``RewriteRule``. This would be
382 useful, for example, for our method-renaming rule, which is parameterized by the
383 original name and the target. For details, see
384 `clang-tools-extra/clang-tidy/utils/TransformerClangTidyCheck.h <https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clang-tidy/utils/TransformerClangTidyCheck.h>`_
389 A good place to start understanding the clang AST and its matchers is with the
390 introductions on clang's site:
392 * :doc:`Introduction to the Clang AST <IntroductionToTheClangAST>`
393 * :doc:`Matching the Clang AST <LibASTMatchers>`
394 * `AST Matcher Reference <https://clang.llvm.org/docs/LibASTMatchersReference.html>`_
396 .. rubric:: Footnotes
398 .. [#f1] Technically, it binds it to the string "str", to which our
399 variable ``s`` is bound. But, the choice of that id string is
400 irrelevant, so elide the difference.