1 <!DOCTYPE HTML PUBLIC
"-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
5 <META http-equiv=
"Content-Type" content=
"text/html; charset=ISO-8859-1">
6 <title>Clang - Features and Goals
</title>
7 <link type=
"text/css" rel=
"stylesheet" href=
"menu.css">
8 <link type=
"text/css" rel=
"stylesheet" href=
"content.css">
9 <style type=
"text/css">
14 <!--#include virtual="menu.html.incl"-->
18 <!--*************************************************************************-->
19 <h1>Clang - Features and Goals
</h1>
20 <!--*************************************************************************-->
23 This page describes the
<a href=
"index.html#goals">features and goals
</a> of
24 Clang in more detail and gives a more broad explanation about what we mean.
28 <p>End-User Features:
</p>
31 <li><a href=
"#performance">Fast compiles and low memory use
</a></li>
32 <li><a href=
"#expressivediags">Expressive diagnostics
</a></li>
33 <li><a href=
"#gcccompat">GCC compatibility
</a></li>
36 <p>Utility and Applications:
</p>
39 <li><a href=
"#libraryarch">Library based architecture
</a></li>
40 <li><a href=
"#diverseclients">Support diverse clients
</a></li>
41 <li><a href=
"#ideintegration">Integration with IDEs
</a></li>
42 <li><a href=
"#license">Use the LLVM 'BSD' License
</a></li>
45 <p>Internal Design and Implementation:
</p>
48 <li><a href=
"#real">A real-world, production quality compiler
</a></li>
49 <li><a href=
"#simplecode">A simple and hackable code base
</a></li>
50 <li><a href=
"#unifiedparser">A single unified parser for C, Objective C, C++,
51 and Objective C++
</a></li>
52 <li><a href=
"#conformance">Conformance with C/C++/ObjC and their
56 <!--*************************************************************************-->
57 <h2><a name=
"enduser">End-User Features
</a></h2>
58 <!--*************************************************************************-->
61 <!--=======================================================================-->
62 <h3><a name=
"performance">Fast compiles and Low Memory Use
</a></h3>
63 <!--=======================================================================-->
65 <p>A major focus of our work on clang is to make it fast, light and scalable.
66 The library-based architecture of clang makes it straight-forward to time and
67 profile the cost of each layer of the stack, and the driver has a number of
68 options for performance analysis. Many detailed benchmarks can be found online.
</p>
70 <p>Compile time performance is important, but when using clang as an API, often
71 memory use is even more so: the less memory the code takes the more code you can
72 fit into memory at a time (useful for whole program analysis tools, for
75 <p>In addition to being efficient when pitted head-to-head against GCC in batch
76 mode, clang is built with a
<a href=
"#libraryarch">library based
77 architecture
</a> that makes it relatively easy to adapt it and build new tools
78 with it. This means that it is often possible to apply out-of-the-box thinking
79 and novel techniques to improve compilation in various ways.
</p>
82 <!--=======================================================================-->
83 <h3><a name=
"expressivediags">Expressive Diagnostics
</a></h3>
84 <!--=======================================================================-->
86 <p>In addition to being fast and functional, we aim to make Clang extremely user
87 friendly. As far as a command-line compiler goes, this basically boils down to
88 making the diagnostics (error and warning messages) generated by the compiler
89 be as useful as possible. There are several ways that we do this, but the
90 most important are pinpointing exactly what is wrong in the program,
91 highlighting related information so that it is easy to understand at a glance,
92 and making the wording as clear as possible.
</p>
94 <p>Here is one simple example that illustrates the quality of Clang diagnostic:
</p>
97 $
<b>clang -fsyntax-only t.c
</b>
98 t.c:
7:
39: error: invalid operands to binary expression ('int' and 'struct A')
99 <span style=
"color:darkgreen"> return y + func(y ? ((SomeA.X +
40) + SomeA) /
42 + SomeA.X : SomeA.X);
</span>
100 <span style=
"color:blue"> ~~~~~~~~~~~~~~ ^ ~~~~~
</span>
103 <p>Here you can see that you don't even need to see the original source code to
104 understand what is wrong based on the Clang error: Because Clang prints a
105 caret, you know exactly
<em>which
</em> plus it is complaining about. The range
106 information highlights the left and right side of the plus which makes it
107 immediately obvious what the compiler is talking about, which is very useful for
108 cases involving precedence issues and many other situations.
</p>
110 <p>Clang diagnostics are very polished and have many features. For more
111 information and examples, please see the
<a href=
"diagnostics.html">Expressive
112 Diagnostics
</a> page.
</p>
114 <!--=======================================================================-->
115 <h3><a name=
"gcccompat">GCC Compatibility
</a></h3>
116 <!--=======================================================================-->
118 <p>GCC is currently the defacto-standard open source compiler today, and it
119 routinely compiles a huge volume of code. GCC supports a huge number of
120 extensions and features (many of which are undocumented) and a lot of
121 code and header files depend on these features in order to build.
</p>
123 <p>While it would be nice to be able to ignore these extensions and focus on
124 implementing the language standards to the letter, pragmatics force us to
125 support the GCC extensions that see the most use. Many users just want their
126 code to compile, they don't care to argue about whether it is pedantically C99
129 <p>As mentioned above, all
130 extensions are explicitly recognized as such and marked with extension
131 diagnostics, which can be mapped to warnings, errors, or just ignored.
135 <!--*************************************************************************-->
136 <h2><a name=
"applications">Utility and Applications
</a></h2>
137 <!--*************************************************************************-->
139 <!--=======================================================================-->
140 <h3><a name=
"libraryarch">Library Based Architecture
</a></h3>
141 <!--=======================================================================-->
143 <p>A major design concept for clang is its use of a library-based
144 architecture. In this design, various parts of the front-end can be cleanly
145 divided into separate libraries which can then be mixed up for different needs
146 and uses. In addition, the library-based approach encourages good interfaces
147 and makes it easier for new developers to get involved (because they only need
148 to understand small pieces of the big picture).
</p>
151 "The world needs better compiler tools, tools which are built as libraries.
152 This design point allows reuse of the tools in new and novel ways. However,
153 building the tools as libraries isn't enough: they must have clean APIs, be as
154 decoupled from each other as possible, and be easy to modify/extend. This
155 requires clean layering, decent design, and keeping the libraries independent of
156 any specific client."</p></blockquote>
159 Currently, clang is divided into the following libraries and tool:
163 <li><b>libsupport
</b> - Basic support library, from LLVM.
</li>
164 <li><b>libsystem
</b> - System abstraction library, from LLVM.
</li>
165 <li><b>libbasic
</b> - Diagnostics, SourceLocations, SourceBuffer abstraction,
166 file system caching for input source files.
</li>
167 <li><b>libast
</b> - Provides classes to represent the C AST, the C type system,
168 builtin functions, and various helpers for analyzing and manipulating the
169 AST (visitors, pretty printers, etc).
</li>
170 <li><b>liblex
</b> - Lexing and preprocessing, identifier hash table, pragma
171 handling, tokens, and macro expansion.
</li>
172 <li><b>libparse
</b> - Parsing. This library invokes coarse-grained 'Actions'
173 provided by the client (e.g. libsema builds ASTs) but knows nothing about
174 ASTs or other client-specific data structures.
</li>
175 <li><b>libsema
</b> - Semantic Analysis. This provides a set of parser actions
176 to build a standardized AST for programs.
</li>
177 <li><b>libcodegen
</b> - Lower the AST to LLVM IR for optimization
& code
179 <li><b>librewrite
</b> - Editing of text buffers (important for code rewriting
180 transformation, like refactoring).
</li>
181 <li><b>libanalysis
</b> - Static analysis support.
</li>
182 <li><b>clang
</b> - A driver program, client of the libraries at various
186 <p>As an example of the power of this library based design.... If you wanted to
187 build a preprocessor, you would take the Basic and Lexer libraries. If you want
188 an indexer, you would take the previous two and add the Parser library and
189 some actions for indexing. If you want a refactoring, static analysis, or
190 source-to-source compiler tool, you would then add the AST building and
191 semantic analyzer libraries.
</p>
193 <p>For more information about the low-level implementation details of the
194 various clang libraries, please see the
<a href=
"docs/InternalsManual.html">
195 clang Internals Manual
</a>.
</p>
197 <!--=======================================================================-->
198 <h3><a name=
"diverseclients">Support Diverse Clients
</a></h3>
199 <!--=======================================================================-->
201 <p>Clang is designed and built with many grand plans for how we can use it. The
202 driving force is the fact that we use C and C++ daily, and have to suffer due to
203 a lack of good tools available for it. We believe that the C and C++ tools
204 ecosystem has been significantly limited by how difficult it is to parse and
205 represent the source code for these languages, and we aim to rectify this
206 problem in clang.
</p>
208 <p>The problem with this goal is that different clients have very different
209 requirements. Consider code generation, for example: a simple front-end that
210 parses for code generation must analyze the code for validity and emit code
211 in some intermediate form to pass off to a optimizer or backend. Because
212 validity analysis and code generation can largely be done on the fly, there is
213 not hard requirement that the front-end actually build up a full AST for all
214 the expressions and statements in the code. TCC and GCC are examples of
215 compilers that either build no real AST (in the former case) or build a stripped
216 down and simplified AST (in the later case) because they focus primarily on
219 <p>On the opposite side of the spectrum, some clients (like refactoring) want
220 highly detailed information about the original source code and want a complete
221 AST to describe it with. Refactoring wants to have information about macro
222 expansions, the location of every paren expression '(((x)))' vs 'x', full
223 position information, and much more. Further, refactoring wants to look
224 <em>across the whole program
</em> to ensure that it is making transformations
225 that are safe. Making this efficient and getting this right requires a
226 significant amount of engineering and algorithmic work that simply are
227 unnecessary for a simple static compiler.
</p>
229 <p>The beauty of the clang approach is that it does not restrict how you use it.
230 In particular, it is possible to use the clang preprocessor and parser to build
231 an extremely quick and light-weight on-the-fly code generator (similar to TCC)
232 that does not build an AST at all. As an intermediate step, clang supports
233 using the current AST generation and semantic analysis code and having a code
234 generation client free the AST for each function after code generation. Finally,
235 clang provides support for building and retaining fully-fledged ASTs, and even
236 supports writing them out to disk.
</p>
238 <p>Designing the libraries with clean and simple APIs allows these high-level
239 policy decisions to be determined in the client, instead of forcing
"one true
240 way" in the implementation of any of these libraries. Getting this right is
241 hard, and we don't always get it right the first time, but we fix any problems
242 when we realize we made a mistake.
</p>
244 <!--=======================================================================-->
245 <h3 id=
"ideintegration">Integration with IDEs
</h3>
246 <!--=======================================================================-->
249 We believe that Integrated Development Environments (IDE's) are a great way
250 to pull together various pieces of the development puzzle, and aim to make clang
251 work well in such an environment. The chief advantage of an IDE is that they
252 typically have visibility across your entire project and are long-lived
253 processes, whereas stand-alone compiler tools are typically invoked on each
254 individual file in the project, and thus have limited scope.
</p>
256 <p>There are many implications of this difference, but a significant one has to
257 do with efficiency and caching: sharing an address space across different files
258 in a project, means that you can use intelligent caching and other techniques to
259 dramatically reduce analysis/compilation time.
</p>
261 <p>A further difference between IDEs and batch compiler is that they often
262 impose very different requirements on the front-end: they depend on high
263 performance in order to provide a
"snappy" experience, and thus really want
264 techniques like
"incremental compilation",
"fuzzy parsing", etc. Finally, IDEs
265 often have very different requirements than code generation, often requiring
266 information that a codegen-only frontend can throw away. Clang is
267 specifically designed and built to capture this information.
271 <!--=======================================================================-->
272 <h3><a name=
"license">Use the LLVM 'Apache
2' License
</a></h3>
273 <!--=======================================================================-->
275 <p>We actively intend for clang (and LLVM as a whole) to be used for
276 commercial projects, not only as a stand-alone compiler but also as a library
277 embedded inside a proprietary application. We feel that the license encourages
278 contributors to pick up the source and work with it, and believe that those
279 individuals and organizations will contribute back their work if they do not
280 want to have to maintain a fork forever (which is time consuming and expensive
281 when merges are involved). Further, nobody makes money on compilers these days,
282 but many people need them to get bigger goals accomplished: it makes sense for
283 everyone to work together.
</p>
285 <p>For more information about the LLVM/clang license, please see the
<a
286 href=
"https://llvm.org/docs/DeveloperPolicy.html#copyright-license-and-patents">LLVM License
287 Description
</a> for more information.
</p>
291 <!--*************************************************************************-->
292 <h2><a name=
"design">Internal Design and Implementation
</a></h2>
293 <!--*************************************************************************-->
295 <!--=======================================================================-->
296 <h3><a name=
"real">A real-world, production quality compiler
</a></h3>
297 <!--=======================================================================-->
300 Clang is designed and built by experienced compiler developers who are
301 increasingly frustrated with the problems that existing open source
302 compilers have. Clang is carefully and thoughtfully designed and
303 built to provide the foundation of a whole new generation of
304 C/C++/Objective C development tools, and we intend for it to be
305 production quality.
</p>
307 <p>Being a production quality compiler means many things: it means being high
308 performance, being solid and (relatively) bug free, and it means eventually
309 being used and depended on by a broad range of people. While we are still in
310 the early development stages, we strongly believe that this will become a
313 <!--=======================================================================-->
314 <h3><a name=
"simplecode">A simple and hackable code base
</a></h3>
315 <!--=======================================================================-->
317 <p>Our goal is to make it possible for anyone with a basic understanding
318 of compilers and working knowledge of the C/C++/ObjC languages to understand and
319 extend the clang source base. A large part of this falls out of our decision to
320 make the AST mirror the languages as closely as possible: you have your friendly
321 if statement, for statement, parenthesis expression, structs, unions, etc, all
322 represented in a simple and explicit way.
</p>
324 <p>In addition to a simple design, we work to make the source base approachable
325 by commenting it well, including citations of the language standards where
326 appropriate, and designing the code for simplicity. Beyond that, clang offers
327 a set of AST dumpers, printers, and visualizers that make it easy to put code in
328 and see how it is represented.
</p>
330 <!--=======================================================================-->
331 <h3><a name=
"unifiedparser">A single unified parser for C, Objective C, C++,
332 and Objective C++
</a></h3>
333 <!--=======================================================================-->
335 <p>Clang is the
"C Language Family Front-end", which means we intend to support
336 the most popular members of the C family. We are convinced that the right
337 parsing technology for this class of languages is a hand-built recursive-descent
338 parser. Because it is plain C++ code, recursive descent makes it very easy for
339 new developers to understand the code, it easily supports ad-hoc rules and other
340 strange hacks required by C/C++, and makes it straight-forward to implement
341 excellent diagnostics and error recovery.
</p>
343 <p>We believe that implementing C/C++/ObjC in a single unified parser makes the
344 end result easier to maintain and evolve than maintaining a separate C and C++
345 parser which must be bugfixed and maintained independently of each other.
</p>
347 <!--=======================================================================-->
348 <h3><a name=
"conformance">Conformance with C/C++/ObjC and their
350 <!--=======================================================================-->
352 <p>When you start work on implementing a language, you find out that there is a
353 huge gap between how the language works and how most people understand it to
354 work. This gap is the difference between a normal programmer and a (scary?
355 super-natural?)
"language lawyer", who knows the ins and outs of the language
356 and can grok standardese with ease.
</p>
358 <p>In practice, being conformant with the languages means that we aim to support
359 the full language, including the dark and dusty corners (like trigraphs,
360 preprocessor arcana, C99 VLAs, etc). Where we support extensions above and
361 beyond what the standard officially allows, we make an effort to explicitly call
362 this out in the code and emit warnings about it (which are disabled by default,
363 but can optionally be mapped to either warnings or errors), allowing you to use
364 clang in
"strict" mode if you desire.
</p>