1 ============================
2 PNaCl C/C++ Language Support
3 ============================
10 Source language support
11 =======================
13 The currently supported languages are C and C++. The PNaCl toolchain is
14 based on recent Clang, which fully supports C++11 and most of C11. A
15 detailed status of the language support is available `here
16 <http://clang.llvm.org/cxx_status.html>`_.
18 For information on using languages other than C/C++, see the :ref:`FAQ
19 section on other languages <other_languages>`.
21 As for the standard libraries, the PNaCl toolchain is currently based on
22 ``libc++``, and the ``newlib`` standard C library. ``libstdc++`` is also
23 supported but its use is discouraged; see :ref:`building_cpp_libraries`
29 Version information can be obtained:
31 * Clang/LLVM: run ``pnacl-clang -v``.
32 * ``newlib``: use the ``_NEWLIB_VERSION`` macro.
33 * ``libc++``: use the ``_LIBCPP_VERSION`` macro.
34 * ``libstdc++``: use the ``_GLIBCXX_VERSION`` macro.
36 Preprocessor definitions
37 ------------------------
39 When compiling C/C++ code, the PNaCl toolchain defines the ``__pnacl__``
40 macro. In addition, ``__native_client__`` is defined for compatibility
41 with other NaCl toolchains.
43 .. _memory_model_and_atomics:
45 Memory Model and Atomics
46 ========================
48 Memory Model for Concurrent Operations
49 --------------------------------------
51 The memory model offered by PNaCl relies on the same coding guidelines
52 as the C11/C++11 one: concurrent accesses must always occur through
53 atomic primitives (offered by `atomic intrinsics
54 <PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always
55 occur with the same size for the same memory location. Visibility of
56 stores is provided on a happens-before basis that relates memory
57 locations to each other as the C11/C++11 standards do.
59 Non-atomic memory accesses may be reordered, separated, elided or fused
60 according to C and C++'s memory model before the pexe is created as well
61 as after its creation. Accessing atomic memory location through
62 non-atomic primitives is :ref:`Undefined Behavior <undefined_behavior>`.
64 As in C11/C++11 some atomic accesses may be implemented with locks on
65 certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be
66 ``1``, signifying that all types are sometimes lock-free. The
67 ``is_lock_free`` methods and ``atomic_is_lock_free`` will return the
68 current platform's implementation at translation time. These macros,
69 methods and functions are in the C11 header ``<stdatomic.h>`` and the
70 C++11 header ``<atomic>``.
72 The PNaCl toolchain supports concurrent memory accesses through legacy
73 GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic
74 primitives and the underlying `GCCMM
75 <http://gcc.gnu.org/wiki/Atomic/GCCMM>`_ ``__atomic_*``
76 primitives. ``volatile`` memory accesses can also be used, though these
77 are discouraged. See `Volatile Memory Accesses`_.
79 PNaCl supports concurrency and parallelism with some restrictions:
81 * Threading is explicitly supported and has no restrictions over what
82 prevalent implementations offer. See `Threading`_.
84 * ``volatile`` and atomic operations are address-free (operations on the
85 same memory location via two different addresses work atomically), as
86 intended by the C11/C++11 standards. This is critical in supporting
87 synchronous "external modifications" such as mapping underlying memory
88 at multiple locations.
90 * Inter-process communication through shared memory is currently not
91 supported. See `Future Directions`_.
93 * Signal handling isn't supported, PNaCl therefore promotes all
94 primitives to cross-thread (instead of single-thread). This may change
95 at a later date. Note that using atomic operations which aren't
96 lock-free may lead to deadlocks when handling asynchronous
97 signals. See `Future Directions`_.
99 * Direct interaction with device memory isn't supported, and there is no
100 intent to support it. The embedding sandbox's runtime can offer APIs
101 to indirectly access devices.
103 Setting up the above mechanisms requires assistance from the embedding
104 sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup
105 can be done through regular C/C++ code.
107 Atomic Memory Ordering Constraints
108 ----------------------------------
110 Atomics follow the same ordering constraints as in regular C11/C++11,
111 but all accesses are promoted to sequential consistency (the strongest
112 memory ordering) at pexe creation time. We plan to support more of the
113 C11/C++11 memory orderings in the future.
115 Some additional restrictions, following the C11/C++11 standards:
117 - Atomic accesses must at least be naturally aligned.
118 - Some accesses may not actually be atomic on certain platforms,
119 requiring an implementation that uses global locks.
120 - An atomic memory location must always be accessed with atomic
121 primitives, and these primitives must always be of the same bit size
123 - Not all memory orderings are valid for all atomic operations.
125 Volatile Memory Accesses
126 ------------------------
128 The C11/C++11 standards mandate that ``volatile`` accesses execute in
129 program order (but are not fences, so other memory operations can
130 reorder around them), are not necessarily atomic, and can’t be
131 elided. They can be separated into smaller width accesses.
133 Before any optimizations occur, the PNaCl toolchain transforms
134 ``volatile`` loads and stores into sequentially consistent ``volatile``
135 atomic loads and stores, and applies regular compiler optimizations
136 along the above guidelines. This orders ``volatiles`` according to the
137 atomic rules, and means that fences (including ``__sync_synchronize``)
138 act in a better-defined manner. Regular memory accesses still do not
139 have ordering guarantees with ``volatile`` and atomic accesses, though
140 the internal representation of ``__sync_synchronize`` attempts to
141 prevent reordering of memory accesses to objects which may escape.
143 Relaxed ordering could be used instead, but for the first release it is
144 more conservative to apply sequential consistency. Future releases may
145 change what happens at compile-time, but already-released pexes will
146 continue using sequential consistency.
148 The PNaCl toolchain also requires that ``volatile`` accesses be at least
149 naturally aligned, and tries to guarantee this alignment.
151 The above guarantees ease the support of legacy (i.e. non-C11/C++11)
152 code, and combined with builtin fences these programs can do meaningful
153 cross-thread communication without changing code. They also better
154 reflect the original code's intent and guarantee better portability.
156 .. _language_support_threading:
161 Threading is explicitly supported through C11/C++11's threading
162 libraries as well as POSIX threads.
164 Communication between threads should use atomic primitives as described
165 in `Memory Model and Atomics`_.
167 ``setjmp`` and ``longjmp``
168 ==========================
170 PNaCl and NaCl support ``setjmp`` and ``longjmp`` without any
171 restrictions beyond C's.
173 C++ Exception Handling
174 ======================
176 PNaCl currently supports C++ exception handling through ``setjmp()`` and
177 ``longjmp()``, which can be enabled with the ``--pnacl-exceptions=sjlj``
178 linker flag. Exceptions are disabled by default so that faster and
179 smaller code is generated, and ``throw`` statements are replaced with
180 calls to ``abort()``. The usual ``-fno-exceptions`` flag is also
181 supported. PNaCl will support full zero-cost exception handling in the
184 NaCl supports full zero-cost C++ exception handling.
189 Inline assembly isn't supported by PNaCl because it isn't portable. The
190 one current exception is the common compiler barrier idiom
191 ``asm("":::"memory")``, which gets transformed to a sequentially
192 consistent memory barrier (equivalent to ``__sync_synchronize()``). In
193 PNaCl this barrier is only guaranteed to order ``volatile`` and atomic
194 memory accesses, though in practice the implementation attempts to also
195 prevent reordering of memory accesses to objects which may escape.
197 NaCl supports a fairly wide subset of inline assembly through GCC's
198 inline assembly syntax, with the restriction that the sandboxing model
199 for the target architecture has to be respected.
201 .. _portable_simd_vectors:
203 Portable SIMD Vectors
204 =====================
206 SIMD vectors aren't part of the C/C++ standards and are traditionally
207 very hardware-specific. Portable Native Client offers a portable version
208 of SIMD vector datatypes and operations which map well to modern
209 architectures and offer performance which matches or approaches
210 hardware-specific uses.
212 SIMD vector support was added to Portable Native Client for version 36
213 of Chrome, and more features are expected to be added in subsequent
216 Hand-Coding Vector Extensions
217 -----------------------------
219 The initial vector support in Portable Native Client adds `LLVM vectors
220 <http://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors>`_
222 <http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html>`_ since these
223 are well supported by different hardware platforms and don't require any
224 new compiler intrinsics.
226 Vector types can be used through the ``vector_size`` attribute:
230 typedef int v4s __attribute__((vector_size(16)));
234 c = b + 1; /* c = b + {1,1,1,1}; */
235 d = 2 * b; /* d = {2,2,2,2} * b; */
238 Vector comparisons are represented as a bitmask as wide as the compared
239 elements of all ``0`` or all ``1``:
243 typedef int v4s __attribute__((vector_size(16)));
245 v4s limit = {32,64,128,256};
246 v4s mask = in > limit;
251 Vector datatypes are currently expected to be 128-bit wide with one of
252 the following element types:
254 ============ ============ ================
255 Type Num Elements Vector Bit Width
256 ============ ============ ================
264 ============ ============ ================
266 64-bit integers and double-precision floating point will be supported in
267 a future release, as will 256-bit and 512-bit vectors.
269 The following operators are supported on vectors:
271 +----------------------------------------------+
272 | unary ``+``, ``-`` |
273 +----------------------------------------------+
275 +----------------------------------------------+
276 | ``+``, ``-``, ``*``, ``/``, ``%`` |
277 +----------------------------------------------+
278 | ``&``, ``|``, ``^``, ``~`` |
279 +----------------------------------------------+
281 +----------------------------------------------+
282 | ``!``, ``&&``, ``||`` |
283 +----------------------------------------------+
284 | ``==``, ``!=``, ``>``, ``<``, ``>=``, ``<=`` |
285 +----------------------------------------------+
287 +----------------------------------------------+
289 C-style casts can be used to convert one vector type to another without
290 modifying the underlying bits. ``__builtin_convertvector`` can be used
291 to convert from one type to another provided both types have the same
292 number of elements, truncating when converting from floating-point to
297 typedef unsigned v4u __attribute__((vector_size(16)));
298 typedef float v4f __attribute__((vector_size(16)));
299 v4u a = {0x3f19999a,0x40000000,0x40490fdb,0x66ff0c30};
300 v4f b = (v4f) a; /* b = {0.6,2,3.14159,6.02214e+23} */
301 v4u c = __builtin_convertvector(b, v4u); /* c = {0,2,3,0} */
303 It is also possible to use array-style indexing into vectors to extract
304 individual elements using ``[]``.
308 typedef unsigned v4u __attribute__((vector_size(16)));
310 void print(const T v) {
311 for (size_t i = 0; i != sizeof(v) / sizeof(v[0]); ++i)
312 std::cout << v[i] << ' ';
313 std::cout << std::endl;
316 Vector shuffles are currently unsupported but will be added soon.
321 Auto-vectorization is currently not enabled for Portable Native Client,
322 but will be in a future release.
327 The C and C++ languages expose some undefined behavior which is
328 discussed in :ref:`PNaCl Undefined Behavior <undefined_behavior>`.
333 PNaCl exposes 32-bit and 64-bit floating point operations which are
334 mostly IEEE-754 compliant. There are a few caveats:
336 * Some :ref:`floating-point behavior is currently left as undefined
337 <undefined_behavior_fp>`.
338 * The default rounding mode is round-to-nearest and other rounding modes
339 are currently not usable, which isn't IEEE-754 compliant. PNaCl could
340 support switching modes (the 4 modes exposed by C99 ``FLT_ROUNDS``
342 * Signaling ``NaN`` never fault.
343 * Fast-math optimizations are currently supported before *pexe* creation
344 time. A *pexe* loses all fast-math information when it is
345 created. Fast-math translation could be enabled at a later date,
346 potentially at a perf-function granularity. This wouldn't affect
347 already-existing *pexe*; it would be an opt-in feature.
349 * Fused-multiply-add have higher precision and often execute faster;
350 PNaCl currently disallows them in the *pexe* because they aren't
351 supported on all platforms and can't realistically be
352 emulated. PNaCl could (but currently doesn't) only generate them in
353 the backend if fast-math were specified and the hardware supports
355 * Transcendentals aren't exposed by PNaCl's ABI; they are part of the
356 math library that is included in the *pexe*. PNaCl could, but
357 currently doesn't, use hardware support if fast-math were provided
363 PNaCl supports computed ``goto``, a non-standard GCC extension to C used
364 by some interpreters, by lowering them to ``switch`` statements. The
365 resulting use of ``switch`` might not be as fast as the original
366 indirect branches. If you are compiling a program that has a
367 compile-time option for using computed ``goto``, it's possible that the
368 program will run faster with the option turned off (e.g., if the program
369 does extra work to take advantage of computed ``goto``).
371 NaCl supports computed ``goto`` without any transformation.
376 Inter-Process Communication
377 ---------------------------
379 Inter-process communication through shared memory is currently not
380 supported by PNaCl/NaCl. When implemented, it may be limited to
381 operations which are lock-free on the current platform (``is_lock_free``
382 methods). It will rely on the address-free properly discussed in `Memory
383 Model for Concurrent Operations`_.
385 POSIX-style Signal Handling
386 ---------------------------
388 POSIX-style signal handling really consists of two different features:
390 * **Hardware exception handling** (synchronous signals): The ability
391 to catch hardware exceptions (such as memory access faults and
392 division by zero) using a signal handler.
394 PNaCl currently doesn't support hardware exception handling.
396 NaCl supports hardware exception handling via the
397 ``<nacl/nacl_exception.h>`` interface.
399 * **Asynchronous interruption of threads** (asynchronous signals): The
400 ability to asynchronously interrupt the execution of a thread,
401 forcing the thread to run a signal handler.
403 A similar feature is **thread suspension**: The ability to
404 asynchronously suspend and resume a thread and inspect or modify its
405 execution state (such as register state).
407 Neither PNaCl nor NaCl currently support asynchronous interruption
408 or suspension of threads.
410 If PNaCl were to support either of these, the interaction of
411 ``volatile`` and atomics with same-thread signal handling would need
412 to be carefully detailed.