native_client_sdk/src/doc/reference/pnacl-c-cpp-language-support.rst

   1 ============================
   2 PNaCl C/C++ Language Support
   3 ============================
   4
   5 .. contents::
   6    :local:
   7    :backlinks: none
   8    :depth: 3
   9
  10 Source language support
  11 =======================
  12
  13 The currently supported languages are C and C++. The PNaCl toolchain is
  14 based on recent Clang, which fully supports C++11 and most of C11. A
  15 detailed status of the language support is available `here
  16 <http://clang.llvm.org/cxx_status.html>`_.
  17
  18 For information on using languages other than C/C++, see the :ref:`FAQ
  19 section on other languages <other_languages>`.
  20
  21 As for the standard libraries, the PNaCl toolchain is currently based on
  22 ``libc++``, and the ``newlib`` standard C library. ``libstdc++`` is also
  23 supported but its use is discouraged; see :ref:`building_cpp_libraries`
  24 for more details.
  25
  26 Versions
  27 --------
  28
  29 Version information can be obtained:
  30
  31 * Clang/LLVM: run ``pnacl-clang -v``.
  32 * ``newlib``: use the ``_NEWLIB_VERSION`` macro.
  33 * ``libc++``: use the ``_LIBCPP_VERSION`` macro.
  34 * ``libstdc++``: use the ``_GLIBCXX_VERSION`` macro.
  35
  36 Preprocessor definitions
  37 ------------------------
  38
  39 When compiling C/C++ code, the PNaCl toolchain defines the ``__pnacl__``
  40 macro. In addition, ``__native_client__`` is defined for compatibility
  41 with other NaCl toolchains.
  42
  43 .. _memory_model_and_atomics:
  44
  45 Memory Model and Atomics
  46 ========================
  47
  48 Memory Model for Concurrent Operations
  49 --------------------------------------
  50
  51 The memory model offered by PNaCl relies on the same coding guidelines
  52 as the C11/C++11 one: concurrent accesses must always occur through
  53 atomic primitives (offered by :ref:`atomic intrinsics
  54 <bitcode_atomicintrinsics>`), and these accesses must always
  55 occur with the same size for the same memory location. Visibility of
  56 stores is provided on a happens-before basis that relates memory
  57 locations to each other as the C11/C++11 standards do.
  58
  59 Non-atomic memory accesses may be reordered, separated, elided or fused
  60 according to C and C++'s memory model before the pexe is created as well
  61 as after its creation. Accessing atomic memory location through
  62 non-atomic primitives is :ref:`Undefined Behavior <undefined_behavior>`.
  63
  64 As in C11/C++11 some atomic accesses may be implemented with locks on
  65 certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be
  66 ``1``, signifying that all types are sometimes lock-free. The
  67 ``is_lock_free`` methods and ``atomic_is_lock_free`` will return the
  68 current platform's implementation at translation time. These macros,
  69 methods and functions are in the C11 header ``<stdatomic.h>`` and the
  70 C++11 header ``<atomic>``.
  71
  72 The PNaCl toolchain supports concurrent memory accesses through legacy
  73 GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic
  74 primitives and the underlying `GCCMM
  75 <http://gcc.gnu.org/wiki/Atomic/GCCMM>`_ ``__atomic_*``
  76 primitives. ``volatile`` memory accesses can also be used, though these
  77 are discouraged. See `Volatile Memory Accesses`_.
  78
  79 PNaCl supports concurrency and parallelism with some restrictions:
  80
  81 * Threading is explicitly supported and has no restrictions over what
  82   prevalent implementations offer. See `Threading`_.
  83
  84 * ``volatile`` and atomic operations are address-free (operations on the
  85   same memory location via two different addresses work atomically), as
  86   intended by the C11/C++11 standards. This is critical in supporting
  87   synchronous "external modifications" such as mapping underlying memory
  88   at multiple locations.
  89
  90 * Inter-process communication through shared memory is currently not
  91   supported. See `Future Directions`_.
  92
  93 * Signal handling isn't supported, PNaCl therefore promotes all
  94   primitives to cross-thread (instead of single-thread). This may change
  95   at a later date. Note that using atomic operations which aren't
  96   lock-free may lead to deadlocks when handling asynchronous
  97   signals. See `Future Directions`_.
  98
  99 * Direct interaction with device memory isn't supported, and there is no
 100   intent to support it. The embedding sandbox's runtime can offer APIs
 101   to indirectly access devices.
 102
 103 Setting up the above mechanisms requires assistance from the embedding
 104 sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup
 105 can be done through regular C/C++ code.
 106
 107 Atomic Memory Ordering Constraints
 108 ----------------------------------
 109
 110 Atomics follow the same ordering constraints as in regular C11/C++11,
 111 but all accesses are promoted to sequential consistency (the strongest
 112 memory ordering) at pexe creation time. We plan to support more of the
 113 C11/C++11 memory orderings in the future.
 114
 115 Some additional restrictions, following the C11/C++11 standards:
 116
 117 - Atomic accesses must at least be naturally aligned.
 118 - Some accesses may not actually be atomic on certain platforms,
 119   requiring an implementation that uses global locks.
 120 - An atomic memory location must always be accessed with atomic
 121   primitives, and these primitives must always be of the same bit size
 122   for that location.
 123 - Not all memory orderings are valid for all atomic operations.
 124
 125 Volatile Memory Accesses
 126 ------------------------
 127
 128 The C11/C++11 standards mandate that ``volatile`` accesses execute in
 129 program order (but are not fences, so other memory operations can
 130 reorder around them), are not necessarily atomic, and can’t be
 131 elided. They can be separated into smaller width accesses.
 132
 133 Before any optimizations occur, the PNaCl toolchain transforms
 134 ``volatile`` loads and stores into sequentially consistent ``volatile``
 135 atomic loads and stores, and applies regular compiler optimizations
 136 along the above guidelines. This orders ``volatiles`` according to the
 137 atomic rules, and means that fences (including ``__sync_synchronize``)
 138 act in a better-defined manner. Regular memory accesses still do not
 139 have ordering guarantees with ``volatile`` and atomic accesses, though
 140 the internal representation of ``__sync_synchronize`` attempts to
 141 prevent reordering of memory accesses to objects which may escape.
 142
 143 Relaxed ordering could be used instead, but for the first release it is
 144 more conservative to apply sequential consistency. Future releases may
 145 change what happens at compile-time, but already-released pexes will
 146 continue using sequential consistency.
 147
 148 The PNaCl toolchain also requires that ``volatile`` accesses be at least
 149 naturally aligned, and tries to guarantee this alignment.
 150
 151 The above guarantees ease the support of legacy (i.e. non-C11/C++11)
 152 code, and combined with builtin fences these programs can do meaningful
 153 cross-thread communication without changing code. They also better
 154 reflect the original code's intent and guarantee better portability.
 155
 156 .. _language_support_threading:
 157
 158 Threading
 159 =========
 160
 161 Threading is explicitly supported through C11/C++11's threading
 162 libraries as well as POSIX threads.
 163
 164 Communication between threads should use atomic primitives as described
 165 in `Memory Model and Atomics`_.
 166
 167 ``setjmp`` and ``longjmp``
 168 ==========================
 169
 170 PNaCl and NaCl support ``setjmp`` and ``longjmp`` without any
 171 restrictions beyond C's.
 172
 173 .. _exception_handling:
 174
 175 C++ Exception Handling
 176 ======================
 177
 178 PNaCl currently supports C++ exception handling through ``setjmp()`` and
 179 ``longjmp()``, which can be enabled with the ``--pnacl-exceptions=sjlj`` linker
 180 flag (set with ``LDFLAGS`` when using Make). Exceptions are disabled by default
 181 so that faster and smaller code is generated, and ``throw`` statements are
 182 replaced with calls to ``abort()``. The usual ``-fno-exceptions`` flag is also
 183 supported, though the default is ``-fexceptions``. PNaCl will support full
 184 zero-cost exception handling in the future.
 185
 186 .. note:: When using naclports_ or other prebuilt static libraries, you don't
 187           need to recompile because the exception handling support is
 188           implemented at link time (when all the static libraries are put
 189           together with your application).
 190
 191 .. _naclports: https://code.google.com/p/naclports
 192
 193 NaCl supports full zero-cost C++ exception handling.
 194
 195 Inline Assembly
 196 ===============
 197
 198 Inline assembly isn't supported by PNaCl because it isn't portable. The
 199 one current exception is the common compiler barrier idiom
 200 ``asm("":::"memory")``, which gets transformed to a sequentially
 201 consistent memory barrier (equivalent to ``__sync_synchronize()``). In
 202 PNaCl this barrier is only guaranteed to order ``volatile`` and atomic
 203 memory accesses, though in practice the implementation attempts to also
 204 prevent reordering of memory accesses to objects which may escape.
 205
 206 PNaCl supports :ref:`Portable SIMD Vectors <portable_simd_vectors>`,
 207 which are traditionally expressed through target-specific intrinsics or
 208 inline assembly.
 209
 210 NaCl supports a fairly wide subset of inline assembly through GCC's
 211 inline assembly syntax, with the restriction that the sandboxing model
 212 for the target architecture has to be respected.
 213
 214 .. _portable_simd_vectors:
 215
 216 Portable SIMD Vectors
 217 =====================
 218
 219 SIMD vectors aren't part of the C/C++ standards and are traditionally
 220 very hardware-specific. Portable Native Client offers a portable version
 221 of SIMD vector datatypes and operations which map well to modern
 222 architectures and offer performance which matches or approaches
 223 hardware-specific uses.
 224
 225 SIMD vector support was added to Portable Native Client for version 37 of Chrome
 226 and more features, including performance enhancements, have been added in
 227 subsequent releases, see the :ref:`Release Notes <sdk-release-notes>` for more
 228 details.
 229
 230 Hand-Coding Vector Extensions
 231 -----------------------------
 232
 233 The initial vector support in Portable Native Client adds `LLVM vectors
 234 <http://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors>`_
 235 and `GCC vectors
 236 <http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html>`_ since these
 237 are well supported by different hardware platforms and don't require any
 238 new compiler intrinsics.
 239
 240 Vector types can be used through the ``vector_size`` attribute:
 241
 242 .. naclcode::
 243
 244   #define VECTOR_BYTES 16
 245   typedef int v4s __attribute__((vector_size(VECTOR_BYTES)));
 246   v4s a = {1,2,3,4};
 247   v4s b = {5,6,7,8};
 248   v4s c, d, e;
 249   c = a + b;  /* c = {6,8,10,12} */
 250   d = b >> a; /* d = {2,1,0,0} */
 251
 252 Vector comparisons are represented as a bitmask as wide as the compared
 253 elements of all ``0`` or all ``1``:
 254
 255 .. naclcode::
 256
 257   typedef int v4s __attribute__((vector_size(16)));
 258   v4s snip(v4s in) {
 259     v4s limit = {32,64,128,256};
 260     v4s mask = in > limit;
 261     v4s ret = in & mask;
 262     return ret;
 263   }
 264
 265 Vector datatypes are currently expected to be 128-bit wide with one of the
 266 following element types, and they're expected to be aligned to the underlying
 267 element's bit width (loads and store will otherwise be broken up into scalar
 268 accesses to prevent faults):
 269
 270 ============  ============  ================ ======================
 271 Type          Num Elements  Vector Bit Width Expected Bit Alignment
 272 ============  ============  ================ ======================
 273 ``uint8_t``   16            128              8
 274 ``int8_t``    16            128              8
 275 ``uint16_t``  8             128              16
 276 ``int16_t``   8             128              16
 277 ``uint32_t``  4             128              32
 278 ``int32_t``   4             128              32
 279 ``float``     4             128              32
 280 ============  ============  ================ ======================
 281
 282 64-bit integers and double-precision floating point will be supported in
 283 a future release, as will 256-bit and 512-bit vectors.
 284
 285 Vector element bit width alignment can be stated explicitly (this is assumed by
 286 PNaCl, but not necessarily by other compilers), and smaller alignments can also
 287 be specified:
 288
 289 .. naclcode::
 290
 291   typedef int v4s_element   __attribute__((vector_size(16), aligned(4)));
 292   typedef int v4s_unaligned __attribute__((vector_size(16), aligned(1)));
 293
 294
 295 The following operators are supported on vectors:
 296
 297 +----------------------------------------------+
 298 | unary ``+``, ``-``                           |
 299 +----------------------------------------------+
 300 | ``++``, ``--``                               |
 301 +----------------------------------------------+
 302 | ``+``, ``-``, ``*``, ``/``, ``%``            |
 303 +----------------------------------------------+
 304 | ``&``, ``|``, ``^``, ``~``                   |
 305 +----------------------------------------------+
 306 | ``>>``, ``<<``                               |
 307 +----------------------------------------------+
 308 | ``!``, ``&&``, ``||``                        |
 309 +----------------------------------------------+
 310 | ``==``, ``!=``, ``>``, ``<``, ``>=``, ``<=`` |
 311 +----------------------------------------------+
 312 | ``=``                                        |
 313 +----------------------------------------------+
 314
 315 C-style casts can be used to convert one vector type to another without
 316 modifying the underlying bits. ``__builtin_convertvector`` can be used
 317 to convert from one type to another provided both types have the same
 318 number of elements, truncating when converting from floating-point to
 319 integer.
 320
 321 .. naclcode::
 322
 323   typedef unsigned v4u __attribute__((vector_size(16)));
 324   typedef float v4f __attribute__((vector_size(16)));
 325   v4u a = {0x3f19999a,0x40000000,0x40490fdb,0x66ff0c30};
 326   v4f b = (v4f) a; /* b = {0.6,2,3.14159,6.02214e+23}  */
 327   v4u c = __builtin_convertvector(b, v4u); /* c = {0,2,3,0} */
 328
 329 It is also possible to use array-style indexing into vectors to extract
 330 individual elements using ``[]``.
 331
 332 .. naclcode::
 333
 334   typedef unsigned v4u __attribute__((vector_size(16)));
 335   template<typename T>
 336   void print(const T v) {
 337     for (size_t i = 0; i != sizeof(v) / sizeof(v[0]); ++i)
 338       std::cout << v[i] << ' ';
 339     std::cout << std::endl;
 340   }
 341
 342 Vector shuffles (often called permutation or swizzle) operations are
 343 supported through ``__builtin_shufflevector``. The builtin has two
 344 vector arguments of the same element type, followed by a list of
 345 constant integers that specify the element indices of the first two
 346 vectors that should be extracted and returned in a new vector. These
 347 element indices are numbered sequentially starting with the first
 348 vector, continuing into the second vector. Thus, if ``vec1`` is a
 349 4-element vector, index ``5`` would refer to the second element of
 350 ``vec2``. An index of ``-1`` can be used to indicate that the
 351 corresponding element in the returned vector is a don’t care and can be
 352 optimized by the backend.
 353
 354 The result of ``__builtin_shufflevector`` is a vector with the same
 355 element type as ``vec1`` / ``vec2`` but that has an element count equal
 356 to the number of indices specified.
 357
 358 .. naclcode::
 359
 360   // identity operation - return 4-element vector v1.
 361   __builtin_shufflevector(v1, v1, 0, 1, 2, 3)
 362
 363   // "Splat" element 0 of v1 into a 4-element result.
 364   __builtin_shufflevector(v1, v1, 0, 0, 0, 0)
 365
 366   // Reverse 4-element vector v1.
 367   __builtin_shufflevector(v1, v1, 3, 2, 1, 0)
 368
 369   // Concatenate every other element of 4-element vectors v1 and v2.
 370   __builtin_shufflevector(v1, v2, 0, 2, 4, 6)
 371
 372   // Concatenate every other element of 8-element vectors v1 and v2.
 373   __builtin_shufflevector(v1, v2, 0, 2, 4, 6, 8, 10, 12, 14)
 374
 375   // Shuffle v1 with some elements being undefined
 376   __builtin_shufflevector(v1, v1, 3, -1, 1, -1)
 377
 378 One common use of ``__builtin_shufflevector`` is to perform
 379 vector-scalar operations:
 380
 381 .. naclcode::
 382
 383   typedef int v4s __attribute__((vector_size(16)));
 384   v4s shift_right_by(v4s shift_me, int shift_amount) {
 385     v4s tmp = {shift_amount};
 386     return shift_me >> __builtin_shuffle_vector(tmp, tmp, 0, 0, 0, 0);
 387   }
 388
 389 Auto-Vectorization
 390 ------------------
 391
 392 Auto-vectorization is currently not enabled for Portable Native Client,
 393 but will be in a future release.
 394
 395 Undefined Behavior
 396 ==================
 397
 398 The C and C++ languages expose some undefined behavior which is
 399 discussed in :ref:`PNaCl Undefined Behavior <undefined_behavior>`.
 400
 401 .. _c_cpp_floating_point:
 402
 403 Floating-Point
 404 ==============
 405
 406 PNaCl exposes 32-bit and 64-bit floating point operations which are
 407 mostly IEEE-754 compliant. There are a few caveats:
 408
 409 * Some :ref:`floating-point behavior is currently left as undefined
 410   <undefined_behavior_fp>`.
 411 * The default rounding mode is round-to-nearest and other rounding modes
 412   are currently not usable, which isn't IEEE-754 compliant. PNaCl could
 413   support switching modes (the 4 modes exposed by C99 ``FLT_ROUNDS``
 414   macros).
 415 * Signaling ``NaN`` never fault.
 416 * Fast-math optimizations are currently supported before *pexe* creation
 417   time. A *pexe* loses all fast-math information when it is
 418   created. Fast-math translation could be enabled at a later date,
 419   potentially at a perf-function granularity. This wouldn't affect
 420   already-existing *pexe*; it would be an opt-in feature.
 421
 422   * Fused-multiply-add have higher precision and often execute faster;
 423     PNaCl currently disallows them in the *pexe* because they aren't
 424     supported on all platforms and can't realistically be
 425     emulated. PNaCl could (but currently doesn't) only generate them in
 426     the backend if fast-math were specified and the hardware supports
 427     the operation.
 428   * Transcendentals aren't exposed by PNaCl's ABI; they are part of the
 429     math library that is included in the *pexe*. PNaCl could, but
 430     currently doesn't, use hardware support if fast-math were provided
 431     in the *pexe*.
 432
 433 Computed ``goto``
 434 =================
 435
 436 PNaCl supports computed ``goto``, a non-standard GCC extension to C used
 437 by some interpreters, by lowering them to ``switch`` statements. The
 438 resulting use of ``switch`` might not be as fast as the original
 439 indirect branches. If you are compiling a program that has a
 440 compile-time option for using computed ``goto``, it's possible that the
 441 program will run faster with the option turned off (e.g., if the program
 442 does extra work to take advantage of computed ``goto``).
 443
 444 NaCl supports computed ``goto`` without any transformation.
 445
 446 Future Directions
 447 =================
 448
 449 Inter-Process Communication
 450 ---------------------------
 451
 452 Inter-process communication through shared memory is currently not
 453 supported by PNaCl/NaCl. When implemented, it may be limited to
 454 operations which are lock-free on the current platform (``is_lock_free``
 455 methods). It will rely on the address-free properly discussed in `Memory
 456 Model for Concurrent Operations`_.
 457
 458 POSIX-style Signal Handling
 459 ---------------------------
 460
 461 POSIX-style signal handling really consists of two different features:
 462
 463 * **Hardware exception handling** (synchronous signals): The ability
 464   to catch hardware exceptions (such as memory access faults and
 465   division by zero) using a signal handler.
 466
 467   PNaCl currently doesn't support hardware exception handling.
 468
 469   NaCl supports hardware exception handling via the
 470   ``<nacl/nacl_exception.h>`` interface.
 471
 472 * **Asynchronous interruption of threads** (asynchronous signals): The
 473   ability to asynchronously interrupt the execution of a thread,
 474   forcing the thread to run a signal handler.
 475
 476   A similar feature is **thread suspension**: The ability to
 477   asynchronously suspend and resume a thread and inspect or modify its
 478   execution state (such as register state).
 479
 480   Neither PNaCl nor NaCl currently support asynchronous interruption
 481   or suspension of threads.
 482
 483 If PNaCl were to support either of these, the interaction of
 484 ``volatile`` and atomics with same-thread signal handling would need
 485 to be carefully detailed.