libcxx/docs/DesignDocs/AtomicDesign.rst

   1
   2 ====================
   3 ``<atomic>`` Design
   4 ====================
   5
   6 There were originally 3 designs under consideration. They differ in where most
   7 of the implementation work is done. The functionality exposed to the customer
   8 should be identical (and conforming) for all three designs.
   9
  10
  11 Design A: Minimal work for the library
  12 ======================================
  13 The compiler supplies all of the intrinsics as described below. This list of
  14 intrinsics roughly parallels the requirements of the C and C++ atomics proposals.
  15 The C and C++ library implementations simply drop through to these intrinsics.
  16 Anything the platform does not support in hardware, the compiler
  17 arranges for a (compiler-rt) library call to be made which will do the job with
  18 a mutex, and in this case ignoring the memory ordering parameter (effectively
  19 implementing ``memory_order_seq_cst``).
  20
  21 Ultimate efficiency is preferred over run time error checking. Undefined
  22 behavior is acceptable when the inputs do not conform as defined below.
  23
  24 .. code-block:: cpp
  25
  26     // In every intrinsic signature below, type* atomic_obj may be a pointer to a
  27     // volatile-qualified type. Memory ordering values map to the following meanings:
  28     //  memory_order_relaxed == 0
  29     //  memory_order_consume == 1
  30     //  memory_order_acquire == 2
  31     //  memory_order_release == 3
  32     //  memory_order_acq_rel == 4
  33     //  memory_order_seq_cst == 5
  34
  35     // type must be trivially copyable
  36     // type represents a "type argument"
  37     bool __atomic_is_lock_free(type);
  38
  39     // type must be trivially copyable
  40     // Behavior is defined for mem_ord = 0, 1, 2, 5
  41     type __atomic_load(const type* atomic_obj, int mem_ord);
  42
  43     // type must be trivially copyable
  44     // Behavior is defined for mem_ord = 0, 3, 5
  45     void __atomic_store(type* atomic_obj, type desired, int mem_ord);
  46
  47     // type must be trivially copyable
  48     // Behavior is defined for mem_ord = [0 ... 5]
  49     type __atomic_exchange(type* atomic_obj, type desired, int mem_ord);
  50
  51     // type must be trivially copyable
  52     // Behavior is defined for mem_success = [0 ... 5],
  53     //   mem_failure <= mem_success
  54     //   mem_failure != 3
  55     //   mem_failure != 4
  56     bool __atomic_compare_exchange_strong(type* atomic_obj,
  57                                         type* expected, type desired,
  58                                         int mem_success, int mem_failure);
  59
  60     // type must be trivially copyable
  61     // Behavior is defined for mem_success = [0 ... 5],
  62     //   mem_failure <= mem_success
  63     //   mem_failure != 3
  64     //   mem_failure != 4
  65     bool __atomic_compare_exchange_weak(type* atomic_obj,
  66                                         type* expected, type desired,
  67                                         int mem_success, int mem_failure);
  68
  69     // type is one of: char, signed char, unsigned char, short, unsigned short, int,
  70     //      unsigned int, long, unsigned long, long long, unsigned long long,
  71     //      char16_t, char32_t, wchar_t
  72     // Behavior is defined for mem_ord = [0 ... 5]
  73     type __atomic_fetch_add(type* atomic_obj, type operand, int mem_ord);
  74
  75     // type is one of: char, signed char, unsigned char, short, unsigned short, int,
  76     //      unsigned int, long, unsigned long, long long, unsigned long long,
  77     //      char16_t, char32_t, wchar_t
  78     // Behavior is defined for mem_ord = [0 ... 5]
  79     type __atomic_fetch_sub(type* atomic_obj, type operand, int mem_ord);
  80
  81     // type is one of: char, signed char, unsigned char, short, unsigned short, int,
  82     //      unsigned int, long, unsigned long, long long, unsigned long long,
  83     //      char16_t, char32_t, wchar_t
  84     // Behavior is defined for mem_ord = [0 ... 5]
  85     type __atomic_fetch_and(type* atomic_obj, type operand, int mem_ord);
  86
  87     // type is one of: char, signed char, unsigned char, short, unsigned short, int,
  88     //      unsigned int, long, unsigned long, long long, unsigned long long,
  89     //      char16_t, char32_t, wchar_t
  90     // Behavior is defined for mem_ord = [0 ... 5]
  91     type __atomic_fetch_or(type* atomic_obj, type operand, int mem_ord);
  92
  93     // type is one of: char, signed char, unsigned char, short, unsigned short, int,
  94     //      unsigned int, long, unsigned long, long long, unsigned long long,
  95     //      char16_t, char32_t, wchar_t
  96     // Behavior is defined for mem_ord = [0 ... 5]
  97     type __atomic_fetch_xor(type* atomic_obj, type operand, int mem_ord);
  98
  99     // Behavior is defined for mem_ord = [0 ... 5]
 100     void* __atomic_fetch_add(void** atomic_obj, ptrdiff_t operand, int mem_ord);
 101     void* __atomic_fetch_sub(void** atomic_obj, ptrdiff_t operand, int mem_ord);
 102
 103     // Behavior is defined for mem_ord = [0 ... 5]
 104     void __atomic_thread_fence(int mem_ord);
 105     void __atomic_signal_fence(int mem_ord);
 106
 107 If desired the intrinsics taking a single ``mem_ord`` parameter can default
 108 this argument to 5.
 109
 110 If desired the intrinsics taking two ordering parameters can default ``mem_success``
 111 to 5, and ``mem_failure`` to ``translate_memory_order(mem_success)`` where
 112 ``translate_memory_order(mem_success)`` is defined as:
 113
 114 .. code-block:: cpp
 115
 116     int translate_memory_order(int o) {
 117         switch (o) {
 118         case 4:
 119             return 2;
 120         case 3:
 121             return 0;
 122         }
 123         return o;
 124     }
 125
 126 Below are representative C++ implementations of all of the operations. Their
 127 purpose is to document the desired semantics of each operation, assuming
 128 ``memory_order_seq_cst``. This is essentially the code that will be called
 129 if the front end calls out to compiler-rt.
 130
 131 .. code-block:: cpp
 132
 133     template <class T>
 134     T __atomic_load(T const volatile* obj) {
 135         unique_lock<mutex> _(some_mutex);
 136         return *obj;
 137     }
 138
 139     template <class T>
 140     void __atomic_store(T volatile* obj, T desr) {
 141         unique_lock<mutex> _(some_mutex);
 142         *obj = desr;
 143     }
 144
 145     template <class T>
 146     T __atomic_exchange(T volatile* obj, T desr) {
 147         unique_lock<mutex> _(some_mutex);
 148         T r = *obj;
 149         *obj = desr;
 150         return r;
 151     }
 152
 153     template <class T>
 154     bool __atomic_compare_exchange_strong(T volatile* obj, T* exp, T desr) {
 155         unique_lock<mutex> _(some_mutex);
 156         if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) // if (*obj == *exp)
 157         {
 158             std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); // *obj = desr;
 159             return true;
 160         }
 161         std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); // *exp = *obj;
 162         return false;
 163     }
 164
 165     // May spuriously return false (even if *obj == *exp)
 166     template <class T>
 167     bool __atomic_compare_exchange_weak(T volatile* obj, T* exp, T desr) {
 168         unique_lock<mutex> _(some_mutex);
 169         if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) // if (*obj == *exp)
 170         {
 171             std::memcpy(const_cast<T*>(obj), &desr, sizeof(T)); // *obj = desr;
 172             return true;
 173         }
 174         std::memcpy(exp, const_cast<T*>(obj), sizeof(T)); // *exp = *obj;
 175         return false;
 176     }
 177
 178     template <class T>
 179     T __atomic_fetch_add(T volatile* obj, T operand) {
 180         unique_lock<mutex> _(some_mutex);
 181         T r = *obj;
 182         *obj += operand;
 183         return r;
 184     }
 185
 186     template <class T>
 187     T __atomic_fetch_sub(T volatile* obj, T operand) {
 188         unique_lock<mutex> _(some_mutex);
 189         T r = *obj;
 190         *obj -= operand;
 191         return r;
 192     }
 193
 194     template <class T>
 195     T __atomic_fetch_and(T volatile* obj, T operand) {
 196         unique_lock<mutex> _(some_mutex);
 197         T r = *obj;
 198         *obj &= operand;
 199         return r;
 200     }
 201
 202     template <class T>
 203     T __atomic_fetch_or(T volatile* obj, T operand) {
 204         unique_lock<mutex> _(some_mutex);
 205         T r = *obj;
 206         *obj |= operand;
 207         return r;
 208     }
 209
 210     template <class T>
 211     T __atomic_fetch_xor(T volatile* obj, T operand) {
 212         unique_lock<mutex> _(some_mutex);
 213         T r = *obj;
 214         *obj ^= operand;
 215         return r;
 216     }
 217
 218     void* __atomic_fetch_add(void* volatile* obj, ptrdiff_t operand) {
 219         unique_lock<mutex> _(some_mutex);
 220         void* r = *obj;
 221         (char*&)(*obj) += operand;
 222         return r;
 223     }
 224
 225     void* __atomic_fetch_sub(void* volatile* obj, ptrdiff_t operand) {
 226         unique_lock<mutex> _(some_mutex);
 227         void* r = *obj;
 228         (char*&)(*obj) -= operand;
 229         return r;
 230     }
 231
 232     void __atomic_thread_fence() {
 233         unique_lock<mutex> _(some_mutex);
 234     }
 235
 236     void __atomic_signal_fence() {
 237         unique_lock<mutex> _(some_mutex);
 238     }
 239
 240
 241 Design B: Something in between
 242 ==============================
 243 This is a variation of design A which puts the burden on the library to arrange
 244 for the correct manipulation of the run time memory ordering arguments, and only
 245 calls the compiler for well-defined memory orderings. I think of this design as
 246 the worst of A and C, instead of the best of A and C. But I offer it as an
 247 option in the spirit of completeness.
 248
 249 .. code-block:: cpp
 250
 251     // type must be trivially copyable
 252     bool __atomic_is_lock_free(const type* atomic_obj);
 253
 254     // type must be trivially copyable
 255     type __atomic_load_relaxed(const volatile type* atomic_obj);
 256     type __atomic_load_consume(const volatile type* atomic_obj);
 257     type __atomic_load_acquire(const volatile type* atomic_obj);
 258     type __atomic_load_seq_cst(const volatile type* atomic_obj);
 259
 260     // type must be trivially copyable
 261     type __atomic_store_relaxed(volatile type* atomic_obj, type desired);
 262     type __atomic_store_release(volatile type* atomic_obj, type desired);
 263     type __atomic_store_seq_cst(volatile type* atomic_obj, type desired);
 264
 265     // type must be trivially copyable
 266     type __atomic_exchange_relaxed(volatile type* atomic_obj, type desired);
 267     type __atomic_exchange_consume(volatile type* atomic_obj, type desired);
 268     type __atomic_exchange_acquire(volatile type* atomic_obj, type desired);
 269     type __atomic_exchange_release(volatile type* atomic_obj, type desired);
 270     type __atomic_exchange_acq_rel(volatile type* atomic_obj, type desired);
 271     type __atomic_exchange_seq_cst(volatile type* atomic_obj, type desired);
 272
 273     // type must be trivially copyable
 274     bool __atomic_compare_exchange_strong_relaxed_relaxed(volatile type* atomic_obj,
 275                                                         type* expected,
 276                                                         type desired);
 277     bool __atomic_compare_exchange_strong_consume_relaxed(volatile type* atomic_obj,
 278                                                         type* expected,
 279                                                         type desired);
 280     bool __atomic_compare_exchange_strong_consume_consume(volatile type* atomic_obj,
 281                                                         type* expected,
 282                                                         type desired);
 283     bool __atomic_compare_exchange_strong_acquire_relaxed(volatile type* atomic_obj,
 284                                                         type* expected,
 285                                                         type desired);
 286     bool __atomic_compare_exchange_strong_acquire_consume(volatile type* atomic_obj,
 287                                                         type* expected,
 288                                                         type desired);
 289     bool __atomic_compare_exchange_strong_acquire_acquire(volatile type* atomic_obj,
 290                                                         type* expected,
 291                                                         type desired);
 292     bool __atomic_compare_exchange_strong_release_relaxed(volatile type* atomic_obj,
 293                                                         type* expected,
 294                                                         type desired);
 295     bool __atomic_compare_exchange_strong_release_consume(volatile type* atomic_obj,
 296                                                         type* expected,
 297                                                         type desired);
 298     bool __atomic_compare_exchange_strong_release_acquire(volatile type* atomic_obj,
 299                                                         type* expected,
 300                                                         type desired);
 301     bool __atomic_compare_exchange_strong_acq_rel_relaxed(volatile type* atomic_obj,
 302                                                         type* expected,
 303                                                         type desired);
 304     bool __atomic_compare_exchange_strong_acq_rel_consume(volatile type* atomic_obj,
 305                                                         type* expected,
 306                                                         type desired);
 307     bool __atomic_compare_exchange_strong_acq_rel_acquire(volatile type* atomic_obj,
 308                                                         type* expected,
 309                                                         type desired);
 310     bool __atomic_compare_exchange_strong_seq_cst_relaxed(volatile type* atomic_obj,
 311                                                         type* expected,
 312                                                         type desired);
 313     bool __atomic_compare_exchange_strong_seq_cst_consume(volatile type* atomic_obj,
 314                                                         type* expected,
 315                                                         type desired);
 316     bool __atomic_compare_exchange_strong_seq_cst_acquire(volatile type* atomic_obj,
 317                                                         type* expected,
 318                                                         type desired);
 319     bool __atomic_compare_exchange_strong_seq_cst_seq_cst(volatile type* atomic_obj,
 320                                                         type* expected,
 321                                                         type desired);
 322
 323     // type must be trivially copyable
 324     bool __atomic_compare_exchange_weak_relaxed_relaxed(volatile type* atomic_obj,
 325                                                         type* expected,
 326                                                         type desired);
 327     bool __atomic_compare_exchange_weak_consume_relaxed(volatile type* atomic_obj,
 328                                                         type* expected,
 329                                                         type desired);
 330     bool __atomic_compare_exchange_weak_consume_consume(volatile type* atomic_obj,
 331                                                         type* expected,
 332                                                         type desired);
 333     bool __atomic_compare_exchange_weak_acquire_relaxed(volatile type* atomic_obj,
 334                                                         type* expected,
 335                                                         type desired);
 336     bool __atomic_compare_exchange_weak_acquire_consume(volatile type* atomic_obj,
 337                                                         type* expected,
 338                                                         type desired);
 339     bool __atomic_compare_exchange_weak_acquire_acquire(volatile type* atomic_obj,
 340                                                         type* expected,
 341                                                         type desired);
 342     bool __atomic_compare_exchange_weak_release_relaxed(volatile type* atomic_obj,
 343                                                         type* expected,
 344                                                         type desired);
 345     bool __atomic_compare_exchange_weak_release_consume(volatile type* atomic_obj,
 346                                                         type* expected,
 347                                                         type desired);
 348     bool __atomic_compare_exchange_weak_release_acquire(volatile type* atomic_obj,
 349                                                         type* expected,
 350                                                         type desired);
 351     bool __atomic_compare_exchange_weak_acq_rel_relaxed(volatile type* atomic_obj,
 352                                                         type* expected,
 353                                                         type desired);
 354     bool __atomic_compare_exchange_weak_acq_rel_consume(volatile type* atomic_obj,
 355                                                         type* expected,
 356                                                         type desired);
 357     bool __atomic_compare_exchange_weak_acq_rel_acquire(volatile type* atomic_obj,
 358                                                         type* expected,
 359                                                         type desired);
 360     bool __atomic_compare_exchange_weak_seq_cst_relaxed(volatile type* atomic_obj,
 361                                                         type* expected,
 362                                                         type desired);
 363     bool __atomic_compare_exchange_weak_seq_cst_consume(volatile type* atomic_obj,
 364                                                         type* expected,
 365                                                         type desired);
 366     bool __atomic_compare_exchange_weak_seq_cst_acquire(volatile type* atomic_obj,
 367                                                         type* expected,
 368                                                         type desired);
 369     bool __atomic_compare_exchange_weak_seq_cst_seq_cst(volatile type* atomic_obj,
 370                                                         type* expected,
 371                                                         type desired);
 372
 373     // type is one of: char, signed char, unsigned char, short, unsigned short, int,
 374     //      unsigned int, long, unsigned long, long long, unsigned long long,
 375     //      char16_t, char32_t, wchar_t
 376     type __atomic_fetch_add_relaxed(volatile type* atomic_obj, type operand);
 377     type __atomic_fetch_add_consume(volatile type* atomic_obj, type operand);
 378     type __atomic_fetch_add_acquire(volatile type* atomic_obj, type operand);
 379     type __atomic_fetch_add_release(volatile type* atomic_obj, type operand);
 380     type __atomic_fetch_add_acq_rel(volatile type* atomic_obj, type operand);
 381     type __atomic_fetch_add_seq_cst(volatile type* atomic_obj, type operand);
 382
 383     // type is one of: char, signed char, unsigned char, short, unsigned short, int,
 384     //      unsigned int, long, unsigned long, long long, unsigned long long,
 385     //      char16_t, char32_t, wchar_t
 386     type __atomic_fetch_sub_relaxed(volatile type* atomic_obj, type operand);
 387     type __atomic_fetch_sub_consume(volatile type* atomic_obj, type operand);
 388     type __atomic_fetch_sub_acquire(volatile type* atomic_obj, type operand);
 389     type __atomic_fetch_sub_release(volatile type* atomic_obj, type operand);
 390     type __atomic_fetch_sub_acq_rel(volatile type* atomic_obj, type operand);
 391     type __atomic_fetch_sub_seq_cst(volatile type* atomic_obj, type operand);
 392
 393     // type is one of: char, signed char, unsigned char, short, unsigned short, int,
 394     //      unsigned int, long, unsigned long, long long, unsigned long long,
 395     //      char16_t, char32_t, wchar_t
 396     type __atomic_fetch_and_relaxed(volatile type* atomic_obj, type operand);
 397     type __atomic_fetch_and_consume(volatile type* atomic_obj, type operand);
 398     type __atomic_fetch_and_acquire(volatile type* atomic_obj, type operand);
 399     type __atomic_fetch_and_release(volatile type* atomic_obj, type operand);
 400     type __atomic_fetch_and_acq_rel(volatile type* atomic_obj, type operand);
 401     type __atomic_fetch_and_seq_cst(volatile type* atomic_obj, type operand);
 402
 403     // type is one of: char, signed char, unsigned char, short, unsigned short, int,
 404     //      unsigned int, long, unsigned long, long long, unsigned long long,
 405     //      char16_t, char32_t, wchar_t
 406     type __atomic_fetch_or_relaxed(volatile type* atomic_obj, type operand);
 407     type __atomic_fetch_or_consume(volatile type* atomic_obj, type operand);
 408     type __atomic_fetch_or_acquire(volatile type* atomic_obj, type operand);
 409     type __atomic_fetch_or_release(volatile type* atomic_obj, type operand);
 410     type __atomic_fetch_or_acq_rel(volatile type* atomic_obj, type operand);
 411     type __atomic_fetch_or_seq_cst(volatile type* atomic_obj, type operand);
 412
 413     // type is one of: char, signed char, unsigned char, short, unsigned short, int,
 414     //      unsigned int, long, unsigned long, long long, unsigned long long,
 415     //      char16_t, char32_t, wchar_t
 416     type __atomic_fetch_xor_relaxed(volatile type* atomic_obj, type operand);
 417     type __atomic_fetch_xor_consume(volatile type* atomic_obj, type operand);
 418     type __atomic_fetch_xor_acquire(volatile type* atomic_obj, type operand);
 419     type __atomic_fetch_xor_release(volatile type* atomic_obj, type operand);
 420     type __atomic_fetch_xor_acq_rel(volatile type* atomic_obj, type operand);
 421     type __atomic_fetch_xor_seq_cst(volatile type* atomic_obj, type operand);
 422
 423     void* __atomic_fetch_add_relaxed(void* volatile* atomic_obj, ptrdiff_t operand);
 424     void* __atomic_fetch_add_consume(void* volatile* atomic_obj, ptrdiff_t operand);
 425     void* __atomic_fetch_add_acquire(void* volatile* atomic_obj, ptrdiff_t operand);
 426     void* __atomic_fetch_add_release(void* volatile* atomic_obj, ptrdiff_t operand);
 427     void* __atomic_fetch_add_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand);
 428     void* __atomic_fetch_add_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand);
 429
 430     void* __atomic_fetch_sub_relaxed(void* volatile* atomic_obj, ptrdiff_t operand);
 431     void* __atomic_fetch_sub_consume(void* volatile* atomic_obj, ptrdiff_t operand);
 432     void* __atomic_fetch_sub_acquire(void* volatile* atomic_obj, ptrdiff_t operand);
 433     void* __atomic_fetch_sub_release(void* volatile* atomic_obj, ptrdiff_t operand);
 434     void* __atomic_fetch_sub_acq_rel(void* volatile* atomic_obj, ptrdiff_t operand);
 435     void* __atomic_fetch_sub_seq_cst(void* volatile* atomic_obj, ptrdiff_t operand);
 436
 437     void __atomic_thread_fence_relaxed();
 438     void __atomic_thread_fence_consume();
 439     void __atomic_thread_fence_acquire();
 440     void __atomic_thread_fence_release();
 441     void __atomic_thread_fence_acq_rel();
 442     void __atomic_thread_fence_seq_cst();
 443
 444     void __atomic_signal_fence_relaxed();
 445     void __atomic_signal_fence_consume();
 446     void __atomic_signal_fence_acquire();
 447     void __atomic_signal_fence_release();
 448     void __atomic_signal_fence_acq_rel();
 449     void __atomic_signal_fence_seq_cst();
 450
 451 Design C: Minimal work for the front end
 452 ========================================
 453 The ``<atomic>`` header is one of the most closely coupled headers to the compiler.
 454 Ideally when you invoke any function from ``<atomic>``, it should result in highly
 455 optimized assembly being inserted directly into your application -- assembly that
 456 is not otherwise representable by higher level C or C++ expressions. The design of
 457 the libc++ ``<atomic>`` header started with this goal in mind. A secondary, but
 458 still very important goal is that the compiler should have to do minimal work to
 459 facilitate the implementation of ``<atomic>``.  Without this second goal, then
 460 practically speaking, the libc++ ``<atomic>`` header would be doomed to be a
 461 barely supported, second class citizen on almost every platform.
 462
 463 Goals:
 464
 465 - Optimal code generation for atomic operations
 466 - Minimal effort for the compiler to achieve goal 1 on any given platform
 467 - Conformance to the C++0X draft standard
 468
 469 The purpose of this document is to inform compiler writers what they need to do
 470 to enable a high performance libc++ ``<atomic>`` with minimal effort.
 471
 472 The minimal work that must be done for a conforming ``<atomic>``
 473 ----------------------------------------------------------------
 474 The only "atomic" operations that must actually be lock free in
 475 ``<atomic>`` are represented by the following compiler intrinsics:
 476
 477 .. code-block:: cpp
 478
 479     __atomic_flag__ __atomic_exchange_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr) {
 480         unique_lock<mutex> _(some_mutex);
 481         __atomic_flag__ result = *obj;
 482         *obj = desr;
 483         return result;
 484     }
 485
 486     void __atomic_store_seq_cst(__atomic_flag__ volatile* obj, __atomic_flag__ desr) {
 487         unique_lock<mutex> _(some_mutex);
 488         *obj = desr;
 489     }
 490
 491 Where:
 492
 493 - If ``__has_feature(__atomic_flag)`` evaluates to 1 in the preprocessor then
 494   the compiler must define ``__atomic_flag__`` (e.g. as a typedef to ``int``).
 495 - If ``__has_feature(__atomic_flag)`` evaluates to 0 in the preprocessor then
 496   the library defines ``__atomic_flag__`` as a typedef to ``bool``.
 497 - To communicate that the above intrinsics are available, the compiler must
 498   arrange for ``__has_feature`` to return 1 when fed the intrinsic name
 499   appended with an '_' and the mangled type name of ``__atomic_flag__``.
 500
 501 For example if ``__atomic_flag__`` is ``unsigned int``:
 502
 503 .. code-block:: cpp
 504
 505     // __has_feature(__atomic_flag) == 1
 506     // __has_feature(__atomic_exchange_seq_cst_j) == 1
 507     // __has_feature(__atomic_store_seq_cst_j) == 1
 508
 509     typedef unsigned int __atomic_flag__;
 510
 511     unsigned int __atomic_exchange_seq_cst(unsigned int volatile*, unsigned int) {
 512         // ...
 513     }
 514
 515     void __atomic_store_seq_cst(unsigned int volatile*, unsigned int) {
 516         // ...
 517     }
 518
 519 That's it! Compiler writers do the above and you've got a fully conforming
 520 (though sub-par performance) ``<atomic>`` header!
 521
 522
 523 Recommended work for a higher performance ``<atomic>``
 524 ------------------------------------------------------
 525 It would be good if the above intrinsics worked with all integral types plus
 526 ``void*``. Because this may not be possible to do in a lock-free manner for
 527 all integral types on all platforms, a compiler must communicate each type that
 528 an intrinsic works with. For example, if ``__atomic_exchange_seq_cst`` works
 529 for all types except for ``long long`` and ``unsigned long long`` then:
 530
 531 .. code-block:: cpp
 532
 533     __has_feature(__atomic_exchange_seq_cst_b) == 1  // bool
 534     __has_feature(__atomic_exchange_seq_cst_c) == 1  // char
 535     __has_feature(__atomic_exchange_seq_cst_a) == 1  // signed char
 536     __has_feature(__atomic_exchange_seq_cst_h) == 1  // unsigned char
 537     __has_feature(__atomic_exchange_seq_cst_Ds) == 1 // char16_t
 538     __has_feature(__atomic_exchange_seq_cst_Di) == 1 // char32_t
 539     __has_feature(__atomic_exchange_seq_cst_w) == 1  // wchar_t
 540     __has_feature(__atomic_exchange_seq_cst_s) == 1  // short
 541     __has_feature(__atomic_exchange_seq_cst_t) == 1  // unsigned short
 542     __has_feature(__atomic_exchange_seq_cst_i) == 1  // int
 543     __has_feature(__atomic_exchange_seq_cst_j) == 1  // unsigned int
 544     __has_feature(__atomic_exchange_seq_cst_l) == 1  // long
 545     __has_feature(__atomic_exchange_seq_cst_m) == 1  // unsigned long
 546     __has_feature(__atomic_exchange_seq_cst_Pv) == 1 // void*
 547
 548 Note that only the ``__has_feature`` flag is decorated with the argument
 549 type. The name of the compiler intrinsic is not decorated, but instead works
 550 like a C++ overloaded function.
 551
 552 Additionally, there are other intrinsics besides ``__atomic_exchange_seq_cst``
 553 and ``__atomic_store_seq_cst``. They are optional. But if the compiler can
 554 generate faster code than provided by the library, then clients will benefit
 555 from the compiler writer's expertise and knowledge of the targeted platform.
 556
 557 Below is the complete list of *sequentially consistent* intrinsics, and
 558 their library implementations. Template syntax is used to indicate the desired
 559 overloading for integral and ``void*`` types. The template does not represent a
 560 requirement that the intrinsic operate on **any** type!
 561
 562 .. code-block:: cpp
 563
 564     // T is one of:
 565     // bool, char, signed char, unsigned char, short, unsigned short,
 566     // int, unsigned int, long, unsigned long,
 567     // long long, unsigned long long, char16_t, char32_t, wchar_t, void*
 568
 569     template <class T>
 570     T __atomic_load_seq_cst(T const volatile* obj) {
 571         unique_lock<mutex> _(some_mutex);
 572         return *obj;
 573     }
 574
 575     template <class T>
 576     void __atomic_store_seq_cst(T volatile* obj, T desr) {
 577         unique_lock<mutex> _(some_mutex);
 578         *obj = desr;
 579     }
 580
 581     template <class T>
 582     T __atomic_exchange_seq_cst(T volatile* obj, T desr) {
 583         unique_lock<mutex> _(some_mutex);
 584         T r = *obj;
 585         *obj = desr;
 586         return r;
 587     }
 588
 589     template <class T>
 590     bool __atomic_compare_exchange_strong_seq_cst_seq_cst(T volatile* obj, T* exp, T desr) {
 591         unique_lock<mutex> _(some_mutex);
 592         if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0) {
 593             std::memcpy(const_cast<T*>(obj), &desr, sizeof(T));
 594             return true;
 595         }
 596         std::memcpy(exp, const_cast<T*>(obj), sizeof(T));
 597         return false;
 598     }
 599
 600     template <class T>
 601     bool __atomic_compare_exchange_weak_seq_cst_seq_cst(T volatile* obj, T* exp, T desr) {
 602         unique_lock<mutex> _(some_mutex);
 603         if (std::memcmp(const_cast<T*>(obj), exp, sizeof(T)) == 0)
 604         {
 605             std::memcpy(const_cast<T*>(obj), &desr, sizeof(T));
 606             return true;
 607         }
 608         std::memcpy(exp, const_cast<T*>(obj), sizeof(T));
 609         return false;
 610     }
 611
 612     // T is one of:
 613     // char, signed char, unsigned char, short, unsigned short,
 614     // int, unsigned int, long, unsigned long,
 615     // long long, unsigned long long, char16_t, char32_t, wchar_t
 616
 617     template <class T>
 618     T __atomic_fetch_add_seq_cst(T volatile* obj, T operand) {
 619         unique_lock<mutex> _(some_mutex);
 620         T r = *obj;
 621         *obj += operand;
 622         return r;
 623     }
 624
 625     template <class T>
 626     T __atomic_fetch_sub_seq_cst(T volatile* obj, T operand) {
 627         unique_lock<mutex> _(some_mutex);
 628         T r = *obj;
 629         *obj -= operand;
 630         return r;
 631     }
 632
 633     template <class T>
 634     T __atomic_fetch_and_seq_cst(T volatile* obj, T operand) {
 635         unique_lock<mutex> _(some_mutex);
 636         T r = *obj;
 637         *obj &= operand;
 638         return r;
 639     }
 640
 641     template <class T>
 642     T __atomic_fetch_or_seq_cst(T volatile* obj, T operand) {
 643         unique_lock<mutex> _(some_mutex);
 644         T r = *obj;
 645         *obj |= operand;
 646         return r;
 647     }
 648
 649     template <class T>
 650     T __atomic_fetch_xor_seq_cst(T volatile* obj, T operand) {
 651         unique_lock<mutex> _(some_mutex);
 652         T r = *obj;
 653         *obj ^= operand;
 654         return r;
 655     }
 656
 657     void* __atomic_fetch_add_seq_cst(void* volatile* obj, ptrdiff_t operand) {
 658         unique_lock<mutex> _(some_mutex);
 659         void* r = *obj;
 660         (char*&)(*obj) += operand;
 661         return r;
 662     }
 663
 664     void* __atomic_fetch_sub_seq_cst(void* volatile* obj, ptrdiff_t operand) {
 665         unique_lock<mutex> _(some_mutex);
 666         void* r = *obj;
 667         (char*&)(*obj) -= operand;
 668         return r;
 669     }
 670
 671     void __atomic_thread_fence_seq_cst() {
 672         unique_lock<mutex> _(some_mutex);
 673     }
 674
 675     void __atomic_signal_fence_seq_cst() {
 676         unique_lock<mutex> _(some_mutex);
 677     }
 678
 679 One should consult the (currently draft) `C++ Standard <https://wg21.link/n3126>`_
 680 for the details of the definitions for these operations. For example,
 681 ``__atomic_compare_exchange_weak_seq_cst_seq_cst`` is allowed to fail
 682 spuriously while ``__atomic_compare_exchange_strong_seq_cst_seq_cst`` is not.
 683
 684 If on your platform the lock-free definition of ``__atomic_compare_exchange_weak_seq_cst_seq_cst``
 685 would be the same as ``__atomic_compare_exchange_strong_seq_cst_seq_cst``, you may omit the
 686 ``__atomic_compare_exchange_weak_seq_cst_seq_cst`` intrinsic without a performance cost. The
 687 library will prefer your implementation of ``__atomic_compare_exchange_strong_seq_cst_seq_cst``
 688 over its own definition for implementing ``__atomic_compare_exchange_weak_seq_cst_seq_cst``.
 689 That is, the library will arrange for ``__atomic_compare_exchange_weak_seq_cst_seq_cst`` to call
 690 ``__atomic_compare_exchange_strong_seq_cst_seq_cst`` if you supply an intrinsic for the strong
 691 version but not the weak.
 692
 693 Taking advantage of weaker memory synchronization
 694 -------------------------------------------------
 695 So far, all of the intrinsics presented require a **sequentially consistent** memory ordering.
 696 That is, no loads or stores can move across the operation (just as if the library had locked
 697 that internal mutex). But ``<atomic>`` supports weaker memory ordering operations. In all,
 698 there are six memory orderings (listed here from strongest to weakest):
 699
 700 .. code-block:: cpp
 701
 702     memory_order_seq_cst
 703     memory_order_acq_rel
 704     memory_order_release
 705     memory_order_acquire
 706     memory_order_consume
 707     memory_order_relaxed
 708
 709 (See the `C++ Standard <https://wg21.link/n3126>`_ for the detailed definitions of each of these orderings).
 710
 711 On some platforms, the compiler vendor can offer some or even all of the above
 712 intrinsics at one or more weaker levels of memory synchronization. This might
 713 lead for example to not issuing an ``mfence`` instruction on the x86.
 714
 715 If the compiler does not offer any given operation, at any given memory ordering
 716 level, the library will automatically attempt to call the next highest memory
 717 ordering operation. This continues up to ``seq_cst``, and if that doesn't
 718 exist, then the library takes over and does the job with a ``mutex``. This
 719 is a compile-time search and selection operation. At run time, the application
 720 will only see the few inlined assembly instructions for the selected intrinsic.
 721
 722 Each intrinsic is appended with the 7-letter name of the memory ordering it
 723 addresses. For example a ``load`` with ``relaxed`` ordering is defined by:
 724
 725 .. code-block:: cpp
 726
 727     T __atomic_load_relaxed(const volatile T* obj);
 728
 729 And announced with:
 730
 731 .. code-block:: cpp
 732
 733     __has_feature(__atomic_load_relaxed_b) == 1  // bool
 734     __has_feature(__atomic_load_relaxed_c) == 1  // char
 735     __has_feature(__atomic_load_relaxed_a) == 1  // signed char
 736     ...
 737
 738 The ``__atomic_compare_exchange_strong(weak)`` intrinsics are parameterized
 739 on two memory orderings. The first ordering applies when the operation returns
 740 ``true`` and the second ordering applies when the operation returns ``false``.
 741
 742 Not every memory ordering is appropriate for every operation. ``exchange``
 743 and the ``fetch_XXX`` operations support all 6. But ``load`` only supports
 744 ``relaxed``, ``consume``, ``acquire`` and ``seq_cst``. ``store`` only supports
 745 ``relaxed``, ``release``, and ``seq_cst``. The ``compare_exchange`` operations
 746 support the following 16 combinations out of the possible 36:
 747
 748 .. code-block:: cpp
 749
 750     relaxed_relaxed
 751     consume_relaxed
 752     consume_consume
 753     acquire_relaxed
 754     acquire_consume
 755     acquire_acquire
 756     release_relaxed
 757     release_consume
 758     release_acquire
 759     acq_rel_relaxed
 760     acq_rel_consume
 761     acq_rel_acquire
 762     seq_cst_relaxed
 763     seq_cst_consume
 764     seq_cst_acquire
 765     seq_cst_seq_cst
 766
 767 Again, the compiler supplies intrinsics only for the strongest orderings where
 768 it can make a difference. The library takes care of calling the weakest
 769 supplied intrinsic that is as strong or stronger than the customer asked for.
 770
 771 Note about ABI
 772 ==============
 773 With any design, the (back end) compiler writer should note that the decision to
 774 implement lock-free operations on any given type (or not) is an ABI-binding decision.
 775 One can not change from treating a type as not lock free, to lock free (or vice-versa)
 776 without breaking your ABI.
 777
 778 For example:
 779
 780 **TU1.cpp**:
 781
 782 .. code-block:: cpp
 783
 784     extern atomic<long long> A;
 785     int foo() { return A.compare_exchange_strong(w, x); }
 786
 787
 788 **TU2.cpp**:
 789
 790 .. code-block:: cpp
 791
 792     extern atomic<long long> A;
 793     void bar() { return A.compare_exchange_strong(y, z); }
 794
 795 If only **one** of these calls to ``compare_exchange_strong`` is implemented with
 796 mutex-locked code, then that mutex-locked code will not be executed mutually
 797 exclusively of the one implemented in a lock-free manner.