openmp/docs/SupportAndFAQ.rst

   1 Support, Getting Involved, and FAQ
   2 ==================================
   3
   4 Please do not hesitate to reach out to us on the `Discourse forums (Runtimes - OpenMP) <https://discourse.llvm.org/c/runtimes/openmp/35>`_ or join
   5 one of our :ref:`regular calls <calls>`. Some common questions are answered in
   6 the :ref:`faq`.
   7
   8 .. _calls:
   9
  10 Calls
  11 -----
  12
  13 OpenMP in LLVM Technical Call
  14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  15
  16 -   Development updates on OpenMP (and OpenACC) in the LLVM Project, including Clang, optimization, and runtime work.
  17 -   Join `OpenMP in LLVM Technical Call <https://bluejeans.com/544112769//webrtc>`__.
  18 -   Time: Weekly call on every Wednesday 7:00 AM Pacific time.
  19 -   Meeting minutes are `here <https://docs.google.com/document/d/1Tz8WFN13n7yJ-SCE0Qjqf9LmjGUw0dWO9Ts1ss4YOdg/edit>`__.
  20 -   Status tracking `page <https://openmp.llvm.org/docs>`__.
  21
  22
  23 OpenMP in Flang Technical Call
  24 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  25 -   Development updates on OpenMP and OpenACC in the Flang Project.
  26 -   Join `OpenMP in Flang Technical Call <https://bit.ly/39eQW3o>`_
  27 -   Time: Weekly call on every Thursdays 8:00 AM Pacific time.
  28 -   Meeting minutes are `here <https://docs.google.com/document/d/1yA-MeJf6RYY-ZXpdol0t7YoDoqtwAyBhFLr5thu5pFI>`__.
  29 -   Status tracking `page <https://docs.google.com/spreadsheets/d/1FvHPuSkGbl4mQZRAwCIndvQx9dQboffiD-xD0oqxgU0/edit#gid=0>`__.
  30
  31
  32 .. _faq:
  33
  34 FAQ
  35 ---
  36
  37 .. note::
  38    The FAQ is a work in progress and most of the expected content is not
  39    yet available. While you can expect changes, we always welcome feedback and
  40    additions. Please post on the `Discourse forums (Runtimes - OpenMP) <https://discourse.llvm.org/c/runtimes/openmp/35>`__.
  41
  42
  43 Q: How to contribute a patch to the webpage or any other part?
  44 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  45
  46 All patches go through the regular `LLVM review process
  47 <https://llvm.org/docs/Contributing.html#how-to-submit-a-patch>`_.
  48
  49
  50 .. _build_offload_capable_compiler:
  51
  52 Q: How to build an OpenMP GPU offload capable compiler?
  53 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  54
  55 The easiest way to create an offload capable compiler is to use the provided
  56 CMake cache file. This will enable the projects and runtimes necessary for
  57 offloading as well as some extra options.
  58
  59 .. code-block:: sh
  60
  61   $> cd llvm-project  # The llvm-project checkout
  62   $> mkdir build
  63   $> cd build
  64   $> cmake ../llvm -G Ninja                                                 \
  65      -C ../offload/cmake/caches/Offload.cmake \ # The preset cache file
  66      -DCMAKE_BUILD_TYPE=<Debug|Release>   \ # Select build type
  67      -DCMAKE_INSTALL_PREFIX=<PATH>        \ # Where the libraries will live
  68   $> ninja install
  69
  70 To manually build an *effective* OpenMP offload capable compiler, only one extra CMake
  71 option, ``LLVM_ENABLE_RUNTIMES="openmp;offload"``, is needed when building LLVM (Generic
  72 information about building LLVM is available `here
  73 <https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that
  74 are targeted by OpenMP are enabled. That can be done by adjusting the CMake
  75 option ``LLVM_TARGETS_TO_BUILD``. The corresponding targets for offloading to AMD
  76 and Nvidia GPUs are ``"AMDGPU"`` and ``"NVPTX"``, respectively. By default,
  77 Clang will be built with all backends enabled. When building with
  78 ``LLVM_ENABLE_RUNTIMES="openmp"`` OpenMP should not be enabled in
  79 ``LLVM_ENABLE_PROJECTS`` because it is enabled by default.
  80
  81 For Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`.
  82 For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
  83
  84 .. note::
  85   The compiler that generates the offload code should be the same (version) as
  86   the compiler that builds the OpenMP device runtimes. The OpenMP host runtime
  87   can be built by a different compiler.
  88
  89 .. _advanced_builds: https://llvm.org//docs/AdvancedBuilds.html
  90
  91 .. _build_nvidia_offload_capable_compiler:
  92
  93 Q: How to build an OpenMP Nvidia offload capable compiler?
  94 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  95 The Cuda SDK is required on the machine that will execute the openmp application.
  96
  97 If your build machine is not the target machine or automatic detection of the
  98 available GPUs failed, you should also set:
  99
 100 - ``LIBOMPTARGET_DEVICE_ARCHITECTURES='sm_<xy>;...'`` where ``<xy>`` is the numeric
 101   compute capability of your GPU. For instance, set
 102   ``LIBOMPTARGET_DEVICE_ARCHITECTURES='sm_70;sm_80'`` to target the Nvidia Volta
 103   and Ampere architectures.
 104
 105
 106 .. _build_amdgpu_offload_capable_compiler:
 107
 108 Q: How to build an OpenMP AMDGPU offload capable compiler?
 109 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 110 A subset of the `ROCm <https://github.com/radeonopencompute>`_ toolchain is
 111 required to build the LLVM toolchain and to execute the openmp application.
 112 Either install ROCm somewhere that cmake's find_package can locate it, or
 113 build the required subcomponents ROCt and ROCr from source.
 114
 115 The two components used are ROCT-Thunk-Interface, roct, and ROCR-Runtime, rocr.
 116 Roct is the userspace part of the linux driver. It calls into the driver which
 117 ships with the linux kernel. It is an implementation detail of Rocr from
 118 OpenMP's perspective. Rocr is an implementation of `HSA
 119 <http://www.hsafoundation.com>`_.
 120
 121 .. code-block:: text
 122
 123   SOURCE_DIR=same-as-llvm-source # e.g. the checkout of llvm-project, next to openmp
 124   BUILD_DIR=somewhere
 125   INSTALL_PREFIX=same-as-llvm-install
 126
 127   cd $SOURCE_DIR
 128   git clone git@github.com:RadeonOpenCompute/ROCT-Thunk-Interface.git -b roc-4.2.x \
 129     --single-branch
 130   git clone git@github.com:RadeonOpenCompute/ROCR-Runtime.git -b rocm-4.2.x \
 131     --single-branch
 132
 133   cd $BUILD_DIR && mkdir roct && cd roct
 134   cmake $SOURCE_DIR/ROCT-Thunk-Interface/ -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX \
 135     -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF
 136   make && make install
 137
 138   cd $BUILD_DIR && mkdir rocr && cd rocr
 139   cmake $SOURCE_DIR/ROCR-Runtime/src -DIMAGE_SUPPORT=OFF \
 140     -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DCMAKE_BUILD_TYPE=Release \
 141     -DBUILD_SHARED_LIBS=ON
 142   make && make install
 143
 144 ``IMAGE_SUPPORT`` requires building rocr with clang and is not used by openmp.
 145
 146 Provided cmake's find_package can find the ROCR-Runtime package, LLVM will
 147 build a tool ``bin/amdgpu-arch`` which will print a string like ``gfx906`` when
 148 run if it recognises a GPU on the local system. LLVM will also build a shared
 149 library, libomptarget.rtl.amdgpu.so, which is linked against rocr.
 150
 151 With those libraries installed, then LLVM build and installed, try:
 152
 153 .. code-block:: shell
 154
 155     clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa example.c -o example && ./example
 156
 157 If your build machine is not the target machine or automatic detection of the
 158 available GPUs failed, you should also set:
 159
 160 - ``LIBOMPTARGET_DEVICE_ARCHITECTURES='gfx<xyz>;...'`` where ``<xyz>`` is the
 161   shader core instruction set architecture. For instance, set
 162   ``LIBOMPTARGET_DEVICE_ARCHITECTURES='gfx906;gfx90a'`` to target AMD GCN5
 163   and CDNA2 devices.
 164
 165 Q: What are the known limitations of OpenMP AMDGPU offload?
 166 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 167 LD_LIBRARY_PATH or rpath/runpath are required to find libomp.so and libomptarget.so
 168
 169 There is no libc. That is, malloc and printf do not exist. Libm is implemented in terms
 170 of the rocm device library, which will be searched for if linking with '-lm'.
 171
 172 Some versions of the driver for the radeon vii (gfx906) will error unless the
 173 environment variable 'export HSA_IGNORE_SRAMECC_MISREPORT=1' is set.
 174
 175 It is a recent addition to LLVM and the implementation differs from that which
 176 has been shipping in ROCm and AOMP for some time. Early adopters will encounter
 177 bugs.
 178
 179 Q: What are the LLVM components used in offloading and how are they found?
 180 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 181 The libraries used by an executable compiled for target offloading are:
 182
 183 - ``libomp.so`` (or similar), the host openmp runtime
 184 - ``libomptarget.so``, the target-agnostic target offloading openmp runtime
 185 - plugins loaded by libomptarget.so:
 186
 187   - ``libomptarget.rtl.amdgpu.so``
 188   - ``libomptarget.rtl.cuda.so``
 189   - ``libomptarget.rtl.x86_64.so``
 190   - ``libomptarget.rtl.ve.so``
 191   - and others
 192
 193 - dependencies of those plugins, e.g. cuda/rocr for nvptx/amdgpu
 194
 195 The compiled executable is dynamically linked against a host runtime, e.g.
 196 ``libomp.so``, and against the target offloading runtime, ``libomptarget.so``. These
 197 are found like any other dynamic library, by setting rpath or runpath on the
 198 executable, by setting ``LD_LIBRARY_PATH``, or by adding them to the system search.
 199
 200 ``libomptarget.so`` is only supported to work with the associated ``clang``
 201 compiler. On systems with globally installed ``libomptarget.so`` this can be
 202 problematic. For this reason it is recommended to use a `Clang configuration
 203 file <https://clang.llvm.org/docs/UsersManual.html#configuration-files>`__ to
 204 automatically configure the environment. For example, store the following file
 205 as ``openmp.cfg`` next to your ``clang`` executable.
 206
 207 .. code-block:: text
 208
 209   # Library paths for OpenMP offloading.
 210   -L '<CFGDIR>/../lib'
 211   -Wl,-rpath='<CFGDIR>/../lib'
 212
 213 The plugins will try to find their dependencies in plugin-dependent fashion.
 214
 215 The cuda plugin is dynamically linked against libcuda if cmake found it at
 216 compiler build time. Otherwise it will attempt to dlopen ``libcuda.so``. It does
 217 not have rpath set.
 218
 219 The amdgpu plugin is linked against ROCr if cmake found it at compiler build
 220 time. Otherwise it will attempt to dlopen ``libhsa-runtime64.so``. It has rpath
 221 set to ``$ORIGIN``, so installing ``libhsa-runtime64.so`` in the same directory is a
 222 way to locate it without environment variables.
 223
 224 In addition to those, there is a compiler runtime library called deviceRTL.
 225 This is compiled from mostly common code into an architecture specific
 226 bitcode library, e.g. ``libomptarget-nvptx-sm_70.bc``.
 227
 228 Clang and the deviceRTL need to match closely as the interface between them
 229 changes frequently. Using both from the same monorepo checkout is strongly
 230 recommended.
 231
 232 Unlike the host side which lets environment variables select components, the
 233 deviceRTL that is located in the clang lib directory is preferred. Only if
 234 it is absent, the ``LIBRARY_PATH`` environment variable is searched to find a
 235 bitcode file with the right name. This can be overridden by passing a clang
 236 flag, ``--libomptarget-nvptx-bc-path`` or ``--libomptarget-amdgcn-bc-path``. That
 237 can specify a directory or an exact bitcode file to use.
 238
 239
 240 Q: Does OpenMP offloading support work in pre-packaged LLVM releases?
 241 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 242 For now, the answer is most likely *no*. Please see :ref:`build_offload_capable_compiler`.
 243
 244 Q: Does OpenMP offloading support work in packages distributed as part of my OS?
 245 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 246 For now, the answer is most likely *no*. Please see :ref:`build_offload_capable_compiler`.
 247
 248
 249 .. _math_and_complex_in_target_regions:
 250
 251 Q: Does Clang support `<math.h>` and `<complex.h>` operations in OpenMP target on GPUs?
 252 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 253
 254 Yes, LLVM/Clang allows math functions and complex arithmetic inside of OpenMP
 255 target regions that are compiled for GPUs.
 256
 257 Clang provides a set of wrapper headers that are found first when `math.h` and
 258 `complex.h`, for C, `cmath` and `complex`, for C++, or similar headers are
 259 included by the application. These wrappers will eventually include the system
 260 version of the corresponding header file after setting up a target device
 261 specific environment. The fact that the system header is included is important
 262 because they differ based on the architecture and operating system and may
 263 contain preprocessor, variable, and function definitions that need to be
 264 available in the target region regardless of the targeted device architecture.
 265 However, various functions may require specialized device versions, e.g.,
 266 `sin`, and others are only available on certain devices, e.g., `__umul64hi`. To
 267 provide "native" support for math and complex on the respective architecture,
 268 Clang will wrap the "native" math functions, e.g., as provided by the device
 269 vendor, in an OpenMP begin/end declare variant. These functions will then be
 270 picked up instead of the host versions while host only variables and function
 271 definitions are still available. Complex arithmetic and functions are support
 272 through a similar mechanism. It is worth noting that this support requires
 273 `extensions to the OpenMP begin/end declare variant context selector
 274 <https://clang.llvm.org/docs/AttributeReference.html#pragma-omp-declare-variant>`__
 275 that are exposed through LLVM/Clang to the user as well.
 276
 277 Q: What is a way to debug errors from mapping memory to a target device?
 278 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 279
 280 An experimental way to debug these errors is to use :ref:`remote process
 281 offloading <remote_offloading_plugin>`.
 282 By using ``libomptarget.rtl.rpc.so`` and ``openmp-offloading-server``, it is
 283 possible to explicitly perform memory transfers between processes on the host
 284 CPU and run sanitizers while doing so in order to catch these errors.
 285
 286 Q: Can I use dynamically linked libraries with OpenMP offloading?
 287 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 288
 289 Dynamically linked libraries can be only used if there is no device code split
 290 between the library and application. Anything declared on the device inside the
 291 shared library will not be visible to the application when it's linked.
 292
 293 Q: How to build an OpenMP offload capable compiler with an outdated host compiler?
 294 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 295
 296 Enabling the OpenMP runtime will perform a two-stage build for you.
 297 If your host compiler is different from your system-wide compiler, you may need
 298 to set ``CMAKE_{C,CXX}_FLAGS`` like
 299 ``--gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/12`` so that clang will be
 300 able to find the correct GCC toolchain in the second stage of the build.
 301
 302 For example, if your system-wide GCC installation is too old to build LLVM and
 303 you would like to use a newer GCC, set ``--gcc-install-dir=``
 304 to inform clang of the GCC installation you would like to use in the second stage.
 305
 306 Q: How can I include OpenMP offloading support in my CMake project?
 307 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 308
 309 Currently, there is an experimental CMake find module for OpenMP target
 310 offloading provided by LLVM. It will attempt to find OpenMP target offloading
 311 support for your compiler. The flags necessary for OpenMP target offloading will
 312 be loaded into the ``OpenMPTarget::OpenMPTarget_<device>`` target or the
 313 ``OpenMPTarget_<device>_FLAGS`` variable if successful. Currently supported
 314 devices are ``AMDGPU`` and ``NVPTX``.
 315
 316 To use this module, simply add the path to CMake's current module path and call
 317 ``find_package``. The module will be installed with your OpenMP installation by
 318 default. Including OpenMP offloading support in an application should now only
 319 require a few additions.
 320
 321 .. code-block:: cmake
 322
 323   cmake_minimum_required(VERSION 3.20.0)
 324   project(offloadTest VERSION 1.0 LANGUAGES CXX)
 325
 326   list(APPEND CMAKE_MODULE_PATH "${PATH_TO_OPENMP_INSTALL}/lib/cmake/openmp")
 327
 328   find_package(OpenMPTarget REQUIRED NVPTX)
 329
 330   add_executable(offload)
 331   target_link_libraries(offload PRIVATE OpenMPTarget::OpenMPTarget_NVPTX)
 332   target_sources(offload PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/src/Main.cpp)
 333
 334 Using this module requires at least CMake version 3.20.0. Supported languages
 335 are C and C++ with Fortran support planned in the future. Compiler support is
 336 best for Clang but this module should work for other compiler vendors such as
 337 IBM, GNU.
 338
 339 Q: What does 'Stack size for entry function cannot be statically determined' mean?
 340 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 341
 342 This is a warning that the Nvidia tools will sometimes emit if the offloading
 343 region is too complex. Normally, the CUDA tools attempt to statically determine
 344 how much stack memory each thread. This way when the kernel is launched each
 345 thread will have as much memory as it needs. If the control flow of the kernel
 346 is too complex, containing recursive calls or nested parallelism, this analysis
 347 can fail. If this warning is triggered it means that the kernel may run out of
 348 stack memory during execution and crash. The environment variable
 349 ``LIBOMPTARGET_STACK_SIZE`` can be used to increase the stack size if this
 350 occurs.
 351
 352 Q: Can OpenMP offloading compile for multiple architectures?
 353 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 354
 355 Since LLVM version 15.0, OpenMP offloading supports offloading to multiple
 356 architectures at once. This allows for executables to be run on different
 357 targets, such as offloading to AMD and NVIDIA GPUs simultaneously, as well as
 358 multiple sub-architectures for the same target. Additionally, static libraries
 359 will only extract archive members if an architecture is used, allowing users to
 360 create generic libraries.
 361
 362 The architecture can either be specified manually using ``--offload-arch=``. If
 363 ``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the
 364 targets will be inferred from the architectures. Conversely, if
 365 ``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
 366 architecture will be set to a default value, usually the architecture supported
 367 by the system LLVM was built on.
 368
 369 For example, an executable can be built that runs on AMDGPU and NVIDIA hardware
 370 given that the necessary build tools are installed for both.
 371
 372 .. code-block:: shell
 373
 374    clang example.c -fopenmp --offload-arch=gfx90a --offload-arch=sm_80
 375
 376 If just given the architectures we should be able to infer the triples,
 377 otherwise we can specify them manually.
 378
 379 .. code-block:: shell
 380
 381    clang example.c -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,nvptx64-nvidia-cuda \
 382       -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a \
 383       -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_80
 384
 385 When linking against a static library that contains device code for multiple
 386 architectures, only the images used by the executable will be extracted.
 387
 388 .. code-block:: shell
 389
 390    clang example.c -fopenmp --offload-arch=gfx90a,gfx90a,sm_70,sm_80 -c
 391    llvm-ar rcs libexample.a example.o
 392    clang app.c -fopenmp --offload-arch=gfx90a -o app
 393
 394 The supported device images can be viewed using the ``--offloading`` option with
 395 ``llvm-objdump``.
 396
 397 .. code-block:: shell
 398
 399    clang example.c -fopenmp --offload-arch=gfx90a --offload-arch=sm_80 -o example
 400    llvm-objdump --offloading example
 401
 402    a.out:  file format elf64-x86-64
 403
 404    OFFLOADING IMAGE [0]:
 405    kind            elf
 406    arch            gfx90a
 407    triple          amdgcn-amd-amdhsa
 408    producer        openmp
 409
 410    OFFLOADING IMAGE [1]:
 411    kind            elf
 412    arch            sm_80
 413    triple          nvptx64-nvidia-cuda
 414    producer        openmp
 415
 416 Q: Can I link OpenMP offloading with CUDA or HIP?
 417 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 418
 419 OpenMP offloading files can currently be experimentally linked with CUDA and HIP
 420 files. This will allow OpenMP to call a CUDA device function or vice-versa.
 421 However, the global state will be distinct between the two images at runtime.
 422 This means any global variables will potentially have different values when
 423 queried from OpenMP or CUDA.
 424
 425 Linking CUDA and HIP currently requires enabling a different compilation mode
 426 for CUDA / HIP with ``--offload-new-driver`` and to link using
 427 ``--offload-link``. Additionally, ``-fgpu-rdc`` must be used to create a
 428 linkable device image.
 429
 430 .. code-block:: shell
 431
 432    clang++ openmp.cpp -fopenmp --offload-arch=sm_80 -c
 433    clang++ cuda.cu --offload-new-driver --offload-arch=sm_80 -fgpu-rdc -c
 434    clang++ openmp.o cuda.o --offload-link -o app
 435
 436 Q: Are libomptarget and plugins backward compatible?
 437 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 438
 439 No. libomptarget and plugins are now built as LLVM libraries starting from LLVM
 440 15. Because LLVM libraries are not backward compatible, libomptarget and plugins
 441 are not as well. Given that fact, the interfaces between 1) the Clang compiler
 442 and libomptarget, 2) the Clang compiler and device runtime library, and
 443 3) libomptarget and plugins are not guaranteed to be compatible with an earlier
 444 version. Users are responsible for ensuring compatibility when not using the
 445 Clang compiler and runtime libraries from the same build. Nevertheless, in order
 446 to better support third-party libraries and toolchains that depend on existing
 447 libomptarget entry points, contributors are discouraged from making
 448 modifications to them.
 449
 450 Q: Can I use libc functions on the GPU?
 451 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 452
 453 LLVM provides basic ``libc`` functionality through the LLVM C Library. For
 454 building instructions, refer to the associated `LLVM libc documentation
 455 <https://libc.llvm.org/gpu/using.html#building-the-gpu-library>`_. Once built,
 456 this provides a static library called ``libcgpu.a``. See the documentation for a
 457 list of `supported functions <https://libc.llvm.org/gpu/support.html>`_ as well.
 458 To utilize these functions, simply link this library as any other when building
 459 with OpenMP.
 460
 461 .. code-block:: shell
 462
 463    clang++ openmp.cpp -fopenmp --offload-arch=gfx90a -lcgpu
 464
 465 For more information on how this is implemented in LLVM/OpenMP's offloading
 466 runtime, refer to the `runtime documentation <libomptarget_libc>`_.
 467
 468 Q: What command line options can I use for OpenMP?
 469 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 470 We recommend taking a look at the OpenMP
 471 :doc:`command line argument reference <CommandLineArgumentReference>` page.
 472
 473 Q: Can I build the offloading runtimes without CUDA or HSA?
 474 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 475 By default, the offloading runtime will load the associated vendor runtime
 476 during initialization rather than directly linking against them. This allows the
 477 program to be built and run on many machine. If you wish to directly link
 478 against these libraries, use the ``LIBOMPTARGET_DLOPEN_PLUGINS=""`` option to
 479 suppress it for each plugin. The default value is every plugin enabled with
 480 ``LIBOMPTARGET_PLUGINS_TO_BUILD``.
 481
 482 Q: Why is my build taking a long time?
 483 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 484 When installing OpenMP and other LLVM components, the build time on multicore
 485 systems can be significantly reduced with parallel build jobs. As suggested in
 486 *LLVM Techniques, Tips, and Best Practices*, one could consider using ``ninja`` as the
 487 generator. This can be done with the CMake option ``cmake -G Ninja``. Afterward,
 488 use ``ninja install`` and specify the number of parallel jobs with ``-j``. The build
 489 time can also be reduced by setting the build type to ``Release`` with the
 490 ``CMAKE_BUILD_TYPE`` option. Recompilation can also be sped up by caching previous
 491 compilations. Consider enabling ``Ccache`` with
 492 ``CMAKE_CXX_COMPILER_LAUNCHER=ccache``.
 493
 494 Q: Did this FAQ not answer your question?
 495 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 496 Feel free to post questions or browse old threads at
 497 `LLVM Discourse <https://discourse.llvm.org/c/runtimes/openmp/>`__.