openmp/docs/SupportAndFAQ.rst

   1 Support, Getting Involved, and FAQ
   2 ==================================
   3
   4 Please do not hesitate to reach out to us on the `Discourse forums (Runtimes - OpenMP) <https://discourse.llvm.org/c/runtimes/openmp/35>`_ or join
   5 one of our :ref:`regular calls <calls>`. Some common questions are answered in
   6 the :ref:`faq`.
   7
   8 .. _calls:
   9
  10 Calls
  11 -----
  12
  13 OpenMP in LLVM Technical Call
  14 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  15
  16 -   Development updates on OpenMP (and OpenACC) in the LLVM Project, including Clang, optimization, and runtime work.
  17 -   Join `OpenMP in LLVM Technical Call <https://bluejeans.com/544112769//webrtc>`__.
  18 -   Time: Weekly call on every Wednesday 7:00 AM Pacific time.
  19 -   Meeting minutes are `here <https://docs.google.com/document/d/1Tz8WFN13n7yJ-SCE0Qjqf9LmjGUw0dWO9Ts1ss4YOdg/edit>`__.
  20 -   Status tracking `page <https://openmp.llvm.org/docs>`__.
  21
  22
  23 OpenMP in Flang Technical Call
  24 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  25 -   Development updates on OpenMP and OpenACC in the Flang Project.
  26 -   Join `OpenMP in Flang Technical Call <https://bit.ly/39eQW3o>`_
  27 -   Time: Weekly call on every Thursdays 8:00 AM Pacific time.
  28 -   Meeting minutes are `here <https://docs.google.com/document/d/1yA-MeJf6RYY-ZXpdol0t7YoDoqtwAyBhFLr5thu5pFI>`__.
  29 -   Status tracking `page <https://docs.google.com/spreadsheets/d/1FvHPuSkGbl4mQZRAwCIndvQx9dQboffiD-xD0oqxgU0/edit#gid=0>`__.
  30
  31
  32 .. _faq:
  33
  34 FAQ
  35 ---
  36
  37 .. note::
  38    The FAQ is a work in progress and most of the expected content is not
  39    yet available. While you can expect changes, we always welcome feedback and
  40    additions. Please post on the `Discourse forums (Runtimes - OpenMP) <https://discourse.llvm.org/c/runtimes/openmp/35>`__.
  41
  42
  43 Q: How to contribute a patch to the webpage or any other part?
  44 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  45
  46 All patches go through the regular `LLVM review process
  47 <https://llvm.org/docs/Contributing.html#how-to-submit-a-patch>`_.
  48
  49
  50 .. _build_offload_capable_compiler:
  51
  52 Q: How to build an OpenMP GPU offload capable compiler?
  53 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  54 To build an *effective* OpenMP offload capable compiler, only one extra CMake
  55 option, ``LLVM_ENABLE_RUNTIMES="openmp"``, is needed when building LLVM (Generic
  56 information about building LLVM is available `here
  57 <https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that
  58 are targeted by OpenMP are enabled. That can be done by adjusting the CMake
  59 option ``LLVM_TARGETS_TO_BUILD``. The corresponding targets for offloading to AMD
  60 and Nvidia GPUs are ``"AMDGPU"`` and ``"NVPTX"``, respectively. By default,
  61 Clang will be built with all backends enabled. When building with
  62 ``LLVM_ENABLE_RUNTIMES="openmp"`` OpenMP should not be enabled in
  63 ``LLVM_ENABLE_PROJECTS`` because it is enabled by default.
  64
  65 For Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`.
  66 For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
  67
  68 .. note::
  69   The compiler that generates the offload code should be the same (version) as
  70   the compiler that builds the OpenMP device runtimes. The OpenMP host runtime
  71   can be built by a different compiler.
  72
  73 .. _advanced_builds: https://llvm.org//docs/AdvancedBuilds.html
  74
  75 .. _build_nvidia_offload_capable_compiler:
  76
  77 Q: How to build an OpenMP Nvidia offload capable compiler?
  78 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  79 The Cuda SDK is required on the machine that will execute the openmp application.
  80
  81 If your build machine is not the target machine or automatic detection of the
  82 available GPUs failed, you should also set:
  83
  84 - ``LIBOMPTARGET_DEVICE_ARCHITECTURES=sm_<xy>,...`` where ``<xy>`` is the numeric
  85   compute capability of your GPU. For instance, set
  86   ``LIBOMPTARGET_DEVICE_ARCHITECTURES=sm_70,sm_80`` to target the Nvidia Volta
  87   and Ampere architectures.
  88
  89
  90 .. _build_amdgpu_offload_capable_compiler:
  91
  92 Q: How to build an OpenMP AMDGPU offload capable compiler?
  93 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  94 A subset of the `ROCm <https://github.com/radeonopencompute>`_ toolchain is
  95 required to build the LLVM toolchain and to execute the openmp application.
  96 Either install ROCm somewhere that cmake's find_package can locate it, or
  97 build the required subcomponents ROCt and ROCr from source.
  98
  99 The two components used are ROCT-Thunk-Interface, roct, and ROCR-Runtime, rocr.
 100 Roct is the userspace part of the linux driver. It calls into the driver which
 101 ships with the linux kernel. It is an implementation detail of Rocr from
 102 OpenMP's perspective. Rocr is an implementation of `HSA
 103 <http://www.hsafoundation.com>`_.
 104
 105 .. code-block:: text
 106
 107   SOURCE_DIR=same-as-llvm-source # e.g. the checkout of llvm-project, next to openmp
 108   BUILD_DIR=somewhere
 109   INSTALL_PREFIX=same-as-llvm-install
 110
 111   cd $SOURCE_DIR
 112   git clone git@github.com:RadeonOpenCompute/ROCT-Thunk-Interface.git -b roc-4.2.x \
 113     --single-branch
 114   git clone git@github.com:RadeonOpenCompute/ROCR-Runtime.git -b rocm-4.2.x \
 115     --single-branch
 116
 117   cd $BUILD_DIR && mkdir roct && cd roct
 118   cmake $SOURCE_DIR/ROCT-Thunk-Interface/ -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX \
 119     -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF
 120   make && make install
 121
 122   cd $BUILD_DIR && mkdir rocr && cd rocr
 123   cmake $SOURCE_DIR/ROCR-Runtime/src -DIMAGE_SUPPORT=OFF \
 124     -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DCMAKE_BUILD_TYPE=Release \
 125     -DBUILD_SHARED_LIBS=ON
 126   make && make install
 127
 128 ``IMAGE_SUPPORT`` requires building rocr with clang and is not used by openmp.
 129
 130 Provided cmake's find_package can find the ROCR-Runtime package, LLVM will
 131 build a tool ``bin/amdgpu-arch`` which will print a string like ``gfx906`` when
 132 run if it recognises a GPU on the local system. LLVM will also build a shared
 133 library, libomptarget.rtl.amdgpu.so, which is linked against rocr.
 134
 135 With those libraries installed, then LLVM build and installed, try:
 136
 137 .. code-block:: shell
 138
 139     clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa example.c -o example && ./example
 140
 141 If your build machine is not the target machine or automatic detection of the
 142 available GPUs failed, you should also set:
 143
 144 - ``LIBOMPTARGET_DEVICE_ARCHITECTURES=gfx<xyz>,...`` where ``<xyz>`` is the
 145   shader core instruction set architecture. For instance, set
 146   ``LIBOMPTARGET_DEVICE_ARCHITECTURES=gfx906,gfx90a`` to target AMD GCN5
 147   and CDNA2 devices.
 148
 149 Q: What are the known limitations of OpenMP AMDGPU offload?
 150 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 151 LD_LIBRARY_PATH or rpath/runpath are required to find libomp.so and libomptarget.so
 152
 153 There is no libc. That is, malloc and printf do not exist. Libm is implemented in terms
 154 of the rocm device library, which will be searched for if linking with '-lm'.
 155
 156 Some versions of the driver for the radeon vii (gfx906) will error unless the
 157 environment variable 'export HSA_IGNORE_SRAMECC_MISREPORT=1' is set.
 158
 159 It is a recent addition to LLVM and the implementation differs from that which
 160 has been shipping in ROCm and AOMP for some time. Early adopters will encounter
 161 bugs.
 162
 163 Q: What are the LLVM components used in offloading and how are they found?
 164 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 165 The libraries used by an executable compiled for target offloading are:
 166
 167 - ``libomp.so`` (or similar), the host openmp runtime
 168 - ``libomptarget.so``, the target-agnostic target offloading openmp runtime
 169 - plugins loaded by libomptarget.so:
 170
 171   - ``libomptarget.rtl.amdgpu.so``
 172   - ``libomptarget.rtl.cuda.so``
 173   - ``libomptarget.rtl.x86_64.so``
 174   - ``libomptarget.rtl.ve.so``
 175   - and others
 176
 177 - dependencies of those plugins, e.g. cuda/rocr for nvptx/amdgpu
 178
 179 The compiled executable is dynamically linked against a host runtime, e.g.
 180 ``libomp.so``, and against the target offloading runtime, ``libomptarget.so``. These
 181 are found like any other dynamic library, by setting rpath or runpath on the
 182 executable, by setting ``LD_LIBRARY_PATH``, or by adding them to the system search.
 183
 184 ``libomptarget.so`` is only supported to work with the associated ``clang``
 185 compiler. On systems with globally installed ``libomptarget.so`` this can be
 186 problematic. For this reason it is recommended to use a `Clang configuration
 187 file <https://clang.llvm.org/docs/UsersManual.html#configuration-files>`__ to
 188 automatically configure the environment. For example, store the following file
 189 as ``openmp.cfg`` next to your ``clang`` executable.
 190
 191 .. code-block:: text
 192
 193   # Library paths for OpenMP offloading.
 194   -L '<CFGDIR>/../lib'
 195   -Wl,-rpath='<CFGDIR>/../lib'
 196
 197 The plugins will try to find their dependencies in plugin-dependent fashion.
 198
 199 The cuda plugin is dynamically linked against libcuda if cmake found it at
 200 compiler build time. Otherwise it will attempt to dlopen ``libcuda.so``. It does
 201 not have rpath set.
 202
 203 The amdgpu plugin is linked against ROCr if cmake found it at compiler build
 204 time. Otherwise it will attempt to dlopen ``libhsa-runtime64.so``. It has rpath
 205 set to ``$ORIGIN``, so installing ``libhsa-runtime64.so`` in the same directory is a
 206 way to locate it without environment variables.
 207
 208 In addition to those, there is a compiler runtime library called deviceRTL.
 209 This is compiled from mostly common code into an architecture specific
 210 bitcode library, e.g. ``libomptarget-nvptx-sm_70.bc``.
 211
 212 Clang and the deviceRTL need to match closely as the interface between them
 213 changes frequently. Using both from the same monorepo checkout is strongly
 214 recommended.
 215
 216 Unlike the host side which lets environment variables select components, the
 217 deviceRTL that is located in the clang lib directory is preferred. Only if
 218 it is absent, the ``LIBRARY_PATH`` environment variable is searched to find a
 219 bitcode file with the right name. This can be overridden by passing a clang
 220 flag, ``--libomptarget-nvptx-bc-path`` or ``--libomptarget-amdgcn-bc-path``. That
 221 can specify a directory or an exact bitcode file to use.
 222
 223
 224 Q: Does OpenMP offloading support work in pre-packaged LLVM releases?
 225 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 226 For now, the answer is most likely *no*. Please see :ref:`build_offload_capable_compiler`.
 227
 228 Q: Does OpenMP offloading support work in packages distributed as part of my OS?
 229 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 230 For now, the answer is most likely *no*. Please see :ref:`build_offload_capable_compiler`.
 231
 232
 233 .. _math_and_complex_in_target_regions:
 234
 235 Q: Does Clang support `<math.h>` and `<complex.h>` operations in OpenMP target on GPUs?
 236 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 237
 238 Yes, LLVM/Clang allows math functions and complex arithmetic inside of OpenMP
 239 target regions that are compiled for GPUs.
 240
 241 Clang provides a set of wrapper headers that are found first when `math.h` and
 242 `complex.h`, for C, `cmath` and `complex`, for C++, or similar headers are
 243 included by the application. These wrappers will eventually include the system
 244 version of the corresponding header file after setting up a target device
 245 specific environment. The fact that the system header is included is important
 246 because they differ based on the architecture and operating system and may
 247 contain preprocessor, variable, and function definitions that need to be
 248 available in the target region regardless of the targeted device architecture.
 249 However, various functions may require specialized device versions, e.g.,
 250 `sin`, and others are only available on certain devices, e.g., `__umul64hi`. To
 251 provide "native" support for math and complex on the respective architecture,
 252 Clang will wrap the "native" math functions, e.g., as provided by the device
 253 vendor, in an OpenMP begin/end declare variant. These functions will then be
 254 picked up instead of the host versions while host only variables and function
 255 definitions are still available. Complex arithmetic and functions are support
 256 through a similar mechanism. It is worth noting that this support requires
 257 `extensions to the OpenMP begin/end declare variant context selector
 258 <https://clang.llvm.org/docs/AttributeReference.html#pragma-omp-declare-variant>`__
 259 that are exposed through LLVM/Clang to the user as well.
 260
 261 Q: What is a way to debug errors from mapping memory to a target device?
 262 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 263
 264 An experimental way to debug these errors is to use :ref:`remote process
 265 offloading <remote_offloading_plugin>`.
 266 By using ``libomptarget.rtl.rpc.so`` and ``openmp-offloading-server``, it is
 267 possible to explicitly perform memory transfers between processes on the host
 268 CPU and run sanitizers while doing so in order to catch these errors.
 269
 270 Q: Can I use dynamically linked libraries with OpenMP offloading?
 271 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 272
 273 Dynamically linked libraries can be only used if there is no device code split
 274 between the library and application. Anything declared on the device inside the
 275 shared library will not be visible to the application when it's linked.
 276
 277 Q: How to build an OpenMP offload capable compiler with an outdated host compiler?
 278 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 279
 280 Enabling the OpenMP runtime will perform a two-stage build for you.
 281 If your host compiler is different from your system-wide compiler, you may need
 282 to set the CMake variable `GCC_INSTALL_PREFIX` so clang will be able to find the
 283 correct GCC toolchain in the second stage of the build.
 284
 285 For example, if your system-wide GCC installation is too old to build LLVM and
 286 you would like to use a newer GCC, set the CMake variable `GCC_INSTALL_PREFIX`
 287 to inform clang of the GCC installation you would like to use in the second stage.
 288
 289 Q: How can I include OpenMP offloading support in my CMake project?
 290 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 291
 292 Currently, there is an experimental CMake find module for OpenMP target
 293 offloading provided by LLVM. It will attempt to find OpenMP target offloading
 294 support for your compiler. The flags necessary for OpenMP target offloading will
 295 be loaded into the ``OpenMPTarget::OpenMPTarget_<device>`` target or the
 296 ``OpenMPTarget_<device>_FLAGS`` variable if successful. Currently supported
 297 devices are ``AMDGPU`` and ``NVPTX``.
 298
 299 To use this module, simply add the path to CMake's current module path and call
 300 ``find_package``. The module will be installed with your OpenMP installation by
 301 default. Including OpenMP offloading support in an application should now only
 302 require a few additions.
 303
 304 .. code-block:: cmake
 305
 306   cmake_minimum_required(VERSION 3.20.0)
 307   project(offloadTest VERSION 1.0 LANGUAGES CXX)
 308
 309   list(APPEND CMAKE_MODULE_PATH "${PATH_TO_OPENMP_INSTALL}/lib/cmake/openmp")
 310
 311   find_package(OpenMPTarget REQUIRED NVPTX)
 312
 313   add_executable(offload)
 314   target_link_libraries(offload PRIVATE OpenMPTarget::OpenMPTarget_NVPTX)
 315   target_sources(offload PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/src/Main.cpp)
 316
 317 Using this module requires at least CMake version 3.20.0. Supported languages
 318 are C and C++ with Fortran support planned in the future. Compiler support is
 319 best for Clang but this module should work for other compiler vendors such as
 320 IBM, GNU.
 321
 322 Q: What does 'Stack size for entry function cannot be statically determined' mean?
 323 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 324
 325 This is a warning that the Nvidia tools will sometimes emit if the offloading
 326 region is too complex. Normally, the CUDA tools attempt to statically determine
 327 how much stack memory each thread. This way when the kernel is launched each
 328 thread will have as much memory as it needs. If the control flow of the kernel
 329 is too complex, containing recursive calls or nested parallelism, this analysis
 330 can fail. If this warning is triggered it means that the kernel may run out of
 331 stack memory during execution and crash. The environment variable
 332 ``LIBOMPTARGET_STACK_SIZE`` can be used to increase the stack size if this
 333 occurs.
 334
 335 Q: Can OpenMP offloading compile for multiple architectures?
 336 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 337
 338 Since LLVM version 15.0, OpenMP offloading supports offloading to multiple
 339 architectures at once. This allows for executables to be run on different
 340 targets, such as offloading to AMD and NVIDIA GPUs simultaneously, as well as
 341 multiple sub-architectures for the same target. Additionally, static libraries
 342 will only extract archive members if an architecture is used, allowing users to
 343 create generic libraries.
 344
 345 The architecture can either be specified manually using ``--offload-arch=``. If
 346 ``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the
 347 targets will be inferred from the architectures. Conversely, if
 348 ``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target
 349 architecture will be set to a default value, usually the architecture supported
 350 by the system LLVM was built on.
 351
 352 For example, an executable can be built that runs on AMDGPU and NVIDIA hardware
 353 given that the necessary build tools are installed for both.
 354
 355 .. code-block:: shell
 356
 357    clang example.c -fopenmp --offload-arch=gfx90a --offload-arch=sm_80
 358
 359 If just given the architectures we should be able to infer the triples,
 360 otherwise we can specify them manually.
 361
 362 .. code-block:: shell
 363
 364    clang example.c -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,nvptx64-nvidia-cuda \
 365       -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx90a \
 366       -Xopenmp-target=nvptx64-nvidia-cuda --offload-arch=sm_80
 367
 368 When linking against a static library that contains device code for multiple
 369 architectures, only the images used by the executable will be extracted.
 370
 371 .. code-block:: shell
 372
 373    clang example.c -fopenmp --offload-arch=gfx90a,gfx90a,sm_70,sm_80 -c
 374    llvm-ar rcs libexample.a example.o
 375    clang app.c -fopenmp --offload-arch=gfx90a -o app
 376
 377 The supported device images can be viewed using the ``--offloading`` option with
 378 ``llvm-objdump``.
 379
 380 .. code-block:: shell
 381
 382    clang example.c -fopenmp --offload-arch=gfx90a --offload-arch=sm_80 -o example
 383    llvm-objdump --offloading example
 384
 385    a.out:  file format elf64-x86-64
 386
 387    OFFLOADING IMAGE [0]:
 388    kind            elf
 389    arch            gfx90a
 390    triple          amdgcn-amd-amdhsa
 391    producer        openmp
 392
 393    OFFLOADING IMAGE [1]:
 394    kind            elf
 395    arch            sm_80
 396    triple          nvptx64-nvidia-cuda
 397    producer        openmp
 398
 399 Q: Can I link OpenMP offloading with CUDA or HIP?
 400 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 401
 402 OpenMP offloading files can currently be experimentally linked with CUDA and HIP
 403 files. This will allow OpenMP to call a CUDA device function or vice-versa.
 404 However, the global state will be distinct between the two images at runtime.
 405 This means any global variables will potentially have different values when
 406 queried from OpenMP or CUDA.
 407
 408 Linking CUDA and HIP currently requires enabling a different compilation mode
 409 for CUDA / HIP with ``--offload-new-driver`` and to link using
 410 ``--offload-link``. Additionally, ``-fgpu-rdc`` must be used to create a
 411 linkable device image.
 412
 413 .. code-block:: shell
 414
 415    clang++ openmp.cpp -fopenmp --offload-arch=sm_80 -c
 416    clang++ cuda.cu --offload-new-driver --offload-arch=sm_80 -fgpu-rdc -c
 417    clang++ openmp.o cuda.o --offload-link -o app
 418
 419 Q: Are libomptarget and plugins backward compatible?
 420 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 421
 422 No. libomptarget and plugins are now built as LLVM libraries starting from LLVM
 423 15. Because LLVM libraries are not backward compatible, libomptarget and plugins
 424 are not as well. Given that fact, the interfaces between 1) the Clang compiler
 425 and libomptarget, 2) the Clang compiler and device runtime library, and
 426 3) libomptarget and plugins are not guaranteed to be compatible with an earlier
 427 version. Users are responsible for ensuring compatibility when not using the
 428 Clang compiler and runtime libraries from the same build. Nevertheless, in order
 429 to better support third-party libraries and toolchains that depend on existing
 430 libomptarget entry points, contributors are discouraged from making
 431 modifications to them.
 432
 433 Q: Can I use libc functions on the GPU?
 434 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 435
 436 LLVM provides basic ``libc`` functionality through the LLVM C Library. For
 437 building instructions, refer to the associated `LLVM libc documentation
 438 <https://libc.llvm.org/gpu/using.html#building-the-gpu-library>`_. Once built,
 439 this provides a static library called ``libcgpu.a``. See the documentation for a
 440 list of `supported functions <https://libc.llvm.org/gpu/support.html>`_ as well.
 441 To utilize these functions, simply link this library as any other when building
 442 with OpenMP.
 443
 444 .. code-block:: shell
 445
 446    clang++ openmp.cpp -fopenmp --offload-arch=gfx90a -lcgpu
 447
 448 For more information on how this is implemented in LLVM/OpenMP's offloading
 449 runtime, refer to the `runtime documentation <libomptarget_libc>`_.
 450
 451 Q: What command line options can I use for OpenMP?
 452 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 453 We recommend taking a look at the OpenMP
 454 :doc:`command line argument reference <CommandLineArgumentReference>` page.
 455
 456 Q: Why is my build taking a long time?
 457 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 458 When installing OpenMP and other LLVM components, the build time on multicore
 459 systems can be significantly reduced with parallel build jobs. As suggested in
 460 *LLVM Techniques, Tips, and Best Practices*, one could consider using ``ninja`` as the
 461 generator. This can be done with the CMake option ``cmake -G Ninja``. Afterward,
 462 use ``ninja install`` and specify the number of parallel jobs with ``-j``. The build
 463 time can also be reduced by setting the build type to ``Release`` with the
 464 ``CMAKE_BUILD_TYPE`` option. Recompilation can also be sped up by caching previous
 465 compilations. Consider enabling ``Ccache`` with
 466 ``CMAKE_CXX_COMPILER_LAUNCHER=ccache``.
 467
 468 Q: Did this FAQ not answer your question?
 469 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 470 Feel free to post questions or browse old threads at
 471 `LLVM Discourse <https://discourse.llvm.org/c/runtimes/openmp/>`__.