openmp/docs/CommandLineArgumentReference.rst

   1 OpenMP Command-Line Argument Reference
   2 ======================================
   3 Welcome to the OpenMP in LLVM command line argument reference. The content is
   4 not a complete list of arguments but includes the essential command-line
   5 arguments you may need when compiling and linking OpenMP.
   6 Section :ref:`general_command_line_arguments` lists OpenMP command line options
   7 for multicore programming while  :ref:`offload_command_line_arguments` lists
   8 options relevant to OpenMP target offloading.
   9
  10 .. _general_command_line_arguments:
  11
  12 OpenMP Command-Line Arguments
  13 -----------------------------
  14
  15 ``-fopenmp``
  16 ^^^^^^^^^^^^
  17 Enable the OpenMP compilation toolchain. The compiler will parse OpenMP
  18 compiler directives and generate parallel code.
  19
  20 ``-fopenmp-extensions``
  21 ^^^^^^^^^^^^^^^^^^^^^^^
  22 Enable all ``Clang`` extensions for OpenMP directives and clauses. A list of
  23 current extensions and their implementation status can be found on the
  24 `support <https://clang.llvm.org/docs/OpenMPSupport.html#openmp-extensions>`_
  25 page.
  26
  27 ``-fopenmp-simd``
  28 ^^^^^^^^^^^^^^^^^
  29 This option enables OpenMP only for single instruction, multiple data
  30 (SIMD) constructs.
  31
  32 ``-static-openmp``
  33 ^^^^^^^^^^^^^^^^^^
  34 Use the static OpenMP host runtime while linking.
  35
  36 ``-fopenmp-version=<arg>``
  37 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  38 Set the OpenMP version to a specific version ``<arg>`` of the OpenMP standard.
  39 For example, you may use ``-fopenmp-version=45`` to select version 4.5 of
  40 the OpenMP standard. The default value is ``-fopenmp-version=51`` for ``Clang``.
  41
  42 .. _offload_command_line_arguments:
  43
  44 Offloading Specific Command-Line Arguments
  45 ------------------------------------------
  46
  47 .. _fopenmp-targets:
  48
  49 ``-fopenmp-targets``
  50 ^^^^^^^^^^^^^^^^^^^^
  51 | Specify which OpenMP offloading targets should be supported. For example, you
  52   may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx64``. This option is
  53   often optional when :ref:`offload_arch` is provided.
  54 | It is also possible to offload to CPU architectures, for instance with
  55   ``-fopenmp-targets=x86_64-pc-linux-gnu``.
  56
  57 .. _offload_arch:
  58
  59 ``--offload-arch``
  60 ^^^^^^^^^^^^^^^^^^
  61 | Specify the device architecture for OpenMP offloading. For instance
  62   ``--offload-arch=sm_80`` to target an Nvidia Tesla A100,
  63   ``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or
  64   ``--offload-arch=sm_80,gfx90a`` to target both.
  65 | It is also possible to specify :ref:`fopenmp-targets` without specifying
  66   ``--offload-arch``. In that case, the executables ``amdgpu-arch`` or
  67   ``nvptx-arch`` will be executed as part of the compiler driver to
  68   detect the device architecture automatically.
  69 | Finally, the device architecture will also be automatically inferred with
  70   ``--offload-arch=native``.
  71
  72 ``--offload-device-only``
  73 ^^^^^^^^^^^^^^^^^^^^^^^^^
  74 Compile only the code that goes on the device. This option is mainly for
  75 debugging purposes. It is primarily used for inspecting the intermediate
  76 representation (IR) output when compiling for the device. It may also be used
  77 if device-only runtimes are created.
  78
  79 ``--offload-host-only``
  80 ^^^^^^^^^^^^^^^^^^^^^^^
  81 Compile only the code that goes on the host. With this option enabled, the
  82 ``.llvm.offloading`` section with embedded device code will not be included in
  83 the intermediate representation.
  84
  85 ``--offload-host-device``
  86 ^^^^^^^^^^^^^^^^^^^^^^^^^
  87 Compile the target regions for both the host and the device. That is the
  88 default option.
  89
  90 ``-Xopenmp-target <arg>``
  91 ^^^^^^^^^^^^^^^^^^^^^^^^^
  92 Pass an argument ``<arg>`` to the offloading toolchain, for instance
  93 ``-Xopenmp-target -march=sm_80``.
  94
  95 ``-Xopenmp-target=<triple> <arg>``
  96 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  97 Pass an argument ``<arg>`` to the offloading toolchain for the target
  98 ``<triple>``. That is especially  useful when an argument must differ for each
  99 triple. For instance ``-Xopenmp-target=nvptx64 --offload-arch=sm_80
 100 -Xopenmp-target=amdgcn --offload-arch=gfx90a`` to specify the device
 101 architecture.  Alternatively, :ref:`Xarch_host` and :ref:`Xarch_device` can
 102 pass an argument to the host and device compilation toolchain.
 103
 104 ``-Xoffload-linker<triple> <arg>``
 105 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 106 Pass an argument ``<arg>`` to the offloading linker for the target specified in
 107 ``<triple>``.
 108
 109 .. _Xarch_device:
 110
 111 ``-Xarch_device <arg>``
 112 ^^^^^^^^^^^^^^^^^^^^^^^
 113 Pass an argument ``<arg>`` to the device compilation toolchain.
 114
 115 .. _Xarch_host:
 116
 117 ``-Xarch_host <arg>``
 118 ^^^^^^^^^^^^^^^^^^^^^
 119 Pass an argument ``<arg>`` to the host compilation toolchain.
 120
 121 ``-foffload-lto[=<arg>]``
 122 ^^^^^^^^^^^^^^^^^^^^^^^^^
 123 Enable device link time optimization (LTO) and select the LTO mode ``<arg>``.
 124 Select either ``-foffload-lto=thin`` or ``-foffload-lto=full``. Thin LTO takes
 125 less time while still achieving some performance gains. If no argument is set,
 126 this option defaults to ``-foffload-lto=full``.
 127
 128 ``-fopenmp-offload-mandatory``
 129 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 130 | This option is set to avoid generating the host fallback code
 131   executed when offloading to the device fails. That is
 132   helpful when the target contains code that cannot be compiled for the host, for
 133   instance, if it contains unguarded device intrinsics.
 134 | This option can also be used to reduce compile time.
 135 | This option should not be used when one wants to verify that the code is being
 136   offloaded to the device. Instead, set the environment variable
 137   ``OMP_TARGET_OFFLOAD='MANDATORY'`` to confirm that the code is being offloaded to
 138   the device.
 139
 140 ``-fopenmp-target-debug[=<arg>]``
 141 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 142 Enable debugging in the device runtime library (RTL). Note that it is both
 143 necessary to configure the debugging in the device runtime at compile-time with
 144 ``-fopenmp-target-debug=<arg>`` and enable debugging at runtime with the
 145 environment  variable ``LIBOMPTARGET_DEVICE_RTL_DEBUG=<arg>``. Further, it is
 146 currently only supported for Nvidia targets as of July 2023. Alternatively, the
 147 environment variable ``LIBOMPTARGET_DEBUG`` can be set to debug both Nvidia and
 148 AMD GPU targets. For more information, see the
 149 `debugging instructions <https://openmp.llvm.org/design/Runtimes.html#debugging>`_.
 150 The debugging instructions list the supported debugging arguments.
 151
 152 ``-fopenmp-target-jit``
 153 ^^^^^^^^^^^^^^^^^^^^^^^
 154 | Emit code that is Just-in-Time (JIT) compiled for OpenMP offloading. Embed
 155   LLVM-IR for the device code in the object files rather than binary code for the
 156   respective target. At runtime, the LLVM-IR is optimized again and compiled for
 157   the target device. The optimization level can be set at runtime with
 158   ``LIBOMPTARGET_JIT_OPT_LEVEL``, for instance,
 159   ``LIBOMPTARGET_JIT_OPT_LEVEL=3`` corresponding to optimizations level ``-O3``.
 160   See the
 161   `OpenMP JIT details <https://openmp.llvm.org/design/Runtimes.html#libomptarget-jit-pre-opt-ir-module>`_
 162   for instructions on extracting the embedded device code before or after the
 163   JIT and more.
 164 | We want to emphasize that JIT for OpenMP offloading is good for debugging  as
 165   the target IR can be extracted, modified, and injected at runtime.
 166
 167 ``--offload-new-driver``
 168 ^^^^^^^^^^^^^^^^^^^^^^^^
 169 In upstream LLVM, OpenMP only uses the new driver. However, enabling this
 170 option for experimental linking with CUDA or HIP files is necessary.
 171
 172 ``--offload-link``
 173 ^^^^^^^^^^^^^^^^^^
 174 Use the new offloading linker `clang-linker-wrapper` to perform the link job.
 175 `clang-linker-wrapper` is the default offloading linker for OpenMP. This option
 176 can be used to use the new offloading linker in toolchains that do not automatically
 177 use it. It is necessary to enable this option when linking with CUDA or HIP files.
 178
 179 ``-nogpulib``
 180 ^^^^^^^^^^^^^
 181 Do not link the device library for CUDA or HIP device compilation.
 182
 183 ``-nogpuinc``
 184 ^^^^^^^^^^^^^
 185 Do not include the default CUDA or HIP headers, and do not add CUDA or HIP
 186 include paths.