clang/docs/HIPSupport.rst

   1 .. raw:: html
   2
   3   <style type="text/css">
   4     .none { background-color: #FFCCCC }
   5     .part { background-color: #FFFF99 }
   6     .good { background-color: #CCFF99 }
   7   </style>
   8
   9 .. role:: none
  10 .. role:: part
  11 .. role:: good
  12
  13 .. contents::
  14    :local:
  15
  16 =============
  17 HIP Support
  18 =============
  19
  20 HIP (Heterogeneous-Compute Interface for Portability) `<https://github.com/ROCm-Developer-Tools/HIP>`_ is
  21 a C++ Runtime API and Kernel Language. It enables developers to create portable applications for
  22 offloading computation to different hardware platforms from a single source code.
  23
  24 AMD GPU Support
  25 ===============
  26
  27 Clang provides HIP support on AMD GPUs via the ROCm platform `<https://rocm.docs.amd.com/en/latest/#>`_.
  28 The ROCm runtime forms the base for HIP host APIs, while HIP device APIs are realized through HIP header
  29 files and the ROCm device library. The Clang driver uses the HIPAMD toolchain to compile HIP device code
  30 to AMDGPU ISA via the AMDGPU backend. The compiled code is then bundled and embedded in the host executables.
  31
  32 Intel GPU Support
  33 =================
  34
  35 Clang provides partial HIP support on Intel GPUs using the CHIP-Star project `<https://github.com/CHIP-SPV/chipStar>`_.
  36 CHIP-Star implements the HIP runtime over oneAPI Level Zero or OpenCL runtime. The Clang driver uses the HIPSPV
  37 toolchain to compile HIP device code into LLVM IR, which is subsequently translated to SPIR-V via the SPIR-V
  38 backend or the out-of-tree LLVM-SPIRV translator. The SPIR-V is then bundled and embedded into the host executables.
  39
  40 .. note::
  41    While Clang does not directly provide HIP support for NVIDIA GPUs and CPUs, these platforms are supported via other means:
  42
  43    - NVIDIA GPUs: HIP support is offered through the HIP project `<https://github.com/ROCm-Developer-Tools/HIP>`_, which provides a header-only library for translating HIP runtime APIs into CUDA runtime APIs. The code is subsequently compiled using NVIDIA's `nvcc`.
  44
  45    - CPUs: HIP support is available through the HIP-CPU runtime library `<https://github.com/ROCm-Developer-Tools/HIP-CPU>`_. This header-only library enables CPUs to execute unmodified HIP code.
  46
  47
  48 Example Usage
  49 =============
  50
  51 To compile a HIP program, use the following command:
  52
  53 .. code-block:: shell
  54
  55    clang++ -c --offload-arch=gfx906 -xhip sample.cpp -o sample.o
  56
  57 The ``-xhip`` option indicates that the source is a HIP program. If the file has a ``.hip`` extension,
  58 Clang will automatically recognize it as a HIP program:
  59
  60 .. code-block:: shell
  61
  62    clang++ -c --offload-arch=gfx906 sample.hip -o sample.o
  63
  64 To link a HIP program, use this command:
  65
  66 .. code-block:: shell
  67
  68    clang++ --hip-link --offload-arch=gfx906 sample.o -o sample
  69
  70 In the above command, the ``--hip-link`` flag instructs Clang to link the HIP runtime library. However,
  71 the use of this flag is unnecessary if a HIP input file is already present in your program.
  72
  73 For convenience, Clang also supports compiling and linking in a single step:
  74
  75 .. code-block:: shell
  76
  77    clang++ --offload-arch=gfx906 -xhip sample.cpp -o sample
  78
  79 In the above commands, ``gfx906`` is the GPU architecture that the code is being compiled for. The supported GPU
  80 architectures can be found in the `AMDGPU Processor Table <https://llvm.org/docs/AMDGPUUsage.html#processors>`_.
  81 Alternatively, you can use the ``amdgpu-arch`` tool that comes with Clang to list the GPU architecture on your system:
  82
  83 .. code-block:: shell
  84
  85    amdgpu-arch
  86
  87 You can use ``--offload-arch=native`` to automatically detect the GPU architectures on your system:
  88
  89 .. code-block:: shell
  90
  91    clang++ --offload-arch=native -xhip sample.cpp -o sample
  92
  93
  94 Path Setting for Dependencies
  95 =============================
  96
  97 Compiling a HIP program depends on the HIP runtime and device library. The paths to the HIP runtime and device libraries
  98 can be specified either using compiler options or environment variables. The paths can also be set through the ROCm path
  99 if they follow the ROCm installation directory structure.
 100
 101 Order of Precedence for HIP Path
 102 --------------------------------
 103
 104 1. ``--hip-path`` compiler option
 105 2. ``HIP_PATH`` environment variable *(use with caution)*
 106 3. ``--rocm-path`` compiler option
 107 4. ``ROCM_PATH`` environment variable *(use with caution)*
 108 5. Default automatic detection (relative to Clang or at the default ROCm installation location)
 109
 110 Order of Precedence for Device Library Path
 111 -------------------------------------------
 112
 113 1. ``--hip-device-lib-path`` compiler option
 114 2. ``HIP_DEVICE_LIB_PATH`` environment variable *(use with caution)*
 115 3. ``--rocm-path`` compiler option
 116 4. ``ROCM_PATH`` environment variable *(use with caution)*
 117 5. Default automatic detection (relative to Clang or at the default ROCm installation location)
 118
 119 .. list-table::
 120    :header-rows: 1
 121
 122    * - Compiler Option
 123      - Environment Variable
 124      - Description
 125      - Default Value
 126    * - ``--rocm-path=<path>``
 127      - ``ROCM_PATH``
 128      - Specifies the ROCm installation path.
 129      - Automatic detection
 130    * - ``--hip-path=<path>``
 131      - ``HIP_PATH``
 132      - Specifies the HIP runtime installation path.
 133      - Determined by ROCm directory structure
 134    * - ``--hip-device-lib-path=<path>``
 135      - ``HIP_DEVICE_LIB_PATH``
 136      - Specifies the HIP device library installation path.
 137      - Determined by ROCm directory structure
 138
 139 .. note::
 140
 141    We recommend using the compiler options as the primary method for specifying these paths. While the environment variables ``ROCM_PATH``, ``HIP_PATH``, and ``HIP_DEVICE_LIB_PATH`` are supported, their use can lead to implicit dependencies that might cause issues in the long run. Use them with caution.
 142
 143
 144 Predefined Macros
 145 =================
 146
 147 .. list-table::
 148    :header-rows: 1
 149
 150    * - Macro
 151      - Description
 152    * - ``__CLANG_RDC__``
 153      - Defined when Clang is compiling code in Relocatable Device Code (RDC) mode. RDC, enabled with the ``-fgpu-rdc`` compiler option, is necessary for linking device codes across translation units.
 154    * - ``__HIP__``
 155      - Defined when compiling with HIP language support, indicating that the code targets the HIP environment.
 156    * - ``__HIPCC__``
 157      - Alias to ``__HIP__``.
 158    * - ``__HIP_DEVICE_COMPILE__``
 159      - Defined during device code compilation in Clang's separate compilation process for the host and each offloading GPU architecture.
 160    * - ``__HIP_MEMORY_SCOPE_SINGLETHREAD``
 161      - Represents single-thread memory scope in HIP (value is 1).
 162    * - ``__HIP_MEMORY_SCOPE_WAVEFRONT``
 163      - Represents wavefront memory scope in HIP (value is 2).
 164    * - ``__HIP_MEMORY_SCOPE_WORKGROUP``
 165      - Represents workgroup memory scope in HIP (value is 3).
 166    * - ``__HIP_MEMORY_SCOPE_AGENT``
 167      - Represents agent memory scope in HIP (value is 4).
 168    * - ``__HIP_MEMORY_SCOPE_SYSTEM``
 169      - Represents system-wide memory scope in HIP (value is 5).
 170    * - ``__HIP_NO_IMAGE_SUPPORT__``
 171      - Defined with a value of 1 when the target device lacks support for HIP image functions.
 172    * - ``__HIP_NO_IMAGE_SUPPORT``
 173      - Alias to ``__HIP_NO_IMAGE_SUPPORT__``. Deprecated.
 174    * - ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``
 175      - Defined when the GPU default stream is set to per-thread mode.
 176    * - ``HIP_API_PER_THREAD_DEFAULT_STREAM``
 177      - Alias to ``__HIP_API_PER_THREAD_DEFAULT_STREAM__``. Deprecated.
 178
 179 Note that some architecture specific AMDGPU macros will have default values when
 180 used from the HIP host compilation. Other :doc:`AMDGPU macros <AMDGPUSupport>`
 181 like ``__AMDGCN_WAVEFRONT_SIZE__`` (deprecated) will default to 64 for example.
 182
 183 Compilation Modes
 184 =================
 185
 186 Each HIP source file contains intertwined device and host code. Depending on the chosen compilation mode by the compiler options ``-fno-gpu-rdc`` and ``-fgpu-rdc``, these portions of code are compiled differently.
 187
 188 Device Code Compilation
 189 -----------------------
 190
 191 **``-fno-gpu-rdc`` Mode (default)**:
 192
 193 - Compiles to a self-contained, fully linked offloading device binary for each offloading device architecture.
 194 - Device code within a Translation Unit (TU) cannot call functions located in another TU.
 195
 196 **``-fgpu-rdc`` Mode**:
 197
 198 - Compiles to a bitcode for each GPU architecture.
 199 - For each offloading device architecture, the bitcode from different TUs are linked together to create a single offloading device binary.
 200 - Device code in one TU can call functions located in another TU.
 201
 202 Host Code Compilation
 203 ---------------------
 204
 205 **Both Modes**:
 206
 207 - Compiles to a relocatable object for each TU.
 208 - These relocatable objects are then linked together.
 209 - Host code within a TU can call host functions and launch kernels from another TU.
 210
 211 Syntax Difference with CUDA
 212 ===========================
 213
 214 Clang's front end, used for both CUDA and HIP programming models, shares the same parsing and semantic analysis mechanisms. This includes the resolution of overloads concerning device and host functions. While there exists a comprehensive documentation on the syntax differences between Clang and NVCC for CUDA at `Dialect Differences Between Clang and NVCC <https://llvm.org/docs/CompileCudaWithLLVM.html#dialect-differences-between-clang-and-nvcc>`_, it is important to note that these differences also apply to HIP code compilation.
 215
 216 Predefined Macros for Differentiation
 217 -------------------------------------
 218
 219 To facilitate differentiation between HIP and CUDA code, as well as between device and host compilations within HIP, Clang defines specific macros:
 220
 221 - ``__HIP__`` : This macro is defined only when compiling HIP code. It can be used to conditionally compile code specific to HIP, enabling developers to write portable code that can be compiled for both CUDA and HIP.
 222
 223 - ``__HIP_DEVICE_COMPILE__`` : Defined exclusively during HIP device compilation, this macro allows for conditional compilation of device-specific code. It provides a mechanism to segregate device and host code, ensuring that each can be optimized for their respective execution environments.
 224
 225 Function Pointers Support
 226 =========================
 227
 228 Function pointers' support varies with the usage mode in Clang with HIP. The following table provides an overview of the support status across different use-cases and modes.
 229
 230 .. list-table:: Function Pointers Support Overview
 231    :widths: 25 25 25
 232    :header-rows: 1
 233
 234    * - Use Case
 235      - ``-fno-gpu-rdc`` Mode (default)
 236      - ``-fgpu-rdc`` Mode
 237    * - Defined and used in the same TU
 238      - Supported
 239      - Supported
 240    * - Defined in one TU and used in another TU
 241      - Not Supported
 242      - Supported
 243
 244 In the ``-fno-gpu-rdc`` mode, the compiler calculates the resource usage of kernels based only on functions present within the same TU. This mode does not support the use of function pointers defined in a different TU due to the possibility of incorrect resource usage calculations, leading to undefined behavior.
 245
 246 On the other hand, the ``-fgpu-rdc`` mode allows the definition and use of function pointers across different TUs, as resource usage calculations can accommodate functions from disparate TUs.
 247
 248 Virtual Function Support
 249 ========================
 250
 251 In Clang with HIP, support for calling virtual functions of an object in device or host code is contingent on where the object is constructed.
 252
 253 - **Constructed in Device Code**: Virtual functions of an object can be called in device code on a specific offloading device if the object is constructed in device code on an offloading device with the same architecture.
 254 - **Constructed in Host Code**: Virtual functions of an object can be called in host code if the object is constructed in host code.
 255
 256 In other scenarios, calling virtual functions is not allowed.
 257
 258 Explanation
 259 -----------
 260
 261 An object constructed on the device side contains a pointer to the virtual function table on the device side, which is not accessible in host code, and vice versa. Thus, trying to invoke virtual functions from a context different from where the object was constructed will be disallowed because the appropriate virtual table cannot be accessed. The virtual function tables for offloading devices with different architecures are different, therefore trying to invoke virtual functions from an offloading device with a different architecture than where the object is constructed is also disallowed.
 262
 263 Example Usage
 264 -------------
 265
 266 .. code-block:: c++
 267
 268    class Base {
 269    public:
 270       __device__ virtual void virtualFunction() {
 271          // Base virtual function implementation
 272       }
 273    };
 274
 275    class Derived : public Base {
 276    public:
 277       __device__ void virtualFunction() override {
 278          // Derived virtual function implementation
 279       }
 280    };
 281
 282    __global__ void kernel() {
 283       Derived obj;
 284       Base* basePtr = &obj;
 285       basePtr->virtualFunction(); // Allowed since obj is constructed in device code
 286    }
 287
 288 SPIR-V Support on HIPAMD ToolChain
 289 ==================================
 290
 291 The HIPAMD ToolChain supports targetting
 292 `AMDGCN Flavoured SPIR-V <https://llvm.org/docs/SPIRVUsage.html#target-triples>`_.
 293 The support for SPIR-V in the ROCm and HIPAMD ToolChain is under active
 294 development.
 295
 296 Compilation Process
 297 -------------------
 298
 299 When compiling HIP programs with the intent of utilizing SPIR-V, the process
 300 diverges from the traditional compilation flow:
 301
 302 Using ``--offload-arch=amdgcnspirv``
 303 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 304
 305 - **Target Triple**: The ``--offload-arch=amdgcnspirv`` flag instructs the
 306   compiler to use the target triple ``spirv64-amd-amdhsa``. This approach does
 307   generates generic AMDGCN SPIR-V which retains architecture specific elements
 308   without hardcoding them, thus allowing for optimal target specific code to be
 309   generated at run time, when the concrete target is known.
 310
 311 - **LLVM IR Translation**: The program is compiled to LLVM Intermediate
 312   Representation (IR), which is subsequently translated into SPIR-V. In the
 313   future, this translation step will be replaced by direct SPIR-V emission via
 314   the SPIR-V Back-end.
 315
 316 - **Clang Offload Bundler**: The resulting SPIR-V is embedded in the Clang
 317   offload bundler with the bundle ID ``hip-spirv64-amd-amdhsa--amdgcnspirv``.
 318
 319 Mixed with Normal ``--offload-arch``
 320 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 321
 322 **Mixing ``amdgcnspirv`` and concrete ``gfx###`` targets via ``--offload-arch``
 323 is not currently supported; this limitation is temporary and will be removed in
 324 a future release**
 325
 326 Architecture Specific Macros
 327 ----------------------------
 328
 329 None of the architecture specific :doc:`AMDGPU macros <AMDGPUSupport>` are
 330 defined when targeting SPIR-V. An alternative, more flexible mechanism to enable
 331 doing per target / per feature code selection will be added in the future.