libc/docs/gpu/using.rst

   1 .. _libc_gpu_usage:
   2
   3
   4 ===================
   5 Using libc for GPUs
   6 ===================
   7
   8 .. contents:: Table of Contents
   9   :depth: 4
  10   :local:
  11
  12 Building the GPU library
  13 ========================
  14
  15 LLVM's libc GPU support *must* be built with an up-to-date ``clang`` compiler
  16 due to heavy reliance on ``clang``'s GPU support. This can be done automatically
  17 using the ``LLVM_ENABLE_RUNTIMES=libc`` option. To enable libc for the GPU,
  18 enable the ``LIBC_GPU_BUILD`` option. By default, ``libcgpu.a`` will be built
  19 using every supported GPU architecture. To restrict the number of architectures
  20 build, either set ``LIBC_GPU_ARCHITECTURES`` to the list of desired
  21 architectures manually or use ``native`` to detect the GPUs on your system. A
  22 typical ``cmake`` configuration will look like this:
  23
  24 .. code-block:: sh
  25
  26   $> cd llvm-project  # The llvm-project checkout
  27   $> mkdir build
  28   $> cd build
  29   $> cmake ../llvm -G Ninja                                \
  30      -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt"        \
  31      -DLLVM_ENABLE_RUNTIMES="libc;openmp"                  \
  32      -DCMAKE_BUILD_TYPE=<Debug|Release>   \ # Select build type
  33      -DLIBC_GPU_BUILD=ON                  \ # Build in GPU mode
  34      -DLIBC_GPU_ARCHITECTURES=all         \ # Build all supported architectures
  35      -DCMAKE_INSTALL_PREFIX=<PATH>        \ # Where 'libcgpu.a' will live
  36   $> ninja install
  37
  38 Since we want to include ``clang``, ``lld`` and ``compiler-rt`` in our
  39 toolchain, we list them in ``LLVM_ENABLE_PROJECTS``. To ensure ``libc`` is built
  40 using a compatible compiler and to support ``openmp`` offloading, we list them
  41 in ``LLVM_ENABLE_RUNTIMES`` to build them after the enabled projects using the
  42 newly built compiler. ``CMAKE_INSTALL_PREFIX`` specifies the installation
  43 directory in which to install the ``libcgpu.a`` library and headers along with
  44 LLVM. The generated headers will be placed in ``include/gpu-none-llvm``.
  45
  46 Usage
  47 =====
  48
  49 Once the ``libcgpu.a`` static archive has been built it can be linked directly
  50 with offloading applications as a standard library. This process is described in
  51 the `clang documentation <https://clang.llvm.org/docs/OffloadingDesign.html>`_.
  52 This linking mode is used by the OpenMP toolchain, but is currently opt-in for
  53 the CUDA and HIP toolchains through the ``--offload-new-driver``` and
  54 ``-fgpu-rdc`` flags. A typical usage will look this this:
  55
  56 .. code-block:: sh
  57
  58   $> clang foo.c -fopenmp --offload-arch=gfx90a -lcgpu
  59
  60 The ``libcgpu.a`` static archive is a fat-binary containing LLVM-IR for each
  61 supported target device. The supported architectures can be seen using LLVM's
  62 ``llvm-objdump`` with the ``--offloading`` flag:
  63
  64 .. code-block:: sh
  65
  66   $> llvm-objdump --offloading libcgpu.a
  67   libcgpu.a(strcmp.cpp.o):    file format elf64-x86-64
  68
  69   OFFLOADING IMAGE [0]:
  70   kind            llvm ir
  71   arch            gfx90a
  72   triple          amdgcn-amd-amdhsa
  73   producer        none
  74
  75 Because the device code is stored inside a fat binary, it can be difficult to
  76 inspect the resulting code. This can be done using the following utilities:
  77
  78 .. code-block:: sh
  79
  80    $> llvm-ar x libcgpu.a strcmp.cpp.o
  81    $> clang-offload-packager strcmp.cpp.o --image=arch=gfx90a,file=gfx90a.bc
  82    $> opt -S out.bc
  83    ...
  84
  85 Please note that this fat binary format is provided for compatibility with
  86 existing offloading toolchains. The implementation in ``libc`` does not depend
  87 on any existing offloading languages and is completely freestanding.