8 .. _clang-linker-wrapper:
13 This tool works as a wrapper of the normal host linking job. This tool is used
14 to create linked device images for offloading and the necessary runtime calls to
15 register them. It works by first scanning the linker's input for embedded device
16 offloading data stored at the ``.llvm.offloading`` section. This section
17 contains binary data created by the :doc:`ClangOffloadPackager`. The extracted
18 device files will then be linked. The linked modules will then be wrapped into a
19 new object file containing the code necessary to register it with the offloading
25 This tool can be used with the following options. Any arguments not intended
26 only for the linker wrapper will be forwarded to the wrapped linker job.
28 .. code-block:: console
30 USAGE: clang-linker-wrapper [options] -- <options to passed to the linker>
33 --cuda-path=<dir> Set the system CUDA path
34 --device-debug Use debugging
35 --device-linker=<value> or <triple>=<value>
36 Arguments to pass to the device linker invocation
37 --dry-run Print program arguments without running
38 --help-hidden Display all available options
39 --help Display available options (--help-hidden for more)
40 --host-triple=<triple> Triple to use for the host compilation
41 --linker-path=<path> The linker executable to invoke
42 -L <dir> Add <dir> to the library search path
43 -l <libname> Search for library <libname>
44 --opt-level=<O0, O1, O2, or O3>
45 Optimization level for LTO
46 --override-image=<kind=file>
47 Uses the provided file as if it were the output of the device link step
48 -o <path> Path to file to write output
49 --pass-remarks-analysis=<value>
51 --pass-remarks-missed=<value>
53 --pass-remarks=<value> Pass remarks for LTO
54 --print-wrapped-module Print the wrapped module's IR for testing
55 --ptxas-arg=<value> Argument to pass to the 'ptxas' invocation
56 --relocatable Link device code to create a relocatable offloading application
57 --save-temps Save intermediate results
58 --sysroot<value> Set the system root
59 --verbose Verbose output from tools
60 --v Display the version number and exit
61 -- The separator for the wrapped linker arguments
66 The ``clang-linker-wrapper`` handles linking embedded device code and then
67 registering it with the appropriate runtime. Normally, this is only done when
68 the executable is created so other files containing device code can be linked
69 together. This can be somewhat problematic for users who wish to ship static
70 libraries that contain offloading code to users without a compatible offloading
73 When using a relocatable link with ``-r``, the ``clang-linker-wrapper`` will
74 perform the device linking and registration eagerly. This will remove the
75 embedded device code and register it correctly with the runtime. Semantically,
76 this is similar to creating a shared library object. If standard relocatable
77 linking is desired, simply do not run the binaries through the
78 ``clang-linker-wrapper``. This will simply append the embedded device code so
79 that it can be linked later.
84 The linker wrapper will link extracted device code that is compatible with each
85 other. Generally, this requires that the target triple and architecture match.
86 An exception is made when the architecture is listed as ``generic``, which will
87 cause it be linked with any other device code with the same target triple.
92 The linker wrapper performs a lot of steps internally, such as input matching,
93 symbol resolution, and image registration. This makes it difficult to debug in
94 some scenarios. The behavior of the linker-wrapper is controlled mostly through
95 metadata, described in `clang documentation
96 <https://clang.llvm.org/docs/OffloadingDesign.html>`_. Intermediate output can
97 be obtained from the linker-wrapper using the ``--save-temps`` flag. These files
102 $> clang openmp.c -fopenmp --offload-arch=gfx90a -c
103 $> clang openmp.o -fopenmp --offload-arch=gfx90a -Wl,--save-temps
104 $> ; Modify temp files.
105 $> llvm-objcopy --update-section=.llvm.offloading=out.bc openmp.o
107 Doing this will allow you to override one of the input files by replacing its
108 embedded offloading metadata with a user-modified version. However, this will be
109 more difficult when there are multiple input files. For a very large hammer, the
110 ``--override-image=<kind>=<file>`` flag can be used.
112 In the following example, we use the ``--save-temps`` to obtain the LLVM-IR just
113 before running the backend. We then modify it to test altered behavior, and then
114 compile it to a binary. This can then be passed to the linker-wrapper which will
115 then ignore all embedded metadata and use the provided image as if it were the
116 result of the device linking phase.
120 $> clang openmp.c -fopenmp --offload-arch=gfx90a -Wl,--save-temps
121 $> ; Modify temp files.
122 $> clang --target=amdgcn-amd-amdhsa -mcpu=gfx90a -nogpulib out.bc -o a.out
123 $> clang openmp.c -fopenmp --offload-arch=gfx90a -Wl,--override-image=openmp=a.out
128 This tool links object files with offloading images embedded within it using the
129 ``-fembed-offload-object`` flag in Clang. Given an input file containing the
130 magic section we can pass it to this tool to extract the data contained at that
131 section and run a device linking job on it.
133 .. code-block:: console
135 clang-linker-wrapper --host-triple=x86_64 --linker-path=/usr/bin/ld -- <Args>