1 \input texinfo @c -*-texinfo-*-
4 @setfilename libgomp.info
10 Copyright @copyright{} 2006-2025 Free Software Foundation, Inc.
12 Permission is granted to copy, distribute and/or modify this document
13 under the terms of the GNU Free Documentation License, Version 1.3 or
14 any later version published by the Free Software Foundation; with the
15 Invariant Sections being ``Funding Free Software'', the Front-Cover
16 texts being (a) (see below), and with the Back-Cover Texts being (b)
17 (see below). A copy of the license is included in the section entitled
18 ``GNU Free Documentation License''.
20 (a) The FSF's Front-Cover Text is:
24 (b) The FSF's Back-Cover Text is:
26 You have freedom to copy and modify this GNU Manual, like GNU
27 software. Copies published by the Free Software Foundation raise
28 funds for GNU development.
32 @dircategory GNU Libraries
34 * libgomp: (libgomp). GNU Offloading and Multi Processing Runtime Library.
37 This manual documents libgomp, the GNU Offloading and Multi Processing
38 Runtime library. This is the GNU implementation of the OpenMP and
39 OpenACC APIs for parallel and accelerator programming in C/C++ and
42 Published by the Free Software Foundation
43 51 Franklin Street, Fifth Floor
44 Boston, MA 02110-1301 USA
50 @setchapternewpage odd
53 @title GNU Offloading and Multi Processing Runtime Library
54 @subtitle The GNU OpenMP and OpenACC Implementation
56 @vskip 0pt plus 1filll
57 @comment For the @value{version-GCC} Version*
59 Published by the Free Software Foundation @*
60 51 Franklin Street, Fifth Floor@*
61 Boston, MA 02110-1301, USA@*
71 @node Top, Enabling OpenMP
75 This manual documents the usage of libgomp, the GNU Offloading and
76 Multi Processing Runtime Library. This includes the GNU
77 implementation of the @uref{https://www.openmp.org, OpenMP} Application
78 Programming Interface (API) for multi-platform shared-memory parallel
79 programming in C/C++ and Fortran, and the GNU implementation of the
80 @uref{https://www.openacc.org, OpenACC} Application Programming
81 Interface (API) for offloading of code to accelerator devices in C/C++
84 Originally, libgomp implemented the GNU OpenMP Runtime Library. Based
85 on this, support for OpenACC and offloading (both OpenACC and OpenMP
86 4's target construct) has been added later on, and the library's name
87 changed to GNU Offloading and Multi Processing Runtime Library.
92 @comment When you add a new menu item, please keep the right hand
93 @comment aligned to the same column. Do not use tabs. This provides
94 @comment better formatting.
97 * Enabling OpenMP:: How to enable OpenMP for your applications.
98 * OpenMP Implementation Status:: List of implemented features by OpenMP version
99 * OpenMP Runtime Library Routines: Runtime Library Routines.
100 The OpenMP runtime application programming
102 * OpenMP Environment Variables: Environment Variables.
103 Influencing OpenMP runtime behavior with
104 environment variables.
105 * Enabling OpenACC:: How to enable OpenACC for your
107 * OpenACC Runtime Library Routines:: The OpenACC runtime application
108 programming interface.
109 * OpenACC Environment Variables:: Influencing OpenACC runtime behavior with
110 environment variables.
111 * CUDA Streams Usage:: Notes on the implementation of
112 asynchronous operations.
113 * OpenACC Library Interoperability:: OpenACC library interoperability with the
114 NVIDIA CUBLAS library.
115 * OpenACC Profiling Interface::
116 * OpenMP-Implementation Specifics:: Notes specifics of this OpenMP
118 * Offload-Target Specifics:: Notes on offload-target specific internals
119 * The libgomp ABI:: Notes on the external ABI presented by libgomp.
120 * Reporting Bugs:: How to report bugs in the GNU Offloading and
121 Multi Processing Runtime Library.
122 * Copying:: GNU general public license says
123 how you can copy and share libgomp.
124 * GNU Free Documentation License::
125 How you can copy and share this manual.
126 * Funding:: How to help assure continued work for free
128 * Library Index:: Index of this documentation.
132 @c ---------------------------------------------------------------------
134 @c ---------------------------------------------------------------------
136 @node Enabling OpenMP
137 @chapter Enabling OpenMP
139 To activate the OpenMP extensions for C/C++ and Fortran, the compile-time
140 flag @option{-fopenmp} must be specified. For C and C++, this enables
141 the handling of the OpenMP directives using @code{#pragma omp} and the
142 @code{[[omp::directive(...)]]}, @code{[[omp::sequence(...)]]} and
143 @code{[[omp::decl(...)]]} attributes. For Fortran, it enables for
144 free source form the @code{!$omp} sentinel for directives and the
145 @code{!$} conditional compilation sentinel and for fixed source form the
146 @code{c$omp}, @code{*$omp} and @code{!$omp} sentinels for directives and
147 the @code{c$}, @code{*$} and @code{!$} conditional compilation sentinels.
148 The flag also arranges for automatic linking of the OpenMP runtime library
149 (@ref{Runtime Library Routines}).
151 The @option{-fopenmp-simd} flag can be used to enable a subset of
152 OpenMP directives that do not require the linking of either the
153 OpenMP runtime library or the POSIX threads library.
155 A complete description of all OpenMP directives may be found in the
156 @uref{https://www.openmp.org, OpenMP Application Program Interface} manuals.
157 See also @ref{OpenMP Implementation Status}.
160 @c ---------------------------------------------------------------------
161 @c OpenMP Implementation Status
162 @c ---------------------------------------------------------------------
164 @node OpenMP Implementation Status
165 @chapter OpenMP Implementation Status
168 * OpenMP 4.5:: Feature completion status to 4.5 specification
169 * OpenMP 5.0:: Feature completion status to 5.0 specification
170 * OpenMP 5.1:: Feature completion status to 5.1 specification
171 * OpenMP 5.2:: Feature completion status to 5.2 specification
172 * OpenMP 6.0:: Feature completion status to 6.0 specification
175 The @code{_OPENMP} preprocessor macro and Fortran's @code{openmp_version}
176 parameter, provided by @code{omp_lib.h} and the @code{omp_lib} module, have
177 the value @code{201511} (i.e. OpenMP 4.5).
182 The OpenMP 4.5 specification is fully supported.
187 @unnumberedsubsec New features listed in Appendix B of the OpenMP specification
188 @c This list is sorted as in OpenMP 5.1's B.3 not as in OpenMP 5.0's B.2
190 @multitable @columnfractions .60 .10 .25
191 @headitem Description @tab Status @tab Comments
192 @item Array shaping @tab N @tab
193 @item Array sections with non-unit strides in C and C++ @tab N @tab
194 @item Iterators @tab Y @tab
195 @item @code{metadirective} directive @tab Y @tab
196 @item @code{declare variant} directive @tab Y @tab
197 @item @var{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
198 env variable @tab Y @tab
199 @item Nested-parallel changes to @var{max-active-levels-var} ICV @tab Y @tab
200 @item @code{requires} directive @tab Y
201 @tab See also @ref{Offload-Target Specifics}
202 @item @code{teams} construct outside an enclosing target region @tab Y @tab
203 @item Non-rectangular loop nests @tab P
204 @tab Full support for C/C++, partial for Fortran
205 (@uref{https://gcc.gnu.org/PR110735,PR110735})
206 @item @code{!=} as relational-op in canonical loop form for C/C++ @tab Y @tab
207 @item @code{nonmonotonic} as default loop schedule modifier for worksharing-loop
208 constructs @tab Y @tab
209 @item Collapse of associated loops that are imperfectly nested loops @tab Y @tab
210 @item Clauses @code{if}, @code{nontemporal} and @code{order(concurrent)} in
211 @code{simd} construct @tab Y @tab
212 @item @code{atomic} constructs in @code{simd} @tab Y @tab
213 @item @code{loop} construct @tab Y @tab
214 @item @code{order(concurrent)} clause @tab Y @tab
215 @item @code{scan} directive and @code{in_scan} modifier for the
216 @code{reduction} clause @tab Y @tab
217 @item @code{in_reduction} clause on @code{task} constructs @tab Y @tab
218 @item @code{in_reduction} clause on @code{target} constructs @tab P
219 @tab @code{nowait} only stub
220 @item @code{task_reduction} clause with @code{taskgroup} @tab Y @tab
221 @item @code{task} modifier to @code{reduction} clause @tab Y @tab
222 @item @code{affinity} clause to @code{task} construct @tab Y @tab Stub only
223 @item @code{detach} clause to @code{task} construct @tab Y @tab
224 @item @code{omp_fulfill_event} runtime routine @tab Y @tab
225 @item @code{reduction} and @code{in_reduction} clauses on @code{taskloop}
226 and @code{taskloop simd} constructs @tab Y @tab
227 @item @code{taskloop} construct cancelable by @code{cancel} construct
229 @item @code{mutexinoutset} @emph{dependence-type} for @code{depend} clause
231 @item Predefined memory spaces, memory allocators, allocator traits
232 @tab Y @tab See also @ref{Memory allocation}
233 @item Memory management routines @tab Y @tab
234 @item @code{allocate} directive @tab P
235 @tab C++ unsupported; see also @ref{Memory allocation}
236 @item @code{allocate} clause @tab P @tab Clause has no effect on @code{target}
237 (@uref{https://gcc.gnu.org/PR113436,PR113436})
238 @item @code{use_device_addr} clause on @code{target data} @tab Y @tab
239 @item @code{ancestor} modifier on @code{device} clause @tab Y @tab
240 @item Implicit declare target directive @tab Y @tab
241 @item Discontiguous array section with @code{target update} construct
243 @item C/C++'s lvalue expressions in @code{to}, @code{from}
244 and @code{map} clauses @tab Y @tab
245 @item C/C++'s lvalue expressions in @code{depend} clauses @tab Y @tab
246 @item Nested @code{declare target} directive @tab Y @tab
247 @item Combined @code{master} constructs @tab Y @tab
248 @item @code{depend} clause on @code{taskwait} @tab Y @tab
249 @item Weak memory ordering clauses on @code{atomic} and @code{flush} construct
251 @item @code{hint} clause on the @code{atomic} construct @tab Y @tab Stub only
252 @item @code{depobj} construct and depend objects @tab Y @tab
253 @item Lock hints were renamed to synchronization hints @tab Y @tab
254 @item @code{conditional} modifier to @code{lastprivate} clause @tab Y @tab
255 @item Map-order clarifications @tab P @tab
256 @item @code{close} @emph{map-type-modifier} @tab Y @tab
257 @item Mapping C/C++ pointer variables and to assign the address of
258 device memory mapped by an array section @tab P @tab
259 @item Mapping of Fortran pointer and allocatable variables, including pointer
260 and allocatable components of variables
261 @tab P @tab Mapping of vars with allocatable components unsupported
262 @item @code{defaultmap} extensions @tab Y @tab
263 @item @code{declare mapper} directive @tab N @tab
264 @item @code{omp_get_supported_active_levels} routine @tab Y @tab
265 @item Runtime routines and environment variables to display runtime thread
266 affinity information @tab Y @tab
267 @item @code{omp_pause_resource} and @code{omp_pause_resource_all} runtime
269 @item @code{omp_get_device_num} runtime routine @tab Y @tab
270 @item OMPT interface @tab N @tab
271 @item OMPD interface @tab N @tab
274 @unnumberedsubsec Other new OpenMP 5.0 features
276 @multitable @columnfractions .60 .10 .25
277 @headitem Description @tab Status @tab Comments
278 @item Supporting C++'s range-based for loop @tab Y @tab
285 @unnumberedsubsec New features listed in Appendix B of the OpenMP specification
287 @multitable @columnfractions .60 .10 .25
288 @headitem Description @tab Status @tab Comments
289 @item OpenMP directive as C++ attribute specifiers @tab Y @tab
290 @item @code{omp_all_memory} reserved locator @tab Y @tab
291 @item @emph{target_device trait} in OpenMP Context @tab Y
292 @item @code{target_device} selector set in context selectors @tab Y @tab
293 @item C/C++'s @code{declare variant} directive: elision support of
294 preprocessed code @tab N @tab
295 @item @code{declare variant}: new clauses @code{adjust_args} and
296 @code{append_args} @tab P @tab For @code{append_args}, all interop objects
297 must be specified in the @code{interop} clause of @code{dispatch}
298 @item @code{dispatch} construct @tab Y @tab
299 @item device-specific ICV settings with environment variables @tab Y @tab
300 @item @code{assume} and @code{assumes} directives @tab Y @tab
301 @item @code{nothing} directive @tab Y @tab
302 @item @code{error} directive @tab Y @tab
303 @item @code{masked} construct @tab Y @tab
304 @item @code{scope} directive @tab Y @tab
305 @item Loop transformation constructs @tab Y @tab
306 @item @code{strict} modifier in the @code{grainsize} and @code{num_tasks}
307 clauses of the @code{taskloop} construct @tab Y @tab
308 @item @code{align} clause in @code{allocate} directive @tab P
309 @tab Only C and Fortran
310 @item @code{align} modifier in @code{allocate} clause @tab Y @tab
311 @item @code{thread_limit} clause to @code{target} construct @tab Y @tab
312 @item @code{has_device_addr} clause to @code{target} construct @tab Y @tab
313 @item Iterators in @code{target update} motion clauses and @code{map}
315 @item Indirect calls to the device version of a procedure or function in
316 @code{target} regions @tab Y @tab
317 @item @code{interop} directive @tab N @tab
318 @item @code{omp_interop_t} object support in runtime routines @tab Y @tab
319 @item @code{nowait} clause in @code{taskwait} directive @tab Y @tab
320 @item Extensions to the @code{atomic} directive @tab Y @tab
321 @item @code{seq_cst} clause on a @code{flush} construct @tab Y @tab
322 @item @code{inoutset} argument to the @code{depend} clause @tab Y @tab
323 @item @code{private} and @code{firstprivate} argument to @code{default}
324 clause in C and C++ @tab Y @tab
325 @item @code{present} argument to @code{defaultmap} clause @tab Y @tab
326 @item @code{omp_set_num_teams}, @code{omp_set_teams_thread_limit},
327 @code{omp_get_max_teams}, @code{omp_get_teams_thread_limit} runtime
329 @item @code{omp_target_is_accessible} runtime routine @tab Y @tab
330 @item @code{omp_target_memcpy_async} and @code{omp_target_memcpy_rect_async}
331 runtime routines @tab Y @tab
332 @item @code{omp_get_mapped_ptr} runtime routine @tab Y @tab
333 @item @code{omp_calloc}, @code{omp_realloc}, @code{omp_aligned_alloc} and
334 @code{omp_aligned_calloc} runtime routines @tab Y @tab
335 @item @code{omp_alloctrait_key_t} enum: @code{omp_atv_serialized} added,
336 @code{omp_atv_default} changed @tab Y @tab
337 @item @code{omp_display_env} runtime routine @tab Y @tab
338 @item @code{ompt_scope_endpoint_t} enum: @code{ompt_scope_beginend} @tab N @tab
339 @item @code{ompt_sync_region_t} enum additions @tab N @tab
340 @item @code{ompt_state_t} enum: @code{ompt_state_wait_barrier_implementation}
341 and @code{ompt_state_wait_barrier_teams} @tab N @tab
342 @item @code{ompt_callback_target_data_op_emi_t},
343 @code{ompt_callback_target_emi_t}, @code{ompt_callback_target_map_emi_t}
344 and @code{ompt_callback_target_submit_emi_t} @tab N @tab
345 @item @code{ompt_callback_error_t} type @tab N @tab
346 @item @code{OMP_PLACES} syntax extensions @tab Y @tab
347 @item @code{OMP_NUM_TEAMS} and @code{OMP_TEAMS_THREAD_LIMIT} environment
348 variables @tab Y @tab
351 @unnumberedsubsec Other new OpenMP 5.1 features
353 @multitable @columnfractions .60 .10 .25
354 @headitem Description @tab Status @tab Comments
355 @item Support of strictly structured blocks in Fortran @tab Y @tab
356 @item Support of structured block sequences in C/C++ @tab Y @tab
357 @item @code{unconstrained} and @code{reproducible} modifiers on @code{order}
359 @item Support @code{begin/end declare target} syntax in C/C++ @tab Y @tab
360 @item Pointer predetermined firstprivate getting initialized
361 to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
362 @item For Fortran, diagnose placing declarative before/between @code{USE},
363 @code{IMPORT}, and @code{IMPLICIT} as invalid @tab N @tab
364 @item Optional comma between directive and clause in the @code{#pragma} form @tab Y @tab
365 @item @code{indirect} clause in @code{declare target} @tab Y @tab
366 @item @code{device_type(nohost)}/@code{device_type(host)} for variables @tab N @tab
367 @item @code{present} modifier to the @code{map}, @code{to} and @code{from}
369 @item Changed interaction between @code{declare target} and OpenMP context
371 @item Dynamic selector support in @code{metadirective} @tab Y @tab
372 @item Dynamic selector support in @code{declare variant} @tab P
373 @tab Fortran rejects non-constant expressions in dynamic selectors;
374 C/C++ reject expressions using argument variables.
375 (@uref{https://gcc.gnu.org/PR113904,PR113904})
382 @unnumberedsubsec New features listed in Appendix B of the OpenMP specification
384 @multitable @columnfractions .60 .10 .25
385 @headitem Description @tab Status @tab Comments
386 @item @code{omp_in_explicit_task} routine and @var{explicit-task-var} ICV
388 @item @code{omp}/@code{ompx}/@code{omx} sentinels and @code{omp_}/@code{ompx_}
390 @tab warning for @code{ompx/omx} sentinels@footnote{The @code{ompx}
391 sentinel as C/C++ pragma and C++ attributes are warned for with
392 @code{-Wunknown-pragmas} (implied by @code{-Wall}) and @code{-Wattributes}
393 (enabled by default), respectively; for Fortran free-source code, there is
394 a warning enabled by default and, for fixed-source code, the @code{omx}
395 sentinel is warned for with @code{-Wsurprising} (enabled by
396 @code{-Wall}). Unknown clauses are always rejected with an error.}
397 @item Clauses on @code{end} directive can be on directive @tab Y @tab
398 @item @code{destroy} clause with destroy-var argument on @code{depobj}
400 @item Deprecation of no-argument @code{destroy} clause on @code{depobj}
401 @tab N/A @tab undeprecated in OpenMP 6
402 @item @code{linear} clause syntax changes and @code{step} modifier @tab Y @tab
403 @item Deprecation of minus operator for reductions @tab N @tab
404 @item Deprecation of separating @code{map} modifiers without comma @tab N @tab
405 @item @code{declare mapper} with iterator and @code{present} modifiers
407 @item If a matching mapped list item is not found in the data environment, the
408 pointer retains its original value @tab Y @tab
409 @item New @code{enter} clause as alias for @code{to} on declare target directive
411 @item Deprecation of @code{to} clause on declare target directive @tab N @tab
412 @item Extended list of directives permitted in Fortran pure procedures
414 @item New @code{allocators} directive for Fortran @tab Y @tab
415 @item Deprecation of @code{allocate} directive for Fortran
416 allocatables/pointers @tab N @tab
417 @item Optional paired @code{end} directive with @code{dispatch} @tab Y @tab
418 @item New @code{memspace} and @code{traits} modifiers for @code{uses_allocators}
420 @item Deprecation of traits array following the allocator_handle expression in
421 @code{uses_allocators} @tab N @tab
422 @item New @code{otherwise} clause as alias for @code{default} on metadirectives
424 @item Deprecation of @code{default} clause on metadirectives @tab N
425 @tab Both @code{otherwise} and @code{default} are accepted
427 @item Deprecation of delimited form of @code{declare target} @tab N @tab
428 @item Reproducible semantics changed for @code{order(concurrent)} @tab N @tab
429 @item @code{allocate} and @code{firstprivate} clauses on @code{scope}
431 @item @code{ompt_callback_work} @tab N @tab
432 @item Default map-type for the @code{map} clause in @code{target enter/exit data}
434 @item New @code{doacross} clause as alias for @code{depend} with
435 @code{source}/@code{sink} modifier @tab Y @tab
436 @item Deprecation of @code{depend} with @code{source}/@code{sink} modifier
438 @item @code{omp_cur_iteration} keyword @tab Y @tab
441 @unnumberedsubsec Other new OpenMP 5.2 features
443 @multitable @columnfractions .60 .10 .25
444 @headitem Description @tab Status @tab Comments
445 @item For Fortran, optional comma between directive and clause @tab N @tab
446 @item Conforming device numbers and @code{omp_initial_device} and
447 @code{omp_invalid_device} enum/PARAMETER @tab Y @tab
448 @item Initial value of @var{default-device-var} ICV with
449 @code{OMP_TARGET_OFFLOAD=mandatory} @tab Y @tab
450 @item @code{all} as @emph{implicit-behavior} for @code{defaultmap} @tab Y @tab
451 @item @emph{interop_types} in any position of the modifier list for the @code{init} clause
452 of the @code{interop} construct @tab Y @tab
453 @item Invoke virtual member functions of C++ objects created on the host device
454 on other devices @tab N @tab
455 @item @code{mapper} as map-type modifier in @code{declare mapper} @tab N @tab
462 @unnumberedsubsec New features listed in Appendix B of the OpenMP specification
463 @multitable @columnfractions .60 .10 .25
464 @item Features deprecated in versions 5.0, 5.1 and 5.2 were removed
465 @tab N/A @tab Backward compatibility
466 @item Full support for C23 was added @tab P @tab
467 @item Full support for C++23 was added @tab P @tab
468 @item Full support for Fortran 2023 was added @tab P @tab
469 @item @code{_ALL} suffix to the device-scope environment variables
470 @tab P @tab Host device number wrongly accepted
471 @item @code{num_threads} clause now accepts a list @tab N @tab
472 @item Abstract names added for @code{OMP_NUM_THREADS},
473 @code{OMP_THREAD_LIMIT} and @code{OMP_TEAMS_THREAD_LIMIT}
475 @item Supporting increments with abstract names in @code{OMP_PLACES} @tab N @tab
476 @item Extension of @code{OMP_DEFAULT_DEVICE} and new
477 @code{OMP_AVAILABLE_DEVICES} environment vars @tab N @tab
478 @item New @code{uid} trait for target devices and for
479 @code{OMP_AVAILABLE_DEVICES} and @code{OMP_DEFAULT_DEVICE} @tab N @tab
480 @item New @code{OMP_THREADS_RESERVE} environment variable @tab N @tab
481 @item The @code{decl} attribute was added to the C++ attribute syntax
483 @item The OpenMP directive syntax was extended to include C23 attribute
484 specifiers @tab Y @tab
485 @item Support for pure directives in Fortran's @code{do concurrent} @tab N @tab
486 @item All inarguable clauses take now an optional Boolean argument @tab N @tab
487 @item The @code{adjust_args} clause was extended to specify the argument by position
488 and supports variadic arguments @tab N @tab
489 @item For Fortran, @emph{locator list} can be also function reference with
490 data pointer result @tab N @tab
491 @item Concept of @emph{assumed-size arrays} in C and C++
493 @item @emph{directive-name-modifier} accepted in all clauses @tab N @tab
494 @item Extension of @code{interop} operation of @code{append_args}, allowing
495 all modifiers of the @code{init} clause @tab Y @tab
496 @item New argument-free version of @code{depobj} with repeatable clauses and
497 the @code{init} clause @tab N @tab
498 @item Undeprecate omitting the argument to the @code{depend} clause of
499 the argument version of the @code{depend} construct @tab Y @tab
500 @item For Fortran, atomic with BLOCK construct and, for C/C++, with
501 unlimited curly braces supported @tab N @tab
502 @item For Fortran, atomic with pointer comparison @tab N @tab
503 @item For Fortran, atomic with enum and enumeration types @tab N @tab
504 @item For Fortran, atomic compare with storing the comparison result
506 @item Canonical loop sequences and new @code{looprange} clause @tab N @tab
507 @item For Fortran, handling polymorphic types in data-sharing-attribute
508 clauses @tab P @tab @code{private} not supported
509 @item For Fortran, rejecting polymorphic types in data-mapping clauses
510 @tab N @tab not diagnosed (and mostly unsupported)
511 @item New @code{taskgraph} construct including @code{saved} modifier and
512 @code{replayable} clause @tab N @tab
513 @item @code{default} clause on the @code{target} directive and accepting
514 variable categories @tab N @tab
515 @item Semantic change regarding the reference count update with
516 @code{use_device_ptr} and @code{use_device_addr} @tab N @tab
517 @item Support for inductions @tab N @tab
518 @item Reduction over private variables with @code{reduction} clause
520 @item Implicit reduction identifiers of C++ classes
522 @item New @code{init_complete} clause to the @code{scan} directive
524 @item @code{ref} modifier to the @code{map} clause @tab N @tab
525 @item New @code{storage} map-type modifier; context-dependent @code{alloc} and
526 @code{release} are aliases @tab N @tab
527 @item Change of the @emph{map-type} property from @emph{ultimate} to
528 @emph{default} @tab N @tab
529 @item @code{self} modifier to @code{map} and @code{self} as
530 @code{defaultmap} argument @tab N @tab
531 @item Mapping of @emph{assumed-size arrays} in C, C++ and Fortran
533 @item @code{delete} as delete-modifier not as map type @tab N @tab
534 @item For Fortran, the @code{automap} modifier to the @code{enter} clause
535 of @code{declare_target} @tab N @tab
536 @item @code{groupprivate} directive @tab N @tab
537 @item @code{local} clause to @code{declare_target} directive @tab N @tab
538 @item @code{part_size} allocator trait for @code{interleaved} allocator
539 partitions @tab N @tab
540 @item @code{pin_device}, @code{preferred_device} and @code{target_access}
543 @item @code{access} allocator trait changes @tab N @tab
544 @item New @code{partitioner} value to @code{partition} allocator trait
546 @item Semicolon-separated list to @code{uses_allocators} @tab N @tab
547 @item New @code{need_device_addr} modifier to @code{adjust_args} clause @tab N @tab
548 @item @code{interop} clause to @code{dispatch} @tab N @tab
549 @item Scope requirement changes for @code{declare_target} @tab N @tab
550 @item @code{message} and @code{severity} clauses to @code{parallel} directive
552 @item @code{self_maps} clause to @code{requires} directive @tab Y @tab
553 @item @code{no_openmp_constructs} assumptions clause @tab N @tab
554 @item Restriction for @code{ordered} regarding loop-transforming directives
556 @item @code{apply} clause to loop-transforming constructs @tab N @tab
557 @item Non-constant values in the @code{sizes} clause @tab N @tab
558 @item @code{fuse} loop-transformation construct @tab N @tab
559 @item @code{interchange} loop-transformation construct @tab N @tab
560 @item @code{reverse} loop-transformation construct @tab N @tab
561 @item @code{split} loop-transformation construct @tab N @tab
562 @item @code{stripe} loop-transformation construct @tab N @tab
563 @item @code{tile} permitting association of grid and inter-tile loops @tab N @tab
564 @item @code{strict} modifier keyword to @code{num_threads} @tab N @tab
565 @item @code{safesync} clause to the @code{parallel} construct @tab N @tab
566 @item @code{omp_curr_progress_width} identifier @tab N @tab
567 @item @code{omp_get_max_progress_width} runtime routine @tab N @tab
568 @item Lifted restrictions on @code{order(concurrent)} and, hence, the
569 @code{loop} construct @tab N @tab
570 @item @code{atomic} permitted in a construct with @code{order(concurrent)}
572 @item Lifted restrictions on not-strictly-nested regions with
573 @code{order(concurrent)} @tab N @tab
574 @item @code{workdistribute} directive for Fortran @tab N @tab
575 @item Fortran @code{DO CONCURRENT} as associated loop in a @code{loop} construct
577 @item New @code{task_iteration} directive inside @code{taskloop} @tab N @tab
578 @item @code{threadset} clause in task-generating constructs @tab N @tab
579 @item New @code{priority} clause to @code{target}, @code{target_enter_data},
580 @code{target_data}, @code{target_exit_data} and @code{target_update}
582 @item New @code{device_type} clause to the @code{target} directive
584 @item @code{target_data} as composite construct @tab N @tab
585 @item @code{nowait} clause with reverse-offload @code{target} directives
587 @item Extended @emph{prefer-type} modifier to @code{init} clause @tab Y @tab
588 @item Boolean argument to @code{nowait} and @code{nogroup} may be non constant
590 @item @code{memscope} clause to @code{atomic} and @code{flush} @tab N @tab
591 @item New @code{transparent} clause for multi-generational task-dependence graphs
593 @item The @code{cancel} construct now completes tasks with unfulfilled events
595 @item @code{omp_fulfill_event} routine was restricted regarding fulfillment of
596 event variables @tab N @tab
597 @item Added rule for compound-directive names, permitting many more combinations
599 @item @code{omp_is_free_agent} and @code{omp_ancestor_is_free_agent} routines
601 @item @code{omp_get_device_from_uid} and @code{omp_get_uid_from_device} routines
603 @item @code{omp_get_device_num_teams}, @code{omp_set_device_num_teams},
604 @code{omp_get_device_teams_thread_limit}, and
605 @code{omp_set_device_teams_thread_limit} routines @tab N @tab
606 @item @code{omp_target_memset} and @code{omp_target_memset_async} routines
608 @item Fortran version of the interop runtime routines @tab Y @tab
609 @item Routines for obtaining memory spaces/allocators for shared/device memory
611 @item @code{omp_get_memspace_num_resources} routine @tab N @tab
612 @item @code{omp_get_memspace_pagesize} routine @tab N @tab
613 @item @code{omp_get_submemspace} routine @tab N @tab
614 @item @code{omp_init_mempartitioner}, @code{omp_destroy_mempartitioner},
615 @code{omp_init_mempartition}, @code{omp_destroy_mempartition},
616 @code{omp_mempartition_set_part}, @code{omp_mempartition_get_user_data}
618 @item Deprecation of the @code{target_data_op}, @code{target},
619 @code{target_map} and @code{target_submit} callbacks and as values that
620 @code{set_callback} must return @tab N @tab
621 @item @code{ompt_target_data_transfer} and @code{ompt_target_data_transfer_async}
622 values in @code{ompt_target_data_op_t} enum @tab N @tab
623 @item The values @code{ompt_target_data_transfer_to_device},
624 @code{ompt_target_data_transfer_from_device},
625 @code{ompt_target_data_transfer_to_device_async} and
626 @code{ompt_target_data_transfer_from_device_async} of the @code{target_data_op}
627 OMPT type were deprecated @tab N @tab
628 @item @code{ompt_get_buffer_limits} OMPT routine @tab N @tab
631 @unnumberedsubsec Deprecated features, unless listed above
632 @multitable @columnfractions .60 .10 .25
633 @item Deprecation of omitting the optional white space to separate adjacent
634 keywords in the directive-name in Fortran (fixed and free source form)
636 @item Deprecation of the combiner expression in the @code{declare_reduction}
638 @item Deprecation of the Fortran include file @code{omp_lib.h}
642 @unnumberedsubsec Other new OpenMP 6.0 features
643 @multitable @columnfractions .60 .10 .25
644 @item Multi-word directives now use underscore by default @tab N @tab
645 @item Relaxed Fortran restrictions to the @code{aligned} clause @tab N @tab
646 @item Mapping lambda captures @tab N @tab
647 @item New @code{omp_pause_stop_tool} constant for omp_pause_resource @tab N @tab
648 @item In Fortran (fixed and free source form), spaces between directive names are mandatory
650 @item Update of the map-type decay for mapping and @code{declare_mapper}
656 @c ---------------------------------------------------------------------
657 @c OpenMP Runtime Library Routines
658 @c ---------------------------------------------------------------------
660 @node Runtime Library Routines
661 @chapter OpenMP Runtime Library Routines
663 The runtime routines described here are defined by Section 18 of the OpenMP
664 specification in version 5.2.
667 * Thread Team Routines::
668 * Thread Affinity Routines::
669 * Teams Region Routines::
671 * Resource Relinquishing Routines::
672 * Device Information Routines::
673 * Device Memory Routines::
677 * Interoperability Routines::
678 * Memory Management Routines::
679 @c * Tool Control Routine::
680 * Environment Display Routine::
685 @node Thread Team Routines
686 @section Thread Team Routines
688 Routines controlling threads in the current contention group.
689 They have C linkage and do not throw exceptions.
692 * omp_set_num_threads:: Set upper team size limit
693 * omp_get_num_threads:: Size of the active team
694 * omp_get_max_threads:: Maximum number of threads of parallel region
695 * omp_get_thread_num:: Current thread ID
696 * omp_in_parallel:: Whether a parallel region is active
697 * omp_set_dynamic:: Enable/disable dynamic teams
698 * omp_get_dynamic:: Dynamic teams setting
699 * omp_get_cancellation:: Whether cancellation support is enabled
700 * omp_set_nested:: Enable/disable nested parallel regions
701 * omp_get_nested:: Nested parallel regions
702 * omp_set_schedule:: Set the runtime scheduling method
703 * omp_get_schedule:: Obtain the runtime scheduling method
704 * omp_get_teams_thread_limit:: Maximum number of threads imposed by teams
705 * omp_get_supported_active_levels:: Maximum number of active regions supported
706 * omp_set_max_active_levels:: Limits the number of active parallel regions
707 * omp_get_max_active_levels:: Current maximum number of active regions
708 * omp_get_level:: Number of parallel regions
709 * omp_get_ancestor_thread_num:: Ancestor thread ID
710 * omp_get_team_size:: Number of threads in a team
711 * omp_get_active_level:: Number of active parallel regions
716 @node omp_set_num_threads
717 @subsection @code{omp_set_num_threads} -- Set upper team size limit
719 @item @emph{Description}:
720 Specifies the number of threads used by default in subsequent parallel
721 sections, if those do not specify a @code{num_threads} clause. The
722 argument of @code{omp_set_num_threads} shall be a positive integer.
725 @multitable @columnfractions .20 .80
726 @item @emph{Prototype}: @tab @code{void omp_set_num_threads(int num_threads);}
729 @item @emph{Fortran}:
730 @multitable @columnfractions .20 .80
731 @item @emph{Interface}: @tab @code{subroutine omp_set_num_threads(num_threads)}
732 @item @tab @code{integer, intent(in) :: num_threads}
735 @item @emph{See also}:
736 @ref{OMP_NUM_THREADS}, @ref{omp_get_num_threads}, @ref{omp_get_max_threads}
738 @item @emph{Reference}:
739 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.1.
744 @node omp_get_num_threads
745 @subsection @code{omp_get_num_threads} -- Size of the active team
747 @item @emph{Description}:
748 Returns the number of threads in the current team. In a sequential section of
749 the program @code{omp_get_num_threads} returns 1.
751 The default team size may be initialized at startup by the
752 @env{OMP_NUM_THREADS} environment variable. At runtime, the size
753 of the current team may be set either by the @code{NUM_THREADS}
754 clause or by @code{omp_set_num_threads}. If none of the above were
755 used to define a specific value and @env{OMP_DYNAMIC} is disabled,
756 one thread per CPU online is used.
759 @multitable @columnfractions .20 .80
760 @item @emph{Prototype}: @tab @code{int omp_get_num_threads(void);}
763 @item @emph{Fortran}:
764 @multitable @columnfractions .20 .80
765 @item @emph{Interface}: @tab @code{integer function omp_get_num_threads()}
768 @item @emph{See also}:
769 @ref{omp_get_max_threads}, @ref{omp_set_num_threads}, @ref{OMP_NUM_THREADS}
771 @item @emph{Reference}:
772 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.2.
777 @node omp_get_max_threads
778 @subsection @code{omp_get_max_threads} -- Maximum number of threads of parallel region
780 @item @emph{Description}:
781 Return the maximum number of threads used for the current parallel region
782 that does not use the clause @code{num_threads}.
785 @multitable @columnfractions .20 .80
786 @item @emph{Prototype}: @tab @code{int omp_get_max_threads(void);}
789 @item @emph{Fortran}:
790 @multitable @columnfractions .20 .80
791 @item @emph{Interface}: @tab @code{integer function omp_get_max_threads()}
794 @item @emph{See also}:
795 @ref{omp_set_num_threads}, @ref{omp_set_dynamic}, @ref{omp_get_thread_limit}
797 @item @emph{Reference}:
798 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.3.
803 @node omp_get_thread_num
804 @subsection @code{omp_get_thread_num} -- Current thread ID
806 @item @emph{Description}:
807 Returns a unique thread identification number within the current team.
808 In a sequential parts of the program, @code{omp_get_thread_num}
809 always returns 0. In parallel regions the return value varies
810 from 0 to @code{omp_get_num_threads}-1 inclusive. The return
811 value of the primary thread of a team is always 0.
814 @multitable @columnfractions .20 .80
815 @item @emph{Prototype}: @tab @code{int omp_get_thread_num(void);}
818 @item @emph{Fortran}:
819 @multitable @columnfractions .20 .80
820 @item @emph{Interface}: @tab @code{integer function omp_get_thread_num()}
823 @item @emph{See also}:
824 @ref{omp_get_num_threads}, @ref{omp_get_ancestor_thread_num}
826 @item @emph{Reference}:
827 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.4.
832 @node omp_in_parallel
833 @subsection @code{omp_in_parallel} -- Whether a parallel region is active
835 @item @emph{Description}:
836 This function returns @code{true} if currently running in parallel,
837 @code{false} otherwise. Here, @code{true} and @code{false} represent
838 their language-specific counterparts.
841 @multitable @columnfractions .20 .80
842 @item @emph{Prototype}: @tab @code{int omp_in_parallel(void);}
845 @item @emph{Fortran}:
846 @multitable @columnfractions .20 .80
847 @item @emph{Interface}: @tab @code{logical function omp_in_parallel()}
850 @item @emph{Reference}:
851 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.6.
855 @node omp_set_dynamic
856 @subsection @code{omp_set_dynamic} -- Enable/disable dynamic teams
858 @item @emph{Description}:
859 Enable or disable the dynamic adjustment of the number of threads
860 within a team. The function takes the language-specific equivalent
861 of @code{true} and @code{false}, where @code{true} enables dynamic
862 adjustment of team sizes and @code{false} disables it.
865 @multitable @columnfractions .20 .80
866 @item @emph{Prototype}: @tab @code{void omp_set_dynamic(int dynamic_threads);}
869 @item @emph{Fortran}:
870 @multitable @columnfractions .20 .80
871 @item @emph{Interface}: @tab @code{subroutine omp_set_dynamic(dynamic_threads)}
872 @item @tab @code{logical, intent(in) :: dynamic_threads}
875 @item @emph{See also}:
876 @ref{OMP_DYNAMIC}, @ref{omp_get_dynamic}
878 @item @emph{Reference}:
879 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.7.
884 @node omp_get_dynamic
885 @subsection @code{omp_get_dynamic} -- Dynamic teams setting
887 @item @emph{Description}:
888 This function returns @code{true} if enabled, @code{false} otherwise.
889 Here, @code{true} and @code{false} represent their language-specific
892 The dynamic team setting may be initialized at startup by the
893 @env{OMP_DYNAMIC} environment variable or at runtime using
894 @code{omp_set_dynamic}. If undefined, dynamic adjustment is
898 @multitable @columnfractions .20 .80
899 @item @emph{Prototype}: @tab @code{int omp_get_dynamic(void);}
902 @item @emph{Fortran}:
903 @multitable @columnfractions .20 .80
904 @item @emph{Interface}: @tab @code{logical function omp_get_dynamic()}
907 @item @emph{See also}:
908 @ref{omp_set_dynamic}, @ref{OMP_DYNAMIC}
910 @item @emph{Reference}:
911 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.8.
916 @node omp_get_cancellation
917 @subsection @code{omp_get_cancellation} -- Whether cancellation support is enabled
919 @item @emph{Description}:
920 This function returns @code{true} if cancellation is activated, @code{false}
921 otherwise. Here, @code{true} and @code{false} represent their language-specific
922 counterparts. Unless @env{OMP_CANCELLATION} is set true, cancellations are
926 @multitable @columnfractions .20 .80
927 @item @emph{Prototype}: @tab @code{int omp_get_cancellation(void);}
930 @item @emph{Fortran}:
931 @multitable @columnfractions .20 .80
932 @item @emph{Interface}: @tab @code{logical function omp_get_cancellation()}
935 @item @emph{See also}:
936 @ref{OMP_CANCELLATION}
938 @item @emph{Reference}:
939 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.9.
945 @subsection @code{omp_set_nested} -- Enable/disable nested parallel regions
947 @item @emph{Description}:
948 Enable or disable nested parallel regions, i.e., whether team members
949 are allowed to create new teams. The function takes the language-specific
950 equivalent of @code{true} and @code{false}, where @code{true} enables
951 dynamic adjustment of team sizes and @code{false} disables it.
953 Enabling nested parallel regions also sets the maximum number of
954 active nested regions to the maximum supported. Disabling nested parallel
955 regions sets the maximum number of active nested regions to one.
957 Note that the @code{omp_set_nested} API routine was deprecated
958 in the OpenMP specification 5.0 in favor of @code{omp_set_max_active_levels}.
961 @multitable @columnfractions .20 .80
962 @item @emph{Prototype}: @tab @code{void omp_set_nested(int nested);}
965 @item @emph{Fortran}:
966 @multitable @columnfractions .20 .80
967 @item @emph{Interface}: @tab @code{subroutine omp_set_nested(nested)}
968 @item @tab @code{logical, intent(in) :: nested}
971 @item @emph{See also}:
972 @ref{omp_get_nested}, @ref{omp_set_max_active_levels},
973 @ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
975 @item @emph{Reference}:
976 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.10.
982 @subsection @code{omp_get_nested} -- Nested parallel regions
984 @item @emph{Description}:
985 This function returns @code{true} if nested parallel regions are
986 enabled, @code{false} otherwise. Here, @code{true} and @code{false}
987 represent their language-specific counterparts.
989 The state of nested parallel regions at startup depends on several
990 environment variables. If @env{OMP_MAX_ACTIVE_LEVELS} is defined
991 and is set to greater than one, then nested parallel regions will be
992 enabled. If not defined, then the value of the @env{OMP_NESTED}
993 environment variable will be followed if defined. If neither are
994 defined, then if either @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND}
995 are defined with a list of more than one value, then nested parallel
996 regions are enabled. If none of these are defined, then nested parallel
997 regions are disabled by default.
999 Nested parallel regions can be enabled or disabled at runtime using
1000 @code{omp_set_nested}, or by setting the maximum number of nested
1001 regions with @code{omp_set_max_active_levels} to one to disable, or
1002 above one to enable.
1004 Note that the @code{omp_get_nested} API routine was deprecated
1005 in the OpenMP specification 5.0 in favor of @code{omp_get_max_active_levels}.
1008 @multitable @columnfractions .20 .80
1009 @item @emph{Prototype}: @tab @code{int omp_get_nested(void);}
1012 @item @emph{Fortran}:
1013 @multitable @columnfractions .20 .80
1014 @item @emph{Interface}: @tab @code{logical function omp_get_nested()}
1017 @item @emph{See also}:
1018 @ref{omp_get_max_active_levels}, @ref{omp_set_nested},
1019 @ref{OMP_MAX_ACTIVE_LEVELS}, @ref{OMP_NESTED}
1021 @item @emph{Reference}:
1022 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.11.
1027 @node omp_set_schedule
1028 @subsection @code{omp_set_schedule} -- Set the runtime scheduling method
1030 @item @emph{Description}:
1031 Sets the runtime scheduling method. The @var{kind} argument can have the
1032 value @code{omp_sched_static}, @code{omp_sched_dynamic},
1033 @code{omp_sched_guided} or @code{omp_sched_auto}. Except for
1034 @code{omp_sched_auto}, the chunk size is set to the value of
1035 @var{chunk_size} if positive, or to the default value if zero or negative.
1036 For @code{omp_sched_auto} the @var{chunk_size} argument is ignored.
1039 @multitable @columnfractions .20 .80
1040 @item @emph{Prototype}: @tab @code{void omp_set_schedule(omp_sched_t kind, int chunk_size);}
1043 @item @emph{Fortran}:
1044 @multitable @columnfractions .20 .80
1045 @item @emph{Interface}: @tab @code{subroutine omp_set_schedule(kind, chunk_size)}
1046 @item @tab @code{integer(kind=omp_sched_kind) kind}
1047 @item @tab @code{integer chunk_size}
1050 @item @emph{See also}:
1051 @ref{omp_get_schedule}
1054 @item @emph{Reference}:
1055 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.12.
1060 @node omp_get_schedule
1061 @subsection @code{omp_get_schedule} -- Obtain the runtime scheduling method
1063 @item @emph{Description}:
1064 Obtain the runtime scheduling method. The @var{kind} argument is set to
1065 @code{omp_sched_static}, @code{omp_sched_dynamic},
1066 @code{omp_sched_guided} or @code{omp_sched_auto}. The second argument,
1067 @var{chunk_size}, is set to the chunk size.
1070 @multitable @columnfractions .20 .80
1071 @item @emph{Prototype}: @tab @code{void omp_get_schedule(omp_sched_t *kind, int *chunk_size);}
1074 @item @emph{Fortran}:
1075 @multitable @columnfractions .20 .80
1076 @item @emph{Interface}: @tab @code{subroutine omp_get_schedule(kind, chunk_size)}
1077 @item @tab @code{integer(kind=omp_sched_kind) kind}
1078 @item @tab @code{integer chunk_size}
1081 @item @emph{See also}:
1082 @ref{omp_set_schedule}, @ref{OMP_SCHEDULE}
1084 @item @emph{Reference}:
1085 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.13.
1089 @node omp_get_teams_thread_limit
1090 @subsection @code{omp_get_teams_thread_limit} -- Maximum number of threads imposed by teams
1092 @item @emph{Description}:
1093 Return the maximum number of threads that are able to participate in
1094 each team created by a teams construct.
1097 @multitable @columnfractions .20 .80
1098 @item @emph{Prototype}: @tab @code{int omp_get_teams_thread_limit(void);}
1101 @item @emph{Fortran}:
1102 @multitable @columnfractions .20 .80
1103 @item @emph{Interface}: @tab @code{integer function omp_get_teams_thread_limit()}
1106 @item @emph{See also}:
1107 @ref{omp_set_teams_thread_limit}, @ref{OMP_TEAMS_THREAD_LIMIT}
1109 @item @emph{Reference}:
1110 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.6.
1115 @node omp_get_supported_active_levels
1116 @subsection @code{omp_get_supported_active_levels} -- Maximum number of active regions supported
1118 @item @emph{Description}:
1119 This function returns the maximum number of nested, active parallel regions
1120 supported by this implementation.
1123 @multitable @columnfractions .20 .80
1124 @item @emph{Prototype}: @tab @code{int omp_get_supported_active_levels(void);}
1127 @item @emph{Fortran}:
1128 @multitable @columnfractions .20 .80
1129 @item @emph{Interface}: @tab @code{integer function omp_get_supported_active_levels()}
1132 @item @emph{See also}:
1133 @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
1135 @item @emph{Reference}:
1136 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.15.
1141 @node omp_set_max_active_levels
1142 @subsection @code{omp_set_max_active_levels} -- Limits the number of active parallel regions
1144 @item @emph{Description}:
1145 This function limits the maximum allowed number of nested, active
1146 parallel regions. @var{max_levels} must be less or equal to
1147 the value returned by @code{omp_get_supported_active_levels}.
1150 @multitable @columnfractions .20 .80
1151 @item @emph{Prototype}: @tab @code{void omp_set_max_active_levels(int max_levels);}
1154 @item @emph{Fortran}:
1155 @multitable @columnfractions .20 .80
1156 @item @emph{Interface}: @tab @code{subroutine omp_set_max_active_levels(max_levels)}
1157 @item @tab @code{integer max_levels}
1160 @item @emph{See also}:
1161 @ref{omp_get_max_active_levels}, @ref{omp_get_active_level},
1162 @ref{omp_get_supported_active_levels}
1164 @item @emph{Reference}:
1165 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.15.
1170 @node omp_get_max_active_levels
1171 @subsection @code{omp_get_max_active_levels} -- Current maximum number of active regions
1173 @item @emph{Description}:
1174 This function obtains the maximum allowed number of nested, active parallel regions.
1177 @multitable @columnfractions .20 .80
1178 @item @emph{Prototype}: @tab @code{int omp_get_max_active_levels(void);}
1181 @item @emph{Fortran}:
1182 @multitable @columnfractions .20 .80
1183 @item @emph{Interface}: @tab @code{integer function omp_get_max_active_levels()}
1186 @item @emph{See also}:
1187 @ref{omp_set_max_active_levels}, @ref{omp_get_active_level}
1189 @item @emph{Reference}:
1190 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.16.
1195 @subsection @code{omp_get_level} -- Obtain the current nesting level
1197 @item @emph{Description}:
1198 This function returns the nesting level for the parallel blocks,
1199 which enclose the calling call.
1202 @multitable @columnfractions .20 .80
1203 @item @emph{Prototype}: @tab @code{int omp_get_level(void);}
1206 @item @emph{Fortran}:
1207 @multitable @columnfractions .20 .80
1208 @item @emph{Interface}: @tab @code{integer function omp_level()}
1211 @item @emph{See also}:
1212 @ref{omp_get_active_level}
1214 @item @emph{Reference}:
1215 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.17.
1220 @node omp_get_ancestor_thread_num
1221 @subsection @code{omp_get_ancestor_thread_num} -- Ancestor thread ID
1223 @item @emph{Description}:
1224 This function returns the thread identification number for the given
1225 nesting level of the current thread. For values of @var{level} outside
1226 zero to @code{omp_get_level} -1 is returned; if @var{level} is
1227 @code{omp_get_level} the result is identical to @code{omp_get_thread_num}.
1230 @multitable @columnfractions .20 .80
1231 @item @emph{Prototype}: @tab @code{int omp_get_ancestor_thread_num(int level);}
1234 @item @emph{Fortran}:
1235 @multitable @columnfractions .20 .80
1236 @item @emph{Interface}: @tab @code{integer function omp_get_ancestor_thread_num(level)}
1237 @item @tab @code{integer level}
1240 @item @emph{See also}:
1241 @ref{omp_get_level}, @ref{omp_get_thread_num}, @ref{omp_get_team_size}
1243 @item @emph{Reference}:
1244 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.18.
1249 @node omp_get_team_size
1250 @subsection @code{omp_get_team_size} -- Number of threads in a team
1252 @item @emph{Description}:
1253 This function returns the number of threads in a thread team to which
1254 either the current thread or its ancestor belongs. For values of @var{level}
1255 outside zero to @code{omp_get_level}, -1 is returned; if @var{level} is zero,
1256 1 is returned, and for @code{omp_get_level}, the result is identical
1257 to @code{omp_get_num_threads}.
1260 @multitable @columnfractions .20 .80
1261 @item @emph{Prototype}: @tab @code{int omp_get_team_size(int level);}
1264 @item @emph{Fortran}:
1265 @multitable @columnfractions .20 .80
1266 @item @emph{Interface}: @tab @code{integer function omp_get_team_size(level)}
1267 @item @tab @code{integer level}
1270 @item @emph{See also}:
1271 @ref{omp_get_num_threads}, @ref{omp_get_level}, @ref{omp_get_ancestor_thread_num}
1273 @item @emph{Reference}:
1274 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.19.
1279 @node omp_get_active_level
1280 @subsection @code{omp_get_active_level} -- Number of parallel regions
1282 @item @emph{Description}:
1283 This function returns the nesting level for the active parallel blocks,
1284 which enclose the calling call.
1287 @multitable @columnfractions .20 .80
1288 @item @emph{Prototype}: @tab @code{int omp_get_active_level(void);}
1291 @item @emph{Fortran}:
1292 @multitable @columnfractions .20 .80
1293 @item @emph{Interface}: @tab @code{integer function omp_get_active_level()}
1296 @item @emph{See also}:
1297 @ref{omp_get_level}, @ref{omp_get_max_active_levels}, @ref{omp_set_max_active_levels}
1299 @item @emph{Reference}:
1300 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.20.
1305 @node Thread Affinity Routines
1306 @section Thread Affinity Routines
1308 Routines controlling and accessing thread-affinity policies.
1309 They have C linkage and do not throw exceptions.
1312 * omp_get_proc_bind:: Whether threads may be moved between CPUs
1313 @c * omp_get_num_places:: Get the number of places available
1314 @c * omp_get_place_num_procs:: Get the number of processes associated with a place
1315 @c * omp_get_place_proc_ids:: Get number of processes associated with a place
1316 @c * omp_get_place_num:: Get place number of the associated task
1317 @c * omp_get_partition_num_places:: Get number of places of innermost task
1318 @c * omp_get_partition_place_nums:: <fixme>
1319 @c * omp_set_affinity_format:: <fixme>
1320 @c * omp_get_affinity_format:: <fixme>
1321 @c * omp_display_affinity:: <fixme>
1322 @c * omp_capture_affinity:: <fixme>
1327 @node omp_get_proc_bind
1328 @subsection @code{omp_get_proc_bind} -- Whether threads may be moved between CPUs
1330 @item @emph{Description}:
1331 This functions returns the currently active thread affinity policy, which is
1332 set via @env{OMP_PROC_BIND}. Possible values are @code{omp_proc_bind_false},
1333 @code{omp_proc_bind_true}, @code{omp_proc_bind_primary},
1334 @code{omp_proc_bind_master}, @code{omp_proc_bind_close} and @code{omp_proc_bind_spread},
1335 where @code{omp_proc_bind_master} is an alias for @code{omp_proc_bind_primary}.
1338 @multitable @columnfractions .20 .80
1339 @item @emph{Prototype}: @tab @code{omp_proc_bind_t omp_get_proc_bind(void);}
1342 @item @emph{Fortran}:
1343 @multitable @columnfractions .20 .80
1344 @item @emph{Interface}: @tab @code{integer(kind=omp_proc_bind_kind) function omp_get_proc_bind()}
1347 @item @emph{See also}:
1348 @ref{OMP_PROC_BIND}, @ref{OMP_PLACES}, @ref{GOMP_CPU_AFFINITY},
1350 @item @emph{Reference}:
1351 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.22.
1356 @node Teams Region Routines
1357 @section Teams Region Routines
1359 Routines controlling the league of teams that are executed in a @code{teams}
1360 region. They have C linkage and do not throw exceptions.
1363 * omp_get_num_teams:: Number of teams
1364 * omp_get_team_num:: Get team number
1365 * omp_set_num_teams:: Set upper teams limit for teams region
1366 * omp_get_max_teams:: Maximum number of teams for teams region
1367 * omp_set_teams_thread_limit:: Set upper thread limit for teams construct
1368 * omp_get_thread_limit:: Maximum number of threads
1373 @node omp_get_num_teams
1374 @subsection @code{omp_get_num_teams} -- Number of teams
1376 @item @emph{Description}:
1377 Returns the number of teams in the current team region.
1380 @multitable @columnfractions .20 .80
1381 @item @emph{Prototype}: @tab @code{int omp_get_num_teams(void);}
1384 @item @emph{Fortran}:
1385 @multitable @columnfractions .20 .80
1386 @item @emph{Interface}: @tab @code{integer function omp_get_num_teams()}
1389 @item @emph{Reference}:
1390 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.32.
1395 @node omp_get_team_num
1396 @subsection @code{omp_get_team_num} -- Get team number
1398 @item @emph{Description}:
1399 Returns the team number of the calling thread.
1402 @multitable @columnfractions .20 .80
1403 @item @emph{Prototype}: @tab @code{int omp_get_team_num(void);}
1406 @item @emph{Fortran}:
1407 @multitable @columnfractions .20 .80
1408 @item @emph{Interface}: @tab @code{integer function omp_get_team_num()}
1411 @item @emph{Reference}:
1412 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.33.
1417 @node omp_set_num_teams
1418 @subsection @code{omp_set_num_teams} -- Set upper teams limit for teams construct
1420 @item @emph{Description}:
1421 Specifies the upper bound for number of teams created by the teams construct
1422 which does not specify a @code{num_teams} clause. The
1423 argument of @code{omp_set_num_teams} shall be a positive integer.
1426 @multitable @columnfractions .20 .80
1427 @item @emph{Prototype}: @tab @code{void omp_set_num_teams(int num_teams);}
1430 @item @emph{Fortran}:
1431 @multitable @columnfractions .20 .80
1432 @item @emph{Interface}: @tab @code{subroutine omp_set_num_teams(num_teams)}
1433 @item @tab @code{integer, intent(in) :: num_teams}
1436 @item @emph{See also}:
1437 @ref{OMP_NUM_TEAMS}, @ref{omp_get_num_teams}, @ref{omp_get_max_teams}
1439 @item @emph{Reference}:
1440 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.3.
1445 @node omp_get_max_teams
1446 @subsection @code{omp_get_max_teams} -- Maximum number of teams of teams region
1448 @item @emph{Description}:
1449 Return the maximum number of teams used for the teams region
1450 that does not use the clause @code{num_teams}.
1453 @multitable @columnfractions .20 .80
1454 @item @emph{Prototype}: @tab @code{int omp_get_max_teams(void);}
1457 @item @emph{Fortran}:
1458 @multitable @columnfractions .20 .80
1459 @item @emph{Interface}: @tab @code{integer function omp_get_max_teams()}
1462 @item @emph{See also}:
1463 @ref{omp_set_num_teams}, @ref{omp_get_num_teams}
1465 @item @emph{Reference}:
1466 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.4.
1471 @node omp_set_teams_thread_limit
1472 @subsection @code{omp_set_teams_thread_limit} -- Set upper thread limit for teams construct
1474 @item @emph{Description}:
1475 Specifies the upper bound for number of threads that are available
1476 for each team created by the teams construct which does not specify a
1477 @code{thread_limit} clause. The argument of
1478 @code{omp_set_teams_thread_limit} shall be a positive integer.
1481 @multitable @columnfractions .20 .80
1482 @item @emph{Prototype}: @tab @code{void omp_set_teams_thread_limit(int thread_limit);}
1485 @item @emph{Fortran}:
1486 @multitable @columnfractions .20 .80
1487 @item @emph{Interface}: @tab @code{subroutine omp_set_teams_thread_limit(thread_limit)}
1488 @item @tab @code{integer, intent(in) :: thread_limit}
1491 @item @emph{See also}:
1492 @ref{OMP_TEAMS_THREAD_LIMIT}, @ref{omp_get_teams_thread_limit}, @ref{omp_get_thread_limit}
1494 @item @emph{Reference}:
1495 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.4.5.
1500 @node omp_get_thread_limit
1501 @subsection @code{omp_get_thread_limit} -- Maximum number of threads
1503 @item @emph{Description}:
1504 Return the maximum number of threads of the program.
1507 @multitable @columnfractions .20 .80
1508 @item @emph{Prototype}: @tab @code{int omp_get_thread_limit(void);}
1511 @item @emph{Fortran}:
1512 @multitable @columnfractions .20 .80
1513 @item @emph{Interface}: @tab @code{integer function omp_get_thread_limit()}
1516 @item @emph{See also}:
1517 @ref{omp_get_max_threads}, @ref{OMP_THREAD_LIMIT}
1519 @item @emph{Reference}:
1520 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.14.
1525 @node Tasking Routines
1526 @section Tasking Routines
1528 Routines relating to explicit tasks.
1529 They have C linkage and do not throw exceptions.
1532 * omp_get_max_task_priority:: Maximum task priority value that can be set
1533 * omp_in_explicit_task:: Whether a given task is an explicit task
1534 * omp_in_final:: Whether in final or included task region
1535 @c * omp_is_free_agent:: <fixme>/TR12
1536 @c * omp_ancestor_is_free_agent:: <fixme>/TR12
1541 @node omp_get_max_task_priority
1542 @subsection @code{omp_get_max_task_priority} -- Maximum priority value
1543 that can be set for tasks.
1545 @item @emph{Description}:
1546 This function obtains the maximum allowed priority number for tasks.
1549 @multitable @columnfractions .20 .80
1550 @item @emph{Prototype}: @tab @code{int omp_get_max_task_priority(void);}
1553 @item @emph{Fortran}:
1554 @multitable @columnfractions .20 .80
1555 @item @emph{Interface}: @tab @code{integer function omp_get_max_task_priority()}
1558 @item @emph{Reference}:
1559 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
1564 @node omp_in_explicit_task
1565 @subsection @code{omp_in_explicit_task} -- Whether a given task is an explicit task
1567 @item @emph{Description}:
1568 The function returns the @var{explicit-task-var} ICV; it returns true when the
1569 encountering task was generated by a task-generating construct such as
1570 @code{target}, @code{task} or @code{taskloop}. Otherwise, the encountering task
1571 is in an implicit task region such as generated by the implicit or explicit
1572 @code{parallel} region and @code{omp_in_explicit_task} returns false.
1575 @multitable @columnfractions .20 .80
1576 @item @emph{Prototype}: @tab @code{int omp_in_explicit_task(void);}
1579 @item @emph{Fortran}:
1580 @multitable @columnfractions .20 .80
1581 @item @emph{Interface}: @tab @code{logical function omp_in_explicit_task()}
1584 @item @emph{Reference}:
1585 @uref{https://www.openmp.org, OpenMP specification v5.2}, Section 18.5.2.
1591 @subsection @code{omp_in_final} -- Whether in final or included task region
1593 @item @emph{Description}:
1594 This function returns @code{true} if currently running in a final
1595 or included task region, @code{false} otherwise. Here, @code{true}
1596 and @code{false} represent their language-specific counterparts.
1599 @multitable @columnfractions .20 .80
1600 @item @emph{Prototype}: @tab @code{int omp_in_final(void);}
1603 @item @emph{Fortran}:
1604 @multitable @columnfractions .20 .80
1605 @item @emph{Interface}: @tab @code{logical function omp_in_final()}
1608 @item @emph{Reference}:
1609 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.21.
1614 @node Resource Relinquishing Routines
1615 @section Resource Relinquishing Routines
1617 Routines releasing resources used by the OpenMP runtime.
1618 They have C linkage and do not throw exceptions.
1621 * omp_pause_resource:: Release OpenMP resources on a device
1622 * omp_pause_resource_all:: Release OpenMP resources on all devices
1627 @node omp_pause_resource
1628 @subsection @code{omp_pause_resource} -- Release OpenMP resources on a device
1630 @item @emph{Description}:
1631 Free resources used by the OpenMP program and the runtime library on and for the
1632 device specified by @var{device_num}; on success, zero is returned and non-zero
1635 The value of @var{device_num} must be a conforming device number. The routine
1636 may not be called from within any explicit region and all explicit threads that
1637 do not bind to the implicit parallel region have finalized execution.
1640 @multitable @columnfractions .20 .80
1641 @item @emph{Prototype}: @tab @code{int omp_pause_resource(omp_pause_resource_t kind, int device_num);}
1644 @item @emph{Fortran}:
1645 @multitable @columnfractions .20 .80
1646 @item @emph{Interface}: @tab @code{integer function omp_pause_resource(kind, device_num)}
1647 @item @tab @code{integer (kind=omp_pause_resource_kind) kind}
1648 @item @tab @code{integer device_num}
1651 @item @emph{Reference}:
1652 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.43.
1657 @node omp_pause_resource_all
1658 @subsection @code{omp_pause_resource_all} -- Release OpenMP resources on all devices
1660 @item @emph{Description}:
1661 Free resources used by the OpenMP program and the runtime library on all devices,
1662 including the host. On success, zero is returned and non-zero otherwise.
1664 The routine may not be called from within any explicit region and all explicit
1665 threads that do not bind to the implicit parallel region have finalized execution.
1668 @multitable @columnfractions .20 .80
1669 @item @emph{Prototype}: @tab @code{int omp_pause_resource(omp_pause_resource_t kind);}
1672 @item @emph{Fortran}:
1673 @multitable @columnfractions .20 .80
1674 @item @emph{Interface}: @tab @code{integer function omp_pause_resource(kind)}
1675 @item @tab @code{integer (kind=omp_pause_resource_kind) kind}
1678 @item @emph{See also}:
1679 @ref{omp_pause_resource}
1681 @item @emph{Reference}:
1682 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.44.
1687 @node Device Information Routines
1688 @section Device Information Routines
1690 Routines related to devices available to an OpenMP program.
1691 They have C linkage and do not throw exceptions.
1694 * omp_get_num_procs:: Number of processors online
1695 @c * omp_get_max_progress_width:: <fixme>/TR11
1696 * omp_set_default_device:: Set the default device for target regions
1697 * omp_get_default_device:: Get the default device for target regions
1698 * omp_get_num_devices:: Number of target devices
1699 * omp_get_device_num:: Get device that current thread is running on
1700 * omp_get_device_from_uid:: Obtain the device number to a unique id
1701 * omp_get_uid_from_device:: Obtain the unique id of a device
1702 * omp_is_initial_device:: Whether executing on the host device
1703 * omp_get_initial_device:: Device number of host device
1704 @c * omp_get_device_num_teams:: <fixme>/TR13
1705 @c * omp_set_device_num_teams:: <fixme>/TR13
1706 @c * omp_get_device_teams_thread_limit:: <fixme>/TR13
1707 @c * omp_set_device_teams_thread_limit:: <fixme>/TR13
1712 @node omp_get_num_procs
1713 @subsection @code{omp_get_num_procs} -- Number of processors online
1715 @item @emph{Description}:
1716 Returns the number of processors online on that device.
1719 @multitable @columnfractions .20 .80
1720 @item @emph{Prototype}: @tab @code{int omp_get_num_procs(void);}
1723 @item @emph{Fortran}:
1724 @multitable @columnfractions .20 .80
1725 @item @emph{Interface}: @tab @code{integer function omp_get_num_procs()}
1728 @item @emph{Reference}:
1729 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.5.
1734 @node omp_set_default_device
1735 @subsection @code{omp_set_default_device} -- Set the default device for target regions
1737 @item @emph{Description}:
1738 Get the value of the @emph{default-device-var} ICV, which is used
1739 for target regions without a device clause. The argument
1740 shall be a nonnegative device number, @code{omp_initial_device},
1741 or @code{omp_invalid_device}.
1743 The effect of running this routine in a @code{target} region is unspecified.
1746 @multitable @columnfractions .20 .80
1747 @item @emph{Prototype}: @tab @code{void omp_set_default_device(int device_num);}
1750 @item @emph{Fortran}:
1751 @multitable @columnfractions .20 .80
1752 @item @emph{Interface}: @tab @code{subroutine omp_set_default_device(device_num)}
1753 @item @tab @code{integer device_num}
1756 @item @emph{See also}:
1757 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_get_default_device}
1759 @item @emph{Reference}:
1760 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.29.
1765 @node omp_get_default_device
1766 @subsection @code{omp_get_default_device} -- Get the default device for target regions
1768 @item @emph{Description}:
1769 Get the value of the @emph{default-device-var} ICV, which is used
1770 for target regions without a device clause. The value is either a
1771 nonnegative device number, @code{omp_initial_device} or
1772 @code{omp_invalid_device}. Note that for the host, the ICV can have two values:
1773 either the value of the named constant @code{omp_initial_device} or the value
1774 returned by the @code{omp_get_num_devices} routine.
1776 The effect of running this routine in a @code{target} region is unspecified.
1779 @multitable @columnfractions .20 .80
1780 @item @emph{Prototype}: @tab @code{int omp_get_default_device(void);}
1783 @item @emph{Fortran}:
1784 @multitable @columnfractions .20 .80
1785 @item @emph{Interface}: @tab @code{integer function omp_get_default_device()}
1788 @item @emph{See also}:
1789 @ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device},
1790 @ref{omp_get_initial_device}
1792 @item @emph{Reference}:
1793 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
1798 @node omp_get_num_devices
1799 @subsection @code{omp_get_num_devices} -- Number of target devices
1801 @item @emph{Description}:
1802 Returns the number of available non-host devices.
1804 The effect of running this routine in a @code{target} region is unspecified.
1807 @multitable @columnfractions .20 .80
1808 @item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
1811 @item @emph{Fortran}:
1812 @multitable @columnfractions .20 .80
1813 @item @emph{Interface}: @tab @code{integer function omp_get_num_devices()}
1816 @item @emph{Reference}:
1817 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.31.
1822 @node omp_get_device_num
1823 @subsection @code{omp_get_device_num} -- Return device number of current device
1825 @item @emph{Description}:
1826 This function returns a device number that represents the device that the
1827 current thread is executing on. When called on the host, it returns
1828 the same value as returned by the @code{omp_get_initial_device} function
1829 as required since OpenMP 5.0.
1832 @multitable @columnfractions .20 .80
1833 @item @emph{Prototype}: @tab @code{int omp_get_device_num(void);}
1836 @item @emph{Fortran}:
1837 @multitable @columnfractions .20 .80
1838 @item @emph{Interface}: @tab @code{integer function omp_get_device_num()}
1841 @item @emph{See also}:
1842 @ref{omp_get_initial_device}
1844 @item @emph{Reference}:
1845 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.2.37.
1850 @node omp_get_device_from_uid
1851 @subsection @code{omp_get_device_from_uid} -- Obtain the device number to a unique id
1853 @item @emph{Description}:
1854 This function returns the device number associated with the passed
1855 unique-identifier (UID) string. If no device with this UID is available, the value
1856 @code{omp_invalid_device} is returned. The effect of running this routine in a
1857 @code{target} region is unspecified.
1859 GCC treats the UID string case sensitive; for the initial device, GCC currently
1860 only accepts the value @code{OMP_INITIAL_DEVICE} and returns for it the value
1861 of @code{omp_initial_device}.
1864 @multitable @columnfractions .20 .80
1865 @item @emph{Prototype}: @tab @code{int omp_get_device_from_uid(const char *uid);}
1868 @item @emph{Fortran}:
1869 @multitable @columnfractions .20 .80
1870 @item @emph{Interface}: @tab @code{integer function omp_get_device_from_uid(uid)}
1871 @item @tab @code{character(len=*), intent(in) :: uid}
1874 @item @emph{See also}:
1875 @ref{omp_get_uid_from_device}, @ref{Offload-Target Specifics}
1877 @item @emph{Reference}:
1878 @uref{https://www.openmp.org, OpenMP specification v6.0}, Section 24.7
1883 @node omp_get_uid_from_device
1884 @subsection @code{omp_get_uid_from_device} -- Obtain the unique id of a device
1886 @item @emph{Description}:
1887 This function returns a pointer to a string that represents a unique identifier
1888 (UID) for the device specified by @var{device_num}. It returns a @code{NULL} (C/C++)
1889 or a disassociated pointer (Fortran) for @code{omp_invalid_device}. The effect of
1890 running this routine in a @code{target} region is unspecified.
1892 GCC currently returns for initial device the value @code{OMP_INITIAL_DEVICE}.
1895 @multitable @columnfractions .20 .80
1896 @item @emph{Prototype}: @tab @code{const char *omp_get_uid_from_device(int device_num);}
1899 @item @emph{Fortran}:
1900 @multitable @columnfractions .20 .80
1901 @item @emph{Interface}: @tab @code{character(:) function omp_get_uid_from_device(device_num)}
1902 @item @emph{Interface}: @tab @code{pointer :: omp_get_uid_from_device}
1903 @item @tab @code{integer, intent(in) :: device_num}
1906 @item @emph{See also}:
1907 @ref{omp_get_uid_from_device}, @ref{Offload-Target Specifics}
1909 @item @emph{Reference}:
1910 @uref{https://www.openmp.org, OpenMP specification v6.0}, Section 24.8
1915 @node omp_is_initial_device
1916 @subsection @code{omp_is_initial_device} -- Whether executing on the host device
1918 @item @emph{Description}:
1919 This function returns @code{true} if currently running on the host device,
1920 @code{false} otherwise. Here, @code{true} and @code{false} represent
1921 their language-specific counterparts.
1923 Note that in GCC this function call is already folded to a constant in the
1924 compiler; compile with @option{-fno-builtin-omp_is_initial_device} if a
1925 run-time function is desired.
1928 @multitable @columnfractions .20 .80
1929 @item @emph{Prototype}: @tab @code{int omp_is_initial_device(void);}
1932 @item @emph{Fortran}:
1933 @multitable @columnfractions .20 .80
1934 @item @emph{Interface}: @tab @code{logical function omp_is_initial_device()}
1937 @item @emph{Reference}:
1938 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.34.
1943 @node omp_get_initial_device
1944 @subsection @code{omp_get_initial_device} -- Return device number of initial device
1946 @item @emph{Description}:
1947 This function returns a device number that represents the host device.
1948 Since OpenMP 5.1, this is equal to the value returned by the
1949 @code{omp_get_num_devices} function; since OpenMP 6.0 it may also return
1950 the value of @code{omp_initial_device}.
1952 The effect of running this routine in a @code{target} region is unspecified.
1955 @multitable @columnfractions .20 .80
1956 @item @emph{Prototype}: @tab @code{int omp_get_initial_device(void);}
1959 @item @emph{Fortran}:
1960 @multitable @columnfractions .20 .80
1961 @item @emph{Interface}: @tab @code{integer function omp_get_initial_device()}
1964 @item @emph{See also}:
1965 @ref{omp_get_num_devices}
1967 @item @emph{Reference}:
1968 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.35.
1973 @node Device Memory Routines
1974 @section Device Memory Routines
1976 Routines related to memory allocation and managing corresponding
1977 pointers on devices. They have C linkage and do not throw exceptions.
1980 * omp_target_alloc:: Allocate device memory
1981 * omp_target_free:: Free device memory
1982 * omp_target_is_present:: Check whether storage is mapped
1983 * omp_target_is_accessible:: Check whether memory is device accessible
1984 * omp_target_memcpy:: Copy data between devices
1985 * omp_target_memcpy_async:: Copy data between devices asynchronously
1986 * omp_target_memcpy_rect:: Copy a subvolume of data between devices
1987 * omp_target_memcpy_rect_async:: Copy a subvolume of data between devices asynchronously
1988 @c * omp_target_memset:: <fixme>/TR12
1989 @c * omp_target_memset_async:: <fixme>/TR12
1990 * omp_target_associate_ptr:: Associate a device pointer with a host pointer
1991 * omp_target_disassociate_ptr:: Remove device--host pointer association
1992 * omp_get_mapped_ptr:: Return device pointer to a host pointer
1997 @node omp_target_alloc
1998 @subsection @code{omp_target_alloc} -- Allocate device memory
2000 @item @emph{Description}:
2001 This routine allocates @var{size} bytes of memory in the device environment
2002 associated with the device number @var{device_num}. If successful, a device
2003 pointer is returned, otherwise a null pointer.
2005 In GCC, when the device is the host or the device shares memory with the host,
2006 the memory is allocated on the host; in that case, when @var{size} is zero,
2007 either NULL or a unique pointer value that can later be successfully passed to
2008 @code{omp_target_free} is returned. When the allocation is not performed on
2009 the host, a null pointer is returned when @var{size} is zero; in that case,
2010 additionally a diagnostic might be printed to standard error (stderr).
2012 Running this routine in a @code{target} region except on the initial device
2016 @multitable @columnfractions .20 .80
2017 @item @emph{Prototype}: @tab @code{void *omp_target_alloc(size_t size, int device_num)}
2020 @item @emph{Fortran}:
2021 @multitable @columnfractions .20 .80
2022 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_target_alloc(size, device_num) bind(C)}
2023 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int, c_size_t}
2024 @item @tab @code{integer(c_size_t), value :: size}
2025 @item @tab @code{integer(c_int), value :: device_num}
2028 @item @emph{See also}:
2029 @ref{omp_target_free}, @ref{omp_target_associate_ptr}
2031 @item @emph{Reference}:
2032 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.1
2037 @node omp_target_free
2038 @subsection @code{omp_target_free} -- Free device memory
2040 @item @emph{Description}:
2041 This routine frees memory allocated by the @code{omp_target_alloc} routine.
2042 The @var{device_ptr} argument must be either a null pointer or a device pointer
2043 returned by @code{omp_target_alloc} for the specified @code{device_num}. The
2044 device number @var{device_num} must be a conforming device number.
2046 Running this routine in a @code{target} region except on the initial device
2050 @multitable @columnfractions .20 .80
2051 @item @emph{Prototype}: @tab @code{void omp_target_free(void *device_ptr, int device_num)}
2054 @item @emph{Fortran}:
2055 @multitable @columnfractions .20 .80
2056 @item @emph{Interface}: @tab @code{subroutine omp_target_free(device_ptr, device_num) bind(C)}
2057 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int}
2058 @item @tab @code{type(c_ptr), value :: device_ptr}
2059 @item @tab @code{integer(c_int), value :: device_num}
2062 @item @emph{See also}:
2063 @ref{omp_target_alloc}, @ref{omp_target_disassociate_ptr}
2065 @item @emph{Reference}:
2066 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.2
2071 @node omp_target_is_present
2072 @subsection @code{omp_target_is_present} -- Check whether storage is mapped
2074 @item @emph{Description}:
2075 This routine tests whether storage, identified by the host pointer @var{ptr}
2076 is mapped to the device specified by @var{device_num}. If so, it returns
2077 a nonzero value and otherwise zero.
2079 In GCC, this includes self mapping such that @code{omp_target_is_present}
2080 returns @emph{true} when @var{device_num} specifies the host or when the host
2081 and the device share memory. If @var{ptr} is a null pointer, @var{true} is
2082 returned and if @var{device_num} is an invalid device number, @var{false} is
2085 If those conditions do not apply, @emph{true} is returned if the association has
2086 been established by an explicit or implicit @code{map} clause, the
2087 @code{declare target} directive or a call to the @code{omp_target_associate_ptr}
2090 Running this routine in a @code{target} region except on the initial device
2094 @multitable @columnfractions .20 .80
2095 @item @emph{Prototype}: @tab @code{int omp_target_is_present(const void *ptr,}
2096 @item @tab @code{ int device_num)}
2099 @item @emph{Fortran}:
2100 @multitable @columnfractions .20 .80
2101 @item @emph{Interface}: @tab @code{integer(c_int) function omp_target_is_present(ptr, &}
2102 @item @tab @code{ device_num) bind(C)}
2103 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int}
2104 @item @tab @code{type(c_ptr), value :: ptr}
2105 @item @tab @code{integer(c_int), value :: device_num}
2108 @item @emph{See also}:
2109 @ref{omp_target_associate_ptr}
2111 @item @emph{Reference}:
2112 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.3
2117 @node omp_target_is_accessible
2118 @subsection @code{omp_target_is_accessible} -- Check whether memory is device accessible
2120 @item @emph{Description}:
2121 This routine tests whether memory, starting at the address given by @var{ptr}
2122 and extending @var{size} bytes, is accessibly on the device specified by
2123 @var{device_num}. If so, it returns a nonzero value and otherwise zero.
2125 The address given by @var{ptr} is interpreted to be in the address space of
2126 the device and @var{size} must be positive.
2128 Note that GCC's current implementation assumes that @var{ptr} is a valid host
2129 pointer. Therefore, all addresses given by @var{ptr} are assumed to be
2130 accessible on the initial device. And, to err on the safe side, this memory
2131 is only available on a non-host device that can access all host memory
2132 ([uniform] shared memory access).
2134 Running this routine in a @code{target} region except on the initial device
2138 @multitable @columnfractions .20 .80
2139 @item @emph{Prototype}: @tab @code{int omp_target_is_accessible(const void *ptr,}
2140 @item @tab @code{ size_t size,}
2141 @item @tab @code{ int device_num)}
2144 @item @emph{Fortran}:
2145 @multitable @columnfractions .20 .80
2146 @item @emph{Interface}: @tab @code{integer(c_int) function omp_target_is_accessible(ptr, &}
2147 @item @tab @code{ size, device_num) bind(C)}
2148 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_size_t, c_int}
2149 @item @tab @code{type(c_ptr), value :: ptr}
2150 @item @tab @code{integer(c_size_t), value :: size}
2151 @item @tab @code{integer(c_int), value :: device_num}
2154 @item @emph{See also}:
2155 @ref{omp_target_associate_ptr}
2157 @item @emph{Reference}:
2158 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.4
2163 @node omp_target_memcpy
2164 @subsection @code{omp_target_memcpy} -- Copy data between devices
2166 @item @emph{Description}:
2167 This routine copies @var{length} of bytes of data from the device
2168 identified by device number @var{src_device_num} to device @var{dst_device_num}.
2169 The data is copied from the source device from the address provided by
2170 @var{src}, shifted by the offset of @var{src_offset} bytes, to the destination
2171 device's @var{dst} address shifted by @var{dst_offset}. The routine returns
2172 zero on success and non-zero otherwise.
2174 Running this routine in a @code{target} region except on the initial device
2178 @multitable @columnfractions .20 .80
2179 @item @emph{Prototype}: @tab @code{int omp_target_memcpy(void *dst,}
2180 @item @tab @code{ const void *src,}
2181 @item @tab @code{ size_t length,}
2182 @item @tab @code{ size_t dst_offset,}
2183 @item @tab @code{ size_t src_offset,}
2184 @item @tab @code{ int dst_device_num,}
2185 @item @tab @code{ int src_device_num)}
2188 @item @emph{Fortran}:
2189 @multitable @columnfractions .20 .80
2190 @item @emph{Interface}: @tab @code{integer(c_int) function omp_target_memcpy( &}
2191 @item @tab @code{ dst, src, length, dst_offset, src_offset, &}
2192 @item @tab @code{ dst_device_num, src_device_num) bind(C)}
2193 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_size_t, c_int}
2194 @item @tab @code{type(c_ptr), value :: dst, src}
2195 @item @tab @code{integer(c_size_t), value :: length, dst_offset, src_offset}
2196 @item @tab @code{integer(c_int), value :: dst_device_num, src_device_num}
2199 @item @emph{See also}:
2200 @ref{omp_target_memcpy_async}, @ref{omp_target_memcpy_rect}
2202 @item @emph{Reference}:
2203 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.5
2208 @node omp_target_memcpy_async
2209 @subsection @code{omp_target_memcpy_async} -- Copy data between devices asynchronously
2211 @item @emph{Description}:
2212 This routine copies asynchronously @var{length} of bytes of data from the
2213 device identified by device number @var{src_device_num} to device
2214 @var{dst_device_num}. The data is copied from the source device from the
2215 address provided by @var{src}, shifted by the offset of @var{src_offset} bytes,
2216 to the destination device's @var{dst} address shifted by @var{dst_offset}.
2217 Task dependence is expressed by passing an array of depend objects to
2218 @var{depobj_list}, where the number of array elements is passed as
2219 @var{depobj_count}; if the count is zero, the @var{depobj_list} argument is
2220 ignored. In C++ and Fortran, the @var{depobj_list} argument can also be
2221 omitted in that case. The routine returns zero if the copying process has
2222 successfully been started and non-zero otherwise.
2224 Running this routine in a @code{target} region except on the initial device
2228 @multitable @columnfractions .20 .80
2229 @item @emph{Prototype}: @tab @code{int omp_target_memcpy_async(void *dst,}
2230 @item @tab @code{ const void *src,}
2231 @item @tab @code{ size_t length,}
2232 @item @tab @code{ size_t dst_offset,}
2233 @item @tab @code{ size_t src_offset,}
2234 @item @tab @code{ int dst_device_num,}
2235 @item @tab @code{ int src_device_num,}
2236 @item @tab @code{ int depobj_count,}
2237 @item @tab @code{ omp_depend_t *depobj_list)}
2240 @item @emph{Fortran}:
2241 @multitable @columnfractions .20 .80
2242 @item @emph{Interface}: @tab @code{integer(c_int) function omp_target_memcpy_async( &}
2243 @item @tab @code{ dst, src, length, dst_offset, src_offset, &}
2244 @item @tab @code{ dst_device_num, src_device_num, &}
2245 @item @tab @code{ depobj_count, depobj_list) bind(C)}
2246 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_size_t, c_int}
2247 @item @tab @code{type(c_ptr), value :: dst, src}
2248 @item @tab @code{integer(c_size_t), value :: length, dst_offset, src_offset}
2249 @item @tab @code{integer(c_int), value :: dst_device_num, src_device_num, depobj_count}
2250 @item @tab @code{integer(omp_depend_kind), optional :: depobj_list(*)}
2253 @item @emph{See also}:
2254 @ref{omp_target_memcpy}, @ref{omp_target_memcpy_rect_async}
2256 @item @emph{Reference}:
2257 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.7
2262 @node omp_target_memcpy_rect
2263 @subsection @code{omp_target_memcpy_rect} -- Copy a subvolume of data between devices
2265 @item @emph{Description}:
2266 This routine copies a subvolume of data from the device identified by
2267 device number @var{src_device_num} to device @var{dst_device_num}.
2268 The array has @var{num_dims} dimensions and each array element has a size of
2269 @var{element_size} bytes. The @var{volume} array specifies how many elements
2270 per dimension are copied. The full sizes of the destination and source arrays
2271 are given by the @var{dst_dimensions} and @var{src_dimensions} arguments,
2272 respectively. The offset per dimension to the first element to be copied is
2273 given by the @var{dst_offset} and @var{src_offset} arguments. The routine
2274 returns zero on success and non-zero otherwise.
2276 The OpenMP specification only requires that @var{num_dims} up to three is
2277 supported. In order to find implementation-specific maximally supported number
2278 of dimensions, the routine returns this value when invoked with a null pointer
2279 to both the @var{dst} and @var{src} arguments. As GCC supports arbitrary
2280 dimensions, it returns @code{INT_MAX}.
2282 The device-number arguments must be conforming device numbers, the @var{src} and
2283 @var{dst} must be either both null pointers or all of the following must be
2284 fulfilled: @var{element_size} and @var{num_dims} must be positive and the
2285 @var{volume}, offset and dimension arrays must have at least @var{num_dims}
2288 Running this routine in a @code{target} region is not supported except on
2292 @multitable @columnfractions .20 .80
2293 @item @emph{Prototype}: @tab @code{int omp_target_memcpy_rect(void *dst,}
2294 @item @tab @code{ const void *src,}
2295 @item @tab @code{ size_t element_size,}
2296 @item @tab @code{ int num_dims,}
2297 @item @tab @code{ const size_t *volume,}
2298 @item @tab @code{ const size_t *dst_offset,}
2299 @item @tab @code{ const size_t *src_offset,}
2300 @item @tab @code{ const size_t *dst_dimensions,}
2301 @item @tab @code{ const size_t *src_dimensions,}
2302 @item @tab @code{ int dst_device_num,}
2303 @item @tab @code{ int src_device_num)}
2306 @item @emph{Fortran}:
2307 @multitable @columnfractions .20 .80
2308 @item @emph{Interface}: @tab @code{integer(c_int) function omp_target_memcpy_rect( &}
2309 @item @tab @code{ dst, src, element_size, num_dims, volume, &}
2310 @item @tab @code{ dst_offset, src_offset, dst_dimensions, &}
2311 @item @tab @code{ src_dimensions, dst_device_num, src_device_num) bind(C)}
2312 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_size_t, c_int}
2313 @item @tab @code{type(c_ptr), value :: dst, src}
2314 @item @tab @code{integer(c_size_t), value :: element_size, dst_offset, src_offset}
2315 @item @tab @code{integer(c_size_t), value :: volume, dst_dimensions, src_dimensions}
2316 @item @tab @code{integer(c_int), value :: num_dims, dst_device_num, src_device_num}
2319 @item @emph{See also}:
2320 @ref{omp_target_memcpy_rect_async}, @ref{omp_target_memcpy}
2322 @item @emph{Reference}:
2323 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.6
2328 @node omp_target_memcpy_rect_async
2329 @subsection @code{omp_target_memcpy_rect_async} -- Copy a subvolume of data between devices asynchronously
2331 @item @emph{Description}:
2332 This routine copies asynchronously a subvolume of data from the device
2333 identified by device number @var{src_device_num} to device @var{dst_device_num}.
2334 The array has @var{num_dims} dimensions and each array element has a size of
2335 @var{element_size} bytes. The @var{volume} array specifies how many elements
2336 per dimension are copied. The full sizes of the destination and source arrays
2337 are given by the @var{dst_dimensions} and @var{src_dimensions} arguments,
2338 respectively. The offset per dimension to the first element to be copied is
2339 given by the @var{dst_offset} and @var{src_offset} arguments. Task dependence
2340 is expressed by passing an array of depend objects to @var{depobj_list}, where
2341 the number of array elements is passed as @var{depobj_count}; if the count is
2342 zero, the @var{depobj_list} argument is ignored. In C++ and Fortran, the
2343 @var{depobj_list} argument can also be omitted in that case. The routine
2344 returns zero on success and non-zero otherwise.
2346 The OpenMP specification only requires that @var{num_dims} up to three is
2347 supported. In order to find implementation-specific maximally supported number
2348 of dimensions, the routine returns this value when invoked with a null pointer
2349 to both the @var{dst} and @var{src} arguments. As GCC supports arbitrary
2350 dimensions, it returns @code{INT_MAX}.
2352 The device-number arguments must be conforming device numbers, the @var{src} and
2353 @var{dst} must be either both null pointers or all of the following must be
2354 fulfilled: @var{element_size} and @var{num_dims} must be positive and the
2355 @var{volume}, offset and dimension arrays must have at least @var{num_dims}
2358 Running this routine in a @code{target} region is not supported except on
2362 @multitable @columnfractions .20 .80
2363 @item @emph{Prototype}: @tab @code{int omp_target_memcpy_rect_async(void *dst,}
2364 @item @tab @code{ const void *src,}
2365 @item @tab @code{ size_t element_size,}
2366 @item @tab @code{ int num_dims,}
2367 @item @tab @code{ const size_t *volume,}
2368 @item @tab @code{ const size_t *dst_offset,}
2369 @item @tab @code{ const size_t *src_offset,}
2370 @item @tab @code{ const size_t *dst_dimensions,}
2371 @item @tab @code{ const size_t *src_dimensions,}
2372 @item @tab @code{ int dst_device_num,}
2373 @item @tab @code{ int src_device_num,}
2374 @item @tab @code{ int depobj_count,}
2375 @item @tab @code{ omp_depend_t *depobj_list)}
2378 @item @emph{Fortran}:
2379 @multitable @columnfractions .20 .80
2380 @item @emph{Interface}: @tab @code{integer(c_int) function omp_target_memcpy_rect_async( &}
2381 @item @tab @code{ dst, src, element_size, num_dims, volume, &}
2382 @item @tab @code{ dst_offset, src_offset, dst_dimensions, &}
2383 @item @tab @code{ src_dimensions, dst_device_num, src_device_num, &}
2384 @item @tab @code{ depobj_count, depobj_list) bind(C)}
2385 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_size_t, c_int}
2386 @item @tab @code{type(c_ptr), value :: dst, src}
2387 @item @tab @code{integer(c_size_t), value :: element_size, dst_offset, src_offset}
2388 @item @tab @code{integer(c_size_t), value :: volume, dst_dimensions, src_dimensions}
2389 @item @tab @code{integer(c_int), value :: num_dims, dst_device_num, src_device_num}
2390 @item @tab @code{integer(c_int), value :: depobj_count}
2391 @item @tab @code{integer(omp_depend_kind), optional :: depobj_list(*)}
2394 @item @emph{See also}:
2395 @ref{omp_target_memcpy_rect}, @ref{omp_target_memcpy_async}
2397 @item @emph{Reference}:
2398 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.8
2403 @node omp_target_associate_ptr
2404 @subsection @code{omp_target_associate_ptr} -- Associate a device pointer with a host pointer
2406 @item @emph{Description}:
2407 This routine associates storage on the host with storage on a device identified
2408 by @var{device_num}. The device pointer is usually obtained by calling
2409 @code{omp_target_alloc} or by other means (but not by using the @code{map}
2410 clauses or the @code{declare target} directive). The host pointer should point
2411 to memory that has a storage size of at least @var{size}.
2413 The @var{device_offset} parameter specifies the offset into @var{device_ptr}
2414 that is used as the base address for the device side of the mapping; the
2415 storage size should be at least @var{device_offset} plus @var{size}.
2417 After the association, the host pointer can be used in a @code{map} clause and
2418 in the @code{to} and @code{from} clauses of the @code{target update} directive
2419 to transfer data between the associated pointers. The reference count of such
2420 associated storage is infinite. The association can be removed by calling
2421 @code{omp_target_disassociate_ptr} which should be done before the lifetime
2422 of either storage ends.
2424 The routine returns nonzero (@code{EINVAL}) when the @var{device_num} invalid,
2425 for when the initial device or the associated device shares memory with the
2426 host. @code{omp_target_associate_ptr} returns zero if @var{host_ptr} points
2427 into already associated storage that is fully inside of a previously associated
2428 memory. Otherwise, if the association was successful zero is returned; if none
2429 of the cases above apply, nonzero (@code{EINVAL}) is returned.
2431 The @code{omp_target_is_present} routine can be used to test whether
2432 associated storage for a device pointer exists.
2434 Running this routine in a @code{target} region except on the initial device
2438 @multitable @columnfractions .20 .80
2439 @item @emph{Prototype}: @tab @code{int omp_target_associate_ptr(const void *host_ptr,}
2440 @item @tab @code{ const void *device_ptr,}
2441 @item @tab @code{ size_t size,}
2442 @item @tab @code{ size_t device_offset,}
2443 @item @tab @code{ int device_num)}
2446 @item @emph{Fortran}:
2447 @multitable @columnfractions .20 .80
2448 @item @emph{Interface}: @tab @code{integer(c_int) function omp_target_associate_ptr(host_ptr, &}
2449 @item @tab @code{ device_ptr, size, device_offset, device_num) bind(C)}
2450 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int, c_size_t}
2451 @item @tab @code{type(c_ptr), value :: host_ptr, device_ptr}
2452 @item @tab @code{integer(c_size_t), value :: size, device_offset}
2453 @item @tab @code{integer(c_int), value :: device_num}
2456 @item @emph{See also}:
2457 @ref{omp_target_disassociate_ptr}, @ref{omp_target_is_present},
2458 @ref{omp_target_alloc}
2460 @item @emph{Reference}:
2461 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.9
2466 @node omp_target_disassociate_ptr
2467 @subsection @code{omp_target_disassociate_ptr} -- Remove device--host pointer association
2469 @item @emph{Description}:
2470 This routine removes the storage association established by calling
2471 @code{omp_target_associate_ptr} and sets the reference count to zero,
2472 even if @code{omp_target_associate_ptr} was invoked multiple times for
2473 for host pointer @code{ptr}. If applicable, the device memory needs
2474 to be freed by the user.
2476 If an associated device storage location for the @var{device_num} was
2477 found and has infinite reference count, the association is removed and
2478 zero is returned. In all other cases, nonzero (@code{EINVAL}) is returned
2479 and no other action is taken.
2481 Note that passing a host pointer where the association to the device pointer
2482 was established with the @code{declare target} directive yields undefined
2485 Running this routine in a @code{target} region except on the initial device
2489 @multitable @columnfractions .20 .80
2490 @item @emph{Prototype}: @tab @code{int omp_target_disassociate_ptr(const void *ptr,}
2491 @item @tab @code{ int device_num)}
2494 @item @emph{Fortran}:
2495 @multitable @columnfractions .20 .80
2496 @item @emph{Interface}: @tab @code{integer(c_int) function omp_target_disassociate_ptr(ptr, &}
2497 @item @tab @code{ device_num) bind(C)}
2498 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int}
2499 @item @tab @code{type(c_ptr), value :: ptr}
2500 @item @tab @code{integer(c_int), value :: device_num}
2503 @item @emph{See also}:
2504 @ref{omp_target_associate_ptr}
2506 @item @emph{Reference}:
2507 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.10
2512 @node omp_get_mapped_ptr
2513 @subsection @code{omp_get_mapped_ptr} -- Return device pointer to a host pointer
2515 @item @emph{Description}:
2516 If the device number is refers to the initial device or to a device with
2517 memory accessible from the host (shared memory), the @code{omp_get_mapped_ptr}
2518 routines returns the value of the passed @var{ptr}. Otherwise, if associated
2519 storage to the passed host pointer @var{ptr} exists on device associated with
2520 @var{device_num}, it returns that pointer. In all other cases and in cases of
2521 an error, a null pointer is returned.
2523 The association of storage location is established either via an explicit or
2524 implicit @code{map} clause, the @code{declare target} directive or the
2525 @code{omp_target_associate_ptr} routine.
2527 Running this routine in a @code{target} region except on the initial device
2531 @multitable @columnfractions .20 .80
2532 @item @emph{Prototype}: @tab @code{void *omp_get_mapped_ptr(const void *ptr, int device_num);}
2535 @item @emph{Fortran}:
2536 @multitable @columnfractions .20 .80
2537 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_get_mapped_ptr(ptr, device_num) bind(C)}
2538 @item @tab @code{use, intrinsic :: iso_c_binding, only: c_ptr, c_int}
2539 @item @tab @code{type(c_ptr), value :: ptr}
2540 @item @tab @code{integer(c_int), value :: device_num}
2543 @item @emph{See also}:
2544 @ref{omp_target_associate_ptr}
2546 @item @emph{Reference}:
2547 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.8.11
2553 @section Lock Routines
2555 Initialize, set, test, unset and destroy simple and nested locks.
2556 The routines have C linkage and do not throw exceptions.
2559 * omp_init_lock:: Initialize simple lock
2560 * omp_init_nest_lock:: Initialize nested lock
2561 @c PR libgomp/109452
2562 @c * omp_init_lock_with_hint:: Initialize simple lock with sync hint
2563 @c * omp_init_nest_lock_with_hint:: Initialize nested lock with sync hint
2564 * omp_destroy_lock:: Destroy simple lock
2565 * omp_destroy_nest_lock:: Destroy nested lock
2566 * omp_set_lock:: Wait for and set simple lock
2567 * omp_set_nest_lock:: Wait for and set simple lock
2568 * omp_unset_lock:: Unset simple lock
2569 * omp_unset_nest_lock:: Unset nested lock
2570 * omp_test_lock:: Test and set simple lock if available
2571 * omp_test_nest_lock:: Test and set nested lock if available
2577 @subsection @code{omp_init_lock} -- Initialize simple lock
2579 @item @emph{Description}:
2580 Initialize a simple lock. After initialization, the lock is in
2584 @multitable @columnfractions .20 .80
2585 @item @emph{Prototype}: @tab @code{void omp_init_lock(omp_lock_t *lock);}
2588 @item @emph{Fortran}:
2589 @multitable @columnfractions .20 .80
2590 @item @emph{Interface}: @tab @code{subroutine omp_init_lock(svar)}
2591 @item @tab @code{integer(omp_lock_kind), intent(out) :: svar}
2594 @item @emph{See also}:
2595 @ref{omp_destroy_lock}
2597 @item @emph{Reference}:
2598 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
2603 @node omp_init_nest_lock
2604 @subsection @code{omp_init_nest_lock} -- Initialize nested lock
2606 @item @emph{Description}:
2607 Initialize a nested lock. After initialization, the lock is in
2608 an unlocked state and the nesting count is set to zero.
2611 @multitable @columnfractions .20 .80
2612 @item @emph{Prototype}: @tab @code{void omp_init_nest_lock(omp_nest_lock_t *lock);}
2615 @item @emph{Fortran}:
2616 @multitable @columnfractions .20 .80
2617 @item @emph{Interface}: @tab @code{subroutine omp_init_nest_lock(nvar)}
2618 @item @tab @code{integer(omp_nest_lock_kind), intent(out) :: nvar}
2621 @item @emph{See also}:
2622 @ref{omp_destroy_nest_lock}
2624 @item @emph{Reference}:
2625 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.1.
2630 @node omp_destroy_lock
2631 @subsection @code{omp_destroy_lock} -- Destroy simple lock
2633 @item @emph{Description}:
2634 Destroy a simple lock. In order to be destroyed, a simple lock must be
2635 in the unlocked state.
2638 @multitable @columnfractions .20 .80
2639 @item @emph{Prototype}: @tab @code{void omp_destroy_lock(omp_lock_t *lock);}
2642 @item @emph{Fortran}:
2643 @multitable @columnfractions .20 .80
2644 @item @emph{Interface}: @tab @code{subroutine omp_destroy_lock(svar)}
2645 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
2648 @item @emph{See also}:
2651 @item @emph{Reference}:
2652 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
2657 @node omp_destroy_nest_lock
2658 @subsection @code{omp_destroy_nest_lock} -- Destroy nested lock
2660 @item @emph{Description}:
2661 Destroy a nested lock. In order to be destroyed, a nested lock must be
2662 in the unlocked state and its nesting count must equal zero.
2665 @multitable @columnfractions .20 .80
2666 @item @emph{Prototype}: @tab @code{void omp_destroy_nest_lock(omp_nest_lock_t *);}
2669 @item @emph{Fortran}:
2670 @multitable @columnfractions .20 .80
2671 @item @emph{Interface}: @tab @code{subroutine omp_destroy_nest_lock(nvar)}
2672 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
2675 @item @emph{See also}:
2678 @item @emph{Reference}:
2679 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.3.
2685 @subsection @code{omp_set_lock} -- Wait for and set simple lock
2687 @item @emph{Description}:
2688 Before setting a simple lock, the lock variable must be initialized by
2689 @code{omp_init_lock}. The calling thread is blocked until the lock
2690 is available. If the lock is already held by the current thread,
2694 @multitable @columnfractions .20 .80
2695 @item @emph{Prototype}: @tab @code{void omp_set_lock(omp_lock_t *lock);}
2698 @item @emph{Fortran}:
2699 @multitable @columnfractions .20 .80
2700 @item @emph{Interface}: @tab @code{subroutine omp_set_lock(svar)}
2701 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
2704 @item @emph{See also}:
2705 @ref{omp_init_lock}, @ref{omp_test_lock}, @ref{omp_unset_lock}
2707 @item @emph{Reference}:
2708 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
2713 @node omp_set_nest_lock
2714 @subsection @code{omp_set_nest_lock} -- Wait for and set nested lock
2716 @item @emph{Description}:
2717 Before setting a nested lock, the lock variable must be initialized by
2718 @code{omp_init_nest_lock}. The calling thread is blocked until the lock
2719 is available. If the lock is already held by the current thread, the
2720 nesting count for the lock is incremented.
2723 @multitable @columnfractions .20 .80
2724 @item @emph{Prototype}: @tab @code{void omp_set_nest_lock(omp_nest_lock_t *lock);}
2727 @item @emph{Fortran}:
2728 @multitable @columnfractions .20 .80
2729 @item @emph{Interface}: @tab @code{subroutine omp_set_nest_lock(nvar)}
2730 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
2733 @item @emph{See also}:
2734 @ref{omp_init_nest_lock}, @ref{omp_unset_nest_lock}
2736 @item @emph{Reference}:
2737 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.4.
2742 @node omp_unset_lock
2743 @subsection @code{omp_unset_lock} -- Unset simple lock
2745 @item @emph{Description}:
2746 A simple lock about to be unset must have been locked by @code{omp_set_lock}
2747 or @code{omp_test_lock} before. In addition, the lock must be held by the
2748 thread calling @code{omp_unset_lock}. Then, the lock becomes unlocked. If one
2749 or more threads attempted to set the lock before, one of them is chosen to,
2750 again, set the lock to itself.
2753 @multitable @columnfractions .20 .80
2754 @item @emph{Prototype}: @tab @code{void omp_unset_lock(omp_lock_t *lock);}
2757 @item @emph{Fortran}:
2758 @multitable @columnfractions .20 .80
2759 @item @emph{Interface}: @tab @code{subroutine omp_unset_lock(svar)}
2760 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
2763 @item @emph{See also}:
2764 @ref{omp_set_lock}, @ref{omp_test_lock}
2766 @item @emph{Reference}:
2767 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
2772 @node omp_unset_nest_lock
2773 @subsection @code{omp_unset_nest_lock} -- Unset nested lock
2775 @item @emph{Description}:
2776 A nested lock about to be unset must have been locked by @code{omp_set_nested_lock}
2777 or @code{omp_test_nested_lock} before. In addition, the lock must be held by the
2778 thread calling @code{omp_unset_nested_lock}. If the nesting count drops to zero, the
2779 lock becomes unlocked. If one ore more threads attempted to set the lock before,
2780 one of them is chosen to, again, set the lock to itself.
2783 @multitable @columnfractions .20 .80
2784 @item @emph{Prototype}: @tab @code{void omp_unset_nest_lock(omp_nest_lock_t *lock);}
2787 @item @emph{Fortran}:
2788 @multitable @columnfractions .20 .80
2789 @item @emph{Interface}: @tab @code{subroutine omp_unset_nest_lock(nvar)}
2790 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
2793 @item @emph{See also}:
2794 @ref{omp_set_nest_lock}
2796 @item @emph{Reference}:
2797 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.5.
2803 @subsection @code{omp_test_lock} -- Test and set simple lock if available
2805 @item @emph{Description}:
2806 Before setting a simple lock, the lock variable must be initialized by
2807 @code{omp_init_lock}. Contrary to @code{omp_set_lock}, @code{omp_test_lock}
2808 does not block if the lock is not available. This function returns
2809 @code{true} upon success, @code{false} otherwise. Here, @code{true} and
2810 @code{false} represent their language-specific counterparts.
2813 @multitable @columnfractions .20 .80
2814 @item @emph{Prototype}: @tab @code{int omp_test_lock(omp_lock_t *lock);}
2817 @item @emph{Fortran}:
2818 @multitable @columnfractions .20 .80
2819 @item @emph{Interface}: @tab @code{logical function omp_test_lock(svar)}
2820 @item @tab @code{integer(omp_lock_kind), intent(inout) :: svar}
2823 @item @emph{See also}:
2824 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
2826 @item @emph{Reference}:
2827 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
2832 @node omp_test_nest_lock
2833 @subsection @code{omp_test_nest_lock} -- Test and set nested lock if available
2835 @item @emph{Description}:
2836 Before setting a nested lock, the lock variable must be initialized by
2837 @code{omp_init_nest_lock}. Contrary to @code{omp_set_nest_lock},
2838 @code{omp_test_nest_lock} does not block if the lock is not available.
2839 If the lock is already held by the current thread, the new nesting count
2840 is returned. Otherwise, the return value equals zero.
2843 @multitable @columnfractions .20 .80
2844 @item @emph{Prototype}: @tab @code{int omp_test_nest_lock(omp_nest_lock_t *lock);}
2847 @item @emph{Fortran}:
2848 @multitable @columnfractions .20 .80
2849 @item @emph{Interface}: @tab @code{logical function omp_test_nest_lock(nvar)}
2850 @item @tab @code{integer(omp_nest_lock_kind), intent(inout) :: nvar}
2854 @item @emph{See also}:
2855 @ref{omp_init_lock}, @ref{omp_set_lock}, @ref{omp_set_lock}
2857 @item @emph{Reference}:
2858 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.3.6.
2863 @node Timing Routines
2864 @section Timing Routines
2866 Portable, thread-based, wall clock timer.
2867 The routines have C linkage and do not throw exceptions.
2870 * omp_get_wtick:: Get timer precision.
2871 * omp_get_wtime:: Elapsed wall clock time.
2877 @subsection @code{omp_get_wtick} -- Get timer precision
2879 @item @emph{Description}:
2880 Gets the timer precision, i.e., the number of seconds between two
2881 successive clock ticks.
2884 @multitable @columnfractions .20 .80
2885 @item @emph{Prototype}: @tab @code{double omp_get_wtick(void);}
2888 @item @emph{Fortran}:
2889 @multitable @columnfractions .20 .80
2890 @item @emph{Interface}: @tab @code{double precision function omp_get_wtick()}
2893 @item @emph{See also}:
2896 @item @emph{Reference}:
2897 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.2.
2903 @subsection @code{omp_get_wtime} -- Elapsed wall clock time
2905 @item @emph{Description}:
2906 Elapsed wall clock time in seconds. The time is measured per thread, no
2907 guarantee can be made that two distinct threads measure the same time.
2908 Time is measured from some "time in the past", which is an arbitrary time
2909 guaranteed not to change during the execution of the program.
2912 @multitable @columnfractions .20 .80
2913 @item @emph{Prototype}: @tab @code{double omp_get_wtime(void);}
2916 @item @emph{Fortran}:
2917 @multitable @columnfractions .20 .80
2918 @item @emph{Interface}: @tab @code{double precision function omp_get_wtime()}
2921 @item @emph{See also}:
2924 @item @emph{Reference}:
2925 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.4.1.
2931 @section Event Routine
2933 Support for event objects.
2934 The routine has C linkage and do not throw exceptions.
2937 * omp_fulfill_event:: Fulfill and destroy an OpenMP event.
2942 @node omp_fulfill_event
2943 @subsection @code{omp_fulfill_event} -- Fulfill and destroy an OpenMP event
2945 @item @emph{Description}:
2946 Fulfill the event associated with the event handle argument. Currently, it
2947 is only used to fulfill events generated by detach clauses on task
2948 constructs - the effect of fulfilling the event is to allow the task to
2951 The result of calling @code{omp_fulfill_event} with an event handle other
2952 than that generated by a detach clause is undefined. Calling it with an
2953 event handle that has already been fulfilled is also undefined.
2956 @multitable @columnfractions .20 .80
2957 @item @emph{Prototype}: @tab @code{void omp_fulfill_event(omp_event_handle_t event);}
2960 @item @emph{Fortran}:
2961 @multitable @columnfractions .20 .80
2962 @item @emph{Interface}: @tab @code{subroutine omp_fulfill_event(event)}
2963 @item @tab @code{integer (kind=omp_event_handle_kind) :: event}
2966 @item @emph{Reference}:
2967 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.5.1.
2972 @node Interoperability Routines
2973 @section Interoperability Routines
2975 Routines to obtain properties from an object of OpenMP interop type.
2976 They have C linkage and do not throw exceptions.
2979 * omp_get_num_interop_properties:: Get the number of implementation-specific properties
2980 * omp_get_interop_int:: Obtain integer-valued interoperability property
2981 * omp_get_interop_ptr:: Obtain pointer-valued interoperability property
2982 * omp_get_interop_str:: Obtain string-valued interoperability property
2983 * omp_get_interop_name:: Obtain the name of an interop_property value as string
2984 * omp_get_interop_type_desc:: Obtain type and description to an interop_property
2985 * omp_get_interop_rc_desc:: Obtain error string to an interop_rc error code
2990 @node omp_get_num_interop_properties
2991 @subsection @code{omp_get_num_interop_properties} -- Get the number of implementation-specific properties
2993 @item @emph{Description}:
2994 The @code{omp_get_num_interop_properties} function returns the number of
2995 implementation-defined interoperability properties available for the passed
2996 @var{interop}, extending the OpenMP-defined properties. The available OpenMP
2997 interop_property-type values range from @code{omp_ipr_first} to the value
2998 returned by @code{omp_get_num_interop_properties} minus one.
3000 No implementation-defined properties are currently defined in GCC.
3002 @c Implementation remark: In GCC, the Fortran interface differs from the one shown
3003 @c below: the function has C binding, @var{interop} is passed by value and an
3004 @c integer of @code{c_int} kind is returned, which permits use of the same ABI as
3005 @c the C function. This does not affect the usage of the function when GCC's
3006 @c @code{omp_lib} module or @code{omp_lib.h} header is used.
3009 @multitable @columnfractions .20 .80
3010 @item @emph{Prototype}: @tab @code{int omp_get_num_interop_properties(const omp_interop_t interop)}
3013 @item @emph{Fortran}:
3014 @multitable @columnfractions .20 .80
3015 @item @emph{Interface}: @tab @code{integer function omp_get_num_interop_properties(interop)}
3016 @item @tab @code{integer(omp_interop_kind), intent(in) :: interop}
3019 @item @emph{See also}:
3020 @ref{omp_get_interop_name}, @ref{omp_get_interop_type_desc}
3022 @item @emph{Reference}:
3023 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.1,
3024 @uref{https://www.openmp.org, OpenMP specification v6.0}, Section 26.1
3029 @node omp_get_interop_int
3030 @subsection @code{omp_get_interop_int} -- Obtain integer-valued interoperability property
3032 @item @emph{Description}:
3033 The @code{omp_get_interop_int} function returns the integer value associated
3034 with the @var{property_id} interoperability property of the passed @var{interop}
3035 object. The @var{ret_code} argument is optional, i.e. it can be omitted in C++
3036 and Fortran or used with @code{NULL} as argument in C and C++. If successful,
3037 @var{ret_code} (if present) is set to @code{omp_irc_success}.
3039 In GCC, the effect of running this routine in a @code{target} region that is not
3040 the initial device is unspecified.
3042 @c Implementation remark: In GCC, the Fortran interface differs from the one shown
3043 @c below: the function has C binding and @var{interop} and @var{property_id} are
3044 @c passed by value, which permits use of the same ABI as the C function. This does
3045 @c not affect the usage of the function when GCC's @code{omp_lib} module or
3046 @c @code{omp_lib.h} header is used.
3049 @multitable @columnfractions .20 .80
3050 @item @emph{Prototype}: @tab @code{omp_intptr_t omp_get_interop_int(const omp_interop_t interop,
3051 omp_interop_property_t property_id, int *ret_code)}
3054 @item @emph{Fortran}:
3055 @multitable @columnfractions .20 .80
3056 @item @emph{Interface}: @tab @code{integer(c_intptr_t) function omp_get_interop_int(interop,
3057 property_id, ret_code)}
3058 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_intptr_t}
3059 @item @tab @code{integer(omp_interop_kind), intent(in) :: interop}
3060 @item @tab @code{integer(omp_interop_property_kind) property_id}
3061 @item @tab @code{integer(omp_interop_rc_kind), optional, intent(out) :: ret_code}
3064 @item @emph{See also}:
3065 @ref{omp_get_interop_ptr}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc}
3067 @item @emph{Reference}:
3068 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.2,
3069 @uref{https://www.openmp.org, OpenMP specification v6.0}, Section 26.2
3074 @node omp_get_interop_ptr
3075 @subsection @code{omp_get_interop_ptr} -- Obtain pointer-valued interoperability property
3077 @item @emph{Description}:
3078 The @code{omp_get_interop_int} function returns the pointer value associated with
3079 the @var{property_id} interoperability property of the passed @var{interop}
3080 object. The @var{ret_code} argument is optional, i.e. it can be omitted in C++
3081 and Fortran or used with @code{NULL} as argument in C and C++. If successful,
3082 @var{ret_code} (if present) is set to @code{omp_irc_success}.
3084 In GCC, the effect of running this routine in a @code{target} region that is not
3085 the initial device is unspecified.
3087 @c Implementation remark: In GCC, the Fortran interface differs from the one shown
3088 @c below: the function has C binding and @var{interop} and @var{property_id} are
3089 @c passed by value, which permits use of the same ABI as the C function. This does
3090 @c not affect the usage of the function when GCC's @code{omp_lib} module or
3091 @c @code{omp_lib.h} header is used.
3094 @multitable @columnfractions .20 .80
3095 @item @emph{Prototype}: @tab @code{void *omp_get_interop_ptr(const omp_interop_t interop,
3096 omp_interop_property_t property_id, int *ret_code)}
3099 @item @emph{Fortran}:
3100 @multitable @columnfractions .20 .80
3101 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_get_interop_int(interop,
3102 property_id, ret_code)}
3103 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr}
3104 @item @tab @code{integer(omp_interop_kind), intent(in) :: interop}
3105 @item @tab @code{integer(omp_interop_property_kind) property_id}
3106 @item @tab @code{integer(omp_interop_rc_kind), optional, intent(out) :: ret_code}
3109 @item @emph{See also}:
3110 @ref{omp_get_interop_int}, @ref{omp_get_interop_str}, @ref{omp_get_interop_rc_desc}
3112 @item @emph{Reference}:
3113 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.3,
3114 @uref{https://www.openmp.org, OpenMP specification v6.0}, Section 26.3
3119 @node omp_get_interop_str
3120 @subsection @code{omp_get_interop_str} -- Obtain string-valued interoperability property
3122 @item @emph{Description}:
3123 The @code{omp_get_interop_str} function returns the string value associated with
3124 the @var{property_id} interoperability property of the passed @var{interop}
3125 object. The @var{ret_code} argument is optional, i.e. it can be omitted in C++
3126 and Fortran or used with @code{NULL} as argument in C and C++. If successful,
3127 @var{ret_code} (if present) is set to @code{omp_irc_success}.
3129 In GCC, the effect of running this routine in a @code{target} region that is not
3130 the initial device is unspecified.
3132 @c Implementation remark: In GCC, the Fortran interface differs from the one shown
3133 @c below: @var{interop} and @var{property_id} are passed by value. This does not
3134 @c affect the usage of the function when GCC's @code{omp_lib} module or
3135 @c @code{omp_lib.h} header is used.
3138 @multitable @columnfractions .20 .80
3139 @item @emph{Prototype}: @tab @code{const char *omp_get_interop_str(const omp_interop_t interop,
3140 omp_interop_property_t property_id, int *ret_code)}
3143 @item @emph{Fortran}:
3144 @multitable @columnfractions .20 .80
3145 @item @emph{Interface}: @tab @code{character(:) function omp_get_interop_str(interop,
3146 property_id, ret_code)}
3147 @item @tab @code{pointer :: omp_get_interop_str}
3148 @item @tab @code{integer(omp_interop_kind), intent(in) :: interop}
3149 @item @tab @code{integer(omp_interop_property_kind) property_id}
3150 @item @tab @code{integer(omp_interop_rc_kind), optional, intent(out) :: ret_code}
3153 @item @emph{See also}:
3154 @ref{omp_get_interop_int}, @ref{omp_get_interop_ptr}, @ref{omp_get_interop_rc_desc}
3156 @item @emph{Reference}:
3157 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.4,
3158 @uref{https://www.openmp.org, OpenMP specification v6.0}, Section 26.4
3163 @node omp_get_interop_name
3164 @subsection @code{omp_get_interop_name} -- Obtain the name of an @code{interop_property} value as string
3166 @item @emph{Description}:
3167 The @code{omp_get_interop_name} function returns the name of the property
3168 itself as string; for the properties specified by the OpenMP specification,
3169 the name matches the name of the named constant with the @samp{omp_ipr_}
3172 @c Implementation remark: In GCC, the Fortran interface differs from the one shown
3173 @c below: @var{interop} and @var{property_id} are passed by value. This does not
3174 @c affect the usage of the function when GCC's @code{omp_lib} module or
3175 @c @code{omp_lib.h} header is used.
3178 @multitable @columnfractions .20 .80
3179 @item @emph{Prototype}: @tab @code{const char *omp_get_interop_name(const omp_interop_t interop,
3180 omp_interop_property_t property_id)}
3183 @item @emph{Fortran}:
3184 @multitable @columnfractions .20 .80
3185 @item @emph{Interface}: @tab @code{character(:) function omp_get_interop_name(interop,
3187 @item @tab @code{pointer :: omp_get_interop_name}
3188 @item @tab @code{integer(omp_interop_kind), intent(in) :: interop}
3189 @item @tab @code{integer(omp_interop_property_kind) property_id}
3192 @item @emph{See also}:
3193 @ref{omp_get_num_interop_properties}, @ref{omp_get_interop_type_desc}
3195 @item @emph{Reference}:
3196 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.5,
3197 @uref{https://www.openmp.org, OpenMP specification v6.0}, Section 26.5
3202 @node omp_get_interop_type_desc
3203 @subsection @code{omp_get_interop_type_desc} -- Obtain type and description to an @code{interop_property}
3205 @item @emph{Description}:
3206 The @code{omp_get_interop_type_desc} function returns a string that describes in
3207 human-readable form the data type associated with the @var{property_id}
3208 interoperability property of the passed @var{interop} object.
3210 In GCC, this function returns the name of the C/C++ data type for this property
3211 or @samp{N/A} if this property is not available for the given foreign runtime.
3212 If @var{interop} is @code{omp_interop_none} or for invalid property values,
3213 a null pointer is returned. The effect of running this routine in a
3214 @code{target} region that is not the initial device is unspecified.
3216 @c Implementation remark: In GCC, the Fortran interface differs from the one shown
3217 @c below: @var{interop} and @var{property_id} are passed by value. This does not
3218 @c affect the usage of the function when GCC's @code{omp_lib} module or
3219 @c @code{omp_lib.h} header is used.
3222 @multitable @columnfractions .20 .80
3223 @item @emph{Prototype}: @tab @code{const char *omp_get_interop_type_desc(const omp_interop_t interop,
3224 omp_interop_property_t property_id)}
3227 @item @emph{Fortran}:
3228 @multitable @columnfractions .20 .80
3229 @item @emph{Interface}: @tab @code{character(:) function omp_get_interop_type_desc(interop,
3231 @item @tab @code{pointer :: omp_get_interop_type_desc}
3232 @item @tab @code{integer(omp_interop_kind), intent(in) :: interop}
3233 @item @tab @code{integer(omp_interop_property_kind) property_id}
3236 @item @emph{See also}:
3237 @ref{omp_get_num_interop_properties}, @ref{omp_get_interop_name}
3239 @item @emph{Reference}:
3240 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.6,
3241 @uref{https://www.openmp.org, OpenMP specification v6.0}, Section 26.6
3246 @node omp_get_interop_rc_desc
3247 @subsection @code{omp_get_interop_rc_desc} -- Obtain error string to an @code{interop_rc} error code
3249 @item @emph{Description}:
3250 The @code{omp_get_interop_rc_desc} function returns a string value describing
3251 the @var{ret_code} in human-readable form.
3253 The behavior is unspecified if value of @var{ret_code} was not set by an
3254 interoperability routine invoked for @var{interop}.
3257 @multitable @columnfractions .20 .80
3258 @item @emph{Prototype}: @tab @code{const char *omp_get_interop_rc_desc(const omp_interop_t interop,
3259 omp_interop_rc_t ret_code)}
3262 @item @emph{Fortran}:
3263 @multitable @columnfractions .20 .80
3264 @item @emph{Interface}: @tab @code{character(:) function omp_get_interop_rc_desc(interop,
3265 property_id, ret_code)}
3266 @item @tab @code{pointer :: omp_get_interop_rc_desc}
3267 @item @tab @code{integer(omp_interop_kind), intent(in) :: interop}
3268 @item @tab @code{integer (omp_interop_rc_kind) ret_code}
3271 @item @emph{Reference}:
3272 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.12.7,
3273 @uref{https://www.openmp.org, OpenMP specification v6.0}, Section 26.7
3278 @node Memory Management Routines
3279 @section Memory Management Routines
3281 Routines to manage and allocate memory on the current device.
3282 They have C linkage and do not throw exceptions.
3285 @c * omp_get_devices_memspace:: <fixme>/TR13
3286 @c * omp_get_device_memspace:: <fixme>/TR13
3287 @c * omp_get_devices_and_host_memspace:: <fixme>/TR13
3288 @c * omp_get_device_and_host_memspace:: <fixme>/TR13
3289 @c * omp_get_devices_all_memspace:: <fixme>/TR13
3290 @c * omp_get_memspace_num_resources:: <fixme>/TR11
3291 @c * omp_get_memspace_pagesize:: <fixme>/TR13
3292 @c * omp_get_submemspace:: <fixme>/TR11
3293 @c * omp_init_mempartitioner:: <fixme>/TR13
3294 @c * omp_destroy_mempartitioner:: <fixme>/TR13
3295 @c * omp_init_mempartition:: <fixme>/TR13
3296 @c * omp_destroy_mempartition:: <fixme>/TR13
3297 @c * omp_mempartition_set_part:: <fixme>/TR13
3298 @c * omp_mempartition_get_user_data:: <fixme>/TR13
3299 * omp_init_allocator:: Create an allocator
3300 * omp_destroy_allocator:: Destroy an allocator
3301 @c * omp_get_devices_allocator:: <fixme>/TR13
3302 @c * omp_get_device_allocator:: <fixme>/TR13
3303 @c * omp_get_devices_and_host_allocator:: <fixme>/TR13
3304 @c * omp_get_device_and_host_allocator:: <fixme>/TR13
3305 @c * omp_get_devices_all_allocator:: <fixme>/TR13
3306 * omp_set_default_allocator:: Set the default allocator
3307 * omp_get_default_allocator:: Get the default allocator
3308 * omp_alloc:: Memory allocation with an allocator
3309 * omp_aligned_alloc:: Memory allocation with an allocator and alignment
3310 * omp_free:: Freeing memory allocated with OpenMP routines
3311 * omp_calloc:: Allocate nullified memory with an allocator
3312 * omp_aligned_calloc:: Allocate nullified aligned memory with an allocator
3313 * omp_realloc:: Reallocate memory allocated with OpenMP routines
3318 @node omp_init_allocator
3319 @subsection @code{omp_init_allocator} -- Create an allocator
3321 @item @emph{Description}:
3322 Create an allocator that uses the specified memory space and has the specified
3323 traits; if an allocator that fulfills the requirements cannot be created,
3324 @code{omp_null_allocator} is returned.
3326 The predefined memory spaces and available traits can be found at
3327 @ref{OMP_ALLOCATOR}, where the trait names have to be prefixed by
3328 @code{omp_atk_} (e.g. @code{omp_atk_pinned}) and the named trait values by
3329 @code{omp_atv_} (e.g. @code{omp_atv_true}); additionally, @code{omp_atv_default}
3330 may be used as trait value to specify that the default value should be used.
3333 @multitable @columnfractions .20 .80
3334 @item @emph{Prototype}: @tab @code{omp_allocator_handle_t omp_init_allocator(}
3335 @item @tab @code{ omp_memspace_handle_t memspace,}
3336 @item @tab @code{ int ntraits,}
3337 @item @tab @code{ const omp_alloctrait_t traits[]);}
3340 @item @emph{Fortran}:
3341 @multitable @columnfractions .20 .80
3342 @item @emph{Interface}: @tab @code{function omp_init_allocator(memspace, ntraits, traits)}
3343 @item @tab @code{integer (omp_allocator_handle_kind) :: omp_init_allocator}
3344 @item @tab @code{integer (omp_memspace_handle_kind), intent(in) :: memspace}
3345 @item @tab @code{integer, intent(in) :: ntraits}
3346 @item @tab @code{type (omp_alloctrait), intent(in) :: traits(*)}
3349 @item @emph{See also}:
3350 @ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_destroy_allocator}
3352 @item @emph{Reference}:
3353 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.2
3358 @node omp_destroy_allocator
3359 @subsection @code{omp_destroy_allocator} -- Destroy an allocator
3361 @item @emph{Description}:
3362 Releases all resources used by a memory allocator, which must not represent
3363 a predefined memory allocator. Accessing memory after its allocator has been
3364 destroyed has unspecified behavior. Passing @code{omp_null_allocator} to the
3365 routine is permitted but has no effect.
3369 @multitable @columnfractions .20 .80
3370 @item @emph{Prototype}: @tab @code{void omp_destroy_allocator (omp_allocator_handle_t allocator);}
3373 @item @emph{Fortran}:
3374 @multitable @columnfractions .20 .80
3375 @item @emph{Interface}: @tab @code{subroutine omp_destroy_allocator(allocator)}
3376 @item @tab @code{integer (omp_allocator_handle_kind), intent(in) :: allocator}
3379 @item @emph{See also}:
3380 @ref{omp_init_allocator}
3382 @item @emph{Reference}:
3383 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.3
3388 @node omp_set_default_allocator
3389 @subsection @code{omp_set_default_allocator} -- Set the default allocator
3391 @item @emph{Description}:
3392 Sets the default allocator that is used when no allocator has been specified
3393 in the @code{allocate} or @code{allocator} clause or if an OpenMP memory
3394 routine is invoked with the @code{omp_null_allocator} allocator.
3397 @multitable @columnfractions .20 .80
3398 @item @emph{Prototype}: @tab @code{void omp_set_default_allocator(omp_allocator_handle_t allocator);}
3401 @item @emph{Fortran}:
3402 @multitable @columnfractions .20 .80
3403 @item @emph{Interface}: @tab @code{subroutine omp_set_default_allocator(allocator)}
3404 @item @tab @code{integer (omp_allocator_handle_kind), intent(in) :: allocator}
3407 @item @emph{See also}:
3408 @ref{omp_get_default_allocator}, @ref{omp_init_allocator}, @ref{OMP_ALLOCATOR},
3409 @ref{Memory allocation}
3411 @item @emph{Reference}:
3412 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.4
3417 @node omp_get_default_allocator
3418 @subsection @code{omp_get_default_allocator} -- Get the default allocator
3420 @item @emph{Description}:
3421 The routine returns the default allocator that is used when no allocator has
3422 been specified in the @code{allocate} or @code{allocator} clause or if an
3423 OpenMP memory routine is invoked with the @code{omp_null_allocator} allocator.
3426 @multitable @columnfractions .20 .80
3427 @item @emph{Prototype}: @tab @code{omp_allocator_handle_t omp_get_default_allocator();}
3430 @item @emph{Fortran}:
3431 @multitable @columnfractions .20 .80
3432 @item @emph{Interface}: @tab @code{function omp_get_default_allocator()}
3433 @item @tab @code{integer (omp_allocator_handle_kind) :: omp_get_default_allocator}
3436 @item @emph{See also}:
3437 @ref{omp_set_default_allocator}, @ref{OMP_ALLOCATOR}
3439 @item @emph{Reference}:
3440 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.5
3446 @subsection @code{omp_alloc} -- Memory allocation with an allocator
3448 @item @emph{Description}:
3449 Allocate memory with the specified allocator, which can either be a predefined
3450 allocator, an allocator handle or @code{omp_null_allocator}. If the allocators
3451 is @code{omp_null_allocator}, the allocator specified by the
3452 @var{def-allocator-var} ICV is used. @var{size} must be a nonnegative number
3453 denoting the number of bytes to be allocated; if @var{size} is zero,
3454 @code{omp_alloc} will return a null pointer. If successful, a pointer to the
3455 allocated memory is returned, otherwise the @code{fallback} trait of the
3456 allocator determines the behavior. The content of the allocated memory is
3459 In @code{target} regions, either the @code{dynamic_allocators} clause must
3460 appear on a @code{requires} directive in the same compilation unit -- or the
3461 @var{allocator} argument may only be a constant expression with the value of
3462 one of the predefined allocators and may not be @code{omp_null_allocator}.
3464 Memory allocated by @code{omp_alloc} must be freed using @code{omp_free}.
3467 @multitable @columnfractions .20 .80
3468 @item @emph{Prototype}: @tab @code{void* omp_alloc(size_t size,}
3469 @item @tab @code{ omp_allocator_handle_t allocator)}
3473 @multitable @columnfractions .20 .80
3474 @item @emph{Prototype}: @tab @code{void* omp_alloc(size_t size,}
3475 @item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
3478 @item @emph{Fortran}:
3479 @multitable @columnfractions .20 .80
3480 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_alloc(size, allocator) bind(C)}
3481 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
3482 @item @tab @code{integer (c_size_t), value :: size}
3483 @item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
3486 @item @emph{See also}:
3487 @ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
3488 @ref{omp_free}, @ref{omp_init_allocator}
3490 @item @emph{Reference}:
3491 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.6
3496 @node omp_aligned_alloc
3497 @subsection @code{omp_aligned_alloc} -- Memory allocation with an allocator and alignment
3499 @item @emph{Description}:
3500 Allocate memory with the specified allocator, which can either be a predefined
3501 allocator, an allocator handle or @code{omp_null_allocator}. If the allocators
3502 is @code{omp_null_allocator}, the allocator specified by the
3503 @var{def-allocator-var} ICV is used. @var{alignment} must be a positive power
3504 of two and @var{size} must be a nonnegative number that is a multiple of the
3505 alignment and denotes the number of bytes to be allocated; if @var{size} is
3506 zero, @code{omp_aligned_alloc} will return a null pointer. The alignment will
3507 be at least the maximal value required by @code{alignment} trait of the
3508 allocator and the value of the passed @var{alignment} argument. If successful,
3509 a pointer to the allocated memory is returned, otherwise the @code{fallback}
3510 trait of the allocator determines the behavior. The content of the allocated
3511 memory is unspecified.
3513 In @code{target} regions, either the @code{dynamic_allocators} clause must
3514 appear on a @code{requires} directive in the same compilation unit -- or the
3515 @var{allocator} argument may only be a constant expression with the value of
3516 one of the predefined allocators and may not be @code{omp_null_allocator}.
3518 Memory allocated by @code{omp_aligned_alloc} must be freed using
3522 @multitable @columnfractions .20 .80
3523 @item @emph{Prototype}: @tab @code{void* omp_aligned_alloc(size_t alignment,}
3524 @item @tab @code{ size_t size,}
3525 @item @tab @code{ omp_allocator_handle_t allocator)}
3529 @multitable @columnfractions .20 .80
3530 @item @emph{Prototype}: @tab @code{void* omp_aligned_alloc(size_t alignment,}
3531 @item @tab @code{ size_t size,}
3532 @item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
3535 @item @emph{Fortran}:
3536 @multitable @columnfractions .20 .80
3537 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_aligned_alloc(alignment, size, allocator) bind(C)}
3538 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
3539 @item @tab @code{integer (c_size_t), value :: alignment, size}
3540 @item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
3543 @item @emph{See also}:
3544 @ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
3545 @ref{omp_free}, @ref{omp_init_allocator}
3547 @item @emph{Reference}:
3548 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.13.6
3554 @subsection @code{omp_free} -- Freeing memory allocated with OpenMP routines
3556 @item @emph{Description}:
3557 The @code{omp_free} routine deallocates memory previously allocated by an
3558 OpenMP memory-management routine. The @var{ptr} argument must point to such
3559 memory or be a null pointer; if it is a null pointer, no operation is
3560 performed. If specified, the @var{allocator} argument must be either the
3561 memory allocator that was used for the allocation or @code{omp_null_allocator};
3562 if it is @code{omp_null_allocator}, the implementation will determine the value
3565 Calling @code{omp_free} invokes undefined behavior if the memory
3566 was already deallocated or when the used allocator has already been destroyed.
3569 @multitable @columnfractions .20 .80
3570 @item @emph{Prototype}: @tab @code{void omp_free(void *ptr,}
3571 @item @tab @code{ omp_allocator_handle_t allocator)}
3575 @multitable @columnfractions .20 .80
3576 @item @emph{Prototype}: @tab @code{void omp_free(void *ptr,}
3577 @item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
3580 @item @emph{Fortran}:
3581 @multitable @columnfractions .20 .80
3582 @item @emph{Interface}: @tab @code{subroutine omp_free(ptr, allocator) bind(C)}
3583 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr}
3584 @item @tab @code{type (c_ptr), value :: ptr}
3585 @item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
3588 @item @emph{See also}:
3589 @ref{omp_alloc}, @ref{omp_aligned_alloc}, @ref{omp_calloc},
3590 @ref{omp_aligned_calloc}, @ref{omp_realloc}
3592 @item @emph{Reference}:
3593 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.7
3599 @subsection @code{omp_calloc} -- Allocate nullified memory with an allocator
3601 @item @emph{Description}:
3602 Allocate zero-initialized memory with the specified allocator, which can either
3603 be a predefined allocator, an allocator handle or @code{omp_null_allocator}. If
3604 the allocators is @code{omp_null_allocator}, the allocator specified by the
3605 @var{def-allocator-var} ICV is used. The to-be allocated memory is for an
3606 array with @var{nmemb} elements, each having a size of @var{size} bytes. Both
3607 @var{nmemb} and @var{size} must be nonnegative numbers; if either of them is
3608 zero, @code{omp_calloc} will return a null pointer. If successful, a pointer to
3609 the zero-initialized allocated memory is returned, otherwise the @code{fallback}
3610 trait of the allocator determines the behavior.
3612 In @code{target} regions, either the @code{dynamic_allocators} clause must
3613 appear on a @code{requires} directive in the same compilation unit -- or the
3614 @var{allocator} argument may only be a constant expression with the value of
3615 one of the predefined allocators and may not be @code{omp_null_allocator}.
3617 Memory allocated by @code{omp_calloc} must be freed using @code{omp_free}.
3620 @multitable @columnfractions .20 .80
3621 @item @emph{Prototype}: @tab @code{void* omp_calloc(size_t nmemb, size_t size,}
3622 @item @tab @code{ omp_allocator_handle_t allocator)}
3626 @multitable @columnfractions .20 .80
3627 @item @emph{Prototype}: @tab @code{void* omp_calloc(size_t nmemb, size_t size,}
3628 @item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
3631 @item @emph{Fortran}:
3632 @multitable @columnfractions .20 .80
3633 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_calloc(nmemb, size, allocator) bind(C)}
3634 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
3635 @item @tab @code{integer (c_size_t), value :: nmemb, size}
3636 @item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
3639 @item @emph{See also}:
3640 @ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
3641 @ref{omp_free}, @ref{omp_init_allocator}
3643 @item @emph{Reference}:
3644 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.13.8
3649 @node omp_aligned_calloc
3650 @subsection @code{omp_aligned_calloc} -- Allocate aligned nullified memory with an allocator
3652 @item @emph{Description}:
3653 Allocate zero-initialized memory with the specified allocator, which can either
3654 be a predefined allocator, an allocator handle or @code{omp_null_allocator}. If
3655 the allocators is @code{omp_null_allocator}, the allocator specified by the
3656 @var{def-allocator-var} ICV is used. The to-be allocated memory is for an
3657 array with @var{nmemb} elements, each having a size of @var{size} bytes. Both
3658 @var{nmemb} and @var{size} must be nonnegative numbers; if either of them is
3659 zero, @code{omp_aligned_calloc} will return a null pointer. @var{alignment}
3660 must be a positive power of two and @var{size} must be a multiple of the
3661 alignment; the alignment will be at least the maximal value required by
3662 @code{alignment} trait of the allocator and the value of the passed
3663 @var{alignment} argument. If successful, a pointer to the zero-initialized
3664 allocated memory is returned, otherwise the @code{fallback} trait of the
3665 allocator determines the behavior.
3667 In @code{target} regions, either the @code{dynamic_allocators} clause must
3668 appear on a @code{requires} directive in the same compilation unit -- or the
3669 @var{allocator} argument may only be a constant expression with the value of
3670 one of the predefined allocators and may not be @code{omp_null_allocator}.
3672 Memory allocated by @code{omp_aligned_calloc} must be freed using
3676 @multitable @columnfractions .20 .80
3677 @item @emph{Prototype}: @tab @code{void* omp_aligned_calloc(size_t nmemb, size_t size,}
3678 @item @tab @code{ omp_allocator_handle_t allocator)}
3682 @multitable @columnfractions .20 .80
3683 @item @emph{Prototype}: @tab @code{void* omp_aligned_calloc(size_t nmemb, size_t size,}
3684 @item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator)}
3687 @item @emph{Fortran}:
3688 @multitable @columnfractions .20 .80
3689 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_aligned_calloc(nmemb, size, allocator) bind(C)}
3690 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
3691 @item @tab @code{integer (c_size_t), value :: nmemb, size}
3692 @item @tab @code{integer (omp_allocator_handle_kind), value :: allocator}
3695 @item @emph{See also}:
3696 @ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
3697 @ref{omp_free}, @ref{omp_init_allocator}
3699 @item @emph{Reference}:
3700 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.13.8
3706 @subsection @code{omp_realloc} -- Reallocate memory allocated with OpenMP routines
3708 @item @emph{Description}:
3709 The @code{omp_realloc} routine deallocates memory to which @var{ptr} points to
3710 and allocates new memory with the specified @var{allocator} argument; the
3711 new memory will have the content of the old memory up to the minimum of the
3712 old size and the new @var{size}, otherwise the content of the returned memory
3713 is unspecified. If the new allocator is the same as the old one, the routine
3714 tries to resize the existing memory allocation, returning the same address as
3715 @var{ptr} if successful. @var{ptr} must point to memory allocated by an OpenMP
3716 memory-management routine.
3718 The @var{allocator} and @var{free_allocator} arguments must be a predefined
3719 allocator, an allocator handle or @code{omp_null_allocator}. If
3720 @var{free_allocator} is @code{omp_null_allocator}, the implementation
3721 automatically determines the allocator used for the allocation of @var{ptr}.
3722 If @var{allocator} is @code{omp_null_allocator} and @var{ptr} is not a
3723 null pointer, the same allocator as @code{free_allocator} is used and
3724 when @var{ptr} is a null pointer the allocator specified by the
3725 @var{def-allocator-var} ICV is used.
3727 The @var{size} must be a nonnegative number denoting the number of bytes to be
3728 allocated; if @var{size} is zero, @code{omp_realloc} will return free the
3729 memory and return a null pointer. When @var{size} is nonzero: if successful,
3730 a pointer to the allocated memory is returned, otherwise the @code{fallback}
3731 trait of the allocator determines the behavior.
3733 In @code{target} regions, either the @code{dynamic_allocators} clause must
3734 appear on a @code{requires} directive in the same compilation unit -- or the
3735 @var{free_allocator} and @var{allocator} arguments may only be a constant
3736 expression with the value of one of the predefined allocators and may not be
3737 @code{omp_null_allocator}.
3739 Memory allocated by @code{omp_realloc} must be freed using @code{omp_free}.
3740 Calling @code{omp_free} invokes undefined behavior if the memory
3741 was already deallocated or when the used allocator has already been destroyed.
3744 @multitable @columnfractions .20 .80
3745 @item @emph{Prototype}: @tab @code{void* omp_realloc(void *ptr, size_t size,}
3746 @item @tab @code{ omp_allocator_handle_t allocator,}
3747 @item @tab @code{ omp_allocator_handle_t free_allocator)}
3751 @multitable @columnfractions .20 .80
3752 @item @emph{Prototype}: @tab @code{void* omp_realloc(void *ptr, size_t size,}
3753 @item @tab @code{ omp_allocator_handle_t allocator=omp_null_allocator,}
3754 @item @tab @code{ omp_allocator_handle_t free_allocator=omp_null_allocator)}
3757 @item @emph{Fortran}:
3758 @multitable @columnfractions .20 .80
3759 @item @emph{Interface}: @tab @code{type(c_ptr) function omp_realloc(ptr, size, allocator, free_allocator) bind(C)}
3760 @item @tab @code{use, intrinsic :: iso_c_binding, only : c_ptr, c_size_t}
3761 @item @tab @code{type(C_ptr), value :: ptr}
3762 @item @tab @code{integer (c_size_t), value :: size}
3763 @item @tab @code{integer (omp_allocator_handle_kind), value :: allocator, free_allocator}
3766 @item @emph{See also}:
3767 @ref{OMP_ALLOCATOR}, @ref{Memory allocation}, @ref{omp_set_default_allocator},
3768 @ref{omp_free}, @ref{omp_init_allocator}
3770 @item @emph{Reference}:
3771 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 3.7.9
3776 @c @node Tool Control Routine
3777 @c @section Tool Control Routine
3781 @node Environment Display Routine
3782 @section Environment Display Routine
3784 Routine to display the OpenMP version number and the initial value of ICVs.
3785 It has C linkage and does not throw exceptions.
3788 * omp_display_env:: print the initial ICV values
3791 @node omp_display_env
3792 @subsection @code{omp_display_env} -- print the initial ICV values
3794 @item @emph{Description}:
3795 Each time this routine is invoked, the OpenMP version number and initial value
3796 of internal control variables (ICVs) is printed on @code{stderr}. The displayed
3797 values are those at startup after evaluating the environment variables; later
3798 calls to API routines or clauses used in enclosing constructs do not affect
3801 If the @var{verbose} argument is @code{false}, only the OpenMP version and
3802 standard OpenMP ICVs are shown; if it is @code{true}, additionally, the
3803 GCC-specific ICVs are shown.
3805 The output consists of multiple lines and starts with
3806 @samp{OPENMP DISPLAY ENVIRONMENT BEGIN} followed by the name-value lines and
3807 ends with @samp{OPENMP DISPLAY ENVIRONMENT END}. The @var{name} is followed by
3808 an equal sign and the @var{value} is enclosed in single quotes.
3810 The first line has as @var{name} either @samp{_OPENMP} or @samp{openmp_version}
3811 and shows as value the supported OpenMP version number (4-digit year, 2-digit
3812 month) of the implementation, matching the value of the @code{_OPENMP} macro
3813 and, in Fortran, the named constant @code{openmp_version}.
3815 In each of the succeeding lines, the @var{name} matches the environment-variable
3816 name of an ICV and shows its value. Those line are might be prefixed by pair of
3817 brackets and a space, where the brackets enclose a comma-separated list of
3818 devices to which the ICV-value combination applies to; the value can either be a
3819 numeric device number or an abstract name denoting all devices (@code{all}), the
3820 initial host device (@code{host}) or all devices but the host (@code{device}).
3821 Note that the same ICV might be printed multiple times for multiple devices,
3822 even if all have the same value.
3824 The effect when invoked from within a @code{target} region is unspecified.
3827 @multitable @columnfractions .20 .80
3828 @item @emph{Prototype}: @tab @code{void omp_display_env(int verbose)}
3831 @item @emph{Fortran}:
3832 @multitable @columnfractions .20 .80
3833 @item @emph{Interface}: @tab @code{subroutine omp_display_env(verbose)}
3834 @item @tab @code{logical, intent(in) :: verbose}
3837 @item @emph{Example}:
3838 Note that the GCC-specific ICVs, such as the shown @code{GOMP_SPINCOUNT},
3839 are only printed when @var{verbose} set to @code{true}.
3842 OPENMP DISPLAY ENVIRONMENT BEGIN
3844 [host] OMP_DYNAMIC = 'FALSE'
3845 [host] OMP_NESTED = 'FALSE'
3846 [all] OMP_CANCELLATION = 'FALSE'
3848 [host] GOMP_SPINCOUNT = '300000'
3849 OPENMP DISPLAY ENVIRONMENT END
3853 @item @emph{See also}:
3854 @ref{OMP_DISPLAY_ENV}, @ref{Environment Variables},
3855 @ref{Implementation-defined ICV Initialization}
3857 @item @emph{Reference}:
3858 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 3.15
3862 @c ---------------------------------------------------------------------
3863 @c OpenMP Environment Variables
3864 @c ---------------------------------------------------------------------
3866 @node Environment Variables
3867 @chapter OpenMP Environment Variables
3869 The environment variables which beginning with @env{OMP_} are defined by
3870 section 4 of the OpenMP specification in version 4.5 or in a later version
3871 of the specification, while those beginning with @env{GOMP_} are GNU extensions.
3872 Most @env{OMP_} environment variables have an associated internal control
3875 For any OpenMP environment variable that sets an ICV and is neither
3876 @code{OMP_DEFAULT_DEVICE} nor has global ICV scope, associated
3877 device-specific environment variables exist. For them, the environment
3878 variable without suffix affects the host. The suffix @code{_DEV_} followed
3879 by a non-negative device number less that the number of available devices sets
3880 the ICV for the corresponding device. The suffix @code{_DEV} sets the ICV
3881 of all non-host devices for which a device-specific corresponding environment
3882 variable has not been set while the @code{_ALL} suffix sets the ICV of all
3883 host and non-host devices for which a more specific corresponding environment
3884 variable is not set.
3887 * OMP_ALLOCATOR:: Set the default allocator
3888 * OMP_AFFINITY_FORMAT:: Set the format string used for affinity display
3889 * OMP_CANCELLATION:: Set whether cancellation is activated
3890 * OMP_DISPLAY_AFFINITY:: Display thread affinity information
3891 * OMP_DISPLAY_ENV:: Show OpenMP version and environment variables
3892 * OMP_DEFAULT_DEVICE:: Set the device used in target regions
3893 * OMP_DYNAMIC:: Dynamic adjustment of threads
3894 * OMP_MAX_ACTIVE_LEVELS:: Set the maximum number of nested parallel regions
3895 * OMP_MAX_TASK_PRIORITY:: Set the maximum task priority value
3896 * OMP_NESTED:: Nested parallel regions
3897 * OMP_NUM_TEAMS:: Specifies the number of teams to use by teams region
3898 * OMP_NUM_THREADS:: Specifies the number of threads to use
3899 * OMP_PROC_BIND:: Whether threads may be moved between CPUs
3900 * OMP_PLACES:: Specifies on which CPUs the threads should be placed
3901 * OMP_STACKSIZE:: Set default thread stack size
3902 * OMP_SCHEDULE:: How threads are scheduled
3903 * OMP_TARGET_OFFLOAD:: Controls offloading behavior
3904 * OMP_TEAMS_THREAD_LIMIT:: Set the maximum number of threads imposed by teams
3905 * OMP_THREAD_LIMIT:: Set the maximum number of threads
3906 * OMP_WAIT_POLICY:: How waiting threads are handled
3907 * GOMP_CPU_AFFINITY:: Bind threads to specific CPUs
3908 * GOMP_DEBUG:: Enable debugging output
3909 * GOMP_STACKSIZE:: Set default thread stack size
3910 * GOMP_SPINCOUNT:: Set the busy-wait spin count
3911 * GOMP_RTEMS_THREAD_POOLS:: Set the RTEMS specific thread pools
3916 @section @env{OMP_ALLOCATOR} -- Set the default allocator
3917 @cindex Environment Variable
3919 @item @emph{ICV:} @var{def-allocator-var}
3920 @item @emph{Scope:} data environment
3921 @item @emph{Description}:
3922 Sets the default allocator that is used when no allocator has been specified
3923 in the @code{allocate} or @code{allocator} clause or if an OpenMP memory
3924 routine is invoked with the @code{omp_null_allocator} allocator.
3925 If unset, @code{omp_default_mem_alloc} is used.
3927 The value can either be a predefined allocator or a predefined memory space
3928 or a predefined memory space followed by a colon and a comma-separated list
3929 of memory trait and value pairs, separated by @code{=}.
3931 Note: The corresponding device environment variables are currently not
3932 supported. Therefore, the non-host @var{def-allocator-var} ICVs are always
3933 initialized to @code{omp_default_mem_alloc}. However, on all devices,
3934 the @code{omp_set_default_allocator} API routine can be used to change
3937 @multitable @columnfractions .45 .45
3938 @headitem Predefined allocators @tab Associated predefined memory spaces
3939 @item omp_default_mem_alloc @tab omp_default_mem_space
3940 @item omp_large_cap_mem_alloc @tab omp_large_cap_mem_space
3941 @item omp_const_mem_alloc @tab omp_const_mem_space
3942 @item omp_high_bw_mem_alloc @tab omp_high_bw_mem_space
3943 @item omp_low_lat_mem_alloc @tab omp_low_lat_mem_space
3944 @item omp_cgroup_mem_alloc @tab omp_low_lat_mem_space (implementation defined)
3945 @item omp_pteam_mem_alloc @tab omp_low_lat_mem_space (implementation defined)
3946 @item omp_thread_mem_alloc @tab omp_low_lat_mem_space (implementation defined)
3947 @item ompx_gnu_pinned_mem_alloc @tab omp_default_mem_space (GNU extension)
3950 The predefined allocators use the default values for the traits,
3951 as listed below. Except that the last three allocators have the
3952 @code{access} trait set to @code{cgroup}, @code{pteam}, and
3953 @code{thread}, respectively.
3955 @multitable @columnfractions .25 .40 .25
3956 @headitem Trait @tab Allowed values @tab Default value
3957 @item @code{sync_hint} @tab @code{contended}, @code{uncontended},
3958 @code{serialized}, @code{private}
3959 @tab @code{contended}
3960 @item @code{alignment} @tab Positive integer being a power of two
3962 @item @code{access} @tab @code{all}, @code{cgroup},
3963 @code{pteam}, @code{thread}
3965 @item @code{pool_size} @tab Positive integer
3966 @tab See @ref{Memory allocation}
3967 @item @code{fallback} @tab @code{default_mem_fb}, @code{null_fb},
3968 @code{abort_fb}, @code{allocator_fb}
3970 @item @code{fb_data} @tab @emph{unsupported as it needs an allocator handle}
3972 @item @code{pinned} @tab @code{true}, @code{false}
3974 @item @code{partition} @tab @code{environment}, @code{nearest},
3975 @code{blocked}, @code{interleaved}
3976 @tab @code{environment}
3979 For the @code{fallback} trait, the default value is @code{null_fb} for the
3980 @code{omp_default_mem_alloc} allocator and any allocator that is associated
3981 with device memory; for all other allocators, it is @code{default_mem_fb}
3984 For the @code{pinned} trait, the default value is @code{true} for
3985 predefined allocator @code{ompx_gnu_pinned_mem_alloc} (a GNU extension), and
3986 @code{false} for all others.
3990 OMP_ALLOCATOR=omp_high_bw_mem_alloc
3991 OMP_ALLOCATOR=omp_large_cap_mem_space
3992 OMP_ALLOCATOR=omp_low_lat_mem_space:pinned=true,partition=nearest
3995 @item @emph{See also}:
3996 @ref{Memory allocation}, @ref{omp_get_default_allocator},
3997 @ref{omp_set_default_allocator}, @ref{Offload-Target Specifics}
3999 @item @emph{Reference}:
4000 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.21
4005 @node OMP_AFFINITY_FORMAT
4006 @section @env{OMP_AFFINITY_FORMAT} -- Set the format string used for affinity display
4007 @cindex Environment Variable
4009 @item @emph{ICV:} @var{affinity-format-var}
4010 @item @emph{Scope:} device
4011 @item @emph{Description}:
4012 Sets the format string used when displaying OpenMP thread affinity information.
4013 Special values are output using @code{%} followed by an optional size
4014 specification and then either the single-character field type or its long
4015 name enclosed in curly braces; using @code{%%} displays a literal percent.
4016 The size specification consists of an optional @code{0.} or @code{.} followed
4017 by a positive integer, specifying the minimal width of the output. With
4018 @code{0.} and numerical values, the output is padded with zeros on the left;
4019 with @code{.}, the output is padded by spaces on the left; otherwise, the
4020 output is padded by spaces on the right. If unset, the value is
4021 ``@code{level %L thread %i affinity %A}''.
4023 Supported field types are:
4025 @multitable @columnfractions .10 .25 .60
4026 @item t @tab team_num @tab value returned by @code{omp_get_team_num}
4027 @item T @tab num_teams @tab value returned by @code{omp_get_num_teams}
4028 @item L @tab nesting_level @tab value returned by @code{omp_get_level}
4029 @item n @tab thread_num @tab value returned by @code{omp_get_thread_num}
4030 @item N @tab num_threads @tab value returned by @code{omp_get_num_threads}
4031 @item a @tab ancestor_tnum
4032 @tab value returned by
4033 @code{omp_get_ancestor_thread_num(omp_get_level()-1)}
4034 @item H @tab host @tab name of the host that executes the thread
4035 @item P @tab process_id @tab process identifier
4036 @item i @tab native_thread_id @tab native thread identifier
4037 @item A @tab thread_affinity
4038 @tab comma separated list of integer values or ranges, representing the
4039 processors on which a process might execute, subject to affinity
4043 For instance, after setting
4046 OMP_AFFINITY_FORMAT="%0.2a!%n!%.4L!%N;%.2t;%0.2T;%@{team_num@};%@{num_teams@};%A"
4049 with either @code{OMP_DISPLAY_AFFINITY} being set or when calling
4050 @code{omp_display_affinity} with @code{NULL} or an empty string, the program
4051 might display the following:
4054 00!0! 1!4; 0;01;0;1;0-11
4055 00!3! 1!4; 0;01;0;1;0-11
4056 00!2! 1!4; 0;01;0;1;0-11
4057 00!1! 1!4; 0;01;0;1;0-11
4060 @item @emph{See also}:
4061 @ref{OMP_DISPLAY_AFFINITY}
4063 @item @emph{Reference}:
4064 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.14
4069 @node OMP_CANCELLATION
4070 @section @env{OMP_CANCELLATION} -- Set whether cancellation is activated
4071 @cindex Environment Variable
4073 @item @emph{ICV:} @var{cancel-var}
4074 @item @emph{Scope:} global
4075 @item @emph{Description}:
4076 If set to @code{TRUE}, the cancellation is activated. If set to @code{FALSE} or
4077 if unset, cancellation is disabled and the @code{cancel} construct is ignored.
4079 @item @emph{See also}:
4080 @ref{omp_get_cancellation}
4082 @item @emph{Reference}:
4083 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.11
4088 @node OMP_DISPLAY_AFFINITY
4089 @section @env{OMP_DISPLAY_AFFINITY} -- Display thread affinity information
4090 @cindex Environment Variable
4092 @item @emph{ICV:} @var{display-affinity-var}
4093 @item @emph{Scope:} global
4094 @item @emph{Description}:
4095 If set to @code{FALSE} or if unset, affinity displaying is disabled.
4096 If set to @code{TRUE}, the runtime displays affinity information about
4097 OpenMP threads in a parallel region upon entering the region and every time
4100 @item @emph{See also}:
4101 @ref{OMP_AFFINITY_FORMAT}
4103 @item @emph{Reference}:
4104 @uref{https://www.openmp.org, OpenMP specification v5.0}, Section 6.13
4110 @node OMP_DISPLAY_ENV
4111 @section @env{OMP_DISPLAY_ENV} -- Show OpenMP version and environment variables
4112 @cindex Environment Variable
4114 @item @emph{ICV:} none
4115 @item @emph{Scope:} not applicable
4116 @item @emph{Description}:
4117 If set to @code{TRUE}, the runtime displays the same information to
4118 @code{stderr} as shown by the @code{omp_display_env} routine invoked with
4119 @var{verbose} argument set to @code{false}. If set to @code{VERBOSE}, the same
4120 information is shown as invoking the routine with @var{verbose} set to
4121 @code{true}. If unset or set to @code{FALSE}, this information is not shown.
4122 The result for any other value is unspecified.
4124 @item @emph{See also}:
4125 @ref{omp_display_env}
4127 @item @emph{Reference}:
4128 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.12
4133 @node OMP_DEFAULT_DEVICE
4134 @section @env{OMP_DEFAULT_DEVICE} -- Set the device used in target regions
4135 @cindex Environment Variable
4137 @item @emph{ICV:} @var{default-device-var}
4138 @item @emph{Scope:} data environment
4139 @item @emph{Description}:
4140 Set to choose the device which is used in a @code{target} region, unless the
4141 value is overridden by @code{omp_set_default_device} or by a @code{device}
4142 clause. The value shall be the nonnegative device number. If no device with
4143 the given device number exists, the code is executed on the host. If unset,
4144 @env{OMP_TARGET_OFFLOAD} is @code{mandatory} and no non-host devices are
4145 available, it is set to @code{omp_invalid_device}. Otherwise, if unset,
4146 device number 0 is used.
4149 @item @emph{See also}:
4150 @ref{omp_get_default_device}, @ref{omp_set_default_device},
4151 @ref{OMP_TARGET_OFFLOAD}
4153 @item @emph{Reference}:
4154 @uref{https://www.openmp.org, OpenMP specification v5.2}, Section 21.2.7
4160 @section @env{OMP_DYNAMIC} -- Dynamic adjustment of threads
4161 @cindex Environment Variable
4163 @item @emph{ICV:} @var{dyn-var}
4164 @item @emph{Scope:} global
4165 @item @emph{Description}:
4166 Enable or disable the dynamic adjustment of the number of threads
4167 within a team. The value of this environment variable shall be
4168 @code{TRUE} or @code{FALSE}. If undefined, dynamic adjustment is
4169 disabled by default.
4171 @item @emph{See also}:
4172 @ref{omp_set_dynamic}
4174 @item @emph{Reference}:
4175 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.3
4180 @node OMP_MAX_ACTIVE_LEVELS
4181 @section @env{OMP_MAX_ACTIVE_LEVELS} -- Set the maximum number of nested parallel regions
4182 @cindex Environment Variable
4184 @item @emph{ICV:} @var{max-active-levels-var}
4185 @item @emph{Scope:} data environment
4186 @item @emph{Description}:
4187 Specifies the initial value for the maximum number of nested parallel
4188 regions. The value of this variable shall be a positive integer.
4189 If undefined, then if @env{OMP_NESTED} is defined and set to true, or
4190 if @env{OMP_NUM_THREADS} or @env{OMP_PROC_BIND} are defined and set to
4191 a list with more than one item, the maximum number of nested parallel
4192 regions is initialized to the largest number supported, otherwise
4195 @item @emph{See also}:
4196 @ref{omp_set_max_active_levels}, @ref{OMP_NESTED}, @ref{OMP_PROC_BIND},
4197 @ref{OMP_NUM_THREADS}
4200 @item @emph{Reference}:
4201 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.9
4206 @node OMP_MAX_TASK_PRIORITY
4207 @section @env{OMP_MAX_TASK_PRIORITY} -- Set the maximum priority
4208 number that can be set for a task.
4209 @cindex Environment Variable
4211 @item @emph{ICV:} @var{max-task-priority-var}
4212 @item @emph{Scope:} global
4213 @item @emph{Description}:
4214 Specifies the initial value for the maximum priority value that can be
4215 set for a task. The value of this variable shall be a non-negative
4216 integer, and zero is allowed. If undefined, the default priority is
4219 @item @emph{See also}:
4220 @ref{omp_get_max_task_priority}
4222 @item @emph{Reference}:
4223 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.14
4229 @section @env{OMP_NESTED} -- Nested parallel regions
4230 @cindex Environment Variable
4231 @cindex Implementation specific setting
4233 @item @emph{ICV:} @var{max-active-levels-var}
4234 @item @emph{Scope:} data environment
4235 @item @emph{Description}:
4236 Enable or disable nested parallel regions, i.e., whether team members
4237 are allowed to create new teams. The value of this environment variable
4238 shall be @code{TRUE} or @code{FALSE}. If set to @code{TRUE}, the number
4239 of maximum active nested regions supported is by default set to the
4240 maximum supported, otherwise it is set to one. If
4241 @env{OMP_MAX_ACTIVE_LEVELS} is defined, its setting overrides this
4242 setting. If both are undefined, nested parallel regions are enabled if
4243 @env{OMP_NUM_THREADS} or @env{OMP_PROC_BINDS} are defined to a list with
4244 more than one item, otherwise they are disabled by default.
4246 Note that the @code{OMP_NESTED} environment variable was deprecated in
4247 the OpenMP specification 5.0 in favor of @code{OMP_MAX_ACTIVE_LEVELS}.
4249 @item @emph{See also}:
4250 @ref{omp_set_max_active_levels}, @ref{omp_set_nested},
4251 @ref{OMP_MAX_ACTIVE_LEVELS}
4253 @item @emph{Reference}:
4254 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.6
4260 @section @env{OMP_NUM_TEAMS} -- Specifies the number of teams to use by teams region
4261 @cindex Environment Variable
4263 @item @emph{ICV:} @var{nteams-var}
4264 @item @emph{Scope:} device
4265 @item @emph{Description}:
4266 Specifies the upper bound for number of teams to use in teams regions
4267 without explicit @code{num_teams} clause. The value of this variable shall
4268 be a positive integer. If undefined it defaults to 0 which means
4269 implementation defined upper bound.
4271 @item @emph{See also}:
4272 @ref{omp_set_num_teams}
4274 @item @emph{Reference}:
4275 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.23
4280 @node OMP_NUM_THREADS
4281 @section @env{OMP_NUM_THREADS} -- Specifies the number of threads to use
4282 @cindex Environment Variable
4283 @cindex Implementation specific setting
4285 @item @emph{ICV:} @var{nthreads-var}
4286 @item @emph{Scope:} data environment
4287 @item @emph{Description}:
4288 Specifies the default number of threads to use in parallel regions. The
4289 value of this variable shall be a comma-separated list of positive integers;
4290 the value specifies the number of threads to use for the corresponding nested
4291 level. Specifying more than one item in the list automatically enables
4292 nesting by default. If undefined one thread per CPU is used.
4294 When a list with more than value is specified, it also affects the
4295 @var{max-active-levels-var} ICV as described in @ref{OMP_MAX_ACTIVE_LEVELS}.
4297 @item @emph{See also}:
4298 @ref{omp_set_num_threads}, @ref{OMP_MAX_ACTIVE_LEVELS}
4300 @item @emph{Reference}:
4301 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.2
4307 @section @env{OMP_PROC_BIND} -- Whether threads may be moved between CPUs
4308 @cindex Environment Variable
4310 @item @emph{ICV:} @var{bind-var}
4311 @item @emph{Scope:} data environment
4312 @item @emph{Description}:
4313 Specifies whether threads may be moved between processors. If set to
4314 @code{TRUE}, OpenMP threads should not be moved; if set to @code{FALSE}
4315 they may be moved. Alternatively, a comma separated list with the
4316 values @code{PRIMARY}, @code{MASTER}, @code{CLOSE} and @code{SPREAD} can
4317 be used to specify the thread affinity policy for the corresponding nesting
4318 level. With @code{PRIMARY} and @code{MASTER} the worker threads are in the
4319 same place partition as the primary thread. With @code{CLOSE} those are
4320 kept close to the primary thread in contiguous place partitions. And
4321 with @code{SPREAD} a sparse distribution
4322 across the place partitions is used. Specifying more than one item in the
4323 list automatically enables nesting by default.
4325 When a list is specified, it also affects the @var{max-active-levels-var} ICV
4326 as described in @ref{OMP_MAX_ACTIVE_LEVELS}.
4328 When undefined, @env{OMP_PROC_BIND} defaults to @code{TRUE} when
4329 @env{OMP_PLACES} or @env{GOMP_CPU_AFFINITY} is set and @code{FALSE} otherwise.
4331 @item @emph{See also}:
4332 @ref{omp_get_proc_bind}, @ref{GOMP_CPU_AFFINITY}, @ref{OMP_PLACES},
4333 @ref{OMP_MAX_ACTIVE_LEVELS}
4335 @item @emph{Reference}:
4336 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.4
4342 @section @env{OMP_PLACES} -- Specifies on which CPUs the threads should be placed
4343 @cindex Environment Variable
4345 @item @emph{ICV:} @var{place-partition-var}
4346 @item @emph{Scope:} implicit tasks
4347 @item @emph{Description}:
4348 The thread placement can be either specified using an abstract name or by an
4349 explicit list of the places. The abstract names @code{threads}, @code{cores},
4350 @code{sockets}, @code{ll_caches} and @code{numa_domains} can be optionally
4351 followed by a positive number in parentheses, which denotes the how many places
4352 shall be created. With @code{threads} each place corresponds to a single
4353 hardware thread; @code{cores} to a single core with the corresponding number of
4354 hardware threads; with @code{sockets} the place corresponds to a single
4355 socket; with @code{ll_caches} to a set of cores that shares the last level
4356 cache on the device; and @code{numa_domains} to a set of cores for which their
4357 closest memory on the device is the same memory and at a similar distance from
4358 the cores. The resulting placement can be shown by setting the
4359 @env{OMP_DISPLAY_ENV} environment variable.
4361 Alternatively, the placement can be specified explicitly as comma-separated
4362 list of places. A place is specified by set of nonnegative numbers in curly
4363 braces, denoting the hardware threads. The curly braces can be omitted
4364 when only a single number has been specified. The hardware threads
4365 belonging to a place can either be specified as comma-separated list of
4366 nonnegative thread numbers or using an interval. Multiple places can also be
4367 either specified by a comma-separated list of places or by an interval. To
4368 specify an interval, a colon followed by the count is placed after
4369 the hardware thread number or the place. Optionally, the length can be
4370 followed by a colon and the stride number -- otherwise a unit stride is
4371 assumed. Placing an exclamation mark (@code{!}) directly before a curly
4372 brace or numbers inside the curly braces (excluding intervals)
4373 excludes those hardware threads.
4375 For instance, the following specifies the same places list:
4376 @code{"@{0,1,2@}, @{3,4,6@}, @{7,8,9@}, @{10,11,12@}"};
4377 @code{"@{0:3@}, @{3:3@}, @{7:3@}, @{10:3@}"}; and @code{"@{0:2@}:4:3"}.
4379 If @env{OMP_PLACES} and @env{GOMP_CPU_AFFINITY} are unset and
4380 @env{OMP_PROC_BIND} is either unset or @code{false}, threads may be moved
4381 between CPUs following no placement policy.
4383 @item @emph{See also}:
4384 @ref{OMP_PROC_BIND}, @ref{GOMP_CPU_AFFINITY}, @ref{omp_get_proc_bind},
4385 @ref{OMP_DISPLAY_ENV}
4387 @item @emph{Reference}:
4388 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.5
4394 @section @env{OMP_STACKSIZE} -- Set default thread stack size
4395 @cindex Environment Variable
4397 @item @emph{ICV:} @var{stacksize-var}
4398 @item @emph{Scope:} device
4399 @item @emph{Description}:
4400 Set the default thread stack size in kilobytes, unless the number
4401 is suffixed by @code{B}, @code{K}, @code{M} or @code{G}, in which
4402 case the size is, respectively, in bytes, kilobytes, megabytes
4403 or gigabytes. This is different from @code{pthread_attr_setstacksize}
4404 which gets the number of bytes as an argument. If the stack size cannot
4405 be set due to system constraints, an error is reported and the initial
4406 stack size is left unchanged. If undefined, the stack size is system
4409 @item @emph{See also}:
4410 @ref{GOMP_STACKSIZE}
4412 @item @emph{Reference}:
4413 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.7
4419 @section @env{OMP_SCHEDULE} -- How threads are scheduled
4420 @cindex Environment Variable
4421 @cindex Implementation specific setting
4423 @item @emph{ICV:} @var{run-sched-var}
4424 @item @emph{Scope:} data environment
4425 @item @emph{Description}:
4426 Allows to specify @code{schedule type} and @code{chunk size}.
4427 The value of the variable shall have the form: @code{type[,chunk]} where
4428 @code{type} is one of @code{static}, @code{dynamic}, @code{guided} or @code{auto}
4429 The optional @code{chunk} size shall be a positive integer. If undefined,
4430 dynamic scheduling and a chunk size of 1 is used.
4432 @item @emph{See also}:
4433 @ref{omp_set_schedule}
4435 @item @emph{Reference}:
4436 @uref{https://www.openmp.org, OpenMP specification v4.5}, Sections 2.7.1.1 and 4.1
4441 @node OMP_TARGET_OFFLOAD
4442 @section @env{OMP_TARGET_OFFLOAD} -- Controls offloading behavior
4443 @cindex Environment Variable
4444 @cindex Implementation specific setting
4446 @item @emph{ICV:} @var{target-offload-var}
4447 @item @emph{Scope:} global
4448 @item @emph{Description}:
4449 Specifies the behavior with regard to offloading code to a device. This
4450 variable can be set to one of three values - @code{MANDATORY}, @code{DISABLED}
4453 If set to @code{MANDATORY}, the program terminates with an error if
4454 any device construct or device memory routine uses a device that is unavailable
4455 or not supported by the implementation, or uses a non-conforming device number.
4456 If set to @code{DISABLED}, then offloading is disabled and all code runs on
4457 the host. If set to @code{DEFAULT}, the program tries offloading to the
4458 device first, then falls back to running code on the host if it cannot.
4460 If undefined, then the program behaves as if @code{DEFAULT} was set.
4462 Note: Even with @code{MANDATORY}, no run-time termination is performed when
4463 the device number in a @code{device} clause or argument to a device memory
4464 routine is for host, which includes using the device number in the
4465 @var{default-device-var} ICV. However, the initial value of
4466 the @var{default-device-var} ICV is affected by @code{MANDATORY}.
4468 @item @emph{See also}:
4469 @ref{OMP_DEFAULT_DEVICE}
4471 @item @emph{Reference}:
4472 @uref{https://www.openmp.org, OpenMP specification v5.2}, Section 21.2.8
4477 @node OMP_TEAMS_THREAD_LIMIT
4478 @section @env{OMP_TEAMS_THREAD_LIMIT} -- Set the maximum number of threads imposed by teams
4479 @cindex Environment Variable
4481 @item @emph{ICV:} @var{teams-thread-limit-var}
4482 @item @emph{Scope:} device
4483 @item @emph{Description}:
4484 Specifies an upper bound for the number of threads to use by each contention
4485 group created by a teams construct without explicit @code{thread_limit}
4486 clause. The value of this variable shall be a positive integer. If undefined,
4487 the value of 0 is used which stands for an implementation defined upper
4490 @item @emph{See also}:
4491 @ref{OMP_THREAD_LIMIT}, @ref{omp_set_teams_thread_limit}
4493 @item @emph{Reference}:
4494 @uref{https://www.openmp.org, OpenMP specification v5.1}, Section 6.24
4499 @node OMP_THREAD_LIMIT
4500 @section @env{OMP_THREAD_LIMIT} -- Set the maximum number of threads
4501 @cindex Environment Variable
4503 @item @emph{ICV:} @var{thread-limit-var}
4504 @item @emph{Scope:} data environment
4505 @item @emph{Description}:
4506 Specifies the number of threads to use for the whole program. The
4507 value of this variable shall be a positive integer. If undefined,
4508 the number of threads is not limited.
4510 @item @emph{See also}:
4511 @ref{OMP_NUM_THREADS}, @ref{omp_get_thread_limit}
4513 @item @emph{Reference}:
4514 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.10
4519 @node OMP_WAIT_POLICY
4520 @section @env{OMP_WAIT_POLICY} -- How waiting threads are handled
4521 @cindex Environment Variable
4523 @item @emph{Description}:
4524 Specifies whether waiting threads should be active or passive. If
4525 the value is @code{PASSIVE}, waiting threads should not consume CPU
4526 power while waiting; while the value is @code{ACTIVE} specifies that
4527 they should. If undefined, threads wait actively for a short time
4528 before waiting passively.
4530 @item @emph{See also}:
4531 @ref{GOMP_SPINCOUNT}
4533 @item @emph{Reference}:
4534 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 4.8
4539 @node GOMP_CPU_AFFINITY
4540 @section @env{GOMP_CPU_AFFINITY} -- Bind threads to specific CPUs
4541 @cindex Environment Variable
4543 @item @emph{Description}:
4544 Binds threads to specific CPUs. The variable should contain a space-separated
4545 or comma-separated list of CPUs. This list may contain different kinds of
4546 entries: either single CPU numbers in any order, a range of CPUs (M-N)
4547 or a range with some stride (M-N:S). CPU numbers are zero based. For example,
4548 @code{GOMP_CPU_AFFINITY="0 3 1-2 4-15:2"} binds the initial thread
4549 to CPU 0, the second to CPU 3, the third to CPU 1, the fourth to
4550 CPU 2, the fifth to CPU 4, the sixth through tenth to CPUs 6, 8, 10, 12,
4551 and 14 respectively and then starts assigning back from the beginning of
4552 the list. @code{GOMP_CPU_AFFINITY=0} binds all threads to CPU 0.
4554 There is no libgomp library routine to determine whether a CPU affinity
4555 specification is in effect. As a workaround, language-specific library
4556 functions, e.g., @code{getenv} in C or @code{GET_ENVIRONMENT_VARIABLE} in
4557 Fortran, may be used to query the setting of the @code{GOMP_CPU_AFFINITY}
4558 environment variable. A defined CPU affinity on startup cannot be changed
4559 or disabled during the runtime of the application.
4561 If both @env{GOMP_CPU_AFFINITY} and @env{OMP_PROC_BIND} are set,
4562 @env{OMP_PROC_BIND} has a higher precedence. If neither has been set and
4563 @env{OMP_PROC_BIND} is unset, or when @env{OMP_PROC_BIND} is set to
4564 @code{FALSE}, the host system handles the assignment of threads to CPUs.
4566 @item @emph{See also}:
4567 @ref{OMP_PLACES}, @ref{OMP_PROC_BIND}
4573 @section @env{GOMP_DEBUG} -- Enable debugging output
4574 @cindex Environment Variable
4576 @item @emph{Description}:
4577 Enable debugging output. The variable should be set to @code{0}
4578 (disabled, also the default if not set), or @code{1} (enabled).
4580 If enabled, some debugging output is printed during execution.
4581 This is currently not specified in more detail, and subject to change.
4586 @node GOMP_STACKSIZE
4587 @section @env{GOMP_STACKSIZE} -- Set default thread stack size
4588 @cindex Environment Variable
4589 @cindex Implementation specific setting
4591 @item @emph{Description}:
4592 Set the default thread stack size in kilobytes. This is different from
4593 @code{pthread_attr_setstacksize} which gets the number of bytes as an
4594 argument. If the stack size cannot be set due to system constraints, an
4595 error is reported and the initial stack size is left unchanged. If undefined,
4596 the stack size is system dependent.
4598 @item @emph{See also}:
4601 @item @emph{Reference}:
4602 @uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00493.html,
4603 GCC Patches Mailinglist},
4604 @uref{https://gcc.gnu.org/ml/gcc-patches/2006-06/msg00496.html,
4605 GCC Patches Mailinglist}
4610 @node GOMP_SPINCOUNT
4611 @section @env{GOMP_SPINCOUNT} -- Set the busy-wait spin count
4612 @cindex Environment Variable
4613 @cindex Implementation specific setting
4615 @item @emph{Description}:
4616 Determines how long a threads waits actively with consuming CPU power
4617 before waiting passively without consuming CPU power. The value may be
4618 either @code{INFINITE}, @code{INFINITY} to always wait actively or an
4619 integer which gives the number of spins of the busy-wait loop. The
4620 integer may optionally be followed by the following suffixes acting
4621 as multiplication factors: @code{k} (kilo, thousand), @code{M} (mega,
4622 million), @code{G} (giga, billion), or @code{T} (tera, trillion).
4623 If undefined, 0 is used when @env{OMP_WAIT_POLICY} is @code{PASSIVE},
4624 300,000 is used when @env{OMP_WAIT_POLICY} is undefined and
4625 30 billion is used when @env{OMP_WAIT_POLICY} is @code{ACTIVE}.
4626 If there are more OpenMP threads than available CPUs, 1000 and 100
4627 spins are used for @env{OMP_WAIT_POLICY} being @code{ACTIVE} or
4628 undefined, respectively; unless the @env{GOMP_SPINCOUNT} is lower
4629 or @env{OMP_WAIT_POLICY} is @code{PASSIVE}.
4631 @item @emph{See also}:
4632 @ref{OMP_WAIT_POLICY}
4637 @node GOMP_RTEMS_THREAD_POOLS
4638 @section @env{GOMP_RTEMS_THREAD_POOLS} -- Set the RTEMS specific thread pools
4639 @cindex Environment Variable
4640 @cindex Implementation specific setting
4642 @item @emph{Description}:
4643 This environment variable is only used on the RTEMS real-time operating system.
4644 It determines the scheduler instance specific thread pools. The format for
4645 @env{GOMP_RTEMS_THREAD_POOLS} is a list of optional
4646 @code{<thread-pool-count>[$<priority>]@@<scheduler-name>} configurations
4647 separated by @code{:} where:
4649 @item @code{<thread-pool-count>} is the thread pool count for this scheduler
4651 @item @code{$<priority>} is an optional priority for the worker threads of a
4652 thread pool according to @code{pthread_setschedparam}. In case a priority
4653 value is omitted, then a worker thread inherits the priority of the OpenMP
4654 primary thread that created it. The priority of the worker thread is not
4655 changed after creation, even if a new OpenMP primary thread using the worker has
4656 a different priority.
4657 @item @code{@@<scheduler-name>} is the scheduler instance name according to the
4658 RTEMS application configuration.
4660 In case no thread pool configuration is specified for a scheduler instance,
4661 then each OpenMP primary thread of this scheduler instance uses its own
4662 dynamically allocated thread pool. To limit the worker thread count of the
4663 thread pools, each OpenMP primary thread must call @code{omp_set_num_threads}.
4664 @item @emph{Example}:
4665 Lets suppose we have three scheduler instances @code{IO}, @code{WRK0}, and
4666 @code{WRK1} with @env{GOMP_RTEMS_THREAD_POOLS} set to
4667 @code{"1@@WRK0:3$4@@WRK1"}. Then there are no thread pool restrictions for
4668 scheduler instance @code{IO}. In the scheduler instance @code{WRK0} there is
4669 one thread pool available. Since no priority is specified for this scheduler
4670 instance, the worker thread inherits the priority of the OpenMP primary thread
4671 that created it. In the scheduler instance @code{WRK1} there are three thread
4672 pools available and their worker threads run at priority four.
4677 @c ---------------------------------------------------------------------
4679 @c ---------------------------------------------------------------------
4681 @node Enabling OpenACC
4682 @chapter Enabling OpenACC
4684 To activate the OpenACC extensions for C/C++ and Fortran, the compile-time
4685 flag @option{-fopenacc} must be specified. This enables the OpenACC directive
4686 @samp{#pragma acc} in C/C++ and, in Fortran, the @samp{!$acc} sentinel in free
4687 source form and the @samp{c$acc}, @samp{*$acc} and @samp{!$acc} sentinels in
4688 fixed source form. The flag also arranges for automatic linking of the OpenACC
4689 runtime library (@ref{OpenACC Runtime Library Routines}).
4691 See @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
4693 A complete description of all OpenACC directives accepted may be found in
4694 the @uref{https://www.openacc.org, OpenACC} Application Programming
4695 Interface manual, version 2.6.
4699 @c ---------------------------------------------------------------------
4700 @c OpenACC Runtime Library Routines
4701 @c ---------------------------------------------------------------------
4703 @node OpenACC Runtime Library Routines
4704 @chapter OpenACC Runtime Library Routines
4706 The runtime routines described here are defined by section 3 of the OpenACC
4707 specifications in version 2.6.
4708 They have C linkage, and do not throw exceptions.
4709 Generally, they are available only for the host, with the exception of
4710 @code{acc_on_device}, which is available for both the host and the
4711 acceleration device.
4714 * acc_get_num_devices:: Get number of devices for the given device
4716 * acc_set_device_type:: Set type of device accelerator to use.
4717 * acc_get_device_type:: Get type of device accelerator to be used.
4718 * acc_set_device_num:: Set device number to use.
4719 * acc_get_device_num:: Get device number to be used.
4720 * acc_get_property:: Get device property.
4721 * acc_async_test:: Tests for completion of a specific asynchronous
4723 * acc_async_test_all:: Tests for completion of all asynchronous
4725 * acc_wait:: Wait for completion of a specific asynchronous
4727 * acc_wait_all:: Waits for completion of all asynchronous
4729 * acc_wait_all_async:: Wait for completion of all asynchronous
4731 * acc_wait_async:: Wait for completion of asynchronous operations.
4732 * acc_init:: Initialize runtime for a specific device type.
4733 * acc_shutdown:: Shuts down the runtime for a specific device
4735 * acc_on_device:: Whether executing on a particular device
4736 * acc_malloc:: Allocate device memory.
4737 * acc_free:: Free device memory.
4738 * acc_copyin:: Allocate device memory and copy host memory to
4740 * acc_present_or_copyin:: If the data is not present on the device,
4741 allocate device memory and copy from host
4743 * acc_create:: Allocate device memory and map it to host
4745 * acc_present_or_create:: If the data is not present on the device,
4746 allocate device memory and map it to host
4748 * acc_copyout:: Copy device memory to host memory.
4749 * acc_delete:: Free device memory.
4750 * acc_update_device:: Update device memory from mapped host memory.
4751 * acc_update_self:: Update host memory from mapped device memory.
4752 * acc_map_data:: Map previously allocated device memory to host
4754 * acc_unmap_data:: Unmap device memory from host memory.
4755 * acc_deviceptr:: Get device pointer associated with specific
4757 * acc_hostptr:: Get host pointer associated with specific
4759 * acc_is_present:: Indicate whether host variable / array is
4761 * acc_memcpy_to_device:: Copy host memory to device memory.
4762 * acc_memcpy_from_device:: Copy device memory to host memory.
4763 * acc_attach:: Let device pointer point to device-pointer target.
4764 * acc_detach:: Let device pointer point to host-pointer target.
4766 API routines for target platforms.
4768 * acc_get_current_cuda_device:: Get CUDA device handle.
4769 * acc_get_current_cuda_context::Get CUDA context handle.
4770 * acc_get_cuda_stream:: Get CUDA stream handle.
4771 * acc_set_cuda_stream:: Set CUDA stream handle.
4773 API routines for the OpenACC Profiling Interface.
4775 * acc_prof_register:: Register callbacks.
4776 * acc_prof_unregister:: Unregister callbacks.
4777 * acc_prof_lookup:: Obtain inquiry functions.
4778 * acc_register_library:: Library registration.
4783 @node acc_get_num_devices
4784 @section @code{acc_get_num_devices} -- Get number of devices for given device type
4786 @item @emph{Description}
4787 This function returns a value indicating the number of devices available
4788 for the device type specified in @var{devicetype}.
4791 @multitable @columnfractions .20 .80
4792 @item @emph{Prototype}: @tab @code{int acc_get_num_devices(acc_device_t devicetype);}
4795 @item @emph{Fortran}:
4796 @multitable @columnfractions .20 .80
4797 @item @emph{Interface}: @tab @code{integer function acc_get_num_devices(devicetype)}
4798 @item @tab @code{integer(kind=acc_device_kind) devicetype}
4801 @item @emph{Reference}:
4802 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4808 @node acc_set_device_type
4809 @section @code{acc_set_device_type} -- Set type of device accelerator to use.
4811 @item @emph{Description}
4812 This function indicates to the runtime library which device type, specified
4813 in @var{devicetype}, to use when executing a parallel or kernels region.
4816 @multitable @columnfractions .20 .80
4817 @item @emph{Prototype}: @tab @code{acc_set_device_type(acc_device_t devicetype);}
4820 @item @emph{Fortran}:
4821 @multitable @columnfractions .20 .80
4822 @item @emph{Interface}: @tab @code{subroutine acc_set_device_type(devicetype)}
4823 @item @tab @code{integer(kind=acc_device_kind) devicetype}
4826 @item @emph{Reference}:
4827 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4833 @node acc_get_device_type
4834 @section @code{acc_get_device_type} -- Get type of device accelerator to be used.
4836 @item @emph{Description}
4837 This function returns what device type will be used when executing a
4838 parallel or kernels region.
4840 This function returns @code{acc_device_none} if
4841 @code{acc_get_device_type} is called from
4842 @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
4843 callbacks of the OpenACC Profiling Interface (@ref{OpenACC Profiling
4844 Interface}), that is, if the device is currently being initialized.
4847 @multitable @columnfractions .20 .80
4848 @item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);}
4851 @item @emph{Fortran}:
4852 @multitable @columnfractions .20 .80
4853 @item @emph{Interface}: @tab @code{function acc_get_device_type(void)}
4854 @item @tab @code{integer(kind=acc_device_kind) acc_get_device_type}
4857 @item @emph{Reference}:
4858 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4864 @node acc_set_device_num
4865 @section @code{acc_set_device_num} -- Set device number to use.
4867 @item @emph{Description}
4868 This function will indicate to the runtime which device number,
4869 specified by @var{devicenum}, associated with the specified device
4870 type @var{devicetype}.
4873 @multitable @columnfractions .20 .80
4874 @item @emph{Prototype}: @tab @code{acc_set_device_num(int devicenum, acc_device_t devicetype);}
4877 @item @emph{Fortran}:
4878 @multitable @columnfractions .20 .80
4879 @item @emph{Interface}: @tab @code{subroutine acc_set_device_num(devicenum, devicetype)}
4880 @item @tab @code{integer devicenum}
4881 @item @tab @code{integer(kind=acc_device_kind) devicetype}
4884 @item @emph{Reference}:
4885 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4891 @node acc_get_device_num
4892 @section @code{acc_get_device_num} -- Get device number to be used.
4894 @item @emph{Description}
4895 This function returns which device number associated with the specified device
4896 type @var{devicetype}, will be used when executing a parallel or kernels
4900 @multitable @columnfractions .20 .80
4901 @item @emph{Prototype}: @tab @code{int acc_get_device_num(acc_device_t devicetype);}
4904 @item @emph{Fortran}:
4905 @multitable @columnfractions .20 .80
4906 @item @emph{Interface}: @tab @code{function acc_get_device_num(devicetype)}
4907 @item @tab @code{integer(kind=acc_device_kind) devicetype}
4908 @item @tab @code{integer acc_get_device_num}
4911 @item @emph{Reference}:
4912 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4918 @node acc_get_property
4919 @section @code{acc_get_property} -- Get device property.
4920 @cindex acc_get_property
4921 @cindex acc_get_property_string
4923 @item @emph{Description}
4924 These routines return the value of the specified @var{property} for the
4925 device being queried according to @var{devicenum} and @var{devicetype}.
4926 Integer-valued and string-valued properties are returned by
4927 @code{acc_get_property} and @code{acc_get_property_string} respectively.
4928 The Fortran @code{acc_get_property_string} subroutine returns the string
4929 retrieved in its fourth argument while the remaining entry points are
4930 functions, which pass the return value as their result.
4932 Note for Fortran, only: the OpenACC technical committee corrected and, hence,
4933 modified the interface introduced in OpenACC 2.6. The kind-value parameter
4934 @code{acc_device_property} has been renamed to @code{acc_device_property_kind}
4935 for consistency and the return type of the @code{acc_get_property} function is
4936 now a @code{c_size_t} integer instead of a @code{acc_device_property} integer.
4937 The parameter @code{acc_device_property} is still provided,
4938 but might be removed in a future version of GCC.
4941 @multitable @columnfractions .20 .80
4942 @item @emph{Prototype}: @tab @code{size_t acc_get_property(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
4943 @item @emph{Prototype}: @tab @code{const char *acc_get_property_string(int devicenum, acc_device_t devicetype, acc_device_property_t property);}
4946 @item @emph{Fortran}:
4947 @multitable @columnfractions .20 .80
4948 @item @emph{Interface}: @tab @code{function acc_get_property(devicenum, devicetype, property)}
4949 @item @emph{Interface}: @tab @code{subroutine acc_get_property_string(devicenum, devicetype, property, string)}
4950 @item @tab @code{use ISO_C_Binding, only: c_size_t}
4951 @item @tab @code{integer devicenum}
4952 @item @tab @code{integer(kind=acc_device_kind) devicetype}
4953 @item @tab @code{integer(kind=acc_device_property_kind) property}
4954 @item @tab @code{integer(kind=c_size_t) acc_get_property}
4955 @item @tab @code{character(*) string}
4958 @item @emph{Reference}:
4959 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4965 @node acc_async_test
4966 @section @code{acc_async_test} -- Test for completion of a specific asynchronous operation.
4968 @item @emph{Description}
4969 This function tests for completion of the asynchronous operation specified
4970 in @var{arg}. In C/C++, a non-zero value is returned to indicate
4971 the specified asynchronous operation has completed while Fortran returns
4972 @code{true}. If the asynchronous operation has not completed, C/C++ returns
4973 zero and Fortran returns @code{false}.
4976 @multitable @columnfractions .20 .80
4977 @item @emph{Prototype}: @tab @code{int acc_async_test(int arg);}
4980 @item @emph{Fortran}:
4981 @multitable @columnfractions .20 .80
4982 @item @emph{Interface}: @tab @code{function acc_async_test(arg)}
4983 @item @tab @code{integer(kind=acc_handle_kind) arg}
4984 @item @tab @code{logical acc_async_test}
4987 @item @emph{Reference}:
4988 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
4994 @node acc_async_test_all
4995 @section @code{acc_async_test_all} -- Tests for completion of all asynchronous operations.
4997 @item @emph{Description}
4998 This function tests for completion of all asynchronous operations.
4999 In C/C++, a non-zero value is returned to indicate all asynchronous
5000 operations have completed while Fortran returns @code{true}. If
5001 any asynchronous operation has not completed, C/C++ returns zero and
5002 Fortran returns @code{false}.
5005 @multitable @columnfractions .20 .80
5006 @item @emph{Prototype}: @tab @code{int acc_async_test_all(void);}
5009 @item @emph{Fortran}:
5010 @multitable @columnfractions .20 .80
5011 @item @emph{Interface}: @tab @code{function acc_async_test()}
5012 @item @tab @code{logical acc_get_device_num}
5015 @item @emph{Reference}:
5016 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5023 @section @code{acc_wait} -- Wait for completion of a specific asynchronous operation.
5025 @item @emph{Description}
5026 This function waits for completion of the asynchronous operation
5027 specified in @var{arg}.
5030 @multitable @columnfractions .20 .80
5031 @item @emph{Prototype}: @tab @code{acc_wait(arg);}
5032 @item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait(arg);}
5035 @item @emph{Fortran}:
5036 @multitable @columnfractions .20 .80
5037 @item @emph{Interface}: @tab @code{subroutine acc_wait(arg)}
5038 @item @tab @code{integer(acc_handle_kind) arg}
5039 @item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait(arg)}
5040 @item @tab @code{integer(acc_handle_kind) arg}
5043 @item @emph{Reference}:
5044 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5051 @section @code{acc_wait_all} -- Waits for completion of all asynchronous operations.
5053 @item @emph{Description}
5054 This function waits for the completion of all asynchronous operations.
5057 @multitable @columnfractions .20 .80
5058 @item @emph{Prototype}: @tab @code{acc_wait_all(void);}
5059 @item @emph{Prototype (OpenACC 1.0 compatibility)}: @tab @code{acc_async_wait_all(void);}
5062 @item @emph{Fortran}:
5063 @multitable @columnfractions .20 .80
5064 @item @emph{Interface}: @tab @code{subroutine acc_wait_all()}
5065 @item @emph{Interface (OpenACC 1.0 compatibility)}: @tab @code{subroutine acc_async_wait_all()}
5068 @item @emph{Reference}:
5069 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5075 @node acc_wait_all_async
5076 @section @code{acc_wait_all_async} -- Wait for completion of all asynchronous operations.
5078 @item @emph{Description}
5079 This function enqueues a wait operation on the queue @var{async} for any
5080 and all asynchronous operations that have been previously enqueued on
5084 @multitable @columnfractions .20 .80
5085 @item @emph{Prototype}: @tab @code{acc_wait_all_async(int async);}
5088 @item @emph{Fortran}:
5089 @multitable @columnfractions .20 .80
5090 @item @emph{Interface}: @tab @code{subroutine acc_wait_all_async(async)}
5091 @item @tab @code{integer(acc_handle_kind) async}
5094 @item @emph{Reference}:
5095 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5101 @node acc_wait_async
5102 @section @code{acc_wait_async} -- Wait for completion of asynchronous operations.
5104 @item @emph{Description}
5105 This function enqueues a wait operation on queue @var{async} for any and all
5106 asynchronous operations enqueued on queue @var{arg}.
5109 @multitable @columnfractions .20 .80
5110 @item @emph{Prototype}: @tab @code{acc_wait_async(int arg, int async);}
5113 @item @emph{Fortran}:
5114 @multitable @columnfractions .20 .80
5115 @item @emph{Interface}: @tab @code{subroutine acc_wait_async(arg, async)}
5116 @item @tab @code{integer(acc_handle_kind) arg, async}
5119 @item @emph{Reference}:
5120 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5127 @section @code{acc_init} -- Initialize runtime for a specific device type.
5129 @item @emph{Description}
5130 This function initializes the runtime for the device type specified in
5134 @multitable @columnfractions .20 .80
5135 @item @emph{Prototype}: @tab @code{acc_init(acc_device_t devicetype);}
5138 @item @emph{Fortran}:
5139 @multitable @columnfractions .20 .80
5140 @item @emph{Interface}: @tab @code{subroutine acc_init(devicetype)}
5141 @item @tab @code{integer(acc_device_kind) devicetype}
5144 @item @emph{Reference}:
5145 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5152 @section @code{acc_shutdown} -- Shuts down the runtime for a specific device type.
5154 @item @emph{Description}
5155 This function shuts down the runtime for the device type specified in
5159 @multitable @columnfractions .20 .80
5160 @item @emph{Prototype}: @tab @code{acc_shutdown(acc_device_t devicetype);}
5163 @item @emph{Fortran}:
5164 @multitable @columnfractions .20 .80
5165 @item @emph{Interface}: @tab @code{subroutine acc_shutdown(devicetype)}
5166 @item @tab @code{integer(acc_device_kind) devicetype}
5169 @item @emph{Reference}:
5170 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5177 @section @code{acc_on_device} -- Whether executing on a particular device
5179 @item @emph{Description}:
5180 This function returns whether the program is executing on a particular
5181 device specified in @var{devicetype}. In C/C++ a non-zero value is
5182 returned to indicate the device is executing on the specified device type.
5183 In Fortran, @code{true} is returned. If the program is not executing
5184 on the specified device type C/C++ returns zero, while Fortran
5185 returns @code{false}.
5187 Note that in GCC, depending on @var{devicetype}, the function call might
5188 be folded to a constant in the compiler; compile with
5189 @option{-fno-builtin-acc_on_device} if a run-time function is desired.
5192 @multitable @columnfractions .20 .80
5193 @item @emph{Prototype}: @tab @code{acc_on_device(acc_device_t devicetype);}
5196 @item @emph{Fortran}:
5197 @multitable @columnfractions .20 .80
5198 @item @emph{Interface}: @tab @code{function acc_on_device(devicetype)}
5199 @item @tab @code{integer(acc_device_kind) devicetype}
5200 @item @tab @code{logical acc_on_device}
5203 @item @emph{Reference}:
5204 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5211 @section @code{acc_malloc} -- Allocate device memory.
5213 @item @emph{Description}
5214 This function allocates @var{bytes} bytes of device memory. It returns
5215 the device address of the allocated memory.
5218 @multitable @columnfractions .20 .80
5219 @item @emph{Prototype}: @tab @code{d_void* acc_malloc(size_t bytes);}
5222 @item @emph{Fortran}:
5223 @multitable @columnfractions .20 .80
5224 @item @emph{Interface}: @tab @code{type(c_ptr) function acc_malloc(bytes)}
5225 @item @tab @code{integer(c_size_t), value :: bytes}
5228 @item @emph{Reference}:
5229 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5230 3.2.18. @uref{https://www.openacc.org, openacc specification v3.3}, section
5237 @section @code{acc_free} -- Free device memory.
5239 @item @emph{Description}
5240 Free previously allocated device memory at the device address @code{data_dev}.
5243 @multitable @columnfractions .20 .80
5244 @item @emph{Prototype}: @tab @code{void acc_free(d_void *data_dev);}
5247 @item @emph{Fortran}:
5248 @multitable @columnfractions .20 .80
5249 @item @emph{Interface}: @tab @code{subroutine acc_free(data_dev)}
5250 @item @tab @code{type(c_ptr), value :: data_dev}
5253 @item @emph{Reference}:
5254 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5255 3.2.19. @uref{https://www.openacc.org, openacc specification v3.3}, section
5262 @section @code{acc_copyin} -- Allocate device memory and copy host memory to it.
5264 @item @emph{Description}
5265 In C/C++, this function allocates @var{len} bytes of device memory
5266 and maps it to the specified host address in @var{a}. The device
5267 address of the newly allocated device memory is returned.
5269 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
5270 a contiguous array section. The second form @var{a} specifies a
5271 variable or array element and @var{len} specifies the length in bytes.
5274 @multitable @columnfractions .20 .80
5275 @item @emph{Prototype}: @tab @code{void *acc_copyin(h_void *a, size_t len);}
5276 @item @emph{Prototype}: @tab @code{void *acc_copyin_async(h_void *a, size_t len, int async);}
5279 @item @emph{Fortran}:
5280 @multitable @columnfractions .20 .80
5281 @item @emph{Interface}: @tab @code{subroutine acc_copyin(a)}
5282 @item @tab @code{type, dimension(:[,:]...) :: a}
5283 @item @emph{Interface}: @tab @code{subroutine acc_copyin(a, len)}
5284 @item @tab @code{type, dimension(:[,:]...) :: a}
5285 @item @tab @code{integer len}
5286 @item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, async)}
5287 @item @tab @code{type, dimension(:[,:]...) :: a}
5288 @item @tab @code{integer(acc_handle_kind) :: async}
5289 @item @emph{Interface}: @tab @code{subroutine acc_copyin_async(a, len, async)}
5290 @item @tab @code{type, dimension(:[,:]...) :: a}
5291 @item @tab @code{integer len}
5292 @item @tab @code{integer(acc_handle_kind) :: async}
5295 @item @emph{Reference}:
5296 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5302 @node acc_present_or_copyin
5303 @section @code{acc_present_or_copyin} -- If the data is not present on the device, allocate device memory and copy from host memory.
5305 @item @emph{Description}
5306 This function tests if the host data specified by @var{a} and of length
5307 @var{len} is present or not. If it is not present, device memory
5308 is allocated and the host memory copied. The device address of
5309 the newly allocated device memory is returned.
5311 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
5312 a contiguous array section. The second form @var{a} specifies a variable or
5313 array element and @var{len} specifies the length in bytes.
5315 Note that @code{acc_present_or_copyin} and @code{acc_pcopyin} exist for
5316 backward compatibility with OpenACC 2.0; use @ref{acc_copyin} instead.
5319 @multitable @columnfractions .20 .80
5320 @item @emph{Prototype}: @tab @code{void *acc_present_or_copyin(h_void *a, size_t len);}
5321 @item @emph{Prototype}: @tab @code{void *acc_pcopyin(h_void *a, size_t len);}
5324 @item @emph{Fortran}:
5325 @multitable @columnfractions .20 .80
5326 @item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a)}
5327 @item @tab @code{type, dimension(:[,:]...) :: a}
5328 @item @emph{Interface}: @tab @code{subroutine acc_present_or_copyin(a, len)}
5329 @item @tab @code{type, dimension(:[,:]...) :: a}
5330 @item @tab @code{integer len}
5331 @item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a)}
5332 @item @tab @code{type, dimension(:[,:]...) :: a}
5333 @item @emph{Interface}: @tab @code{subroutine acc_pcopyin(a, len)}
5334 @item @tab @code{type, dimension(:[,:]...) :: a}
5335 @item @tab @code{integer len}
5338 @item @emph{Reference}:
5339 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5346 @section @code{acc_create} -- Allocate device memory and map it to host memory.
5348 @item @emph{Description}
5349 This function allocates device memory and maps it to host memory specified
5350 by the host address @var{a} with a length of @var{len} bytes. In C/C++,
5351 the function returns the device address of the allocated device memory.
5353 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
5354 a contiguous array section. The second form @var{a} specifies a variable or
5355 array element and @var{len} specifies the length in bytes.
5358 @multitable @columnfractions .20 .80
5359 @item @emph{Prototype}: @tab @code{void *acc_create(h_void *a, size_t len);}
5360 @item @emph{Prototype}: @tab @code{void *acc_create_async(h_void *a, size_t len, int async);}
5363 @item @emph{Fortran}:
5364 @multitable @columnfractions .20 .80
5365 @item @emph{Interface}: @tab @code{subroutine acc_create(a)}
5366 @item @tab @code{type, dimension(:[,:]...) :: a}
5367 @item @emph{Interface}: @tab @code{subroutine acc_create(a, len)}
5368 @item @tab @code{type, dimension(:[,:]...) :: a}
5369 @item @tab @code{integer len}
5370 @item @emph{Interface}: @tab @code{subroutine acc_create_async(a, async)}
5371 @item @tab @code{type, dimension(:[,:]...) :: a}
5372 @item @tab @code{integer(acc_handle_kind) :: async}
5373 @item @emph{Interface}: @tab @code{subroutine acc_create_async(a, len, async)}
5374 @item @tab @code{type, dimension(:[,:]...) :: a}
5375 @item @tab @code{integer len}
5376 @item @tab @code{integer(acc_handle_kind) :: async}
5379 @item @emph{Reference}:
5380 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5386 @node acc_present_or_create
5387 @section @code{acc_present_or_create} -- If the data is not present on the device, allocate device memory and map it to host memory.
5389 @item @emph{Description}
5390 This function tests if the host data specified by @var{a} and of length
5391 @var{len} is present or not. If it is not present, device memory
5392 is allocated and mapped to host memory. In C/C++, the device address
5393 of the newly allocated device memory is returned.
5395 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
5396 a contiguous array section. The second form @var{a} specifies a variable or
5397 array element and @var{len} specifies the length in bytes.
5399 Note that @code{acc_present_or_create} and @code{acc_pcreate} exist for
5400 backward compatibility with OpenACC 2.0; use @ref{acc_create} instead.
5403 @multitable @columnfractions .20 .80
5404 @item @emph{Prototype}: @tab @code{void *acc_present_or_create(h_void *a, size_t len)}
5405 @item @emph{Prototype}: @tab @code{void *acc_pcreate(h_void *a, size_t len)}
5408 @item @emph{Fortran}:
5409 @multitable @columnfractions .20 .80
5410 @item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a)}
5411 @item @tab @code{type, dimension(:[,:]...) :: a}
5412 @item @emph{Interface}: @tab @code{subroutine acc_present_or_create(a, len)}
5413 @item @tab @code{type, dimension(:[,:]...) :: a}
5414 @item @tab @code{integer len}
5415 @item @emph{Interface}: @tab @code{subroutine acc_pcreate(a)}
5416 @item @tab @code{type, dimension(:[,:]...) :: a}
5417 @item @emph{Interface}: @tab @code{subroutine acc_pcreate(a, len)}
5418 @item @tab @code{type, dimension(:[,:]...) :: a}
5419 @item @tab @code{integer len}
5422 @item @emph{Reference}:
5423 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5430 @section @code{acc_copyout} -- Copy device memory to host memory.
5432 @item @emph{Description}
5433 This function copies mapped device memory to host memory which is specified
5434 by host address @var{a} for a length @var{len} bytes in C/C++.
5436 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
5437 a contiguous array section. The second form @var{a} specifies a variable or
5438 array element and @var{len} specifies the length in bytes.
5441 @multitable @columnfractions .20 .80
5442 @item @emph{Prototype}: @tab @code{acc_copyout(h_void *a, size_t len);}
5443 @item @emph{Prototype}: @tab @code{acc_copyout_async(h_void *a, size_t len, int async);}
5444 @item @emph{Prototype}: @tab @code{acc_copyout_finalize(h_void *a, size_t len);}
5445 @item @emph{Prototype}: @tab @code{acc_copyout_finalize_async(h_void *a, size_t len, int async);}
5448 @item @emph{Fortran}:
5449 @multitable @columnfractions .20 .80
5450 @item @emph{Interface}: @tab @code{subroutine acc_copyout(a)}
5451 @item @tab @code{type, dimension(:[,:]...) :: a}
5452 @item @emph{Interface}: @tab @code{subroutine acc_copyout(a, len)}
5453 @item @tab @code{type, dimension(:[,:]...) :: a}
5454 @item @tab @code{integer len}
5455 @item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, async)}
5456 @item @tab @code{type, dimension(:[,:]...) :: a}
5457 @item @tab @code{integer(acc_handle_kind) :: async}
5458 @item @emph{Interface}: @tab @code{subroutine acc_copyout_async(a, len, async)}
5459 @item @tab @code{type, dimension(:[,:]...) :: a}
5460 @item @tab @code{integer len}
5461 @item @tab @code{integer(acc_handle_kind) :: async}
5462 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a)}
5463 @item @tab @code{type, dimension(:[,:]...) :: a}
5464 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize(a, len)}
5465 @item @tab @code{type, dimension(:[,:]...) :: a}
5466 @item @tab @code{integer len}
5467 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, async)}
5468 @item @tab @code{type, dimension(:[,:]...) :: a}
5469 @item @tab @code{integer(acc_handle_kind) :: async}
5470 @item @emph{Interface}: @tab @code{subroutine acc_copyout_finalize_async(a, len, async)}
5471 @item @tab @code{type, dimension(:[,:]...) :: a}
5472 @item @tab @code{integer len}
5473 @item @tab @code{integer(acc_handle_kind) :: async}
5476 @item @emph{Reference}:
5477 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5484 @section @code{acc_delete} -- Free device memory.
5486 @item @emph{Description}
5487 This function frees previously allocated device memory specified by
5488 the device address @var{a} and the length of @var{len} bytes.
5490 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
5491 a contiguous array section. The second form @var{a} specifies a variable or
5492 array element and @var{len} specifies the length in bytes.
5495 @multitable @columnfractions .20 .80
5496 @item @emph{Prototype}: @tab @code{acc_delete(h_void *a, size_t len);}
5497 @item @emph{Prototype}: @tab @code{acc_delete_async(h_void *a, size_t len, int async);}
5498 @item @emph{Prototype}: @tab @code{acc_delete_finalize(h_void *a, size_t len);}
5499 @item @emph{Prototype}: @tab @code{acc_delete_finalize_async(h_void *a, size_t len, int async);}
5502 @item @emph{Fortran}:
5503 @multitable @columnfractions .20 .80
5504 @item @emph{Interface}: @tab @code{subroutine acc_delete(a)}
5505 @item @tab @code{type, dimension(:[,:]...) :: a}
5506 @item @emph{Interface}: @tab @code{subroutine acc_delete(a, len)}
5507 @item @tab @code{type, dimension(:[,:]...) :: a}
5508 @item @tab @code{integer len}
5509 @item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, async)}
5510 @item @tab @code{type, dimension(:[,:]...) :: a}
5511 @item @tab @code{integer(acc_handle_kind) :: async}
5512 @item @emph{Interface}: @tab @code{subroutine acc_delete_async(a, len, async)}
5513 @item @tab @code{type, dimension(:[,:]...) :: a}
5514 @item @tab @code{integer len}
5515 @item @tab @code{integer(acc_handle_kind) :: async}
5516 @item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a)}
5517 @item @tab @code{type, dimension(:[,:]...) :: a}
5518 @item @emph{Interface}: @tab @code{subroutine acc_delete_finalize(a, len)}
5519 @item @tab @code{type, dimension(:[,:]...) :: a}
5520 @item @tab @code{integer len}
5521 @item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, async)}
5522 @item @tab @code{type, dimension(:[,:]...) :: a}
5523 @item @tab @code{integer(acc_handle_kind) :: async}
5524 @item @emph{Interface}: @tab @code{subroutine acc_delete_async_finalize(a, len, async)}
5525 @item @tab @code{type, dimension(:[,:]...) :: a}
5526 @item @tab @code{integer len}
5527 @item @tab @code{integer(acc_handle_kind) :: async}
5530 @item @emph{Reference}:
5531 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5537 @node acc_update_device
5538 @section @code{acc_update_device} -- Update device memory from mapped host memory.
5540 @item @emph{Description}
5541 This function updates the device copy from the previously mapped host memory.
5542 The host memory is specified with the host address @var{a} and a length of
5545 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
5546 a contiguous array section. The second form @var{a} specifies a variable or
5547 array element and @var{len} specifies the length in bytes.
5550 @multitable @columnfractions .20 .80
5551 @item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len);}
5552 @item @emph{Prototype}: @tab @code{acc_update_device(h_void *a, size_t len, async);}
5555 @item @emph{Fortran}:
5556 @multitable @columnfractions .20 .80
5557 @item @emph{Interface}: @tab @code{subroutine acc_update_device(a)}
5558 @item @tab @code{type, dimension(:[,:]...) :: a}
5559 @item @emph{Interface}: @tab @code{subroutine acc_update_device(a, len)}
5560 @item @tab @code{type, dimension(:[,:]...) :: a}
5561 @item @tab @code{integer len}
5562 @item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, async)}
5563 @item @tab @code{type, dimension(:[,:]...) :: a}
5564 @item @tab @code{integer(acc_handle_kind) :: async}
5565 @item @emph{Interface}: @tab @code{subroutine acc_update_device_async(a, len, async)}
5566 @item @tab @code{type, dimension(:[,:]...) :: a}
5567 @item @tab @code{integer len}
5568 @item @tab @code{integer(acc_handle_kind) :: async}
5571 @item @emph{Reference}:
5572 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5578 @node acc_update_self
5579 @section @code{acc_update_self} -- Update host memory from mapped device memory.
5581 @item @emph{Description}
5582 This function updates the host copy from the previously mapped device memory.
5583 The host memory is specified with the host address @var{a} and a length of
5586 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
5587 a contiguous array section. The second form @var{a} specifies a variable or
5588 array element and @var{len} specifies the length in bytes.
5591 @multitable @columnfractions .20 .80
5592 @item @emph{Prototype}: @tab @code{acc_update_self(h_void *a, size_t len);}
5593 @item @emph{Prototype}: @tab @code{acc_update_self_async(h_void *a, size_t len, int async);}
5596 @item @emph{Fortran}:
5597 @multitable @columnfractions .20 .80
5598 @item @emph{Interface}: @tab @code{subroutine acc_update_self(a)}
5599 @item @tab @code{type, dimension(:[,:]...) :: a}
5600 @item @emph{Interface}: @tab @code{subroutine acc_update_self(a, len)}
5601 @item @tab @code{type, dimension(:[,:]...) :: a}
5602 @item @tab @code{integer len}
5603 @item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, async)}
5604 @item @tab @code{type, dimension(:[,:]...) :: a}
5605 @item @tab @code{integer(acc_handle_kind) :: async}
5606 @item @emph{Interface}: @tab @code{subroutine acc_update_self_async(a, len, async)}
5607 @item @tab @code{type, dimension(:[,:]...) :: a}
5608 @item @tab @code{integer len}
5609 @item @tab @code{integer(acc_handle_kind) :: async}
5612 @item @emph{Reference}:
5613 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5620 @section @code{acc_map_data} -- Map previously allocated device memory to host memory.
5622 @item @emph{Description}
5623 This function maps previously allocated device and host memory. The device
5624 memory is specified with the device address @var{data_dev}. The host memory is
5625 specified with the host address @var{data_arg} and a length of @var{bytes}.
5628 @multitable @columnfractions .20 .80
5629 @item @emph{Prototype}: @tab @code{void acc_map_data(h_void *data_arg, d_void *data_dev, size_t bytes);}
5632 @item @emph{Fortran}:
5633 @multitable @columnfractions .20 .80
5634 @item @emph{Interface}: @tab @code{subroutine acc_map_data(data_arg, data_dev, bytes)}
5635 @item @tab @code{type(*), dimension(*) :: data_arg}
5636 @item @tab @code{type(c_ptr), value :: data_dev}
5637 @item @tab @code{integer(c_size_t), value :: bytes}
5640 @item @emph{Reference}:
5641 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5642 3.2.26. @uref{https://www.openacc.org, OpenACC specification v3.3}, section
5648 @node acc_unmap_data
5649 @section @code{acc_unmap_data} -- Unmap device memory from host memory.
5651 @item @emph{Description}
5652 This function unmaps previously mapped device and host memory. The latter
5653 specified by @var{data_arg}.
5656 @multitable @columnfractions .20 .80
5657 @item @emph{Prototype}: @tab @code{void acc_unmap_data(h_void *data_arg);}
5660 @item @emph{Fortran}:
5661 @multitable @columnfractions .20 .80
5662 @item @emph{Interface}: @tab @code{subroutine acc_unmap_data(data_arg)}
5663 @item @tab @code{type(*), dimension(*) :: data_arg}
5666 @item @emph{Reference}:
5667 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5668 3.2.27. @uref{https://www.openacc.org, OpenACC specification v3.3}, section
5675 @section @code{acc_deviceptr} -- Get device pointer associated with specific host address.
5677 @item @emph{Description}
5678 This function returns the device address that has been mapped to the
5679 host address specified by @var{data_arg}.
5682 @multitable @columnfractions .20 .80
5683 @item @emph{Prototype}: @tab @code{void *acc_deviceptr(h_void *data_arg);}
5686 @item @emph{Fortran}:
5687 @multitable @columnfractions .20 .80
5688 @item @emph{Interface}: @tab @code{type(c_ptr) function acc_deviceptr(data_arg)}
5689 @item @tab @code{type(*), dimension(*) :: data_arg}
5692 @item @emph{Reference}:
5693 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5694 3.2.28. @uref{https://www.openacc.org, OpenACC specification v3.3}, section
5701 @section @code{acc_hostptr} -- Get host pointer associated with specific device address.
5703 @item @emph{Description}
5704 This function returns the host address that has been mapped to the
5705 device address specified by @var{data_dev}.
5708 @multitable @columnfractions .20 .80
5709 @item @emph{Prototype}: @tab @code{void *acc_hostptr(d_void *data_dev);}
5712 @item @emph{Fortran}:
5713 @multitable @columnfractions .20 .80
5714 @item @emph{Interface}: @tab @code{type(c_ptr) function acc_hostptr(data_dev)}
5715 @item @tab @code{type(c_ptr), value :: data_dev}
5718 @item @emph{Reference}:
5719 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5720 3.2.29. @uref{https://www.openacc.org, OpenACC specification v3.3}, section
5726 @node acc_is_present
5727 @section @code{acc_is_present} -- Indicate whether host variable / array is present on device.
5729 @item @emph{Description}
5730 This function indicates whether the specified host address in @var{a} and a
5731 length of @var{len} bytes is present on the device. In C/C++, a non-zero
5732 value is returned to indicate the presence of the mapped memory on the
5733 device. A zero is returned to indicate the memory is not mapped on the
5736 In Fortran, two (2) forms are supported. In the first form, @var{a} specifies
5737 a contiguous array section. The second form @var{a} specifies a variable or
5738 array element and @var{len} specifies the length in bytes. If the host
5739 memory is mapped to device memory, then a @code{true} is returned. Otherwise,
5740 a @code{false} is return to indicate the mapped memory is not present.
5743 @multitable @columnfractions .20 .80
5744 @item @emph{Prototype}: @tab @code{int acc_is_present(h_void *a, size_t len);}
5747 @item @emph{Fortran}:
5748 @multitable @columnfractions .20 .80
5749 @item @emph{Interface}: @tab @code{function acc_is_present(a)}
5750 @item @tab @code{type, dimension(:[,:]...) :: a}
5751 @item @tab @code{logical acc_is_present}
5752 @item @emph{Interface}: @tab @code{function acc_is_present(a, len)}
5753 @item @tab @code{type, dimension(:[,:]...) :: a}
5754 @item @tab @code{integer len}
5755 @item @tab @code{logical acc_is_present}
5758 @item @emph{Reference}:
5759 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5765 @node acc_memcpy_to_device
5766 @section @code{acc_memcpy_to_device} -- Copy host memory to device memory.
5768 @item @emph{Description}
5769 This function copies host memory specified by host address of
5770 @var{data_host_src} to device memory specified by the device address
5771 @var{data_dev_dest} for a length of @var{bytes} bytes.
5774 @multitable @columnfractions .20 .80
5775 @item @emph{Prototype}: @tab @code{void acc_memcpy_to_device(d_void* data_dev_dest,}
5776 @item @tab @code{h_void* data_host_src, size_t bytes);}
5777 @item @emph{Prototype}: @tab @code{void acc_memcpy_to_device_async(d_void* data_dev_dest,}
5778 @item @tab @code{h_void* data_host_src, size_t bytes, int async_arg);}
5781 @item @emph{Fortran}:
5782 @multitable @columnfractions .20 .80
5783 @item @emph{Interface}: @tab @code{subroutine acc_memcpy_to_device(data_dev_dest, &}
5784 @item @tab @code{data_host_src, bytes)}
5785 @item @emph{Interface}: @tab @code{subroutine acc_memcpy_to_device_async(data_dev_dest, &}
5786 @item @tab @code{data_host_src, bytes, async_arg)}
5787 @item @tab @code{type(c_ptr), value :: data_dev_dest}
5788 @item @tab @code{type(*), dimension(*) :: data_host_src}
5789 @item @tab @code{integer(c_size_t), value :: bytes}
5790 @item @tab @code{integer(acc_handle_kind), value :: async_arg}
5793 @item @emph{Reference}:
5794 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5795 3.2.31 @uref{https://www.openacc.org, OpenACC specification v3.3}, section
5801 @node acc_memcpy_from_device
5802 @section @code{acc_memcpy_from_device} -- Copy device memory to host memory.
5804 @item @emph{Description}
5805 This function copies device memory specified by device address of
5806 @var{data_dev_src} to host memory specified by the host address
5807 @var{data_host_dest} for a length of @var{bytes} bytes.
5810 @multitable @columnfractions .20 .80
5811 @item @emph{Prototype}: @tab @code{void acc_memcpy_from_device(h_void* data_host_dest,}
5812 @item @tab @code{d_void* data_dev_src, size_t bytes);}
5813 @item @emph{Prototype}: @tab @code{void acc_memcpy_from_device_async(h_void* data_host_dest,}
5814 @item @tab @code{d_void* data_dev_src, size_t bytes, int async_arg);}
5817 @item @emph{Fortran}:
5818 @multitable @columnfractions .20 .80
5819 @item @emph{Interface}: @tab @code{subroutine acc_memcpy_from_device(data_host_dest, &}
5820 @item @tab @code{data_dev_src, bytes)}
5821 @item @emph{Interface}: @tab @code{subroutine acc_memcpy_from_device_async(data_host_dest, &}
5822 @item @tab @code{data_dev_src, bytes, async_arg)}
5823 @item @tab @code{type(*), dimension(*) :: data_host_dest}
5824 @item @tab @code{type(c_ptr), value :: data_dev_src}
5825 @item @tab @code{integer(c_size_t), value :: bytes}
5826 @item @tab @code{integer(acc_handle_kind), value :: async_arg}
5829 @item @emph{Reference}:
5830 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5831 3.2.32. @uref{https://www.openacc.org, OpenACC specification v3.3}, section
5838 @section @code{acc_attach} -- Let device pointer point to device-pointer target.
5840 @item @emph{Description}
5841 This function updates a pointer on the device from pointing to a host-pointer
5842 address to pointing to the corresponding device data.
5845 @multitable @columnfractions .20 .80
5846 @item @emph{Prototype}: @tab @code{void acc_attach(h_void **ptr_addr);}
5847 @item @emph{Prototype}: @tab @code{void acc_attach_async(h_void **ptr_addr, int async);}
5850 @c @item @emph{Fortran}:
5851 @c @multitable @columnfractions .20 .80
5852 @c @item @emph{Interface}: @tab @code{subroutine acc_attach(ptr_addr)}
5853 @c @item @emph{Interface}: @tab @code{subroutine acc_attach_async(ptr_addr, async_arg)}
5854 @c @item @tab @code{type(*), dimension(..) :: ptr_addr}
5855 @c @item @tab @code{integer(acc_handle_kind), value :: async_arg}
5858 @item @emph{Reference}:
5859 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5861 @c @uref{https://www.openacc.org, OpenACC specification v3.3}, section
5868 @section @code{acc_detach} -- Let device pointer point to host-pointer target.
5870 @item @emph{Description}
5871 This function updates a pointer on the device from pointing to a device-pointer
5872 address to pointing to the corresponding host data.
5875 @multitable @columnfractions .20 .80
5876 @item @emph{Prototype}: @tab @code{void acc_detach(h_void **ptr_addr);}
5877 @item @emph{Prototype}: @tab @code{void acc_detach_async(h_void **ptr_addr, int async);}
5878 @item @emph{Prototype}: @tab @code{void acc_detach_finalize(h_void **ptr_addr);}
5879 @item @emph{Prototype}: @tab @code{void acc_detach_finalize_async(h_void **ptr_addr, int async);}
5882 @c @item @emph{Fortran}:
5883 @c @multitable @columnfractions .20 .80
5884 @c @item @emph{Interface}: @tab @code{subroutine acc_detach(ptr_addr)}
5885 @c @item @emph{Interface}: @tab @code{subroutine acc_detach_async(ptr_addr, async_arg)}
5886 @c @item @emph{Interface}: @tab @code{subroutine acc_detach_finalize(ptr_addr)}
5887 @c @item @emph{Interface}: @tab @code{subroutine acc_detach_finalize_async(ptr_addr, async_arg)}
5888 @c @item @tab @code{type(*), dimension(..) :: ptr_addr}
5889 @c @item @tab @code{integer(acc_handle_kind), value :: async_arg}
5892 @item @emph{Reference}:
5893 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5895 @c @uref{https://www.openacc.org, OpenACC specification v3.3}, section
5901 @node acc_get_current_cuda_device
5902 @section @code{acc_get_current_cuda_device} -- Get CUDA device handle.
5904 @item @emph{Description}
5905 This function returns the CUDA device handle. This handle is the same
5906 as used by the CUDA Runtime or Driver API's.
5909 @multitable @columnfractions .20 .80
5910 @item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_device(void);}
5913 @item @emph{Reference}:
5914 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5920 @node acc_get_current_cuda_context
5921 @section @code{acc_get_current_cuda_context} -- Get CUDA context handle.
5923 @item @emph{Description}
5924 This function returns the CUDA context handle. This handle is the same
5925 as used by the CUDA Runtime or Driver API's.
5928 @multitable @columnfractions .20 .80
5929 @item @emph{Prototype}: @tab @code{void *acc_get_current_cuda_context(void);}
5932 @item @emph{Reference}:
5933 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5939 @node acc_get_cuda_stream
5940 @section @code{acc_get_cuda_stream} -- Get CUDA stream handle.
5942 @item @emph{Description}
5943 This function returns the CUDA stream handle for the queue @var{async}.
5944 This handle is the same as used by the CUDA Runtime or Driver API's.
5947 @multitable @columnfractions .20 .80
5948 @item @emph{Prototype}: @tab @code{void *acc_get_cuda_stream(int async);}
5951 @item @emph{Reference}:
5952 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5958 @node acc_set_cuda_stream
5959 @section @code{acc_set_cuda_stream} -- Set CUDA stream handle.
5961 @item @emph{Description}
5962 This function associates the stream handle specified by @var{stream} with
5963 the queue @var{async}.
5965 This cannot be used to change the stream handle associated with
5966 @code{acc_async_sync}.
5968 The return value is not specified.
5971 @multitable @columnfractions .20 .80
5972 @item @emph{Prototype}: @tab @code{int acc_set_cuda_stream(int async, void *stream);}
5975 @item @emph{Reference}:
5976 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
5982 @node acc_prof_register
5983 @section @code{acc_prof_register} -- Register callbacks.
5985 @item @emph{Description}:
5986 This function registers callbacks.
5989 @multitable @columnfractions .20 .80
5990 @item @emph{Prototype}: @tab @code{void acc_prof_register (acc_event_t, acc_prof_callback, acc_register_t);}
5993 @item @emph{See also}:
5994 @ref{OpenACC Profiling Interface}
5996 @item @emph{Reference}:
5997 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
6003 @node acc_prof_unregister
6004 @section @code{acc_prof_unregister} -- Unregister callbacks.
6006 @item @emph{Description}:
6007 This function unregisters callbacks.
6010 @multitable @columnfractions .20 .80
6011 @item @emph{Prototype}: @tab @code{void acc_prof_unregister (acc_event_t, acc_prof_callback, acc_register_t);}
6014 @item @emph{See also}:
6015 @ref{OpenACC Profiling Interface}
6017 @item @emph{Reference}:
6018 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
6024 @node acc_prof_lookup
6025 @section @code{acc_prof_lookup} -- Obtain inquiry functions.
6027 @item @emph{Description}:
6028 Function to obtain inquiry functions.
6031 @multitable @columnfractions .20 .80
6032 @item @emph{Prototype}: @tab @code{acc_query_fn acc_prof_lookup (const char *);}
6035 @item @emph{See also}:
6036 @ref{OpenACC Profiling Interface}
6038 @item @emph{Reference}:
6039 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
6045 @node acc_register_library
6046 @section @code{acc_register_library} -- Library registration.
6048 @item @emph{Description}:
6049 Function for library registration.
6052 @multitable @columnfractions .20 .80
6053 @item @emph{Prototype}: @tab @code{void acc_register_library (acc_prof_reg, acc_prof_reg, acc_prof_lookup_func);}
6056 @item @emph{See also}:
6057 @ref{OpenACC Profiling Interface}, @ref{ACC_PROFLIB}
6059 @item @emph{Reference}:
6060 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
6066 @c ---------------------------------------------------------------------
6067 @c OpenACC Environment Variables
6068 @c ---------------------------------------------------------------------
6070 @node OpenACC Environment Variables
6071 @chapter OpenACC Environment Variables
6073 The variables @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}
6074 are defined by section 4 of the OpenACC specification in version 2.0.
6075 The variable @env{ACC_PROFLIB}
6076 is defined by section 4 of the OpenACC specification in version 2.6.
6086 @node ACC_DEVICE_TYPE
6087 @section @code{ACC_DEVICE_TYPE}
6089 @item @emph{Description}:
6090 Control the default device type to use when executing compute regions.
6091 If unset, the code can be run on any device type, favoring a non-host
6094 Supported values in GCC (if compiled in) are
6100 @item @emph{Reference}:
6101 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
6107 @node ACC_DEVICE_NUM
6108 @section @code{ACC_DEVICE_NUM}
6110 @item @emph{Description}:
6111 Control which device, identified by device number, is the default device.
6112 The value must be a nonnegative integer less than the number of devices.
6113 If unset, device number zero is used.
6114 @item @emph{Reference}:
6115 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
6122 @section @code{ACC_PROFLIB}
6124 @item @emph{Description}:
6125 Semicolon-separated list of dynamic libraries that are loaded as profiling
6126 libraries. Each library must provide at least the @code{acc_register_library}
6127 routine. Each library file is found as described by the documentation of
6128 @code{dlopen} of your operating system.
6129 @item @emph{See also}:
6130 @ref{acc_register_library}, @ref{OpenACC Profiling Interface}
6132 @item @emph{Reference}:
6133 @uref{https://www.openacc.org, OpenACC specification v2.6}, section
6139 @c ---------------------------------------------------------------------
6140 @c CUDA Streams Usage
6141 @c ---------------------------------------------------------------------
6143 @node CUDA Streams Usage
6144 @chapter CUDA Streams Usage
6146 This applies to the @code{nvptx} plugin only.
6148 The library provides elements that perform asynchronous movement of
6149 data and asynchronous operation of computing constructs. This
6150 asynchronous functionality is implemented by making use of CUDA
6151 streams@footnote{See "Stream Management" in "CUDA Driver API",
6152 TRM-06703-001, Version 5.5, for additional information}.
6154 The primary means by that the asynchronous functionality is accessed
6155 is through the use of those OpenACC directives which make use of the
6156 @code{async} and @code{wait} clauses. When the @code{async} clause is
6157 first used with a directive, it creates a CUDA stream. If an
6158 @code{async-argument} is used with the @code{async} clause, then the
6159 stream is associated with the specified @code{async-argument}.
6161 Following the creation of an association between a CUDA stream and the
6162 @code{async-argument} of an @code{async} clause, both the @code{wait}
6163 clause and the @code{wait} directive can be used. When either the
6164 clause or directive is used after stream creation, it creates a
6165 rendezvous point whereby execution waits until all operations
6166 associated with the @code{async-argument}, that is, stream, have
6169 Normally, the management of the streams that are created as a result of
6170 using the @code{async} clause, is done without any intervention by the
6171 caller. This implies the association between the @code{async-argument}
6172 and the CUDA stream is maintained for the lifetime of the program.
6173 However, this association can be changed through the use of the library
6174 function @code{acc_set_cuda_stream}. When the function
6175 @code{acc_set_cuda_stream} is called, the CUDA stream that was
6176 originally associated with the @code{async} clause is destroyed.
6177 Caution should be taken when changing the association as subsequent
6178 references to the @code{async-argument} refer to a different
6183 @c ---------------------------------------------------------------------
6184 @c OpenACC Library Interoperability
6185 @c ---------------------------------------------------------------------
6187 @node OpenACC Library Interoperability
6188 @chapter OpenACC Library Interoperability
6190 @section Introduction
6192 The OpenACC library uses the CUDA Driver API, and may interact with
6193 programs that use the Runtime library directly, or another library
6194 based on the Runtime library, e.g., CUBLAS@footnote{See section 2.26,
6195 "Interactions with the CUDA Driver API" in
6196 "CUDA Runtime API", Version 5.5, and section 2.27, "VDPAU
6197 Interoperability", in "CUDA Driver API", TRM-06703-001, Version 5.5,
6198 for additional information on library interoperability.}.
6199 This chapter describes the use cases and what changes are
6200 required in order to use both the OpenACC library and the CUBLAS and Runtime
6201 libraries within a program.
6203 @section First invocation: NVIDIA CUBLAS library API
6205 In this first use case (see below), a function in the CUBLAS library is called
6206 prior to any of the functions in the OpenACC library. More specifically, the
6207 function @code{cublasCreate()}.
6209 When invoked, the function initializes the library and allocates the
6210 hardware resources on the host and the device on behalf of the caller. Once
6211 the initialization and allocation has completed, a handle is returned to the
6212 caller. The OpenACC library also requires initialization and allocation of
6213 hardware resources. Since the CUBLAS library has already allocated the
6214 hardware resources for the device, all that is left to do is to initialize
6215 the OpenACC library and acquire the hardware resources on the host.
6217 Prior to calling the OpenACC function that initializes the library and
6218 allocate the host hardware resources, you need to acquire the device number
6219 that was allocated during the call to @code{cublasCreate()}. The invoking of the
6220 runtime library function @code{cudaGetDevice()} accomplishes this. Once
6221 acquired, the device number is passed along with the device type as
6222 parameters to the OpenACC library function @code{acc_set_device_num()}.
6224 Once the call to @code{acc_set_device_num()} has completed, the OpenACC
6225 library uses the context that was created during the call to
6226 @code{cublasCreate()}. In other words, both libraries share the
6230 /* Create the handle */
6231 s = cublasCreate(&h);
6232 if (s != CUBLAS_STATUS_SUCCESS)
6234 fprintf(stderr, "cublasCreate failed %d\n", s);
6238 /* Get the device number */
6239 e = cudaGetDevice(&dev);
6240 if (e != cudaSuccess)
6242 fprintf(stderr, "cudaGetDevice failed %d\n", e);
6246 /* Initialize OpenACC library and use device 'dev' */
6247 acc_set_device_num(dev, acc_device_nvidia);
6252 @section First invocation: OpenACC library API
6254 In this second use case (see below), a function in the OpenACC library is
6255 called prior to any of the functions in the CUBLAS library. More specifically,
6256 the function @code{acc_set_device_num()}.
6258 In the use case presented here, the function @code{acc_set_device_num()}
6259 is used to both initialize the OpenACC library and allocate the hardware
6260 resources on the host and the device. In the call to the function, the
6261 call parameters specify which device to use and what device
6262 type to use, i.e., @code{acc_device_nvidia}. It should be noted that this
6263 is but one method to initialize the OpenACC library and allocate the
6264 appropriate hardware resources. Other methods are available through the
6265 use of environment variables and these is discussed in the next section.
6267 Once the call to @code{acc_set_device_num()} has completed, other OpenACC
6268 functions can be called as seen with multiple calls being made to
6269 @code{acc_copyin()}. In addition, calls can be made to functions in the
6270 CUBLAS library. In the use case a call to @code{cublasCreate()} is made
6271 subsequent to the calls to @code{acc_copyin()}.
6272 As seen in the previous use case, a call to @code{cublasCreate()}
6273 initializes the CUBLAS library and allocates the hardware resources on the
6274 host and the device. However, since the device has already been allocated,
6275 @code{cublasCreate()} only initializes the CUBLAS library and allocates
6276 the appropriate hardware resources on the host. The context that was created
6277 as part of the OpenACC initialization is shared with the CUBLAS library,
6278 similarly to the first use case.
6283 acc_set_device_num(dev, acc_device_nvidia);
6285 /* Copy the first set to the device */
6286 d_X = acc_copyin(&h_X[0], N * sizeof (float));
6289 fprintf(stderr, "copyin error h_X\n");
6293 /* Copy the second set to the device */
6294 d_Y = acc_copyin(&h_Y1[0], N * sizeof (float));
6297 fprintf(stderr, "copyin error h_Y1\n");
6301 /* Create the handle */
6302 s = cublasCreate(&h);
6303 if (s != CUBLAS_STATUS_SUCCESS)
6305 fprintf(stderr, "cublasCreate failed %d\n", s);
6309 /* Perform saxpy using CUBLAS library function */
6310 s = cublasSaxpy(h, N, &alpha, d_X, 1, d_Y, 1);
6311 if (s != CUBLAS_STATUS_SUCCESS)
6313 fprintf(stderr, "cublasSaxpy failed %d\n", s);
6317 /* Copy the results from the device */
6318 acc_memcpy_from_device(&h_Y1[0], d_Y, N * sizeof (float));
6323 @section OpenACC library and environment variables
6325 There are two environment variables associated with the OpenACC library
6326 that may be used to control the device type and device number:
6327 @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM}, respectively. These two
6328 environment variables can be used as an alternative to calling
6329 @code{acc_set_device_num()}. As seen in the second use case, the device
6330 type and device number were specified using @code{acc_set_device_num()}.
6331 If however, the aforementioned environment variables were set, then the
6332 call to @code{acc_set_device_num()} would not be required.
6335 The use of the environment variables is only relevant when an OpenACC function
6336 is called prior to a call to @code{cudaCreate()}. If @code{cudaCreate()}
6337 is called prior to a call to an OpenACC function, then you must call
6338 @code{acc_set_device_num()}@footnote{More complete information
6339 about @env{ACC_DEVICE_TYPE} and @env{ACC_DEVICE_NUM} can be found in
6340 sections 4.1 and 4.2 of the @uref{https://www.openacc.org, OpenACC}
6341 Application Programming Interface”, Version 2.6.}
6345 @c ---------------------------------------------------------------------
6346 @c OpenACC Profiling Interface
6347 @c ---------------------------------------------------------------------
6349 @node OpenACC Profiling Interface
6350 @chapter OpenACC Profiling Interface
6352 @section Implementation Status and Implementation-Defined Behavior
6354 We're implementing the OpenACC Profiling Interface as defined by the
6355 OpenACC 2.6 specification. We're clarifying some aspects here as
6356 @emph{implementation-defined behavior}, while they're still under
6357 discussion within the OpenACC Technical Committee.
6359 This implementation is tuned to keep the performance impact as low as
6360 possible for the (very common) case that the Profiling Interface is
6361 not enabled. This is relevant, as the Profiling Interface affects all
6362 the @emph{hot} code paths (in the target code, not in the offloaded
6363 code). Users of the OpenACC Profiling Interface can be expected to
6364 understand that performance is impacted to some degree once the
6365 Profiling Interface is enabled: for example, because of the
6366 @emph{runtime} (libgomp) calling into a third-party @emph{library} for
6367 every event that has been registered.
6369 We're not yet accounting for the fact that @cite{OpenACC events may
6370 occur during event processing}.
6371 We just handle one case specially, as required by CUDA 9.0
6372 @command{nvprof}, that @code{acc_get_device_type}
6373 (@ref{acc_get_device_type})) may be called from
6374 @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
6377 We're not yet implementing initialization via a
6378 @code{acc_register_library} function that is either statically linked
6379 in, or dynamically via @env{LD_PRELOAD}.
6380 Initialization via @code{acc_register_library} functions dynamically
6381 loaded via the @env{ACC_PROFLIB} environment variable does work, as
6382 does directly calling @code{acc_prof_register},
6383 @code{acc_prof_unregister}, @code{acc_prof_lookup}.
6385 As currently there are no inquiry functions defined, calls to
6386 @code{acc_prof_lookup} always returns @code{NULL}.
6388 There aren't separate @emph{start}, @emph{stop} events defined for the
6389 event types @code{acc_ev_create}, @code{acc_ev_delete},
6390 @code{acc_ev_alloc}, @code{acc_ev_free}. It's not clear if these
6391 should be triggered before or after the actual device-specific call is
6392 made. We trigger them after.
6394 Remarks about data provided to callbacks:
6398 @item @code{acc_prof_info.event_type}
6399 It's not clear if for @emph{nested} event callbacks (for example,
6400 @code{acc_ev_enqueue_launch_start} as part of a parent compute
6401 construct), this should be set for the nested event
6402 (@code{acc_ev_enqueue_launch_start}), or if the value of the parent
6403 construct should remain (@code{acc_ev_compute_construct_start}). In
6404 this implementation, the value generally corresponds to the
6405 innermost nested event type.
6407 @item @code{acc_prof_info.device_type}
6411 For @code{acc_ev_compute_construct_start}, and in presence of an
6412 @code{if} clause with @emph{false} argument, this still refers to
6413 the offloading device type.
6414 It's not clear if that's the expected behavior.
6417 Complementary to the item before, for
6418 @code{acc_ev_compute_construct_end}, this is set to
6419 @code{acc_device_host} in presence of an @code{if} clause with
6420 @emph{false} argument.
6421 It's not clear if that's the expected behavior.
6425 @item @code{acc_prof_info.thread_id}
6426 Always @code{-1}; not yet implemented.
6428 @item @code{acc_prof_info.async}
6432 Not yet implemented correctly for
6433 @code{acc_ev_compute_construct_start}.
6436 In a compute construct, for host-fallback
6437 execution/@code{acc_device_host} it always is
6438 @code{acc_async_sync}.
6439 It is unclear if that is the expected behavior.
6442 For @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end},
6443 it will always be @code{acc_async_sync}.
6444 It is unclear if that is the expected behavior.
6448 @item @code{acc_prof_info.async_queue}
6449 There is no @cite{limited number of asynchronous queues} in libgomp.
6450 This always has the same value as @code{acc_prof_info.async}.
6452 @item @code{acc_prof_info.src_file}
6453 Always @code{NULL}; not yet implemented.
6455 @item @code{acc_prof_info.func_name}
6456 Always @code{NULL}; not yet implemented.
6458 @item @code{acc_prof_info.line_no}
6459 Always @code{-1}; not yet implemented.
6461 @item @code{acc_prof_info.end_line_no}
6462 Always @code{-1}; not yet implemented.
6464 @item @code{acc_prof_info.func_line_no}
6465 Always @code{-1}; not yet implemented.
6467 @item @code{acc_prof_info.func_end_line_no}
6468 Always @code{-1}; not yet implemented.
6470 @item @code{acc_event_info.event_type}, @code{acc_event_info.*.event_type}
6471 Relating to @code{acc_prof_info.event_type} discussed above, in this
6472 implementation, this will always be the same value as
6473 @code{acc_prof_info.event_type}.
6475 @item @code{acc_event_info.*.parent_construct}
6479 Will be @code{acc_construct_parallel} for all OpenACC compute
6480 constructs as well as many OpenACC Runtime API calls; should be the
6481 one matching the actual construct, or
6482 @code{acc_construct_runtime_api}, respectively.
6485 Will be @code{acc_construct_enter_data} or
6486 @code{acc_construct_exit_data} when processing variable mappings
6487 specified in OpenACC @emph{declare} directives; should be
6488 @code{acc_construct_declare}.
6491 For implicit @code{acc_ev_device_init_start},
6492 @code{acc_ev_device_init_end}, and explicit as well as implicit
6493 @code{acc_ev_alloc}, @code{acc_ev_free},
6494 @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
6495 @code{acc_ev_enqueue_download_start}, and
6496 @code{acc_ev_enqueue_download_end}, will be
6497 @code{acc_construct_parallel}; should reflect the real parent
6502 @item @code{acc_event_info.*.implicit}
6503 For @code{acc_ev_alloc}, @code{acc_ev_free},
6504 @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end},
6505 @code{acc_ev_enqueue_download_start}, and
6506 @code{acc_ev_enqueue_download_end}, this currently will be @code{1}
6507 also for explicit usage.
6509 @item @code{acc_event_info.data_event.var_name}
6510 Always @code{NULL}; not yet implemented.
6512 @item @code{acc_event_info.data_event.host_ptr}
6513 For @code{acc_ev_alloc}, and @code{acc_ev_free}, this is always
6516 @item @code{typedef union acc_api_info}
6517 @dots{} as printed in @cite{5.2.3. Third Argument: API-Specific
6518 Information}. This should obviously be @code{typedef @emph{struct}
6521 @item @code{acc_api_info.device_api}
6522 Possibly not yet implemented correctly for
6523 @code{acc_ev_compute_construct_start},
6524 @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}:
6525 will always be @code{acc_device_api_none} for these event types.
6526 For @code{acc_ev_enter_data_start}, it will be
6527 @code{acc_device_api_none} in some cases.
6529 @item @code{acc_api_info.device_type}
6530 Always the same as @code{acc_prof_info.device_type}.
6532 @item @code{acc_api_info.vendor}
6533 Always @code{-1}; not yet implemented.
6535 @item @code{acc_api_info.device_handle}
6536 Always @code{NULL}; not yet implemented.
6538 @item @code{acc_api_info.context_handle}
6539 Always @code{NULL}; not yet implemented.
6541 @item @code{acc_api_info.async_handle}
6542 Always @code{NULL}; not yet implemented.
6546 Remarks about certain event types:
6550 @item @code{acc_ev_device_init_start}, @code{acc_ev_device_init_end}
6554 @c See 'DEVICE_INIT_INSIDE_COMPUTE_CONSTRUCT' in
6555 @c 'libgomp.oacc-c-c++-common/acc_prof-kernels-1.c',
6556 @c 'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
6557 When a compute construct triggers implicit
6558 @code{acc_ev_device_init_start} and @code{acc_ev_device_init_end}
6559 events, they currently aren't @emph{nested within} the corresponding
6560 @code{acc_ev_compute_construct_start} and
6561 @code{acc_ev_compute_construct_end}, but they're currently observed
6562 @emph{before} @code{acc_ev_compute_construct_start}.
6563 It's not clear what to do: the standard asks us provide a lot of
6564 details to the @code{acc_ev_compute_construct_start} callback, without
6565 (implicitly) initializing a device before?
6568 Callbacks for these event types will not be invoked for calls to the
6569 @code{acc_set_device_type} and @code{acc_set_device_num} functions.
6570 It's not clear if they should be.
6574 @item @code{acc_ev_enter_data_start}, @code{acc_ev_enter_data_end}, @code{acc_ev_exit_data_start}, @code{acc_ev_exit_data_end}
6578 Callbacks for these event types will also be invoked for OpenACC
6579 @emph{host_data} constructs.
6580 It's not clear if they should be.
6583 Callbacks for these event types will also be invoked when processing
6584 variable mappings specified in OpenACC @emph{declare} directives.
6585 It's not clear if they should be.
6591 Callbacks for the following event types will be invoked, but dispatch
6592 and information provided therein has not yet been thoroughly reviewed:
6595 @item @code{acc_ev_alloc}
6596 @item @code{acc_ev_free}
6597 @item @code{acc_ev_update_start}, @code{acc_ev_update_end}
6598 @item @code{acc_ev_enqueue_upload_start}, @code{acc_ev_enqueue_upload_end}
6599 @item @code{acc_ev_enqueue_download_start}, @code{acc_ev_enqueue_download_end}
6602 During device initialization, and finalization, respectively,
6603 callbacks for the following event types will not yet be invoked:
6606 @item @code{acc_ev_alloc}
6607 @item @code{acc_ev_free}
6610 Callbacks for the following event types have not yet been implemented,
6611 so currently won't be invoked:
6614 @item @code{acc_ev_device_shutdown_start}, @code{acc_ev_device_shutdown_end}
6615 @item @code{acc_ev_runtime_shutdown}
6616 @item @code{acc_ev_create}, @code{acc_ev_delete}
6617 @item @code{acc_ev_wait_start}, @code{acc_ev_wait_end}
6620 For the following runtime library functions, not all expected
6621 callbacks will be invoked (mostly concerning implicit device
6625 @item @code{acc_get_num_devices}
6626 @item @code{acc_set_device_type}
6627 @item @code{acc_get_device_type}
6628 @item @code{acc_set_device_num}
6629 @item @code{acc_get_device_num}
6630 @item @code{acc_init}
6631 @item @code{acc_shutdown}
6634 Aside from implicit device initialization, for the following runtime
6635 library functions, no callbacks will be invoked for shared-memory
6636 offloading devices (it's not clear if they should be):
6639 @item @code{acc_malloc}
6640 @item @code{acc_free}
6641 @item @code{acc_copyin}, @code{acc_present_or_copyin}, @code{acc_copyin_async}
6642 @item @code{acc_create}, @code{acc_present_or_create}, @code{acc_create_async}
6643 @item @code{acc_copyout}, @code{acc_copyout_async}, @code{acc_copyout_finalize}, @code{acc_copyout_finalize_async}
6644 @item @code{acc_delete}, @code{acc_delete_async}, @code{acc_delete_finalize}, @code{acc_delete_finalize_async}
6645 @item @code{acc_update_device}, @code{acc_update_device_async}
6646 @item @code{acc_update_self}, @code{acc_update_self_async}
6647 @item @code{acc_map_data}, @code{acc_unmap_data}
6648 @item @code{acc_memcpy_to_device}, @code{acc_memcpy_to_device_async}
6649 @item @code{acc_memcpy_from_device}, @code{acc_memcpy_from_device_async}
6652 @c ---------------------------------------------------------------------
6653 @c OpenMP-Implementation Specifics
6654 @c ---------------------------------------------------------------------
6656 @node OpenMP-Implementation Specifics
6657 @chapter OpenMP-Implementation Specifics
6660 * Implementation-defined ICV Initialization::
6661 * OpenMP Context Selectors::
6662 * Memory allocation::
6665 @node Implementation-defined ICV Initialization
6666 @section Implementation-defined ICV Initialization
6667 @cindex Implementation specific setting
6669 @multitable @columnfractions .30 .70
6670 @item @var{affinity-format-var} @tab See @ref{OMP_AFFINITY_FORMAT}.
6671 @item @var{def-allocator-var} @tab See @ref{OMP_ALLOCATOR}.
6672 @item @var{max-active-levels-var} @tab See @ref{OMP_MAX_ACTIVE_LEVELS}.
6673 @item @var{dyn-var} @tab See @ref{OMP_DYNAMIC}.
6674 @item @var{nthreads-var} @tab See @ref{OMP_NUM_THREADS}.
6675 @item @var{num-devices-var} @tab Number of non-host devices found
6676 by GCC's run-time library
6677 @item @var{num-procs-var} @tab The number of CPU cores on the
6678 initial device, except that affinity settings might lead to a
6679 smaller number. On non-host devices, the value of the
6680 @var{nthreads-var} ICV.
6681 @item @var{place-partition-var} @tab See @ref{OMP_PLACES}.
6682 @item @var{run-sched-var} @tab See @ref{OMP_SCHEDULE}.
6683 @item @var{stacksize-var} @tab See @ref{OMP_STACKSIZE}.
6684 @item @var{thread-limit-var} @tab See @ref{OMP_TEAMS_THREAD_LIMIT}
6685 @item @var{wait-policy-var} @tab See @ref{OMP_WAIT_POLICY} and
6686 @ref{GOMP_SPINCOUNT}
6689 @node OpenMP Context Selectors
6690 @section OpenMP Context Selectors
6692 @code{vendor} is always @code{gnu}. References are to the GCC manual.
6694 @c NOTE: Only the following selectors have been implemented. To add
6695 @c additional traits for target architecture, TARGET_OMP_DEVICE_KIND_ARCH_ISA
6696 @c has to be implemented; cf. also PR target/105640.
6697 @c For offload devices, add *additionally* gcc/config/*/t-omp-device.
6699 For the host compiler, @code{kind} always matches @code{host}, @code{cpu}
6700 and @code{any}; for the offloading architectures AMD GCN and Nvidia PTX,
6701 @code{kind} always matches @code{nohost}, @code{gpu} and @code{any}.
6702 For the x86 family of computers, AMD GCN and Nvidia PTX
6703 the following traits are supported in addition; while OpenMP is supported
6704 on more architectures, GCC currently does not match any @code{arch} or
6705 @code{isa} traits for those.
6707 @multitable @columnfractions .65 .30
6708 @headitem @code{arch} @tab @code{isa}
6709 @item @code{x86}, @code{x86_64}, @code{i386}, @code{i486},
6710 @code{i586}, @code{i686}, @code{ia32}
6711 @tab See @code{-m...} flags in ``x86 Options'' (without @code{-m})
6712 @item @code{amdgcn}, @code{gcn}
6713 @tab See @code{-march=} in ``AMD GCN Options''
6714 @item @code{nvptx}, @code{nvptx64}
6715 @tab See @code{-march=} in ``Nvidia PTX Options''
6718 @node Memory allocation
6719 @section Memory allocation
6721 The description below applies to:
6724 @item Explicit use of the OpenMP API routines, see
6725 @ref{Memory Management Routines}.
6726 @item The @code{allocate} clause, except when the @code{allocator} modifier is a
6727 constant expression with value @code{omp_default_mem_alloc} and no
6728 @code{align} modifier has been specified. (In that case, the normal
6729 @code{malloc} allocation is used.)
6730 @item The @code{allocate} directive for variables in static memory; while
6731 the alignment is honored, the normal static memory is used.
6732 @item Using the @code{allocate} directive for automatic/stack variables, except
6733 when the @code{allocator} clause is a constant expression with value
6734 @code{omp_default_mem_alloc} and no @code{align} clause has been
6735 specified. (In that case, the normal allocation is used: stack allocation
6736 and, sometimes for Fortran, also @code{malloc} [depending on flags such as
6737 @option{-fstack-arrays}].)
6738 @item In Fortran, the @code{allocators} directive and the executable
6739 @code{allocate} directive for Fortran pointers and allocatables is
6740 supported, but requires that files containing those directives has to be
6741 compiled with @option{-fopenmp-allocators}. Additionally, all files that
6742 might explicitly or implicitly deallocate memory allocated that way must
6743 also be compiled with that option.
6744 @item The used alignment is the maximum of the value the @code{align} clause
6745 and the alignment of the type after honoring, if present, the
6746 @code{aligned} (@code{GNU::aligned}) attribute and C's @code{_Alignas}
6747 and C++'s @code{alignas}. However, the @code{align} clause of the
6748 @code{allocate} directive has no effect on the value of C's
6749 @code{_Alignof} and C++'s @code{alignof}.
6752 For the available predefined allocators and, as applicable, their associated
6753 predefined memory spaces and for the available traits and their default values,
6754 see @ref{OMP_ALLOCATOR}. Predefined allocators without an associated memory
6755 space use the @code{omp_default_mem_space} memory space. See additionally
6756 @ref{Offload-Target Specifics}.
6758 For the memory spaces, the following applies:
6760 @item @code{omp_default_mem_space} is supported
6761 @item @code{omp_const_mem_space} maps to @code{omp_default_mem_space}
6762 @item @code{omp_low_lat_mem_space} is only available on supported devices,
6763 and maps to @code{omp_default_mem_space} otherwise.
6764 @item @code{omp_large_cap_mem_space} maps to @code{omp_default_mem_space},
6765 unless the memkind library is available
6766 @item @code{omp_high_bw_mem_space} maps to @code{omp_default_mem_space},
6767 unless the memkind library is available
6770 On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind
6771 library} (@code{libmemkind.so.0}) is available at runtime, it is used when
6772 creating memory allocators requesting
6775 @item the memory space @code{omp_high_bw_mem_space}
6776 @item the memory space @code{omp_large_cap_mem_space}
6777 @item the @code{partition} trait @code{interleaved}; note that for
6778 @code{omp_large_cap_mem_space} the allocation will not be interleaved
6781 On Linux systems, where the @uref{https://github.com/numactl/numactl, numa
6782 library} (@code{libnuma.so.1}) is available at runtime, it used when creating
6783 memory allocators requesting
6786 @item the @code{partition} trait @code{nearest}, except when both the
6787 libmemkind library is available and the memory space is either
6788 @code{omp_large_cap_mem_space} or @code{omp_high_bw_mem_space}
6791 Note that the numa library will round up the allocation size to a multiple of
6792 the system page size; therefore, consider using it only with large data or
6793 by sharing allocations via the @code{pool_size} trait. Furthermore, the Linux
6794 kernel does not guarantee that an allocation will always be on the nearest NUMA
6795 node nor that after reallocation the same node will be used. Note additionally
6796 that, on Linux, the default setting of the memory placement policy is to use the
6797 current node; therefore, unless the memory placement policy has been overridden,
6798 the @code{partition} trait @code{environment} (the default) will be effectively
6799 a @code{nearest} allocation.
6801 Additional notes regarding the traits:
6803 @item The @code{pinned} trait is supported on Linux hosts, but is subject to
6804 the OS @code{ulimit}/@code{rlimit} locked memory settings.
6805 @item The default for the @code{pool_size} trait is no pool and for every
6806 (re)allocation the associated library routine is called, which might
6807 internally use a memory pool.
6808 @item For the @code{partition} trait, the partition part size will be the same
6809 as the requested size (i.e. @code{interleaved} or @code{blocked} has no
6810 effect), except for @code{interleaved} when the memkind library is
6811 available. Furthermore, for @code{nearest} and unless the numa library
6812 is available, the memory might not be on the same NUMA node as thread
6813 that allocated the memory; on Linux, this is in particular the case when
6814 the memory placement policy is set to preferred.
6815 @item The @code{access} trait has no effect such that memory is always
6816 accessible by all threads.
6817 @item The @code{sync_hint} trait has no effect.
6821 @ref{Offload-Target Specifics}
6823 @c ---------------------------------------------------------------------
6824 @c Offload-Target Specifics
6825 @c ---------------------------------------------------------------------
6827 @node Offload-Target Specifics
6828 @chapter Offload-Target Specifics
6830 The following sections present notes on the offload-target specifics
6838 @section AMD Radeon (GCN)
6840 On the hardware side, there is the hierarchy (fine to coarse):
6842 @item work item (thread)
6845 @item compute unit (CU)
6848 All OpenMP and OpenACC levels are used, i.e.
6850 @item OpenMP's simd and OpenACC's vector map to work items (thread)
6851 @item OpenMP's threads (``parallel'') and OpenACC's workers map
6853 @item OpenMP's teams and OpenACC's gang use a threadpool with the
6854 size of the number of teams or gangs, respectively.
6859 @item Number of teams is the specified @code{num_teams} (OpenMP) or
6860 @code{num_gangs} (OpenACC) or otherwise the number of CU. It is limited
6861 by two times the number of CU.
6862 @item Number of wavefronts is 4 for gfx900 and 16 otherwise;
6863 @code{num_threads} (OpenMP) and @code{num_workers} (OpenACC)
6864 overrides this if smaller.
6865 @item The wavefront has 102 scalars and 64 vectors
6866 @item Number of workitems is always 64
6867 @item The hardware permits maximally 40 workgroups/CU and
6868 16 wavefronts/workgroup up to a limit of 40 wavefronts in total per CU.
6869 @item 80 scalars registers and 24 vector registers in non-kernel functions
6870 (the chosen procedure-calling API).
6871 @item For the kernel itself: as many as register pressure demands (number of
6872 teams and number of threads, scaled down if registers are exhausted)
6875 The implementation remark:
6877 @item I/O within OpenMP target regions and OpenACC compute regions is supported
6878 using the C library @code{printf} functions and the Fortran
6879 @code{print}/@code{write} statements.
6880 @item Reverse offload regions (i.e. @code{target} regions with
6881 @code{device(ancestor:1)}) are processed serially per @code{target} region
6882 such that the next reverse offload region is only executed after the previous
6884 @item OpenMP code that has a @code{requires} directive with
6885 @code{unified_shared_memory} is only supported if all AMD GPUs have the
6886 @code{HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT} property; for
6887 discrete GPUs, this may require setting the @code{HSA_XNACK} environment
6888 variable to @samp{1}; for systems with both an APU and a discrete GPU that
6889 does not support XNACK, consider using @code{ROCR_VISIBLE_DEVICES} to
6890 enable only the APU. If not supported, all AMD GPU devices are removed
6891 from the list of available devices (``host fallback'').
6892 @item The available stack size can be changed using the @code{GCN_STACK_SIZE}
6893 environment variable; the default is 32 kiB per thread.
6894 @item Low-latency memory (@code{omp_low_lat_mem_space}) is supported when the
6895 the @code{access} trait is set to @code{cgroup}. The default pool size
6896 is automatically scaled to share the 64 kiB LDS memory between the number
6897 of teams configured to run on each compute-unit, but may be adjusted at
6898 runtime by setting environment variable
6899 @code{GOMP_GCN_LOWLAT_POOL=@var{bytes}}.
6900 @item @code{omp_low_lat_mem_alloc} cannot be used with true low-latency memory
6901 because the definition implies the @code{omp_atv_all} trait; main
6902 graphics memory is used instead.
6903 @item @code{omp_cgroup_mem_alloc}, @code{omp_pteam_mem_alloc}, and
6904 @code{omp_thread_mem_alloc}, all use low-latency memory as first
6905 preference, and fall back to main graphics memory when the low-latency
6907 @item The unique identifier (UID), used with OpenMP's API UID routines, is the
6908 value returned by the HSA runtime library for @code{HSA_AMD_AGENT_INFO_UUID}.
6909 For GPUs, it is currently @samp{GPU-} followed by 16 lower-case hex digits,
6910 yielding a string like @code{GPU-f914a2142fc3413a}. The output matches
6911 the one used by @code{rocminfo}.
6919 On the hardware side, there is the hierarchy (fine to coarse):
6924 @item streaming multiprocessor
6927 All OpenMP and OpenACC levels are used, i.e.
6929 @item OpenMP's simd and OpenACC's vector map to threads
6930 @item OpenMP's threads (``parallel'') and OpenACC's workers map to warps
6931 @item OpenMP's teams and OpenACC's gang use a threadpool with the
6932 size of the number of teams or gangs, respectively.
6937 @item The @code{warp_size} is always 32
6938 @item CUDA kernel launched: @code{dim=@{#teams,1,1@}, blocks=@{#threads,warp_size,1@}}.
6939 @item The number of teams is limited by the number of blocks the device can
6940 host simultaneously.
6943 Additional information can be obtained by setting the environment variable to
6944 @code{GOMP_DEBUG=1} (very verbose; grep for @code{kernel.*launch} for launch
6947 GCC generates generic PTX ISA code, which is just-in-time compiled by CUDA,
6948 which caches the JIT in the user's directory (see CUDA documentation; can be
6949 tuned by the environment variables @code{CUDA_CACHE_@{DISABLE,MAXSIZE,PATH@}}.
6951 Note: While PTX ISA is generic, the @code{-mptx=} and @code{-march=} commandline
6952 options still affect the used PTX ISA code and, thus, the requirements on
6953 CUDA version and hardware.
6955 The implementation remark:
6957 @item I/O within OpenMP target regions and OpenACC compute regions is supported
6958 using the C library @code{printf} functions.
6959 Additionally, the Fortran @code{print}/@code{write} statements are
6960 supported within OpenMP target regions, but not yet within OpenACC compute
6961 regions. @c The latter needs 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'.
6962 @item Compilation OpenMP code that contains @code{requires reverse_offload}
6963 requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30}
6965 @item For code containing reverse offload (i.e. @code{target} regions with
6966 @code{device(ancestor:1)}), there is a slight performance penalty
6967 for @emph{all} target regions, consisting mostly of shutdown delay
6968 Per device, reverse offload regions are processed serially such that
6969 the next reverse offload region is only executed after the previous
6971 @item OpenMP code that has a @code{requires} directive with
6972 @code{unified_shared_memory} runs on nvptx devices if and only if
6973 all of those support the @code{pageableMemoryAccess} property;@footnote{
6974 @uref{https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements}}
6975 otherwise, all nvptx device are removed from the list of available
6976 devices (``host fallback'').
6977 @item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack}
6979 @item The OpenMP routines @code{omp_target_memcpy_rect} and
6980 @code{omp_target_memcpy_rect_async} and the @code{target update}
6981 directive for non-contiguous list items will use the 2D and 3D
6982 memory-copy functions of the CUDA library. Higher dimensions will
6983 call those functions in a loop and are therefore supported.
6984 @item Low-latency memory (@code{omp_low_lat_mem_space}) is supported when the
6985 the @code{access} trait is set to @code{cgroup}, and libgomp has
6986 been built for PTX ISA version 4.1 or higher (such as in GCC's
6987 default configuration). @c -mptx=4.1
6988 The default pool size
6989 is 8 kiB per team, but may be adjusted at runtime by setting environment
6990 variable @code{GOMP_NVPTX_LOWLAT_POOL=@var{bytes}}. The maximum value is
6991 limited by the available hardware, and care should be taken that the
6992 selected pool size does not unduly limit the number of teams that can
6994 @item @code{omp_low_lat_mem_alloc} cannot be used with true low-latency memory
6995 because the definition implies the @code{omp_atv_all} trait; main
6996 graphics memory is used instead.
6997 @item @code{omp_cgroup_mem_alloc}, @code{omp_pteam_mem_alloc}, and
6998 @code{omp_thread_mem_alloc}, all use low-latency memory as first
6999 preference, and fall back to main graphics memory when the low-latency
7001 @item The unique identifier (UID), used with OpenMP's API UID routines, consists
7002 of the @samp{GPU-} prefix followed by the 16-bytes UUID as returned by
7003 the CUDA runtime library. This UUID is output in grouped lower-case
7004 hex digits; the grouping of those 32 digits is: 8 digits, hyphen,
7005 4 digits, hyphen, 4 digits, hyphen, 16 digits. This leads to a string
7006 like @code{GPU-a8081c9e-f03e-18eb-1827-bf5ba95afa5d}. The output
7007 matches the format used by @code{nvidia-smi}.
7011 @c ---------------------------------------------------------------------
7013 @c ---------------------------------------------------------------------
7015 @node The libgomp ABI
7016 @chapter The libgomp ABI
7018 The following sections present notes on the external ABI as
7019 presented by libgomp. Only maintainers should need them.
7022 * Implementing MASTER construct::
7023 * Implementing CRITICAL construct::
7024 * Implementing ATOMIC construct::
7025 * Implementing FLUSH construct::
7026 * Implementing BARRIER construct::
7027 * Implementing THREADPRIVATE construct::
7028 * Implementing PRIVATE clause::
7029 * Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses::
7030 * Implementing REDUCTION clause::
7031 * Implementing PARALLEL construct::
7032 * Implementing FOR construct::
7033 * Implementing ORDERED construct::
7034 * Implementing SECTIONS construct::
7035 * Implementing SINGLE construct::
7036 * Implementing OpenACC's PARALLEL construct::
7040 @node Implementing MASTER construct
7041 @section Implementing MASTER construct
7044 if (omp_get_thread_num () == 0)
7048 Alternately, we generate two copies of the parallel subfunction
7049 and only include this in the version run by the primary thread.
7050 Surely this is not worthwhile though...
7054 @node Implementing CRITICAL construct
7055 @section Implementing CRITICAL construct
7057 Without a specified name,
7060 void GOMP_critical_start (void);
7061 void GOMP_critical_end (void);
7064 so that we don't get COPY relocations from libgomp to the main
7067 With a specified name, use omp_set_lock and omp_unset_lock with
7068 name being transformed into a variable declared like
7071 omp_lock_t gomp_critical_user_<name> __attribute__((common))
7074 Ideally the ABI would specify that all zero is a valid unlocked
7075 state, and so we wouldn't need to initialize this at
7080 @node Implementing ATOMIC construct
7081 @section Implementing ATOMIC construct
7083 The target should implement the @code{__sync} builtins.
7085 Failing that we could add
7088 void GOMP_atomic_enter (void)
7089 void GOMP_atomic_exit (void)
7092 which reuses the regular lock code, but with yet another lock
7093 object private to the library.
7097 @node Implementing FLUSH construct
7098 @section Implementing FLUSH construct
7100 Expands to the @code{__sync_synchronize} builtin.
7104 @node Implementing BARRIER construct
7105 @section Implementing BARRIER construct
7108 void GOMP_barrier (void)
7112 @node Implementing THREADPRIVATE construct
7113 @section Implementing THREADPRIVATE construct
7115 In _most_ cases we can map this directly to @code{__thread}. Except
7116 that OMP allows constructors for C++ objects. We can either
7117 refuse to support this (how often is it used?) or we can
7118 implement something akin to .ctors.
7120 Even more ideally, this ctor feature is handled by extensions
7121 to the main pthreads library. Failing that, we can have a set
7122 of entry points to register ctor functions to be called.
7126 @node Implementing PRIVATE clause
7127 @section Implementing PRIVATE clause
7129 In association with a PARALLEL, or within the lexical extent
7130 of a PARALLEL block, the variable becomes a local variable in
7131 the parallel subfunction.
7133 In association with FOR or SECTIONS blocks, create a new
7134 automatic variable within the current function. This preserves
7135 the semantic of new variable creation.
7139 @node Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
7140 @section Implementing FIRSTPRIVATE LASTPRIVATE COPYIN and COPYPRIVATE clauses
7142 This seems simple enough for PARALLEL blocks. Create a private
7143 struct for communicating between the parent and subfunction.
7144 In the parent, copy in values for scalar and "small" structs;
7145 copy in addresses for others TREE_ADDRESSABLE types. In the
7146 subfunction, copy the value into the local variable.
7148 It is not clear what to do with bare FOR or SECTION blocks.
7149 The only thing I can figure is that we do something like:
7152 #pragma omp for firstprivate(x) lastprivate(y)
7153 for (int i = 0; i < n; ++i)
7170 where the "x=x" and "y=y" assignments actually have different
7171 uids for the two variables, i.e. not something you could write
7172 directly in C. Presumably this only makes sense if the "outer"
7173 x and y are global variables.
7175 COPYPRIVATE would work the same way, except the structure
7176 broadcast would have to happen via SINGLE machinery instead.
7180 @node Implementing REDUCTION clause
7181 @section Implementing REDUCTION clause
7183 The private struct mentioned in the previous section should have
7184 a pointer to an array of the type of the variable, indexed by the
7185 thread's @var{team_id}. The thread stores its final value into the
7186 array, and after the barrier, the primary thread iterates over the
7187 array to collect the values.
7190 @node Implementing PARALLEL construct
7191 @section Implementing PARALLEL construct
7194 #pragma omp parallel
7203 void subfunction (void *data)
7210 GOMP_parallel_start (subfunction, &data, num_threads);
7211 subfunction (&data);
7212 GOMP_parallel_end ();
7216 void GOMP_parallel_start (void (*fn)(void *), void *data, unsigned num_threads)
7219 The @var{FN} argument is the subfunction to be run in parallel.
7221 The @var{DATA} argument is a pointer to a structure used to
7222 communicate data in and out of the subfunction, as discussed
7223 above with respect to FIRSTPRIVATE et al.
7225 The @var{NUM_THREADS} argument is 1 if an IF clause is present
7226 and false, or the value of the NUM_THREADS clause, if
7229 The function needs to create the appropriate number of
7230 threads and/or launch them from the dock. It needs to
7231 create the team structure and assign team ids.
7234 void GOMP_parallel_end (void)
7237 Tears down the team and returns us to the previous @code{omp_in_parallel()} state.
7241 @node Implementing FOR construct
7242 @section Implementing FOR construct
7245 #pragma omp parallel for
7246 for (i = lb; i <= ub; i++)
7253 void subfunction (void *data)
7256 while (GOMP_loop_static_next (&_s0, &_e0))
7259 for (i = _s0; i < _e1; i++)
7262 GOMP_loop_end_nowait ();
7265 GOMP_parallel_loop_static (subfunction, NULL, 0, lb, ub+1, 1, 0);
7267 GOMP_parallel_end ();
7271 #pragma omp for schedule(runtime)
7272 for (i = 0; i < n; i++)
7281 if (GOMP_loop_runtime_start (0, n, 1, &_s0, &_e0))
7284 for (i = _s0, i < _e0; i++)
7286 @} while (GOMP_loop_runtime_next (&_s0, _&e0));
7291 Note that while it looks like there is trickiness to propagating
7292 a non-constant STEP, there isn't really. We're explicitly allowed
7293 to evaluate it as many times as we want, and any variables involved
7294 should automatically be handled as PRIVATE or SHARED like any other
7295 variables. So the expression should remain evaluable in the
7296 subfunction. We can also pull it into a local variable if we like,
7297 but since its supposed to remain unchanged, we can also not if we like.
7299 If we have SCHEDULE(STATIC), and no ORDERED, then we ought to be
7300 able to get away with no work-sharing context at all, since we can
7301 simply perform the arithmetic directly in each thread to divide up
7302 the iterations. Which would mean that we wouldn't need to call any
7305 There are separate routines for handling loops with an ORDERED
7306 clause. Bookkeeping for that is non-trivial...
7310 @node Implementing ORDERED construct
7311 @section Implementing ORDERED construct
7314 void GOMP_ordered_start (void)
7315 void GOMP_ordered_end (void)
7320 @node Implementing SECTIONS construct
7321 @section Implementing SECTIONS construct
7326 #pragma omp sections
7340 for (i = GOMP_sections_start (3); i != 0; i = GOMP_sections_next ())
7357 @node Implementing SINGLE construct
7358 @section Implementing SINGLE construct
7372 if (GOMP_single_start ())
7380 #pragma omp single copyprivate(x)
7387 datap = GOMP_single_copy_start ();
7392 GOMP_single_copy_end (&data);
7401 @node Implementing OpenACC's PARALLEL construct
7402 @section Implementing OpenACC's PARALLEL construct
7405 void GOACC_parallel ()
7410 @c ---------------------------------------------------------------------
7412 @c ---------------------------------------------------------------------
7414 @node Reporting Bugs
7415 @chapter Reporting Bugs
7417 Bugs in the GNU Offloading and Multi Processing Runtime Library should
7418 be reported via @uref{https://gcc.gnu.org/bugzilla/, Bugzilla}. Please add
7419 "openacc", or "openmp", or both to the keywords field in the bug
7420 report, as appropriate.
7424 @c ---------------------------------------------------------------------
7425 @c GNU General Public License
7426 @c ---------------------------------------------------------------------
7428 @include gpl_v3.texi
7432 @c ---------------------------------------------------------------------
7433 @c GNU Free Documentation License
7434 @c ---------------------------------------------------------------------
7440 @c ---------------------------------------------------------------------
7441 @c Funding Free Software
7442 @c ---------------------------------------------------------------------
7444 @include funding.texi
7446 @c ---------------------------------------------------------------------
7448 @c ---------------------------------------------------------------------
7451 @unnumbered Library Index