docs/AMDGPUModifierSyntax.rst

   1 ======================================
   2 Syntax of AMDGPU Instruction Modifiers
   3 ======================================
   4
   5 .. contents::
   6    :local:
   7
   8 Conventions
   9 ===========
  10
  11 The following notation is used throughout this document:
  12
  13     =================== =============================================================
  14     Notation            Description
  15     =================== =============================================================
  16     {0..N}              Any integer value in the range from 0 to N (inclusive).
  17     <x>                 Syntax and meaning of *x* is explained elsewhere.
  18     =================== =============================================================
  19
  20 .. _amdgpu_syn_modifiers:
  21
  22 Modifiers
  23 =========
  24
  25 DS Modifiers
  26 ------------
  27
  28 .. _amdgpu_synid_ds_offset8:
  29
  30 offset8
  31 ~~~~~~~
  32
  33 Specifies an immediate unsigned 8-bit offset, in bytes. The default value is 0.
  34
  35 Used with DS instructions which have 2 addresses.
  36
  37     =================== =====================================================
  38     Syntax              Description
  39     =================== =====================================================
  40     offset:{0..0xFF}    Specifies an unsigned 8-bit offset as a positive
  41                         :ref:`integer number <amdgpu_synid_integer_number>`.
  42     =================== =====================================================
  43
  44 Examples:
  45
  46 .. parsed-literal::
  47
  48   offset:255
  49   offset:0xff
  50
  51 .. _amdgpu_synid_ds_offset16:
  52
  53 offset16
  54 ~~~~~~~~
  55
  56 Specifies an immediate unsigned 16-bit offset, in bytes. The default value is 0.
  57
  58 Used with DS instructions which have 1 address.
  59
  60     ==================== ======================================================
  61     Syntax               Description
  62     ==================== ======================================================
  63     offset:{0..0xFFFF}   Specifies an unsigned 16-bit offset as a positive
  64                          :ref:`integer number <amdgpu_synid_integer_number>`.
  65     ==================== ======================================================
  66
  67 Examples:
  68
  69 .. parsed-literal::
  70
  71   offset:65535
  72   offset:0xffff
  73
  74 .. _amdgpu_synid_sw_offset16:
  75
  76 pattern
  77 ~~~~~~~
  78
  79 This is a special modifier which may be used with *ds_swizzle_b32* instruction only.
  80 It specifies a swizzle pattern in numeric or symbolic form. The default value is 0.
  81
  82 See AMD documentation for more information.
  83
  84     ======================================================= ===========================================================
  85     Syntax                                                  Description
  86     ======================================================= ===========================================================
  87     offset:{0..0xFFFF}                                      Specifies a 16-bit swizzle pattern.
  88     offset:swizzle(QUAD_PERM,{0..3},{0..3},{0..3},{0..3})   Specifies a quad permute mode pattern
  89
  90                                                             Each number is a lane *id*.
  91     offset:swizzle(BITMASK_PERM, "<mask>")                  Specifies a bitmask permute mode pattern.
  92
  93                                                             The pattern converts a 5-bit lane *id* to another
  94                                                             lane *id* with which the lane interacts.
  95
  96                                                             *mask* is a 5 character sequence which
  97                                                             specifies how to transform the bits of the
  98                                                             lane *id*.
  99
 100                                                             The following characters are allowed:
 101
 102                                                             * "0" - set bit to 0.
 103
 104                                                             * "1" - set bit to 1.
 105
 106                                                             * "p" - preserve bit.
 107
 108                                                             * "i" - inverse bit.
 109
 110     offset:swizzle(BROADCAST,{2..32},{0..N})                Specifies a broadcast mode.
 111
 112                                                             Broadcasts the value of any particular lane to
 113                                                             all lanes in its group.
 114
 115                                                             The first numeric parameter is a group
 116                                                             size and must be equal to 2, 4, 8, 16 or 32.
 117
 118                                                             The second numeric parameter is an index of the
 119                                                             lane being broadcasted.
 120
 121                                                             The index must not exceed group size.
 122     offset:swizzle(SWAP,{1..16})                            Specifies a swap mode.
 123
 124                                                             Swaps the neighboring groups of
 125                                                             1, 2, 4, 8 or 16 lanes.
 126     offset:swizzle(REVERSE,{2..32})                         Specifies a reverse mode.
 127
 128                                                             Reverses the lanes for groups of 2, 4, 8, 16 or 32 lanes.
 129     ======================================================= ===========================================================
 130
 131 Numeric parameters may be specified as either :ref:`integer numbers<amdgpu_synid_integer_number>` or
 132 :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
 133
 134 Examples:
 135
 136 .. parsed-literal::
 137
 138   offset:255
 139   offset:0xffff
 140   offset:swizzle(QUAD_PERM, 0, 1, 2 ,3)
 141   offset:swizzle(BITMASK_PERM, "01pi0")
 142   offset:swizzle(BROADCAST, 2, 0)
 143   offset:swizzle(SWAP, 8)
 144   offset:swizzle(REVERSE, 30 + 2)
 145
 146 .. _amdgpu_synid_gds:
 147
 148 gds
 149 ~~~
 150
 151 Specifies whether to use GDS or LDS memory (LDS is the default).
 152
 153     ======================================== ================================================
 154     Syntax                                   Description
 155     ======================================== ================================================
 156     gds                                      Use GDS memory.
 157     ======================================== ================================================
 158
 159
 160 EXP Modifiers
 161 -------------
 162
 163 .. _amdgpu_synid_done:
 164
 165 done
 166 ~~~~
 167
 168 Specifies if this is the last export from the shader to the target. By default, current
 169 instruction does not finish an export sequence.
 170
 171     ======================================== ================================================
 172     Syntax                                   Description
 173     ======================================== ================================================
 174     done                                     Indicates the last export operation.
 175     ======================================== ================================================
 176
 177 .. _amdgpu_synid_compr:
 178
 179 compr
 180 ~~~~~
 181
 182 Indicates if the data are compressed (data are not compressed by default).
 183
 184     ======================================== ================================================
 185     Syntax                                   Description
 186     ======================================== ================================================
 187     compr                                    Data are compressed.
 188     ======================================== ================================================
 189
 190 .. _amdgpu_synid_vm:
 191
 192 vm
 193 ~~
 194
 195 Specifies valid mask flag state (off by default).
 196
 197     ======================================== ================================================
 198     Syntax                                   Description
 199     ======================================== ================================================
 200     vm                                       Set valid mask flag.
 201     ======================================== ================================================
 202
 203 FLAT Modifiers
 204 --------------
 205
 206 .. _amdgpu_synid_flat_offset12:
 207
 208 offset12
 209 ~~~~~~~~
 210
 211 Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
 212
 213 Cannot be used with *global/scratch* opcodes. GFX9 only.
 214
 215     ================= ======================================================
 216     Syntax            Description
 217     ================= ======================================================
 218     offset:{0..4095}  Specifies a 12-bit unsigned offset as a positive
 219                       :ref:`integer number <amdgpu_synid_integer_number>`.
 220     ================= ======================================================
 221
 222 Examples:
 223
 224 .. parsed-literal::
 225
 226   offset:4095
 227   offset:0xff
 228
 229 .. _amdgpu_synid_flat_offset13s:
 230
 231 offset13s
 232 ~~~~~~~~~
 233
 234 Specifies an immediate signed 13-bit offset, in bytes. The default value is 0.
 235
 236 Can be used with *global/scratch* opcodes only. GFX9 only.
 237
 238     ============================ =======================================================
 239     Syntax                       Description
 240     ============================ =======================================================
 241     offset:{-4096..4095}         Specifies a 13-bit signed offset as an
 242                                  :ref:`integer number <amdgpu_synid_integer_number>`.
 243     ============================ =======================================================
 244
 245 Examples:
 246
 247 .. parsed-literal::
 248
 249   offset:-4000
 250   offset:0x10
 251
 252 glc
 253 ~~~
 254
 255 See a description :ref:`here<amdgpu_synid_glc>`.
 256
 257 slc
 258 ~~~
 259
 260 See a description :ref:`here<amdgpu_synid_slc>`.
 261
 262 tfe
 263 ~~~
 264
 265 See a description :ref:`here<amdgpu_synid_tfe>`.
 266
 267 nv
 268 ~~
 269
 270 See a description :ref:`here<amdgpu_synid_nv>`.
 271
 272 MIMG Modifiers
 273 --------------
 274
 275 .. _amdgpu_synid_dmask:
 276
 277 dmask
 278 ~~~~~
 279
 280 Specifies which channels (image components) are used by the operation. By default, no channels
 281 are used.
 282
 283     =============== =====================================================
 284     Syntax          Description
 285     =============== =====================================================
 286     dmask:{0..15}   Specifies image channels as a positive
 287                     :ref:`integer number <amdgpu_synid_integer_number>`.
 288
 289                     Each bit corresponds to one of 4 image
 290                     components (RGBA).
 291
 292                     If the specified bit value
 293                     is 0, the component is not used, value 1 means
 294                     that the component is used.
 295     =============== =====================================================
 296
 297 This modifier has some limitations depending on instruction kind:
 298
 299     =================================================== ========================
 300     Instruction Kind                                    Valid dmask Values
 301     =================================================== ========================
 302     32-bit atomic *cmpswap*                             0x3
 303     32-bit atomic instructions except for *cmpswap*     0x1
 304     64-bit atomic *cmpswap*                             0xF
 305     64-bit atomic instructions except for *cmpswap*     0x3
 306     *gather4*                                           0x1, 0x2, 0x4, 0x8
 307     Other instructions                                  any value
 308     =================================================== ========================
 309
 310 Examples:
 311
 312 .. parsed-literal::
 313
 314   dmask:0xf
 315   dmask:0b1111
 316   dmask:3
 317
 318 .. _amdgpu_synid_unorm:
 319
 320 unorm
 321 ~~~~~
 322
 323 Specifies whether the address is normalized or not (the address is normalized by default).
 324
 325     ======================== ========================================
 326     Syntax                   Description
 327     ======================== ========================================
 328     unorm                    Force the address to be unnormalized.
 329     ======================== ========================================
 330
 331 glc
 332 ~~~
 333
 334 See a description :ref:`here<amdgpu_synid_glc>`.
 335
 336 slc
 337 ~~~
 338
 339 See a description :ref:`here<amdgpu_synid_slc>`.
 340
 341 .. _amdgpu_synid_r128:
 342
 343 r128
 344 ~~~~
 345
 346 Specifies texture resource size. The default size is 256 bits.
 347
 348 GFX7 and GFX8 only.
 349
 350     =================== ================================================
 351     Syntax              Description
 352     =================== ================================================
 353     r128                Specifies 128 bits texture resource size.
 354     =================== ================================================
 355
 356 .. WARNING:: Using this modifier should descrease *rsrc* operand size from 8 to 4 dwords, but assembler does not currently support this feature.
 357
 358 tfe
 359 ~~~
 360
 361 See a description :ref:`here<amdgpu_synid_tfe>`.
 362
 363 .. _amdgpu_synid_lwe:
 364
 365 lwe
 366 ~~~
 367
 368 Specifies LOD warning status (LOD warning is disabled by default).
 369
 370     ======================================== ================================================
 371     Syntax                                   Description
 372     ======================================== ================================================
 373     lwe                                      Enables LOD warning.
 374     ======================================== ================================================
 375
 376 .. _amdgpu_synid_da:
 377
 378 da
 379 ~~
 380
 381 Specifies if an array index must be sent to TA. By default, array index is not sent.
 382
 383     ======================================== ================================================
 384     Syntax                                   Description
 385     ======================================== ================================================
 386     da                                       Send an array-index to TA.
 387     ======================================== ================================================
 388
 389 .. _amdgpu_synid_d16:
 390
 391 d16
 392 ~~~
 393
 394 Specifies data size: 16 or 32 bits (32 bits by default). Not supported by GFX7.
 395
 396     ======================================== ================================================
 397     Syntax                                   Description
 398     ======================================== ================================================
 399     d16                                      Enables 16-bits data mode.
 400
 401                                              On loads, convert data in memory to 16-bit
 402                                              format before storing it in VGPRs.
 403
 404                                              For stores, convert 16-bit data in VGPRs to
 405                                              32 bits before going to memory.
 406
 407                                              Note that GFX8.0 does not support data packing.
 408                                              Each 16-bit data element occupies 1 VGPR.
 409
 410                                              GFX8.1 and GFX9 support data packing.
 411                                              Each pair of 16-bit data elements
 412                                              occupies 1 VGPR.
 413     ======================================== ================================================
 414
 415 .. _amdgpu_synid_a16:
 416
 417 a16
 418 ~~~
 419
 420 Specifies size of image address components: 16 or 32 bits (32 bits by default). GFX9 only.
 421
 422     ======================================== ================================================
 423     Syntax                                   Description
 424     ======================================== ================================================
 425     a16                                      Enables 16-bits image address components.
 426     ======================================== ================================================
 427
 428 Miscellaneous Modifiers
 429 -----------------------
 430
 431 .. _amdgpu_synid_glc:
 432
 433 glc
 434 ~~~
 435
 436 This modifier has different meaning for loads, stores, and atomic operations.
 437 The default value is off (0).
 438
 439 See AMD documentation for details.
 440
 441     ======================================== ================================================
 442     Syntax                                   Description
 443     ======================================== ================================================
 444     glc                                      Set glc bit to 1.
 445     ======================================== ================================================
 446
 447 .. _amdgpu_synid_slc:
 448
 449 slc
 450 ~~~
 451
 452 Specifies cache policy. The default value is off (0).
 453
 454 See AMD documentation for details.
 455
 456     ======================================== ================================================
 457     Syntax                                   Description
 458     ======================================== ================================================
 459     slc                                      Set slc bit to 1.
 460     ======================================== ================================================
 461
 462 .. _amdgpu_synid_tfe:
 463
 464 tfe
 465 ~~~
 466
 467 Controls access to partially resident textures. The default value is off (0).
 468
 469 See AMD documentation for details.
 470
 471     ======================================== ================================================
 472     Syntax                                   Description
 473     ======================================== ================================================
 474     tfe                                      Set tfe bit to 1.
 475     ======================================== ================================================
 476
 477 .. _amdgpu_synid_nv:
 478
 479 nv
 480 ~~
 481
 482 Specifies if instruction is operating on non-volatile memory. By default, memory is volatile.
 483
 484 GFX9 only.
 485
 486     ======================================== ================================================
 487     Syntax                                   Description
 488     ======================================== ================================================
 489     nv                                       Indicates that instruction operates on
 490                                              non-volatile memory.
 491     ======================================== ================================================
 492
 493 MUBUF/MTBUF Modifiers
 494 ---------------------
 495
 496 .. _amdgpu_synid_idxen:
 497
 498 idxen
 499 ~~~~~
 500
 501 Specifies whether address components include an index. By default, no components are used.
 502
 503 Can be used together with :ref:`offen<amdgpu_synid_offen>`.
 504
 505 Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
 506
 507     ======================================== ================================================
 508     Syntax                                   Description
 509     ======================================== ================================================
 510     idxen                                    Address components include an index.
 511     ======================================== ================================================
 512
 513 .. _amdgpu_synid_offen:
 514
 515 offen
 516 ~~~~~
 517
 518 Specifies whether address components include an offset. By default, no components are used.
 519
 520 Can be used together with :ref:`idxen<amdgpu_synid_idxen>`.
 521
 522 Cannot be used with :ref:`addr64<amdgpu_synid_addr64>`.
 523
 524     ======================================== ================================================
 525     Syntax                                   Description
 526     ======================================== ================================================
 527     offen                                    Address components include an offset.
 528     ======================================== ================================================
 529
 530 .. _amdgpu_synid_addr64:
 531
 532 addr64
 533 ~~~~~~
 534
 535 Specifies whether a 64-bit address is used. By default, no address is used.
 536
 537 GFX7 only. Cannot be used with :ref:`offen<amdgpu_synid_offen>` and
 538 :ref:`idxen<amdgpu_synid_idxen>` modifiers.
 539
 540     ======================================== ================================================
 541     Syntax                                   Description
 542     ======================================== ================================================
 543     addr64                                   A 64-bit address is used.
 544     ======================================== ================================================
 545
 546 .. _amdgpu_synid_buf_offset12:
 547
 548 offset12
 549 ~~~~~~~~
 550
 551 Specifies an immediate unsigned 12-bit offset, in bytes. The default value is 0.
 552
 553     =============================== ======================================================
 554     Syntax                          Description
 555     =============================== ======================================================
 556     offset:{0..0xFFF}               Specifies a 12-bit unsigned offset as a positive
 557                                     :ref:`integer number <amdgpu_synid_integer_number>`.
 558     =============================== ======================================================
 559
 560 Examples:
 561
 562 .. parsed-literal::
 563
 564   offset:0
 565   offset:0x10
 566
 567 glc
 568 ~~~
 569
 570 See a description :ref:`here<amdgpu_synid_glc>`.
 571
 572 slc
 573 ~~~
 574
 575 See a description :ref:`here<amdgpu_synid_slc>`.
 576
 577 .. _amdgpu_synid_lds:
 578
 579 lds
 580 ~~~
 581
 582 Specifies where to store the result: VGPRs or LDS (VGPRs by default).
 583
 584     ======================================== ===========================
 585     Syntax                                   Description
 586     ======================================== ===========================
 587     lds                                      Store result in LDS.
 588     ======================================== ===========================
 589
 590 tfe
 591 ~~~
 592
 593 See a description :ref:`here<amdgpu_synid_tfe>`.
 594
 595 .. _amdgpu_synid_dfmt:
 596
 597 dfmt
 598 ~~~~
 599
 600 TBD
 601
 602 .. _amdgpu_synid_nfmt:
 603
 604 nfmt
 605 ~~~~
 606
 607 TBD
 608
 609 SMRD/SMEM Modifiers
 610 -------------------
 611
 612 glc
 613 ~~~
 614
 615 See a description :ref:`here<amdgpu_synid_glc>`.
 616
 617 nv
 618 ~~
 619
 620 See a description :ref:`here<amdgpu_synid_nv>`.
 621
 622 VINTRP Modifiers
 623 ----------------
 624
 625 .. _amdgpu_synid_high:
 626
 627 high
 628 ~~~~
 629
 630 Specifies which half of the LDS word to use. Low half of LDS word is used by default.
 631 GFX9 only.
 632
 633     ======================================== ================================
 634     Syntax                                   Description
 635     ======================================== ================================
 636     high                                     Use high half of LDS word.
 637     ======================================== ================================
 638
 639 VOP1/VOP2 DPP Modifiers
 640 -----------------------
 641
 642 GFX8 and GFX9 only.
 643
 644 .. _amdgpu_synid_dpp_ctrl:
 645
 646 dpp_ctrl
 647 ~~~~~~~~
 648
 649 Specifies how data are shared between threads. This is a mandatory modifier.
 650 There is no default value.
 651
 652 Note. The lanes of a wavefront are organized in four banks and four rows.
 653
 654     ======================================== ================================================
 655     Syntax                                   Description
 656     ======================================== ================================================
 657     quad_perm:[{0..3},{0..3},{0..3},{0..3}]  Full permute of 4 threads.
 658     row_mirror                               Mirror threads within row.
 659     row_half_mirror                          Mirror threads within 1/2 row (8 threads).
 660     row_bcast:15                             Broadcast 15th thread of each row to next row.
 661     row_bcast:31                             Broadcast thread 31 to rows 2 and 3.
 662     wave_shl:1                               Wavefront left shift by 1 thread.
 663     wave_rol:1                               Wavefront left rotate by 1 thread.
 664     wave_shr:1                               Wavefront right shift by 1 thread.
 665     wave_ror:1                               Wavefront right rotate by 1 thread.
 666     row_shl:{1..15}                          Row shift left by 1-15 threads.
 667     row_shr:{1..15}                          Row shift right by 1-15 threads.
 668     row_ror:{1..15}                          Row rotate right by 1-15 threads.
 669     ======================================== ================================================
 670
 671 Note: Numeric parameters may be specified as either
 672 :ref:`integer numbers<amdgpu_synid_integer_number>` or
 673 :ref:`absolute expressions<amdgpu_synid_absolute_expression>`.
 674
 675 Examples:
 676
 677 .. parsed-literal::
 678
 679   quad_perm:[0, 1, 2, 3]
 680   row_shl:3
 681
 682 .. _amdgpu_synid_row_mask:
 683
 684 row_mask
 685 ~~~~~~~~
 686
 687 Controls which rows are enabled for data sharing. By default, all rows are enabled.
 688
 689 Note. The lanes of a wavefront are organized in four banks and four rows.
 690
 691     ======================================== =====================================================
 692     Syntax                                   Description
 693     ======================================== =====================================================
 694     row_mask:{0..15}                         Specifies a *row mask* as a positive
 695                                              :ref:`integer number <amdgpu_synid_integer_number>`.
 696
 697                                              Each of 4 bits in the mask controls one
 698                                              row (0 - disabled, 1 - enabled).
 699     ======================================== =====================================================
 700
 701 Examples:
 702
 703 .. parsed-literal::
 704
 705   row_mask:0xf
 706   row_mask:0b1010
 707   row_mask:0b1111
 708
 709 .. _amdgpu_synid_bank_mask:
 710
 711 bank_mask
 712 ~~~~~~~~~
 713
 714 Controls which banks are enabled for data sharing. By default, all banks are enabled.
 715
 716 Note. The lanes of a wavefront are organized in four banks and four rows.
 717
 718     ======================================== =======================================================
 719     Syntax                                   Description
 720     ======================================== =======================================================
 721     bank_mask:{0..15}                        Specifies a *bank mask* as a positive
 722                                              :ref:`integer number <amdgpu_synid_integer_number>`.
 723
 724                                              Each of 4 bits in the mask controls one
 725                                              bank (0 - disabled, 1 - enabled).
 726     ======================================== =======================================================
 727
 728 Examples:
 729
 730 .. parsed-literal::
 731
 732   bank_mask:0x3
 733   bank_mask:0b0011
 734   bank_mask:0b1111
 735
 736 .. _amdgpu_synid_bound_ctrl:
 737
 738 bound_ctrl
 739 ~~~~~~~~~~
 740
 741 Controls data sharing when accessing an invalid lane. By default, data sharing with
 742 invalid lanes is disabled.
 743
 744     ======================================== ================================================
 745     Syntax                                   Description
 746     ======================================== ================================================
 747     bound_ctrl:0                             Enables data sharing with invalid lanes.
 748
 749                                              Accessing data from an invalid lane will
 750                                              return zero.
 751     ======================================== ================================================
 752
 753 VOP1/VOP2/VOPC SDWA Modifiers
 754 -----------------------------
 755
 756 GFX8 and GFX9 only.
 757
 758 clamp
 759 ~~~~~
 760
 761 See a description :ref:`here<amdgpu_synid_clamp>`.
 762
 763 omod
 764 ~~~~
 765
 766 See a description :ref:`here<amdgpu_synid_omod>`.
 767
 768 GFX9 only.
 769
 770 .. _amdgpu_synid_dst_sel:
 771
 772 dst_sel
 773 ~~~~~~~
 774
 775 Selects which bits in the destination are affected. By default, all bits are affected.
 776
 777     ======================================== ================================================
 778     Syntax                                   Description
 779     ======================================== ================================================
 780     dst_sel:DWORD                            Use bits 31:0.
 781     dst_sel:BYTE_0                           Use bits 7:0.
 782     dst_sel:BYTE_1                           Use bits 15:8.
 783     dst_sel:BYTE_2                           Use bits 23:16.
 784     dst_sel:BYTE_3                           Use bits 31:24.
 785     dst_sel:WORD_0                           Use bits 15:0.
 786     dst_sel:WORD_1                           Use bits 31:16.
 787     ======================================== ================================================
 788
 789
 790 .. _amdgpu_synid_dst_unused:
 791
 792 dst_unused
 793 ~~~~~~~~~~
 794
 795 Controls what to do with the bits in the destination which are not selected
 796 by :ref:`dst_sel<amdgpu_synid_dst_sel>`.
 797 By default, unused bits are preserved.
 798
 799     ======================================== ================================================
 800     Syntax                                   Description
 801     ======================================== ================================================
 802     dst_unused:UNUSED_PAD                    Pad with zeros.
 803     dst_unused:UNUSED_SEXT                   Sign-extend upper bits, zero lower bits.
 804     dst_unused:UNUSED_PRESERVE               Preserve bits.
 805     ======================================== ================================================
 806
 807 .. _amdgpu_synid_src0_sel:
 808
 809 src0_sel
 810 ~~~~~~~~
 811
 812 Controls which bits in the src0 are used. By default, all bits are used.
 813
 814     ======================================== ================================================
 815     Syntax                                   Description
 816     ======================================== ================================================
 817     src0_sel:DWORD                           Use bits 31:0.
 818     src0_sel:BYTE_0                          Use bits 7:0.
 819     src0_sel:BYTE_1                          Use bits 15:8.
 820     src0_sel:BYTE_2                          Use bits 23:16.
 821     src0_sel:BYTE_3                          Use bits 31:24.
 822     src0_sel:WORD_0                          Use bits 15:0.
 823     src0_sel:WORD_1                          Use bits 31:16.
 824     ======================================== ================================================
 825
 826 .. _amdgpu_synid_src1_sel:
 827
 828 src1_sel
 829 ~~~~~~~~
 830
 831 Controls which bits in the src1 are used. By default, all bits are used.
 832
 833     ======================================== ================================================
 834     Syntax                                   Description
 835     ======================================== ================================================
 836     src1_sel:DWORD                           Use bits 31:0.
 837     src1_sel:BYTE_0                          Use bits 7:0.
 838     src1_sel:BYTE_1                          Use bits 15:8.
 839     src1_sel:BYTE_2                          Use bits 23:16.
 840     src1_sel:BYTE_3                          Use bits 31:24.
 841     src1_sel:WORD_0                          Use bits 15:0.
 842     src1_sel:WORD_1                          Use bits 31:16.
 843     ======================================== ================================================
 844
 845 .. _amdgpu_synid_sdwa_operand_modifiers:
 846
 847 VOP1/VOP2/VOPC SDWA Operand Modifiers
 848 -------------------------------------
 849
 850 Operand modifiers are not used separately. They are applied to source operands.
 851
 852 GFX8 and GFX9 only.
 853
 854 abs
 855 ~~~
 856
 857 See a description :ref:`here<amdgpu_synid_abs>`.
 858
 859 neg
 860 ~~~
 861
 862 See a description :ref:`here<amdgpu_synid_neg>`.
 863
 864 .. _amdgpu_synid_sext:
 865
 866 sext
 867 ~~~~
 868
 869 Sign-extends value of a (sub-dword) operand to fill all 32 bits.
 870 Has no effect for 32-bit operands.
 871
 872 Valid for integer operands only.
 873
 874     ======================================== ================================================
 875     Syntax                                   Description
 876     ======================================== ================================================
 877     sext(<operand>)                          Sign-extend operand value.
 878     ======================================== ================================================
 879
 880 Examples:
 881
 882 .. parsed-literal::
 883
 884   sext(v4)
 885   sext(v255)
 886
 887 VOP3 Modifiers
 888 --------------
 889
 890 .. _amdgpu_synid_vop3_op_sel:
 891
 892 op_sel
 893 ~~~~~~
 894
 895 Selects the low [15:0] or high [31:16] operand bits for source and destination operands.
 896 By default, low bits are used for all operands.
 897
 898 The number of values specified with the op_sel modifier must match the number of instruction
 899 operands (both source and destination). First value controls src0, second value controls src1
 900 and so on, except that the last value controls destination.
 901 The value 0 selects the low bits, while 1 selects the high bits.
 902
 903 Note. op_sel modifier affects 16-bit operands only. For 32-bit operands the value specified
 904 by op_sel must be 0.
 905
 906 GFX9 only.
 907
 908     ======================================== ============================================================
 909     Syntax                                   Description
 910     ======================================== ============================================================
 911     op_sel:[{0..1},{0..1}]                   Select operand bits for instructions with 1 source operand.
 912     op_sel:[{0..1},{0..1},{0..1}]            Select operand bits for instructions with 2 source operands.
 913     op_sel:[{0..1},{0..1},{0..1},{0..1}]     Select operand bits for instructions with 3 source operands.
 914     ======================================== ============================================================
 915
 916 Examples:
 917
 918 .. parsed-literal::
 919
 920   op_sel:[0,0]
 921   op_sel:[0,1]
 922
 923 .. _amdgpu_synid_clamp:
 924
 925 clamp
 926 ~~~~~
 927
 928 Clamp meaning depends on instruction.
 929
 930 For *v_cmp* instructions, clamp modifier indicates that the compare signals
 931 if a floating point exception occurs. By default, signaling is disabled.
 932 Not supported by GFX7.
 933
 934 For integer operations, clamp modifier indicates that the result must be clamped
 935 to the largest and smallest representable value. By default, there is no clamping.
 936 Integer clamping is not supported by GFX7.
 937
 938 For floating point operations, clamp modifier indicates that the result must be clamped
 939 to the range [0.0, 1.0]. By default, there is no clamping.
 940
 941 Note. Clamp modifier is applied after :ref:`output modifiers<amdgpu_synid_omod>` (if any).
 942
 943     ======================================== ================================================
 944     Syntax                                   Description
 945     ======================================== ================================================
 946     clamp                                    Enables clamping (or signaling).
 947     ======================================== ================================================
 948
 949 .. _amdgpu_synid_omod:
 950
 951 omod
 952 ~~~~
 953
 954 Specifies if an output modifier must be applied to the result.
 955 By default, no output modifiers are applied.
 956
 957 Note. Output modifiers are applied before :ref:`clamping<amdgpu_synid_clamp>` (if any).
 958
 959 Output modifiers are valid for f32 and f64 floating point results only.
 960 They must not be used with f16.
 961
 962 Note. *v_cvt_f16_f32* is an exception. This instruction produces f16 result
 963 but accepts output modifiers.
 964
 965     ======================================== ================================================
 966     Syntax                                   Description
 967     ======================================== ================================================
 968     mul:2                                    Multiply the result by 2.
 969     mul:4                                    Multiply the result by 4.
 970     div:2                                    Multiply the result by 0.5.
 971     ======================================== ================================================
 972
 973 .. _amdgpu_synid_vop3_operand_modifiers:
 974
 975 VOP3 Operand Modifiers
 976 ----------------------
 977
 978 Operand modifiers are not used separately. They are applied to source operands.
 979
 980 .. _amdgpu_synid_abs:
 981
 982 abs
 983 ~~~
 984
 985 Computes absolute value of its operand. Applied before :ref:`neg<amdgpu_synid_neg>` (if any).
 986 Valid for floating point operands only.
 987
 988     ======================================== ================================================
 989     Syntax                                   Description
 990     ======================================== ================================================
 991     abs(<operand>)                           Get absolute value of operand.
 992     \|<operand>|                             The same as above.
 993     ======================================== ================================================
 994
 995 Examples:
 996
 997 .. parsed-literal::
 998
 999   abs(v36)
1000   \|v36|
1001
1002 .. _amdgpu_synid_neg:
1003
1004 neg
1005 ~~~
1006
1007 Computes negative value of its operand. Applied after :ref:`abs<amdgpu_synid_abs>` (if any).
1008 Valid for floating point operands only.
1009
1010     ======================================== ================================================
1011     Syntax                                   Description
1012     ======================================== ================================================
1013     neg(<operand>)                           Get negative value of operand.
1014     -<operand>                               The same as above.
1015     ======================================== ================================================
1016
1017 Examples:
1018
1019 .. parsed-literal::
1020
1021   neg(v[0])
1022   -v4
1023
1024 VOP3P Modifiers
1025 ---------------
1026
1027 This section describes modifiers of *regular* VOP3P instructions.
1028
1029 *v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16*
1030 instructions use these modifiers :ref:`in a special manner<amdgpu_synid_mad_mix>`.
1031
1032 GFX9 only.
1033
1034 .. _amdgpu_synid_op_sel:
1035
1036 op_sel
1037 ~~~~~~
1038
1039 Selects the low [15:0] or high [31:16] operand bits as input to the operation
1040 which results in the lower-half of the destination.
1041 By default, low bits are used for all operands.
1042
1043 The number of values specified by the *op_sel* modifier must match the number of source
1044 operands. First value controls src0, second value controls src1 and so on.
1045
1046 The value 0 selects the low bits, while 1 selects the high bits.
1047
1048     ================================= =============================================================
1049     Syntax                            Description
1050     ================================= =============================================================
1051     op_sel:[{0..1}]                   Select operand bits for instructions with 1 source operand.
1052     op_sel:[{0..1},{0..1}]            Select operand bits for instructions with 2 source operands.
1053     op_sel:[{0..1},{0..1},{0..1}]     Select operand bits for instructions with 3 source operands.
1054     ================================= =============================================================
1055
1056 Examples:
1057
1058 .. parsed-literal::
1059
1060   op_sel:[0,0]
1061   op_sel:[0,1,0]
1062
1063 .. _amdgpu_synid_op_sel_hi:
1064
1065 op_sel_hi
1066 ~~~~~~~~~
1067
1068 Selects the low [15:0] or high [31:16] operand bits as input to the operation
1069 which results in the upper-half of the destination.
1070 By default, high bits are used for all operands.
1071
1072 The number of values specified by the *op_sel_hi* modifier must match the number of source
1073 operands. First value controls src0, second value controls src1 and so on.
1074
1075 The value 0 selects the low bits, while 1 selects the high bits.
1076
1077     =================================== =============================================================
1078     Syntax                              Description
1079     =================================== =============================================================
1080     op_sel_hi:[{0..1}]                  Select operand bits for instructions with 1 source operand.
1081     op_sel_hi:[{0..1},{0..1}]           Select operand bits for instructions with 2 source operands.
1082     op_sel_hi:[{0..1},{0..1},{0..1}]    Select operand bits for instructions with 3 source operands.
1083     =================================== =============================================================
1084
1085 Examples:
1086
1087 .. parsed-literal::
1088
1089   op_sel_hi:[0,0]
1090   op_sel_hi:[0,0,1]
1091
1092 .. _amdgpu_synid_neg_lo:
1093
1094 neg_lo
1095 ~~~~~~
1096
1097 Specifies whether to change sign of operand values selected by
1098 :ref:`op_sel<amdgpu_synid_op_sel>`. These values are then used
1099 as input to the operation which results in the upper-half of the destination.
1100
1101 The number of values specified by this modifier must match the number of source
1102 operands. First value controls src0, second value controls src1 and so on.
1103
1104 The value 0 indicates that the corresponding operand value is used unmodified,
1105 the value 1 indicates that negative value of the operand must be used.
1106
1107 By default, operand values are used unmodified.
1108
1109 This modifier is valid for floating point operands only.
1110
1111     ================================ ==================================================================
1112     Syntax                           Description
1113     ================================ ==================================================================
1114     neg_lo:[{0..1}]                  Select affected operands for instructions with 1 source operand.
1115     neg_lo:[{0..1},{0..1}]           Select affected operands for instructions with 2 source operands.
1116     neg_lo:[{0..1},{0..1},{0..1}]    Select affected operands for instructions with 3 source operands.
1117     ================================ ==================================================================
1118
1119 Examples:
1120
1121 .. parsed-literal::
1122
1123   neg_lo:[0]
1124   neg_lo:[0,1]
1125
1126 .. _amdgpu_synid_neg_hi:
1127
1128 neg_hi
1129 ~~~~~~
1130
1131 Specifies whether to change sign of operand values selected by
1132 :ref:`op_sel_hi<amdgpu_synid_op_sel_hi>`. These values are then used
1133 as input to the operation which results in the upper-half of the destination.
1134
1135 The number of values specified by this modifier must match the number of source
1136 operands. First value controls src0, second value controls src1 and so on.
1137
1138 The value 0 indicates that the corresponding operand value is used unmodified,
1139 the value 1 indicates that negative value of the operand must be used.
1140
1141 By default, operand values are used unmodified.
1142
1143 This modifier is valid for floating point operands only.
1144
1145     =============================== ==================================================================
1146     Syntax                          Description
1147     =============================== ==================================================================
1148     neg_hi:[{0..1}]                 Select affected operands for instructions with 1 source operand.
1149     neg_hi:[{0..1},{0..1}]          Select affected operands for instructions with 2 source operands.
1150     neg_hi:[{0..1},{0..1},{0..1}]   Select affected operands for instructions with 3 source operands.
1151     =============================== ==================================================================
1152
1153 Examples:
1154
1155 .. parsed-literal::
1156
1157   neg_hi:[1,0]
1158   neg_hi:[0,1,1]
1159
1160 clamp
1161 ~~~~~
1162
1163 See a description :ref:`here<amdgpu_synid_clamp>`.
1164
1165 .. _amdgpu_synid_mad_mix:
1166
1167 VOP3P V_MAD_MIX Modifiers
1168 -------------------------
1169
1170 *v_mad_mix_f32*, *v_mad_mixhi_f16* and *v_mad_mixlo_f16* instructions
1171 use *op_sel* and *op_sel_hi* modifiers
1172 in a manner different from *regular* VOP3P instructions.
1173
1174 See a description below.
1175
1176 GFX9 only.
1177
1178 .. _amdgpu_synid_mad_mix_op_sel:
1179
1180 m_op_sel
1181 ~~~~~~~~
1182
1183 This operand has meaning only for 16-bit source operands as indicated by
1184 :ref:`m_op_sel_hi<amdgpu_synid_mad_mix_op_sel_hi>`.
1185 It specifies to select either the low [15:0] or high [31:16] operand bits
1186 as input to the operation.
1187
1188 The number of values specified by the *op_sel* modifier must match the number of source
1189 operands. First value controls src0, second value controls src1 and so on.
1190
1191 The value 0 indicates the low bits, the value 1 indicates the high 16 bits.
1192
1193 By default, low bits are used for all operands.
1194
1195     =============================== ================================================
1196     Syntax                          Description
1197     =============================== ================================================
1198     op_sel:[{0..1},{0..1},{0..1}]   Select location of each 16-bit source operand.
1199     =============================== ================================================
1200
1201 Examples:
1202
1203 .. parsed-literal::
1204
1205   op_sel:[0,1]
1206
1207 .. _amdgpu_synid_mad_mix_op_sel_hi:
1208
1209 m_op_sel_hi
1210 ~~~~~~~~~~~
1211
1212 Selects the size of source operands: either 32 bits or 16 bits.
1213 By default, 32 bits are used for all source operands.
1214
1215 The number of values specified by the *op_sel_hi* modifier must match the number of source
1216 operands. First value controls src0, second value controls src1 and so on.
1217
1218 The value 0 indicates 32 bits, the value 1 indicates 16 bits.
1219
1220 The location of 16 bits in the operand may be specified by
1221 :ref:`m_op_sel<amdgpu_synid_mad_mix_op_sel>`.
1222
1223     ======================================== ====================================
1224     Syntax                                   Description
1225     ======================================== ====================================
1226     op_sel_hi:[{0..1},{0..1},{0..1}]         Select size of each source operand.
1227     ======================================== ====================================
1228
1229 Examples:
1230
1231 .. parsed-literal::
1232
1233   op_sel_hi:[1,1,1]
1234
1235 abs
1236 ~~~
1237
1238 See a description :ref:`here<amdgpu_synid_abs>`.
1239
1240 neg
1241 ~~~
1242
1243 See a description :ref:`here<amdgpu_synid_neg>`.
1244
1245 clamp
1246 ~~~~~
1247
1248 See a description :ref:`here<amdgpu_synid_clamp>`.