docs/devel/migration/qpl-compression.rst

   1 ===============
   2 QPL Compression
   3 ===============
   4 The Intel Query Processing Library (Intel ``QPL``) is an open-source library to
   5 provide compression and decompression features and it is based on deflate
   6 compression algorithm (RFC 1951).
   7
   8 The ``QPL`` compression relies on Intel In-Memory Analytics Accelerator(``IAA``)
   9 and Shared Virtual Memory(``SVM``) technology, they are new features supported
  10 from Intel 4th Gen Intel Xeon Scalable processors, codenamed Sapphire Rapids
  11 processor(``SPR``).
  12
  13 For more ``QPL`` introduction, please refer to `QPL Introduction
  14 <https://intel.github.io/qpl/documentation/introduction_docs/introduction.html>`_
  15
  16 QPL Compression Framework
  17 =========================
  18
  19 ::
  20
  21   +----------------+       +------------------+
  22   | MultiFD Thread |       |accel-config tool |
  23   +-------+--------+       +--------+---------+
  24           |                         |
  25           |                         |
  26           |compress/decompress      |
  27   +-------+--------+                | Setup IAA
  28   |  QPL library   |                | Resources
  29   +-------+---+----+                |
  30           |   |                     |
  31           |   +-------------+-------+
  32           |   Open IAA      |
  33           |   Devices +-----+-----+
  34           |           |idxd driver|
  35           |           +-----+-----+
  36           |                 |
  37           |                 |
  38           |           +-----+-----+
  39           +-----------+IAA Devices|
  40       Submit jobs     +-----------+
  41       via enqcmd
  42
  43
  44 QPL Build And Installation
  45 --------------------------
  46
  47 .. code-block:: shell
  48
  49   $git clone --recursive https://github.com/intel/qpl.git qpl
  50   $mkdir qpl/build
  51   $cd qpl/build
  52   $cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DQPL_LIBRARY_TYPE=SHARED ..
  53   $sudo cmake --build . --target install
  54
  55 For more details about ``QPL`` installation, please refer to `QPL Installation
  56 <https://intel.github.io/qpl/documentation/get_started_docs/installation.html>`_
  57
  58 IAA Device Management
  59 ---------------------
  60
  61 The number of ``IAA`` devices will vary depending on the Xeon product model.
  62 On a ``SPR`` server, there can be a maximum of 8 ``IAA`` devices, with up to
  63 4 devices per socket.
  64
  65 By default, all ``IAA`` devices are disabled and need to be configured and
  66 enabled by users manually.
  67
  68 Check the number of devices through the following command
  69
  70 .. code-block:: shell
  71
  72   #lspci -d 8086:0cfe
  73   6a:02.0 System peripheral: Intel Corporation Device 0cfe
  74   6f:02.0 System peripheral: Intel Corporation Device 0cfe
  75   74:02.0 System peripheral: Intel Corporation Device 0cfe
  76   79:02.0 System peripheral: Intel Corporation Device 0cfe
  77   e7:02.0 System peripheral: Intel Corporation Device 0cfe
  78   ec:02.0 System peripheral: Intel Corporation Device 0cfe
  79   f1:02.0 System peripheral: Intel Corporation Device 0cfe
  80   f6:02.0 System peripheral: Intel Corporation Device 0cfe
  81
  82 IAA Device Configuration And Enabling
  83 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  84
  85 The ``accel-config`` tool is used to enable ``IAA`` devices and configure
  86 ``IAA`` hardware resources(work queues and engines). One ``IAA`` device
  87 has 8 work queues and 8 processing engines, multiple engines can be assigned
  88 to a work queue via ``group`` attribute.
  89
  90 For ``accel-config`` installation, please refer to `accel-config installation
  91 <https://github.com/intel/idxd-config>`_
  92
  93 One example of configuring and enabling an ``IAA`` device.
  94
  95 .. code-block:: shell
  96
  97   #accel-config config-engine iax1/engine1.0 -g 0
  98   #accel-config config-engine iax1/engine1.1 -g 0
  99   #accel-config config-engine iax1/engine1.2 -g 0
 100   #accel-config config-engine iax1/engine1.3 -g 0
 101   #accel-config config-engine iax1/engine1.4 -g 0
 102   #accel-config config-engine iax1/engine1.5 -g 0
 103   #accel-config config-engine iax1/engine1.6 -g 0
 104   #accel-config config-engine iax1/engine1.7 -g 0
 105   #accel-config config-wq iax1/wq1.0 -g 0 -s 128 -p 10 -b 1 -t 128 -m shared -y user -n app1 -d user
 106   #accel-config enable-device iax1
 107   #accel-config enable-wq iax1/wq1.0
 108
 109 .. note::
 110    IAX is an early name for IAA
 111
 112 - The ``IAA`` device index is 1, use ``ls -lh /sys/bus/dsa/devices/iax*``
 113   command to query the ``IAA`` device index.
 114
 115 - 8 engines and 1 work queue are configured in group 0, so all compression jobs
 116   submitted to this work queue can be processed by all engines at the same time.
 117
 118 - Set work queue attributes including the work mode, work queue size and so on.
 119
 120 - Enable the ``IAA1`` device and work queue 1.0
 121
 122 .. note::
 123
 124   Set work queue mode to shared mode, since ``QPL`` library only supports
 125   shared mode
 126
 127 For more detailed configuration, please refer to `IAA Configuration Samples
 128 <https://github.com/intel/idxd-config/tree/stable/Documentation/accfg>`_
 129
 130 IAA Unit Test
 131 ^^^^^^^^^^^^^
 132
 133 - Enabling ``IAA`` devices for Xeon platform, please refer to `IAA User Guide
 134   <https://www.intel.com/content/www/us/en/content-details/780887/intel-in-memory-analytics-accelerator-intel-iaa.html>`_
 135
 136 - ``IAA`` device driver is Intel Data Accelerator Driver (idxd), it is
 137   recommended that the minimum version of Linux kernel is 5.18.
 138
 139 - Add ``"intel_iommu=on,sm_on"`` parameter to kernel command line
 140   for ``SVM`` feature enabling.
 141
 142 Here is an easy way to verify ``IAA`` device driver and ``SVM`` with `iaa_test
 143 <https://github.com/intel/idxd-config/tree/stable/test>`_
 144
 145 .. code-block:: shell
 146
 147   #./test/iaa_test
 148    [ info] alloc wq 0 shared size 128 addr 0x7f26cebe5000 batch sz 0xfffffffe xfer sz 0x80000000
 149    [ info] test noop: tflags 0x1 num_desc 1
 150    [ info] preparing descriptor for noop
 151    [ info] Submitted all noop jobs
 152    [ info] verifying task result for 0x16f7e20
 153    [ info] test with op 0 passed
 154
 155
 156 IAA Resources Allocation For Migration
 157 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 158
 159 There is no ``IAA`` resource configuration parameters for migration and
 160 ``accel-config`` tool configuration cannot directly specify the ``IAA``
 161 resources used for migration.
 162
 163 The multifd migration with ``QPL`` compression method  will use all work
 164 queues that are enabled and shared mode.
 165
 166 .. note::
 167
 168   Accessing IAA resources requires ``sudo`` command or ``root`` privileges
 169   by default. Administrators can modify the IAA device node ownership
 170   so that QEMU can use IAA with specified user permissions.
 171
 172   For example
 173
 174   #chown -R qemu /dev/iax
 175
 176 Shared Virtual Memory(SVM) Introduction
 177 =======================================
 178
 179 An ability for an accelerator I/O device to operate in the same virtual
 180 memory space of applications on host processors. It also implies the
 181 ability to operate from pageable memory, avoiding functional requirements
 182 to pin memory for DMA operations.
 183
 184 When using ``SVM`` technology, users do not need to reserve memory for the
 185 ``IAA`` device and perform pin memory operation. The ``IAA`` device can
 186 directly access data using the virtual address of the process.
 187
 188 For more ``SVM`` technology, please refer to
 189 `Shared Virtual Addressing (SVA) with ENQCMD
 190 <https://docs.kernel.org/next/x86/sva.html>`_
 191
 192
 193 How To Use QPL Compression In Migration
 194 =======================================
 195
 196 1 - Installation of ``QPL`` library and ``accel-config`` library if using IAA
 197
 198 2 - Configure and enable ``IAA`` devices and work queues via ``accel-config``
 199
 200 3 - Build ``QEMU`` with ``--enable-qpl`` parameter
 201
 202   E.g. configure --target-list=x86_64-softmmu --enable-kvm ``--enable-qpl``
 203
 204 4 - Enable ``QPL`` compression during migration
 205
 206   Set ``migrate_set_parameter multifd-compression qpl`` when migrating, the
 207   ``QPL`` compression does not support configuring the compression level, it
 208   only supports one compression level.
 209
 210 The Difference Between QPL And ZLIB
 211 ===================================
 212
 213 Although both ``QPL`` and ``ZLIB`` are based on the deflate compression
 214 algorithm, and ``QPL`` can support the header and tail of ``ZLIB``, ``QPL``
 215 is still not fully compatible with the ``ZLIB`` compression in the migration.
 216
 217 ``QPL`` only supports 4K history buffer, and ``ZLIB`` is 32K by default.
 218 ``ZLIB`` compresses data that ``QPL`` may not decompress correctly and
 219 vice versa.
 220
 221 ``QPL`` does not support the ``Z_SYNC_FLUSH`` operation in ``ZLIB`` streaming
 222 compression, current ``ZLIB`` implementation uses ``Z_SYNC_FLUSH``, so each
 223 ``multifd`` thread has a ``ZLIB`` streaming context, and all page compression
 224 and decompression are based on this stream. ``QPL`` cannot decompress such data
 225 and vice versa.
 226
 227 The introduction for ``Z_SYNC_FLUSH``, please refer to `Zlib Manual
 228 <https://www.zlib.net/manual.html>`_
 229
 230 The Best Practices
 231 ==================
 232 When user enables the IAA device for ``QPL`` compression, it is recommended
 233 to add ``-mem-prealloc`` parameter to the destination boot parameters. This
 234 parameter can avoid the occurrence of I/O page fault and reduce the overhead
 235 of IAA compression and decompression.
 236
 237 The example of booting with ``-mem-prealloc`` parameter
 238
 239 .. code-block:: shell
 240
 241    $qemu-system-x86_64 --enable-kvm -cpu host --mem-prealloc ...
 242
 243
 244 An example about I/O page fault measurement of destination without
 245 ``-mem-prealloc``, the ``svm_prq`` indicates the number of I/O page fault
 246 occurrences and processing time.
 247
 248 .. code-block:: shell
 249
 250   #echo 1 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
 251   #echo 2 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
 252   #echo 3 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
 253   #echo 4 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
 254   #cat /sys/kernel/debug/iommu/intel/dmar_perf_latency
 255   IOMMU: dmar18 Register Base Address: c87fc000
 256                   <0.1us   0.1us-1us    1us-10us  10us-100us   100us-1ms    1ms-10ms      >=10ms     min(us)     max(us) average(us)
 257    inv_iotlb           0         286         123           0           0           0           0           0           1           0
 258   inv_devtlb           0         276         133           0           0           0           0           0           2           0
 259      inv_iec           0           0           0           0           0           0           0           0           0           0
 260      svm_prq           0           0       25206         364         395           0           0           1         556           9