4 The Intel Query Processing Library (Intel ``QPL``) is an open-source library to
5 provide compression and decompression features and it is based on deflate
6 compression algorithm (RFC 1951).
8 The ``QPL`` compression relies on Intel In-Memory Analytics Accelerator(``IAA``)
9 and Shared Virtual Memory(``SVM``) technology, they are new features supported
10 from Intel 4th Gen Intel Xeon Scalable processors, codenamed Sapphire Rapids
13 For more ``QPL`` introduction, please refer to `QPL Introduction
14 <https://intel.github.io/qpl/documentation/introduction_docs/introduction.html>`_
16 QPL Compression Framework
17 =========================
21 +----------------+ +------------------+
22 | MultiFD Thread | |accel-config tool |
23 +-------+--------+ +--------+---------+
26 |compress/decompress |
27 +-------+--------+ | Setup IAA
28 | QPL library | | Resources
31 | +-------------+-------+
33 | Devices +-----+-----+
39 +-----------+IAA Devices|
40 Submit jobs +-----------+
44 QPL Build And Installation
45 --------------------------
49 $git clone --recursive https://github.com/intel/qpl.git qpl
52 $cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DQPL_LIBRARY_TYPE=SHARED ..
53 $sudo cmake --build . --target install
55 For more details about ``QPL`` installation, please refer to `QPL Installation
56 <https://intel.github.io/qpl/documentation/get_started_docs/installation.html>`_
61 The number of ``IAA`` devices will vary depending on the Xeon product model.
62 On a ``SPR`` server, there can be a maximum of 8 ``IAA`` devices, with up to
65 By default, all ``IAA`` devices are disabled and need to be configured and
66 enabled by users manually.
68 Check the number of devices through the following command
73 6a:02.0 System peripheral: Intel Corporation Device 0cfe
74 6f:02.0 System peripheral: Intel Corporation Device 0cfe
75 74:02.0 System peripheral: Intel Corporation Device 0cfe
76 79:02.0 System peripheral: Intel Corporation Device 0cfe
77 e7:02.0 System peripheral: Intel Corporation Device 0cfe
78 ec:02.0 System peripheral: Intel Corporation Device 0cfe
79 f1:02.0 System peripheral: Intel Corporation Device 0cfe
80 f6:02.0 System peripheral: Intel Corporation Device 0cfe
82 IAA Device Configuration And Enabling
83 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
85 The ``accel-config`` tool is used to enable ``IAA`` devices and configure
86 ``IAA`` hardware resources(work queues and engines). One ``IAA`` device
87 has 8 work queues and 8 processing engines, multiple engines can be assigned
88 to a work queue via ``group`` attribute.
90 For ``accel-config`` installation, please refer to `accel-config installation
91 <https://github.com/intel/idxd-config>`_
93 One example of configuring and enabling an ``IAA`` device.
97 #accel-config config-engine iax1/engine1.0 -g 0
98 #accel-config config-engine iax1/engine1.1 -g 0
99 #accel-config config-engine iax1/engine1.2 -g 0
100 #accel-config config-engine iax1/engine1.3 -g 0
101 #accel-config config-engine iax1/engine1.4 -g 0
102 #accel-config config-engine iax1/engine1.5 -g 0
103 #accel-config config-engine iax1/engine1.6 -g 0
104 #accel-config config-engine iax1/engine1.7 -g 0
105 #accel-config config-wq iax1/wq1.0 -g 0 -s 128 -p 10 -b 1 -t 128 -m shared -y user -n app1 -d user
106 #accel-config enable-device iax1
107 #accel-config enable-wq iax1/wq1.0
110 IAX is an early name for IAA
112 - The ``IAA`` device index is 1, use ``ls -lh /sys/bus/dsa/devices/iax*``
113 command to query the ``IAA`` device index.
115 - 8 engines and 1 work queue are configured in group 0, so all compression jobs
116 submitted to this work queue can be processed by all engines at the same time.
118 - Set work queue attributes including the work mode, work queue size and so on.
120 - Enable the ``IAA1`` device and work queue 1.0
124 Set work queue mode to shared mode, since ``QPL`` library only supports
127 For more detailed configuration, please refer to `IAA Configuration Samples
128 <https://github.com/intel/idxd-config/tree/stable/Documentation/accfg>`_
133 - Enabling ``IAA`` devices for Xeon platform, please refer to `IAA User Guide
134 <https://www.intel.com/content/www/us/en/content-details/780887/intel-in-memory-analytics-accelerator-intel-iaa.html>`_
136 - ``IAA`` device driver is Intel Data Accelerator Driver (idxd), it is
137 recommended that the minimum version of Linux kernel is 5.18.
139 - Add ``"intel_iommu=on,sm_on"`` parameter to kernel command line
140 for ``SVM`` feature enabling.
142 Here is an easy way to verify ``IAA`` device driver and ``SVM`` with `iaa_test
143 <https://github.com/intel/idxd-config/tree/stable/test>`_
145 .. code-block:: shell
148 [ info] alloc wq 0 shared size 128 addr 0x7f26cebe5000 batch sz 0xfffffffe xfer sz 0x80000000
149 [ info] test noop: tflags 0x1 num_desc 1
150 [ info] preparing descriptor for noop
151 [ info] Submitted all noop jobs
152 [ info] verifying task result for 0x16f7e20
153 [ info] test with op 0 passed
156 IAA Resources Allocation For Migration
157 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
159 There is no ``IAA`` resource configuration parameters for migration and
160 ``accel-config`` tool configuration cannot directly specify the ``IAA``
161 resources used for migration.
163 The multifd migration with ``QPL`` compression method will use all work
164 queues that are enabled and shared mode.
168 Accessing IAA resources requires ``sudo`` command or ``root`` privileges
169 by default. Administrators can modify the IAA device node ownership
170 so that QEMU can use IAA with specified user permissions.
174 #chown -R qemu /dev/iax
176 Shared Virtual Memory(SVM) Introduction
177 =======================================
179 An ability for an accelerator I/O device to operate in the same virtual
180 memory space of applications on host processors. It also implies the
181 ability to operate from pageable memory, avoiding functional requirements
182 to pin memory for DMA operations.
184 When using ``SVM`` technology, users do not need to reserve memory for the
185 ``IAA`` device and perform pin memory operation. The ``IAA`` device can
186 directly access data using the virtual address of the process.
188 For more ``SVM`` technology, please refer to
189 `Shared Virtual Addressing (SVA) with ENQCMD
190 <https://docs.kernel.org/next/x86/sva.html>`_
193 How To Use QPL Compression In Migration
194 =======================================
196 1 - Installation of ``QPL`` library and ``accel-config`` library if using IAA
198 2 - Configure and enable ``IAA`` devices and work queues via ``accel-config``
200 3 - Build ``QEMU`` with ``--enable-qpl`` parameter
202 E.g. configure --target-list=x86_64-softmmu --enable-kvm ``--enable-qpl``
204 4 - Enable ``QPL`` compression during migration
206 Set ``migrate_set_parameter multifd-compression qpl`` when migrating, the
207 ``QPL`` compression does not support configuring the compression level, it
208 only supports one compression level.
210 The Difference Between QPL And ZLIB
211 ===================================
213 Although both ``QPL`` and ``ZLIB`` are based on the deflate compression
214 algorithm, and ``QPL`` can support the header and tail of ``ZLIB``, ``QPL``
215 is still not fully compatible with the ``ZLIB`` compression in the migration.
217 ``QPL`` only supports 4K history buffer, and ``ZLIB`` is 32K by default.
218 ``ZLIB`` compresses data that ``QPL`` may not decompress correctly and
221 ``QPL`` does not support the ``Z_SYNC_FLUSH`` operation in ``ZLIB`` streaming
222 compression, current ``ZLIB`` implementation uses ``Z_SYNC_FLUSH``, so each
223 ``multifd`` thread has a ``ZLIB`` streaming context, and all page compression
224 and decompression are based on this stream. ``QPL`` cannot decompress such data
227 The introduction for ``Z_SYNC_FLUSH``, please refer to `Zlib Manual
228 <https://www.zlib.net/manual.html>`_
232 When user enables the IAA device for ``QPL`` compression, it is recommended
233 to add ``-mem-prealloc`` parameter to the destination boot parameters. This
234 parameter can avoid the occurrence of I/O page fault and reduce the overhead
235 of IAA compression and decompression.
237 The example of booting with ``-mem-prealloc`` parameter
239 .. code-block:: shell
241 $qemu-system-x86_64 --enable-kvm -cpu host --mem-prealloc ...
244 An example about I/O page fault measurement of destination without
245 ``-mem-prealloc``, the ``svm_prq`` indicates the number of I/O page fault
246 occurrences and processing time.
248 .. code-block:: shell
250 #echo 1 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
251 #echo 2 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
252 #echo 3 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
253 #echo 4 > /sys/kernel/debug/iommu/intel/dmar_perf_latency
254 #cat /sys/kernel/debug/iommu/intel/dmar_perf_latency
255 IOMMU: dmar18 Register Base Address: c87fc000
256 <0.1us 0.1us-1us 1us-10us 10us-100us 100us-1ms 1ms-10ms >=10ms min(us) max(us) average(us)
257 inv_iotlb 0 286 123 0 0 0 0 0 1 0
258 inv_devtlb 0 276 133 0 0 0 0 0 2 0
259 inv_iec 0 0 0 0 0 0 0 0 0 0
260 svm_prq 0 0 25206 364 395 0 0 1 556 9