2 .xz, .lzma, and .lz Test Files
3 ------------------------------
7 This directory contains bunch of files to test handling of .xz,
8 .lzma (LZMA_Alone), and .lz (lzip) files in decoder implementations.
9 Many of the files have been created by hand with a hex editor, thus
10 there is no better "source code" than the files themselves. All the
11 test files and this README may be distributed under the terms of
12 the BSD Zero Clause License (0BSD).
17 Good files (good-*) must decode successfully without requiring
18 a lot of CPU time or RAM.
20 Unsupported files (unsupported-*) are good files, but headers
21 indicate features not supported by the current file format
24 Bad files (bad-*) must cause the decoder to give an error. Like
25 with the good files, these files must not require a lot of CPU
26 time or RAM before they get detected to be broken.
29 2. Descriptions of Individual .xz Files
33 good-0-empty.xz has one Stream with no Blocks.
35 good-0pad-empty.xz has one Stream with no Blocks followed by
36 four-byte Stream Padding.
38 good-0cat-empty.xz has two zero-Block Streams concatenated without
41 good-0catpad-empty.xz has two zero-Block Streams concatenated with
42 four-byte Stream Padding between the Streams.
44 good-1-check-none.xz has one Stream with one Block with two
45 uncompressed LZMA2 chunks and no integrity check.
47 good-1-check-crc32.xz has one Stream with one Block with two
48 uncompressed LZMA2 chunks and CRC32 check.
50 good-1-check-crc64.xz is like good-1-check-crc32.xz but with CRC64.
52 good-1-check-sha256.xz is like good-1-check-crc32.xz but with
55 good-2-lzma2.xz has one Stream with two Blocks with one uncompressed
56 LZMA2 chunk in each Block.
58 good-1-block_header-1.xz has both Compressed Size and Uncompressed
59 Size in the Block Header. This has also four extra bytes of Header
62 good-1-block_header-2.xz has known Compressed Size.
64 good-1-block_header-3.xz has known Uncompressed Size.
66 good-1-delta-lzma2.tiff.xz is an image file that compresses
67 better with Delta+LZMA2 than with plain LZMA2.
69 good-1-arm64-lzma2-1.xz uses the ARM64 filter and LZMA2. The
70 uncompressed data is constructed so that it tests integer
71 wrap around and sign extension. To recreate the file, compress
72 using XZ Utils 5.4.x (newer may or may not work too):
74 ./debug/testfilegen-arm64 \
75 | xz -T1 -Ccrc32 --arm64 \
76 --lzma2=dict=64KiB,lp=2,lc=2 \
77 > good-1-arm64-lzma2-1.xz
79 good-1-arm64-lzma2-2.xz is like good-1-arm64-lzma2-1.xz but with
80 non-zero start offset. XZ Embedded doesn't support this file.
81 To recreate the file, compress using XZ Utils 5.4.x (newer may or
84 ./debug/testfilegen-arm64 \
85 | xz -T1 -Ccrc32 --arm64=start=4294963200 \
86 --lzma2=dict=64KiB,lp=2,lc=2 \
87 > good-1-arm64-lzma2-2.xz
89 good-1-lzma2-1.xz has two LZMA2 chunks, of which the second sets
92 good-1-lzma2-2.xz has two LZMA2 chunks, of which the second resets
93 the state without specifying new properties.
95 good-1-lzma2-3.xz has two LZMA2 chunks, of which the first is
96 uncompressed and the second is LZMA. The first chunk resets dictionary
97 and the second sets new properties.
99 good-1-lzma2-4.xz has three LZMA2 chunks: First is LZMA, second is
100 uncompressed with dictionary reset, and third is LZMA with new
101 properties but without dictionary reset.
103 good-1-lzma2-5.xz has an empty LZMA2 stream with only the end of
104 payload marker. XZ Utils 5.0.1 and older incorrectly see this file
107 good-1-3delta-lzma2.xz has three Delta filters and LZMA2.
109 good-1-empty-bcj-lzma2.xz has an empty Block that uses PowerPC BCJ
110 and LZMA2. liblzma from XZ Utils 5.0.1 and older may incorrectly
111 return LZMA_BUF_ERROR in some cases. See commit message
112 d8db706acb8316f9861abd432cfbe001dd6d0c5c for the details.
115 2.2. Unsupported Files
117 unsupported-check.xz uses Check ID 0x02 which isn't supported by
118 the current version of the file format. It is implementation-defined
119 how this file handled (it may reject it, or decode it possibly with
122 unsupported-block_header.xz has a non-null byte in Header Padding,
123 which may indicate presence of a new unsupported field.
125 unsupported-filter_flags-1.xz has unsupported Filter ID 0x7F.
127 unsupported-filter_flags-2.xz specifies only Delta filter in the
128 List of Filter Flags, but Delta isn't allowed as the last filter in
129 the chain. It could be a little more correct to detect this file as
130 corrupt instead of unsupported, but saying it is unsupported is
131 simpler in case of liblzma.
133 unsupported-filter_flags-3.xz specifies two LZMA2 filters in the
134 List of Filter Flags. LZMA2 is allowed only as the last filter in the
135 chain. It could be a little more correct to detect this file as
136 corrupt instead of unsupported, but saying it is unsupported is
137 simpler in case of liblzma.
142 bad-0pad-empty.xz has one Stream with no Blocks followed by
143 five-byte Stream Padding. Stream Padding must be a multiple of four
144 bytes, thus this file is corrupt.
146 bad-0catpad-empty.xz has two zero-Block Streams concatenated with
147 five-byte Stream Padding between the Streams.
149 bad-0cat-alone.xz is good-0-empty.xz concatenated with an empty
152 bad-0cat-header_magic.xz is good-0cat-empty.xz but with one byte
153 wrong in the Header Magic Bytes field of the second Stream. liblzma
154 gives LZMA_DATA_ERROR for this. (LZMA_FORMAT_ERROR is used only if
155 the first Stream of a file has invalid Header Magic Bytes.)
157 bad-0-header_magic.xz is good-0-empty.xz but with one byte wrong
158 in the Header Magic Bytes field. liblzma gives LZMA_FORMAT_ERROR for
161 bad-0-footer_magic.xz is good-0-empty.xz but with one byte wrong
162 in the Footer Magic Bytes field. liblzma gives LZMA_DATA_ERROR for
165 bad-0-empty-truncated.xz is good-0-empty.xz without the last byte
168 bad-0-nonempty_index.xz has no Blocks but Index claims that there is
171 bad-0-backward_size.xz has wrong Backward Size in Stream Footer.
173 bad-1-stream_flags-1.xz has different Stream Flags in Stream Header
176 bad-1-stream_flags-2.xz has wrong CRC32 in Stream Header.
178 bad-1-stream_flags-3.xz has wrong CRC32 in Stream Footer.
180 bad-1-vli-1.xz has two-byte variable-length integer in the
181 Uncompressed Size field in Block Header while one-byte would be enough
182 for that value. It's important that the file gets rejected due to too
183 big integer encoding instead of due to Uncompressed Size not matching
184 the value stored in the Block Header. That is, the decoder must not
185 try to decode the Compressed Data field.
187 bad-1-vli-2.xz has ten-byte variable-length integer as Uncompressed
188 Size in Block Header. It's important that the file gets rejected due
189 to too big integer encoding instead of due to Uncompressed Size not
190 matching the value stored in the Block Header. That is, the decoder
191 must not try to decode the Compressed Data field.
193 bad-1-block_header-1.xz has Block Header that ends in the middle of
194 the Filter Flags field.
196 bad-1-block_header-2.xz has Block Header that has Compressed Size and
197 Uncompressed Size but no List of Filter Flags field.
199 bad-1-block_header-3.xz has wrong CRC32 in Block Header.
201 bad-1-block_header-4.xz has too big Compressed Size in Block Header
202 (2^63 - 1 bytes while maximum is a little less, because the whole
203 Block must stay smaller than 2^63). It's important that the file
204 gets rejected due to invalid Compressed Size value; the decoder
205 must not try decoding the Compressed Data field.
207 bad-1-block_header-5.xz has zero as Compressed Size in Block Header.
209 bad-1-block_header-6.xz has corrupt Block Header which may crash
210 xz -lvv in XZ Utils 5.0.3 and earlier. It was fixed in the commit
211 c0297445064951807803457dca1611b3c47e7f0f.
213 bad-2-index-1.xz has wrong Unpadded Sizes in Index.
215 bad-2-index-2.xz has wrong Uncompressed Sizes in Index.
217 bad-2-index-3.xz has non-null byte in Index Padding.
219 bad-2-index-4.xz wrong CRC32 in Index.
221 bad-2-index-5.xz has zero as Unpadded Size. It is important that the
222 file gets rejected specifically due to Unpadded Size having an invalid
225 bad-3-index-uncomp-overflow.xz has Index whose Uncompressed Size
226 fields have huge values whose sum exceeds the maximum allowed size
227 of 2^63 - 1 bytes. In this file the sum is exactly 2^64.
228 lzma_index_append() in liblzma <= 5.2.6 lacks the integer overflow
229 check for the uncompressed size and thus doesn't catch the error
230 when decoding the Index field in this file. This makes "xz -l"
231 not detect the error and will display 0 as the uncompressed size.
232 Note that regular decompression isn't affected by this bug because
233 it uses lzma_index_hash_append() instead.
235 bad-2-compressed_data_padding.xz has non-null byte in the padding of
236 the Compressed Data field of the first Block.
238 bad-1-check-crc32.xz has wrong Check (CRC32).
240 bad-1-check-crc32-2.xz has Compressed Size and Uncompressed Size in
241 Block Header but wrong Check (CRC32) in the actual data. This file
242 differs by one byte from good-1-block_header-1.xz: the last byte of
243 the Check field is wrong. This file is useful for testing error
244 detection in the threaded decoder when a worker thread is configured
245 to pass input one byte at a time to the Block decoder.
247 bad-1-check-crc64.xz has wrong Check (CRC64).
249 bad-1-check-sha256.xz has wrong Check (SHA-256).
251 bad-1-lzma2-1.xz has LZMA2 stream whose first chunk (uncompressed)
252 doesn't reset the dictionary.
254 bad-1-lzma2-2.xz has two LZMA2 chunks, of which the second chunk
255 indicates dictionary reset, but the LZMA compressed data tries to
256 repeat data from the previous chunk.
258 bad-1-lzma2-3.xz sets new invalid properties (lc=8, lp=0, pb=0) in
261 bad-1-lzma2-4.xz has two LZMA2 chunks, of which the first is
262 uncompressed and the second is LZMA. The first chunk resets dictionary
263 as it should, but the second chunk tries to reset state without
264 specifying properties for LZMA.
266 bad-1-lzma2-5.xz is like bad-1-lzma2-4.xz but doesn't try to reset
267 anything in the header of the second chunk.
269 bad-1-lzma2-6.xz has reserved LZMA2 control byte value (0x03).
271 bad-1-lzma2-7.xz has EOPM at LZMA level.
273 bad-1-lzma2-8.xz is like good-1-lzma2-4.xz but doesn't set new
274 properties in the third LZMA2 chunk.
276 bad-1-lzma2-9.xz has LZMA2 stream that is truncated at the end of
277 a LZMA2 chunk (no end marker). The uncompressed size of the partial
278 LZMA2 stream exceeds the value stored in the Block Header.
280 bad-1-lzma2-10.xz has LZMA2 stream that, from point of view of a
281 LZMA2 decoder, extends past the end of Block (and even the end of
282 the file). Uncompressed Size in Block Header is bigger than the
283 invalid LZMA2 stream may produce (even if a decoder reads until
284 the end of the file). The Check type is None to nullify certain
285 simple size-based sanity checks in a Block decoder.
287 bad-1-lzma2-11.xz has LZMA2 stream that lacks the end of
288 payload marker. When Compressed Size bytes have been decoded,
289 Uncompressed Size bytes of output will have been produced but
290 the LZMA2 decoder doesn't indicate end of stream.
293 3. Descriptions of Individual .lzma Files
297 good-unknown_size-with_eopm.lzma has unknown size in the header
298 and end of payload marker at the end.
300 good-known_size-without_eopm.lzma has a known size in the header
301 and no end of payload marker at the end.
303 good-known_size-with_eopm.lzma has a known size in the header
304 and end of payload marker at the end. XZ Utils 5.2.5 and older
305 will give an error at the end of the file after producing the
306 correct uncompressed output.
311 bad-unknown_size-without_eopm.lzma has unknown size in the header
312 but no end of payload marker at the end. This file might be seen
313 by a decoder as if it were truncated.
315 bad-too_big_size-with_eopm.lzma has too big uncompressed size in
316 the header and the end of payload marker will be detected before
317 the specified number of bytes have been decoded.
319 bad-too_small_size-without_eopm-1.lzma has too small uncompressed
320 size in the header. The decoder will look for end of payload marker
321 but instead find a literal that would produce more output.
323 bad-too_small_size-without_eopm-2.lzma is like -1 above but instead
324 of a literal the problem occurs with a short repeated match.
326 bad-too_small_size-without_eopm-3.lzma is like -1 above but instead
327 of a literal the problem occurs in the middle of a match.
330 4. Descriptions of Individual .lz (lzip) Files
334 good-1-v0.lz contains a single version 0 member. lzip 1.17 and
335 *older* can decompress this; support for version 0 was removed
338 good-1-v0-trailing-1.lz is like good-1-v0.lz but contains
339 trailing data that the decompressor must ignore.
341 good-1-v1.lz contains a single version 1 member. lzip 1.3 and
342 newer can decompress this.
344 good-1-v1-trailing-1.lz is like good-1-v1.lz but contains
345 trailing data that the decompressor must ignore.
347 good-1-v1-trailing-2.lz is like good-1-v1.lz but contains
348 trailing data whose first three bytes match the .lz magic bytes.
349 With lzip >= 1.20 this file results in an error unless one uses
350 the command line option --loose-trailing. lzip 1.3 to 1.19 decode
351 this file successfully by default. XZ Utils uses the old behavior
352 because it allows lzma_code() to stop at the first byte of the
353 trailing data as long as the first byte isn't 0x4C (L in US-ASCII);
354 otherwise the first 1-3 bytes that equal to the magic bytes are
355 consumed and lost in lzma_code(), and this is visible in xz too:
357 $ ( xz -dc ; cat ) < good-1-v1-trailing-2.lz
362 $ ( xz -dc --single-stream ; cat ) < good-1-v1-trailing-2.lz
367 good-2-v0-v1.lz contains two members of which the first is
368 version 0 and the second version 1. lzip versions 1.3 to 1.17
369 (inclusive) can decompress this.
371 good-2-v1-v0.lz contains two members of which the first is
372 version 1 and the second version 0. lzip versions 1.3 to 1.17
373 (inclusive) can decompress this.
375 good-2-v1-v1.lz contains two version 1 members. lzip versions 1.3
376 and newer can decompress this.
379 4.2. Unsupported Files
381 unsupported-1-v234.lz is like good-1-v1.lz except the version
382 field has been set to 234 (0xEA) which, as of writing, isn't
383 defined or supported by any .lz implementation.
388 bad-1-v1-magic-1.lz is like good-1-v1.lz but the first magic byte
391 bad-1-v1-magic-2.lz is like good-1-v1.lz but the last (fourth)
394 bad-1-v1-dict-1.lz has too low value in the dictionary size field.
396 bad-1-v1-dict-2.lz has too high value in the dictionary size field.
398 bad-1-v1-crc32.lz has wrong CRC32 value.
400 bad-1-v0-uncomp-size.lz is version 0 format with incorrect value
401 in the uncompressed size field.
403 bad-1-v1-uncomp-size.lz is version 1 format with incorrect value
404 in the uncompressed size field.
406 bad-1-v1-member-size.lz has incorrect value in the member size
409 bad-1-v1-trailing-magic.lz has the four .lz magic bytes as trailing
410 data. This should be detected as a truncated file and thus result
411 in an error. That is, the last four bytes of the file should not be
412 ignored as trailing garbage. lzip >= 1.18 matches this behavior
413 while older versions ignore the last four bytes and don't indicate