2 .xz and .lzma Test Files
3 ------------------------
7 This directory contains bunch of files to test handling of .xz,
8 .lzma (LZMA_Alone), and .lz (lzip) files in decoder implementations.
9 Many of the files have been created by hand with a hex editor, thus
10 there is no better "source code" than the files themselves. All the
11 test files and this README have been put into the public domain.
16 Good files (good-*) must decode successfully without requiring
17 a lot of CPU time or RAM.
19 Unsupported files (unsupported-*) are good files, but headers
20 indicate features not supported by the current file format
23 Bad files (bad-*) must cause the decoder to give an error. Like
24 with the good files, these files must not require a lot of CPU
25 time or RAM before they get detected to be broken.
28 2. Descriptions of Individual .xz Files
32 good-0-empty.xz has one Stream with no Blocks.
34 good-0pad-empty.xz has one Stream with no Blocks followed by
35 four-byte Stream Padding.
37 good-0cat-empty.xz has two zero-Block Streams concatenated without
40 good-0catpad-empty.xz has two zero-Block Streams concatenated with
41 four-byte Stream Padding between the Streams.
43 good-1-check-none.xz has one Stream with one Block with two
44 uncompressed LZMA2 chunks and no integrity check.
46 good-1-check-crc32.xz has one Stream with one Block with two
47 uncompressed LZMA2 chunks and CRC32 check.
49 good-1-check-crc64.xz is like good-1-check-crc32.xz but with CRC64.
51 good-1-check-sha256.xz is like good-1-check-crc32.xz but with
54 good-2-lzma2.xz has one Stream with two Blocks with one uncompressed
55 LZMA2 chunk in each Block.
57 good-1-block_header-1.xz has both Compressed Size and Uncompressed
58 Size in the Block Header. This has also four extra bytes of Header
61 good-1-block_header-2.xz has known Compressed Size.
63 good-1-block_header-3.xz has known Uncompressed Size.
65 good-1-delta-lzma2.tiff.xz is an image file that compresses
66 better with Delta+LZMA2 than with plain LZMA2.
68 good-1-x86-lzma2.xz uses the x86 filter (BCJ) and LZMA2. The
69 uncompressed file is compress_prepared_bcj_x86 found from the tests
72 good-1-sparc-lzma2.xz uses the SPARC filter and LZMA2. The
73 uncompressed file is compress_prepared_bcj_sparc found from the tests
76 good-1-arm64-lzma2-1.xz uses the ARM64 filter and LZMA2. The
77 uncompressed data is constructed so that it tests integer
78 wrap around and sign extension.
80 good-1-arm64-lzma2-2.xz is like good-1-arm64-lzma2-1.xz but with
81 non-zero start offset. XZ Embedded doesn't support this file.
83 good-1-lzma2-1.xz has two LZMA2 chunks, of which the second sets
86 good-1-lzma2-2.xz has two LZMA2 chunks, of which the second resets
87 the state without specifying new properties.
89 good-1-lzma2-3.xz has two LZMA2 chunks, of which the first is
90 uncompressed and the second is LZMA. The first chunk resets dictionary
91 and the second sets new properties.
93 good-1-lzma2-4.xz has three LZMA2 chunks: First is LZMA, second is
94 uncompressed with dictionary reset, and third is LZMA with new
95 properties but without dictionary reset.
97 good-1-lzma2-5.xz has an empty LZMA2 stream with only the end of
98 payload marker. XZ Utils 5.0.1 and older incorrectly see this file
101 good-1-3delta-lzma2.xz has three Delta filters and LZMA2.
103 good-1-empty-bcj-lzma2.xz has an empty Block that uses PowerPC BCJ
104 and LZMA2. liblzma from XZ Utils 5.0.1 and older may incorrectly
105 return LZMA_BUF_ERROR in some cases. See commit message
106 d8db706acb8316f9861abd432cfbe001dd6d0c5c for the details.
109 2.2. Unsupported Files
111 unsupported-check.xz uses Check ID 0x02 which isn't supported by
112 the current version of the file format. It is implementation-defined
113 how this file handled (it may reject it, or decode it possibly with
116 unsupported-block_header.xz has a non-null byte in Header Padding,
117 which may indicate presence of a new unsupported field.
119 unsupported-filter_flags-1.xz has unsupported Filter ID 0x7F.
121 unsupported-filter_flags-2.xz specifies only Delta filter in the
122 List of Filter Flags, but Delta isn't allowed as the last filter in
123 the chain. It could be a little more correct to detect this file as
124 corrupt instead of unsupported, but saying it is unsupported is
125 simpler in case of liblzma.
127 unsupported-filter_flags-3.xz specifies two LZMA2 filters in the
128 List of Filter Flags. LZMA2 is allowed only as the last filter in the
129 chain. It could be a little more correct to detect this file as
130 corrupt instead of unsupported, but saying it is unsupported is
131 simpler in case of liblzma.
136 bad-0pad-empty.xz has one Stream with no Blocks followed by
137 five-byte Stream Padding. Stream Padding must be a multiple of four
138 bytes, thus this file is corrupt.
140 bad-0catpad-empty.xz has two zero-Block Streams concatenated with
141 five-byte Stream Padding between the Streams.
143 bad-0cat-alone.xz is good-0-empty.xz concatenated with an empty
146 bad-0cat-header_magic.xz is good-0cat-empty.xz but with one byte
147 wrong in the Header Magic Bytes field of the second Stream. liblzma
148 gives LZMA_DATA_ERROR for this. (LZMA_FORMAT_ERROR is used only if
149 the first Stream of a file has invalid Header Magic Bytes.)
151 bad-0-header_magic.xz is good-0-empty.xz but with one byte wrong
152 in the Header Magic Bytes field. liblzma gives LZMA_FORMAT_ERROR for
155 bad-0-footer_magic.xz is good-0-empty.xz but with one byte wrong
156 in the Footer Magic Bytes field. liblzma gives LZMA_DATA_ERROR for
159 bad-0-empty-truncated.xz is good-0-empty.xz without the last byte
162 bad-0-nonempty_index.xz has no Blocks but Index claims that there is
165 bad-0-backward_size.xz has wrong Backward Size in Stream Footer.
167 bad-1-stream_flags-1.xz has different Stream Flags in Stream Header
170 bad-1-stream_flags-2.xz has wrong CRC32 in Stream Header.
172 bad-1-stream_flags-3.xz has wrong CRC32 in Stream Footer.
174 bad-1-vli-1.xz has two-byte variable-length integer in the
175 Uncompressed Size field in Block Header while one-byte would be enough
176 for that value. It's important that the file gets rejected due to too
177 big integer encoding instead of due to Uncompressed Size not matching
178 the value stored in the Block Header. That is, the decoder must not
179 try to decode the Compressed Data field.
181 bad-1-vli-2.xz has ten-byte variable-length integer as Uncompressed
182 Size in Block Header. It's important that the file gets rejected due
183 to too big integer encoding instead of due to Uncompressed Size not
184 matching the value stored in the Block Header. That is, the decoder
185 must not try to decode the Compressed Data field.
187 bad-1-block_header-1.xz has Block Header that ends in the middle of
188 the Filter Flags field.
190 bad-1-block_header-2.xz has Block Header that has Compressed Size and
191 Uncompressed Size but no List of Filter Flags field.
193 bad-1-block_header-3.xz has wrong CRC32 in Block Header.
195 bad-1-block_header-4.xz has too big Compressed Size in Block Header
196 (2^63 - 1 bytes while maximum is a little less, because the whole
197 Block must stay smaller than 2^63). It's important that the file
198 gets rejected due to invalid Compressed Size value; the decoder
199 must not try decoding the Compressed Data field.
201 bad-1-block_header-5.xz has zero as Compressed Size in Block Header.
203 bad-1-block_header-6.xz has corrupt Block Header which may crash
204 xz -lvv in XZ Utils 5.0.3 and earlier. It was fixed in the commit
205 c0297445064951807803457dca1611b3c47e7f0f.
207 bad-2-index-1.xz has wrong Unpadded Sizes in Index.
209 bad-2-index-2.xz has wrong Uncompressed Sizes in Index.
211 bad-2-index-3.xz has non-null byte in Index Padding.
213 bad-2-index-4.xz wrong CRC32 in Index.
215 bad-2-index-5.xz has zero as Unpadded Size. It is important that the
216 file gets rejected specifically due to Unpadded Size having an invalid
219 bad-3-index-uncomp-overflow.xz has Index whose Uncompressed Size
220 fields have huge values whose sum exceeds the maximum allowed size
221 of 2^63 - 1 bytes. In this file the sum is exactly 2^64.
222 lzma_index_append() in liblzma <= 5.2.6 lacks the integer overflow
223 check for the uncompressed size and thus doesn't catch the error
224 when decoding the Index field in this file. This makes "xz -l"
225 not detect the error and will display 0 as the uncompressed size.
226 Note that regular decompression isn't affected by this bug because
227 it uses lzma_index_hash_append() instead.
229 bad-2-compressed_data_padding.xz has non-null byte in the padding of
230 the Compressed Data field of the first Block.
232 bad-1-check-crc32.xz has wrong Check (CRC32).
234 bad-1-check-crc32-2.xz has Compressed Size and Uncompressed Size in
235 Block Header but wrong Check (CRC32) in the actual data. This file
236 differs by one byte from good-1-block_header-1.xz: the last byte of
237 the Check field is wrong. This file is useful for testing error
238 detection in the threaded decoder when a worker thread is configured
239 to pass input one byte at a time to the Block decoder.
241 bad-1-check-crc64.xz has wrong Check (CRC64).
243 bad-1-check-sha256.xz has wrong Check (SHA-256).
245 bad-1-lzma2-1.xz has LZMA2 stream whose first chunk (uncompressed)
246 doesn't reset the dictionary.
248 bad-1-lzma2-2.xz has two LZMA2 chunks, of which the second chunk
249 indicates dictionary reset, but the LZMA compressed data tries to
250 repeat data from the previous chunk.
252 bad-1-lzma2-3.xz sets new invalid properties (lc=8, lp=0, pb=0) in
255 bad-1-lzma2-4.xz has two LZMA2 chunks, of which the first is
256 uncompressed and the second is LZMA. The first chunk resets dictionary
257 as it should, but the second chunk tries to reset state without
258 specifying properties for LZMA.
260 bad-1-lzma2-5.xz is like bad-1-lzma2-4.xz but doesn't try to reset
261 anything in the header of the second chunk.
263 bad-1-lzma2-6.xz has reserved LZMA2 control byte value (0x03).
265 bad-1-lzma2-7.xz has EOPM at LZMA level.
267 bad-1-lzma2-8.xz is like good-1-lzma2-4.xz but doesn't set new
268 properties in the third LZMA2 chunk.
270 bad-1-lzma2-9.xz has LZMA2 stream that is truncated at the end of
271 a LZMA2 chunk (no end marker). The uncompressed size of the partial
272 LZMA2 stream exceeds the value stored in the Block Header.
274 bad-1-lzma2-10.xz has LZMA2 stream that, from point of view of a
275 LZMA2 decoder, extends past the end of Block (and even the end of
276 the file). Uncompressed Size in Block Header is bigger than the
277 invalid LZMA2 stream may produce (even if a decoder reads until
278 the end of the file). The Check type is None to nullify certain
279 simple size-based sanity checks in a Block decoder.
281 bad-1-lzma2-11.xz has LZMA2 stream that lacks the end of
282 payload marker. When Compressed Size bytes have been decoded,
283 Uncompressed Size bytes of output will have been produced but
284 the LZMA2 decoder doesn't indicate end of stream.
287 3. Descriptions of Individual .lzma Files
291 good-unknown_size-with_eopm.lzma has unknown size in the header
292 and end of payload marker at the end.
294 good-known_size-without_eopm.lzma has a known size in the header
295 and no end of payload marker at the end.
297 good-known_size-with_eopm.lzma has a known size in the header
298 and end of payload marker at the end. XZ Utils 5.2.5 and older
299 will give an error at the end of the file after producing the
300 correct uncompressed output.
305 bad-unknown_size-without_eopm.lzma has unknown size in the header
306 but no end of payload marker at the end. This file might be seen
307 by a decoder as if it were truncated.
309 bad-too_big_size-with_eopm.lzma has too big uncompressed size in
310 the header and the end of payload marker will be detected before
311 the specified number of bytes have been decoded.
313 bad-too_small_size-without_eopm-1.lzma has too small uncompressed
314 size in the header. The decoder will look for end of payload marker
315 but instead find a literal that would produce more output.
317 bad-too_small_size-without_eopm-2.lzma is like -1 above but instead
318 of a literal the problem occurs with a short repeated match.
320 bad-too_small_size-without_eopm-3.lzma is like -1 above but instead
321 of a literal the problem occurs in the middle of a match.
324 4. Descriptions of Individual .lz (lzip) Files
328 good-1-v0.lz contains a single version 0 member. lzip 1.17 and
329 *older* can decompress this; support for version 0 was removed
332 good-1-v0-trailing-1.lz is like good-1-v0.lz but contains
333 trailing data that the decompressor must ignore.
335 good-1-v1.lz contains a single version 1 member. lzip 1.3 and
336 newer can decompress this.
338 good-1-v1-trailing-1.lz is like good-1-v1.lz but contains
339 trailing data that the decompressor must ignore.
341 good-1-v1-trailing-2.lz is like good-1-v1.lz but contains
342 trailing data whose first three bytes match the .lz magic bytes.
343 With lzip >= 1.20 this file results in an error unless one uses
344 the command line option --loose-trailing. lzip 1.3 to 1.19 decode
345 this file successfully by default. XZ Utils uses the old behavior
346 because it allows lzma_code() to stop at the first byte of the
347 trailing data as long as the first byte isn't 0x4C (L in US-ASCII);
348 otherwise the first 1-3 bytes that equal to the magic bytes are
349 consumed and lost in lzma_code(), and this is visible in xz too:
351 $ ( xz -dc ; cat ) < good-1-v1-trailing-2.lz
356 $ ( xz -dc --single-stream ; cat ) < good-1-v1-trailing-2.lz
361 good-2-v0-v1.lz contains two members of which the first is
362 version 0 and the second version 1. lzip versions 1.3 to 1.17
363 (inclusive) can decompress this.
365 good-2-v1-v0.lz contains two members of which the first is
366 version 1 and the second version 0. lzip versions 1.3 to 1.17
367 (inclusive) can decompress this.
369 good-2-v1-v1.lz contains two version 1 members. lzip versions 1.3
370 and newer can decompress this.
373 4.2. Unsupported Files
375 unsupported-1-v234.lz is like good-1-v1.lz except the version
376 field has been set to 234 (0xEA) which, as of writing, isn't
377 defined or supported by any .lz implementation.
382 bad-1-v1-magic-1.lz is like good-1-v1.lz but the first magic byte
385 bad-1-v1-magic-2.lz is like good-1-v1.lz but the last (fourth)
388 bad-1-v1-dict-1.lz has too low value in the dictionary size field.
390 bad-1-v1-dict-2.lz has too high value in the dictionary size field.
392 bad-1-v1-crc32.lz has wrong CRC32 value.
394 bad-1-v0-uncomp-size.lz is version 0 format with incorrect value
395 in the uncompressed size field.
397 bad-1-v1-uncomp-size.lz is version 1 format with incorrect value
398 in the uncompressed size field.
400 bad-1-v1-member-size.lz has incorrect value in the member size
403 bad-1-v1-trailing-magic.lz has the four .lz magic bytes as trailing
404 data. This should be detected as a truncated file and thus result
405 in an error. That is, the last four bytes of the file should not be
406 ignored as trailing garbage. lzip >= 1.18 matches this behavior
407 while older versions ignore the last four bytes and don't indicate