doc/faq.txt

   1
   2 XZ Utils FAQ
   3 ============
   4
   5 Q:  What do the letters XZ mean?
   6
   7 A:  Nothing. They are just two letters, which come from the file format
   8     suffix .xz. The .xz suffix was selected, because it seemed to be
   9     pretty much unused. It has no deeper meaning.
  10
  11
  12 Q:  What are LZMA and LZMA2?
  13
  14 A:  LZMA stands for Lempel-Ziv-Markov chain-Algorithm. It is the name
  15     of the compression algorithm designed by Igor Pavlov for 7-Zip.
  16     LZMA is based on LZ77 and range encoding.
  17
  18     LZMA2 is an updated version of the original LZMA to fix a couple of
  19     practical issues. In context of XZ Utils, LZMA is called LZMA1 to
  20     emphasize that LZMA is not the same thing as LZMA2. LZMA2 is the
  21     primary compression algorithm in the .xz file format.
  22
  23
  24 Q:  There are many LZMA related projects. How does XZ Utils relate to them?
  25
  26 A:  7-Zip and LZMA SDK are the original projects. LZMA SDK is roughly
  27     a subset of the 7-Zip source tree.
  28
  29     p7zip is 7-Zip's command-line tools ported to POSIX-like systems.
  30
  31     LZMA Utils provide a gzip-like lzma tool for POSIX-like systems.
  32     LZMA Utils are based on LZMA SDK. XZ Utils are the successor to
  33     LZMA Utils.
  34
  35     There are several other projects using LZMA. Most are more or less
  36     based on LZMA SDK. See <https://7-zip.org/links.html>.
  37
  38
  39 Q:  Why is liblzma named liblzma if its primary file format is .xz?
  40     Shouldn't it be e.g. libxz?
  41
  42 A:  When the designing of the .xz format began, the idea was to replace
  43     the .lzma format and use the same .lzma suffix. It would have been
  44     quite OK to reuse the suffix when there were very few .lzma files
  45     around. However, the old .lzma format became popular before the
  46     new format was finished. The new format was renamed to .xz but the
  47     name of liblzma wasn't changed.
  48
  49
  50 Q:  Do XZ Utils support the .7z format?
  51
  52 A:  No. Use 7-Zip (Windows) or p7zip (POSIX-like systems) to handle .7z
  53     files.
  54
  55
  56 Q:  I have many .tar.7z files. Can I convert them to .tar.xz without
  57     spending hours recompressing the data?
  58
  59 A:  In the "extra" directory, there is a script named 7z2lzma.bash which
  60     is able to convert some .7z files to the .lzma format (not .xz). It
  61     needs the 7za (or 7z) command from p7zip. The script may silently
  62     produce corrupt output if certain assumptions are not met, so
  63     decompress the resulting .lzma file and compare it against the
  64     original before deleting the original file!
  65
  66
  67 Q:  I have many .lzma files. Can I quickly convert them to the .xz format?
  68
  69 A:  For now, no. Since XZ Utils supports the .lzma format, it's usually
  70     not too bad to keep the old files in the old format. If you want to
  71     do the conversion anyway, you need to decompress the .lzma files and
  72     then recompress to the .xz format.
  73
  74     Technically, there is a way to make the conversion relatively fast
  75     (roughly twice the time that normal decompression takes). Writing
  76     such a tool would take quite a bit of time though, and would probably
  77     be useful to only a few people. If you really want such a conversion
  78     tool, contact Lasse Collin and offer some money.
  79
  80
  81 Q:  I have installed xz, but my tar doesn't recognize .tar.xz files.
  82     How can I extract .tar.xz files?
  83
  84 A:  xz -dc foo.tar.xz | tar xf -
  85
  86
  87 Q:  Can I recover parts of a broken .xz file (e.g. a corrupted CD-R)?
  88
  89 A:  It may be possible if the file consists of multiple blocks, which
  90     typically is not the case if the file was created in single-threaded
  91     mode. There is no recovery program yet.
  92
  93
  94 Q:  Is (some part of) XZ Utils patented?
  95
  96 A:  Lasse Collin is not aware of any patents that could affect XZ Utils.
  97     However, due to the nature of software patents, it's not possible to
  98     guarantee that XZ Utils isn't affected by any third party patent(s).
  99
 100
 101 Q:  Where can I find documentation about the file format and algorithms?
 102
 103 A:  The .xz format is documented in xz-file-format.txt. It is a container
 104     format only, and doesn't include descriptions of any non-trivial
 105     filters.
 106
 107     Documenting LZMA and LZMA2 is planned, but for now, there is no other
 108     documentation than the source code. Before you begin, you should know
 109     the basics of LZ77 and range-coding algorithms. LZMA is based on LZ77,
 110     but LZMA is a lot more complex. Range coding is used to compress
 111     the final bitstream like Huffman coding is used in Deflate.
 112
 113
 114 Q:  I cannot find BCJ and BCJ2 filters. Don't they exist in liblzma?
 115
 116 A:  BCJ filter is called "x86" in liblzma. BCJ2 is not included,
 117     because it requires using more than one encoded output stream.
 118
 119
 120 Q:  I need to use a script that runs "xz -9". On a system with 256 MiB
 121     of RAM, xz says that it cannot allocate memory. Can I make the
 122     script work without modifying it?
 123
 124 A:  Set a default memory usage limit for compression. You can do it e.g.
 125     in a shell initialization script such as ~/.bashrc or /etc/profile:
 126
 127         XZ_DEFAULTS=--memlimit-compress=150MiB
 128         export XZ_DEFAULTS
 129
 130     xz will then scale the compression settings down so that the given
 131     memory usage limit is not reached. This way xz shouldn't run out
 132     of memory.
 133
 134     Check also that memory-related resource limits are high enough.
 135     On most systems, "ulimit -a" will show the current resource limits.
 136
 137
 138 Q:  How do I create files that can be decompressed with XZ Embedded?
 139
 140 A:  See the documentation in XZ Embedded. In short, something like
 141     this is a good start:
 142
 143         xz --check=crc32 --lzma2=preset=6e,dict=64KiB
 144
 145     Or if a BCJ filter is needed too, e.g. if compressing
 146     a kernel image for PowerPC:
 147
 148         xz --check=crc32 --powerpc --lzma2=preset=6e,dict=64KiB
 149
 150     Adjust the dictionary size to get a good compromise between
 151     compression ratio and decompressor memory usage. Note that
 152     in single-call decompression mode of XZ Embedded, a big
 153     dictionary doesn't increase memory usage.
 154
 155
 156 Q:  How is multi-threaded compression implemented in XZ Utils?
 157
 158 A:  The simplest method is splitting the uncompressed data into blocks
 159     and compressing them in parallel independent from each other.
 160     This is currently the only threading method supported in XZ Utils.
 161     Since the blocks are compressed independently, they can also be
 162     decompressed independently. Together with the index feature in .xz,
 163     this allows using threads to create .xz files for random-access
 164     reading. This also makes threaded decompression possible.
 165
 166     The independent blocks method has a couple of disadvantages too. It
 167     will compress worse than a single-block method. Often the difference
 168     is not too big (maybe 1-2 %) but sometimes it can be too big. Also,
 169     the memory usage of the compressor increases linearly when adding
 170     threads.
 171
 172     At least two other threading methods are possible but these haven't
 173     been implemented in XZ Utils:
 174
 175     Match finder parallelization has been in 7-Zip for ages. It doesn't
 176     affect compression ratio or memory usage significantly. Among the
 177     three threading methods, only this is useful when compressing small
 178     files (files that are not significantly bigger than the dictionary).
 179     Unfortunately this method scales only to about two CPU cores.
 180
 181     The third method is pigz-style threading (I use that name, because
 182     pigz <https://www.zlib.net/pigz/> uses that method). It doesn't
 183     affect compression ratio significantly and scales to many cores.
 184     The memory usage scales linearly when threads are added. This isn't
 185     significant with pigz, because Deflate uses only a 32 KiB dictionary,
 186     but with LZMA2 the memory usage will increase dramatically just like
 187     with the independent-blocks method. There is also a constant
 188     computational overhead, which may make pigz-method a bit dull on
 189     dual-core compared to the parallel match finder method, but with more
 190     cores the overhead is not a big deal anymore.
 191
 192     Combining the threading methods will be possible and also useful.
 193     For example, combining match finder parallelization with pigz-style
 194     threading or independent-blocks-threading can cut the memory usage
 195     by 50 %.
 196
 197
 198 Q:  I told xz to use many threads but it is using only one or two
 199     processor cores. What is wrong?
 200
 201 A:  Since multi-threaded compression is done by splitting the data into
 202     blocks that are compressed individually, if the input file is too
 203     small for the block size, then many threads cannot be used. The
 204     default block size increases when the compression level is
 205     increased. For example, xz -6 uses 8 MiB LZMA2 dictionary and
 206     24 MiB blocks, and xz -9 uses 64 MiB LZMA dictionary and 192 MiB
 207     blocks. If the input file is 100 MiB, xz -6 can use five threads
 208     of which one will finish quickly as it has only 4 MiB to compress.
 209     However, for the same file, xz -9 can only use one thread.
 210
 211     One can adjust block size with --block-size=SIZE but making the
 212     block size smaller than LZMA2 dictionary is waste of RAM: using
 213     xz -9 with 6 MiB blocks isn't any better than using xz -6 with
 214     6 MiB blocks. The default settings use a block size bigger than
 215     the LZMA2 dictionary size because this was seen as a reasonable
 216     compromise between RAM usage and compression ratio.
 217
 218     When decompressing, the ability to use threads depends on how the
 219     file was created. If it was created in multi-threaded mode then
 220     it can be decompressed in multi-threaded mode too if there are
 221     multiple blocks in the file.
 222
 223
 224 Q:  How do I build a program that needs liblzmadec (lzmadec.h)?
 225
 226 A:  liblzmadec is part of LZMA Utils. XZ Utils has liblzma, but no
 227     liblzmadec. The code using liblzmadec should be ported to use
 228     liblzma instead. If you cannot or don't want to do that, download
 229     LZMA Utils from <https://tukaani.org/lzma/>.
 230
 231
 232 Q:  The default build of liblzma is too big. How can I make it smaller?
 233
 234 A:  Give --enable-small to the configure script. Use also appropriate
 235     --enable or --disable options to include only those filter encoders
 236     and decoders and integrity checks that you actually need. Use
 237     CFLAGS=-Os (with GCC) or equivalent to tell your compiler to optimize
 238     for size. See INSTALL for information about configure options.
 239
 240     If the result is still too big, take a look at XZ Embedded. It is
 241     a separate project, which provides a limited but significantly
 242     smaller XZ decoder implementation than XZ Utils. You can find it
 243     at <https://tukaani.org/xz/embedded.html>.
 244