Updated version to 2.0.9.
[salza2.git] / doc / index.html
blob9acfe9a35de274bbead6c88738f841c8109e2681
1 <html>
2 <head>
3 <title>Salza2 - Create compressed data from Common Lisp</title>
4 <style type="text/css">
5 a, a:visited { text-decoration: none }
6 a[href]:hover { text-decoration: underline }
7 pre { background: #DDD; padding: 0.25em }
8 p.download { color: red }
9 </style>
10 </head>
12 <body>
14 <h2>Salza2 - Create compressed data from Common Lisp</h2>
16 <blockquote class='abstract'>
17 <h3>Abstract</h3>
19 <p>Salza2 is a Common Lisp library for creating compressed data in the
20 ZLIB, DEFLATE, or GZIP data formats, described in
21 <a href="http://ietf.org/rfc/rfc1950.txt">RFC 1950</a>,
22 <a href="http://ietf.org/rfc/rfc1951.txt">RFC 1951</a>, and
23 <a href="http://ietf.org/rfc/rfc1952.txt">RFC 1952</a>, respectively.
24 It does not use any external libraries for compression. It does not
25 yet support decompression. Salza2 is available under
26 a <a href="COPYING.txt">BSD-like license</a>.
28 The latest version is 2.0.9, released on July 18th, 2013.
30 <p class='download'>Download shortcut:
32 <p><a href="http://www.xach.com/lisp/salza2.tgz">http://www.xach.com/lisp/salza2.tgz</a>
34 </blockquote>
37 <h3>Contents</h3>
39 <ol>
41 <li> <a href='#sect-overview-and-limitations'>Overview and Limitations</a>
43 <li> <a href='#sect-dictionary'>Dictionary</a>
45 <ul>
46 <li> <a href='#sect-standard-compressors'>Standard Compressors</a>
48 <ul>
49 <li> <a href='#deflate-compressor'><tt>deflate-compressor</tt></a>
50 <li> <a href='#zlib-compressor'><tt>zlib-compressor</tt></a>
51 <li> <a href='#gzip-compressor'><tt>gzip-compressor</tt></a>
52 <li> <a href='#callback'><tt>callback</tt></a>
53 <li> <a href='#compress-octet'><tt>compress-octet</tt></a>
54 <li> <a href='#compress-octet-vector'><tt>compress-octet-vector</tt></a>
55 <li> <a href='#finish-compression'><tt>finish-compression</tt></a>
56 <li> <a href='#reset'><tt>reset</tt></a>
57 <li> <a href='#with-compressor'><tt>with-compressor</tt></a>
58 </ul>
60 <li> <a href='#sect-customizing-compressors'>Customizing Compressors</a>
62 <ul>
63 <li> <a href='#write-bits'><tt>write-bits</tt></a>
64 <li> <a href='#write-octet'><tt>write-octet</tt></a>
65 <li> <a href='#start-data-format'><tt>start-data-format</tt></a>
66 <li> <a href='#process-input'><tt>process-input</tt></a>
67 <li> <a href='#finish-data-format'><tt>finish-data-format</tt></a>
68 </ul>
70 <li> <a href='#sect-checksums'>Checksums</a>
72 <ul>
73 <li> <a href='#adler32-checksum'><tt>adler32-checksum</tt></a>
74 <li> <a href='#crc32-checksum'><tt>crc32-checksum</tt></a>
75 <li> <a href='#update'><tt>update</tt></a>
76 <li> <a href='#result'><tt>result</tt></a>
77 <li> <a href='#result-octets'><tt>result-octets</tt></a>
78 <li> <a href='#reset-checksum'><tt>reset</tt></a>
79 </ul>
81 <li> <a href='#sect-shortcuts'>Shortcuts</a>
83 <ul>
84 <li> <a href='#make-stream-output-callback'><tt>make-stream-output-callback</tt></a>
85 <li> <a href='#gzip-stream'><tt>gzip-stream</tt></a>
86 <li> <a href='#gzip-file'><tt>gzip-file</tt></a>
87 <li> <a href='#compress-data'><tt>compress-data</tt></a>
88 </ul>
89 </ul>
91 <li> <a href='#sect-references'>References</a>
93 <li> <a href='#sect-acknowledgements'>Acknowledgements</a>
95 <li> <a href='#sect-feedback'>Feedback</a>
97 </ol>
100 <a name='sect-overview-and-limitations'><h3>Overview and Limitations</h3></a>
102 <p>Salza2 provides an interface for creating a compressor object. This
103 object acts as a sink for octets (either individual octets or
104 vectors of octets), and is a source for octets in a compressed data
105 format. The compressed octet data is provided to a user-defined
106 callback that can write it to a stream, copy it to another vector,
107 etc.
109 <p>Salza2 has built-in compressors that support the ZLIB, DEFLATE, and
110 GZIP data formats. The classes and generic function protocol are
111 available to make it easy to support similar formats via subclassing
112 and new methods. ZLIB and GZIP are extensions to the DEFLATE format
113 and are implemented as subclasses
114 of <a href='#deflate-compressor'><tt>DEFLATE-COMPRESSOR</tt></a>
115 with a few methods implemented for the protocol.
117 <p>Salza2 is the successor
118 to <a href="http://cliki.net/Salza">Salza</a>, but it is not
119 backwards-compatible. Among other changes, Salza2 drops support for
120 compressing Lisp character data, since the compression formats are
121 octet-based and obtaining encoded octets from Lisp characters varies
122 from implementation to implementation.
124 <p>There are a number of functions that provide a simple interface to
125 specific tasks such as gzipping a file or compressing a single
126 vector.
128 <p>Salza2 does not decode compressed data. There is no support for
129 dynamically defined Huffman codes. There is currently no interface
130 for changing the tradeoff between compression speed and compressed
131 data size.
134 <a name='sect-dictionary'><h3>Dictionary</h3></a>
136 <p>The following symbols are exported from the SALZA2 package.
139 <a name='sect-standard-compressors'><h4>Standard Compressors</h4></a>
141 <p><a name='deflate-compressor'
142 ><a name='zlib-compressor'><a name='gzip-compressor'>[Classes]</a></a></a><br>
143 <b>deflate-compressor</b><br>
144 <b>zlib-compressor</b><br>
145 <b>gzip-compressor</b>
147 <blockquote>
148 Instances of these classes may be created via make-instance. The only
149 supported initarg is <tt>:CALLBACK</tt>.
150 See <a href='#callback'><tt>CALLBACK</tt></a> for the expected value.
151 </blockquote>
154 <p><a name='callback'>[Accessor]</a><br>
155 <b>callback</b> <i>compressor</i> => <i>callback</i><br>
156 (<tt>setf</tt> (<b>callback</b> <i>compressor</i>) <i>new-value</i>)
157 => <i>new-value</i>
159 <blockquote>
160 Gets or sets the callback function of <i>compressor</i>. The callback
161 should be a function of two arguments, an octet vector and an end
162 index, and it should process all octets from the start of the vector
163 below the end index as the compressed output data stream of the
164 compressor. See <a href='#make-stream-output-callback'><tt>MAKE-STREAM-OUTPUT-CALLBACK</tt></a>
165 for an example callback.
167 </blockquote>
169 <p><a name='compress-octet'>[Function]</a><br>
170 <b>compress-octet</b> <i>octet</i> <i>compressor</i> => |
172 <blockquote>
173 Adds <i>octet</i> to <i>compressor</i> to be compressed.
174 </blockquote>
177 <p><a name='compress-octet-vector'>[Function]</a><br>
178 <b>compress-octet-vector</b> <i>vector</i> <i>compressor</i> <tt>&key</tt>
179 <i>start</i> <i>end</i> => |
181 <blockquote>
182 Adds the octets from <i>vector</i> to <i>compressor</i> to be
183 compressed, beginning with the octet at <i>start</i> and ending at the
184 octet at
185 <i>end</i> - 1. If <i>start</i> is not specified, it defaults to
186 0. If <i>end</i> is not specified, it defaults to the total length
187 of <i>vector</i>. Equivalent to (but much more efficient than) the
188 following:
190 <pre>
191 (loop for i from start below end
192 do (compress-octet (aref vector i) compressor))
193 </pre>
195 </blockquote>
198 <p><a name='finish-compression'>[Generic function]</a><br>
199 <b>finish-compression</b> <i>compressor</i> => |
201 <blockquote>Compresses any pending data, concludes the data format
202 for <i>compressor</i> with
203 <a href='#finish-data-format'><tt>FINISH-DATA-FORMAT</tt></a>, and
204 invokes the user callback for the final octets of the compressed data
205 format. This function must be called at the end of compression to
206 ensure the validity of the data format; it is called implicitly
207 by <a href='#with-compressor'><tt>WITH-COMPRESSOR</tt></a>.
209 </blockquote>
212 <p><a name='reset'>[Generic function]</a><br>
213 <b>reset</b> <i>compressor</i> => |
215 <blockquote>
216 The default method
217 for <a href='#deflate-compressor'><tt>DEFLATE-COMPRESSOR</tt></a>
218 objects resets the internal state of <i>compressor</i> and
219 calls <a href='#start-data-format'><tt>START-DATA-FORMAT</tt></a>. This
220 allows the re-use of a single compressor object for multiple
221 compression tasks.
222 </blockquote>
225 <p><a name='with-compressor'>[Macro]<br>
226 <b>with-compressor</b> (<i>var</i> <i>class</i>
227 <tt>&amp;rest</tt> <i>initargs</i>
228 <tt>&amp;key</tt> <tt>&allow-other-keys</tt>)
229 <tt>&amp;body</tt> <i>body</i> => |
231 <blockquote>
232 Evaluates <i>body</i> with <i>var</i> bound to a new compressor
233 created as
234 with <tt>(apply&nbsp;#'make-instance&nbsp;class&nbsp;initargs)</tt>.
235 <a href='#finish-compression'><tt>FINISH-COMPRESSION</tt></a>
236 is implicitly called on the compressor at the end of evaluation.
237 </blockquote>
240 <a name='sect-customizing-compressors'><h4>Customizing Compressors</h4></a>
242 <p>Compressor objects follow a protocol that makes it easy to create
243 specialized data formats. The ZLIB data format is essentially the
244 same as the DEFLATE format with an additional header and a trailing
245 checksum; this is implemented by creating a new class and adding a
246 few new methods to the generic functions below.
248 <p>For example, consider a new compressed data format FOO that
249 encapsulates a DEFLATE data stream but adds four signature octets,
250 F0 0D 00 D1, to the start of the output data stream, and adds a
251 trailing 32-bit length value, MSB first, after the end. It could be
252 implemented like this:
254 <pre>
255 (defclass foo-compressor (deflate-compressor)
256 ((data-length
257 :initarg :data-length
258 :accessor data-length))
259 (:default-initargs
260 :data-length 0))
262 (defmethod <a href='#start-data-format'>start-data-format</a> :before ((compressor foo-compressor))
263 (<a href='#write-octet'>write-octet</a> #xF0 compressor)
264 (write-octet #x0D compressor)
265 (write-octet #x00 compressor)
266 (write-octet #xD1 compressor))
268 (defmethod <a href='#process-input'>process-input</a> :after ((compressor foo-compressor) input start count)
269 (declare (ignore input start))
270 (incf (data-length compressor) count))
272 (defmethod <a href='#finish-data-format'>finish-data-format</a> :after ((compressor foo-compressor))
273 (let ((length (data-length compressor)))
274 (write-octet (ldb (byte 8 24) length) compressor)
275 (write-octet (ldb (byte 8 16) length) compressor)
276 (write-octet (ldb (byte 8 8) length) compressor)
277 (write-octet (ldb (byte 8 0) length) compressor)))
279 (defmethod <a href='#reset'>reset</a> :after ((compressor foo-compressor))
280 (setf (data-length compressor) 0))
281 </pre>
284 <p><a name='write-bits'>[Function]</a><br>
285 <b>write-bits</b> <i>code</i> <i>size</i> <i>compressor</i> => |
287 <blockquote>
288 Writes <i>size</i> low bits of the integer <i>code</i> to the output
289 buffer of <i>compressor</i>. Follows the bit packing layout described
290 in <a href="http://ietf.org/rfc/rfc1951.txt">RFC 1951</a>. The bits
291 are not compressed, but become literal parts of the output stream.
292 </blockquote>
295 <p><a name='write-octet'>[Function]</a><br>
296 <b>write-octet</b> <i>octet</i> <i>compressor</i> => |
298 <blockquote>
299 Writes <i>octet</i> to the output buffer of <i>compressor</i>. Bits of the
300 octet are <i>not</i> packed; the octet is added to the output buffer
301 at the next octet boundary. The octet is not compressed, but becomes a
302 literal part of the output stream.
303 </blockquote>
306 <p><a name='start-data-format'>[Generic function]</a><br>
307 <b>start-data-format</b> <i>compressor</i> => |
309 <blockquote>
310 Outputs any prologue bits or octets needed to produce a valid
311 compressed data stream for <i>compressor</i>. Called from
312 initialize-instance and <a href='#reset'><tt>RESET</tt></a> for
313 subclasses of deflate-compressor. Should not be called directly, but
314 subclasses may add methods to customize what literal data is added to
315 the beginning of the output buffer.
316 </blockquote>
319 <p><a name='process-input'>[Generic function]</a><br>
320 <b>process-input</b> <i>compressor</i> <i>input</i>
321 <i>start</i> <i>count</i> => |
323 <blockquote>
324 Called when <i>count</i> octets of the octet vector <i>input</i>,
325 starting from <i>start</i>, are about to be compressed. This generic
326 function should not be called directly, but may be specialized.
328 <p>This is useful for data formats that must maintain information about
329 the uncompressed contents of a compressed data stream, such as
330 checksums or total data length.
331 </blockquote>
334 <p><a name='finish-data-format'>[Generic function]</a><br>
335 <b>finish-data-format</b> <i>compressor</i> => |
337 <blockquote>
338 Called
339 by <a href='#finish-compression'><tt>FINISH-COMPRESSION</tt></a>. Outputs
340 any epilogue bits or octets needed to produce a valid compressed data
341 stream for compressor. This generic function should not be called
342 directly, but may be specialized.
343 </blockquote>
346 <a name='sect-checksums'><h4>Checksums</h4></a>
348 <p>Checksums are used in several data formats to check data
349 integrity. For example, PNG uses a CRC32 checksum for its chunks of
350 data. Salza2 exports support for two common checksums.
352 <p><a name='adler32-checksum'><a name='crc32-checksum'>[Standard classes]</a></a><br>
353 <b>adler32-checksum</b><br>
354 <b>crc32-checksum</b>
356 <blockquote>
357 Instances of these classes may be created directly with
358 make-instance.
359 </blockquote>
361 <p><a name='update'>[Generic function]</a><br>
362 <b>update</b> <i>checksum</i> <i>buffer</i> <i>start</i> <i>count</i>
363 => |
365 <blockquote>
366 Updates <i>checksum</i> with <i>count</i> octets from the octet
367 vector <i>buffer</i>, starting at <i>start</i>.
368 </blockquote>
371 <p><a name='result'>[Generic function]</a><br>
372 <b>result</b> <i>checksum</i> => <i>result</i>
374 <blockquote>
375 Returns the accumulated value of <i>checksum</i> as an integer.
376 </blockquote>
379 <p><a name='result-octets'>[Generic function]</a><br>
380 <b>result-octets</b> <i>checksum</i> => <i>result-list</i>
382 <blockquote>
383 Returns the individual octets of <i>checksum</i> as a list of octets,
384 in MSB order.
385 </blockquote>
387 <p><a name='reset-checksum'>[Generic function]<br>
388 <b>reset</b> <i>checksum</i> => |
390 <blockquote>
391 The default method for checksum objects resets the internal state
392 of <i>checksum</i> so it may be re-used.
393 </blockquote>
396 <a name='sect-shortcuts'><h4>Shortcuts</h4></a>
398 <p>Some shortcuts for common compression tasks are available.
400 <p><a name='make-stream-output-callback'>[Function]</a><br>
401 <b>make-stream-output-callback</b> <i>stream</i> => <i>callback</i>>
403 <blockquote>
404 Creates and returns a callback function that writes all compressed
405 data to <i>stream</i>. It is defined like this:
407 <pre>
408 (defun make-stream-output-callback (stream)
409 (lambda (buffer end)
410 (write-sequence buffer stream :end end)))
411 </pre>
412 </blockquote>
414 <p><a name='gzip-stream'>[Function]</a><br>
415 <b>gzip-stream</b> <i>input-stream</i> <i>output-stream</i> => |
417 <blockquote>
418 Compresses all data read from <i>input-stream</i> and writes the
419 compressed data to <i>output-stream</i>.
420 </blockquote>
423 <p><a name='gzip-file'>[Function]</a><br>
424 <b>gzip-file</b> <i>input-file</i> <i>output-file</i> => <i>pathname</i>
426 <blockquote>
427 Compresses <i>input-file</i> and writes the compressed data
428 to <i>output-file</i>.
429 </blockquote>
432 <p><a name='compress-data'>[Function]</a><br>
433 <b>compress-data</b> <i>data</i> <i>compressor-designator</i>
434 <tt>&amp;rest</tt> <i>initargs</i> => <i>compressed-data</i>
436 <blockquote>
437 Compresses the octet vector <i>data</i> and returns the compressed
438 data as an octet vector. <i>compressor-designator</i> should be either
439 a compressor object, designating itself, or a symbol, designating a
440 compressor created as with <tt>(apply #'make-instance
441 compressor-designator initargs)</tt>.
443 <p>For example:
445 <pre>
446 * <b>(compress-data (sb-ext:string-to-octets "Hello, hello, hello, hello world.")
447 'zlib-compressor)</b>
448 #(8 153 243 72 205 201 201 215 81 200 192 164 20 202 243 139 114 82 244 0 194 64 11 139)
449 </pre>
450 </blockquote>
453 <a name='sect-references'><h3>References</h3></a>
455 <ul>
457 <li> Deutsch and
458 Gailly, <a href='http://ietf.org/rfc/rfc1950.txt'>ZLIB Compressed Data
459 Format Specification version 3.3 (RFC 1950)</a>
461 <li> Deutsch, <a href='http://ietf.org/rfc/rfc1951.txt'>DEFLATE
462 Compressed Data Format Specification version 1.3 (RFC 1951)</a>
464 <li> Deutsch, <a href='http://ietf.org/rfc/rfc1952.txt'>GZIP file
465 format specification version 4.3 (RFC 1952)</a>
467 <li>
468 Wikipedia, <a href='http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm'>Rabin-Karp
469 string search algorithm</a>
471 </ul>
474 <a name='sect-acknowledgements'><h3>Acknowledgements</h3></a>
476 <p>Thanks to Paul Khuong for his help optimizing the modulo-8191
477 hashing.
479 <p>Thanks to Austin Haas for providing some test SWF files
480 demonstrating a data format bug.
482 <a name='sect-feedback'><h3>Feedback</h3></a>
484 <p>Please direct any comments, questions, bug reports, or other
485 feedback to <a href='mailto:xach@xach.com'>Zach Beane</a>.