3 <title>Salza2 - Create compressed data from Common Lisp
</title>
4 <style type=
"text/css">
5 a
, a:visited
{ text-decoration: none
}
6 a
[href
]:hover
{ text-decoration: underline
}
7 pre
{ background: #DDD; padding: 0.25em }
8 p
.download
{ color: red
}
14 <h2>Salza2 - Create compressed data from Common Lisp
</h2>
16 <blockquote class='abstract'
>
19 <p>Salza2 is a Common Lisp library for creating compressed data in the
20 ZLIB, DEFLATE, or GZIP data formats, described in
21 <a href=
"http://ietf.org/rfc/rfc1950.txt">RFC
1950</a>,
22 <a href=
"http://ietf.org/rfc/rfc1951.txt">RFC
1951</a>, and
23 <a href=
"http://ietf.org/rfc/rfc1952.txt">RFC
1952</a>, respectively.
24 It does not use any external libraries for compression. It does not
25 yet support decompression. Salza2 is available under
26 a
<a href=
"COPYING.txt">BSD-like license
</a>.
28 The latest version is
2.1, released on October
19th,
2021.
30 <p class='download'
>Download shortcut:
32 <p><a href=
"http://www.xach.com/lisp/salza2.tgz">http://www.xach.com/lisp/salza2.tgz
</a>
41 <li> <a href='#sect-overview-and-limitations'
>Overview and Limitations
</a>
43 <li> <a href='#sect-dictionary'
>Dictionary
</a>
46 <li> <a href='#sect-standard-compressors'
>Standard Compressors
</a>
49 <li> <a href='#deflate-compressor'
><tt>deflate-compressor
</tt></a>
50 <li> <a href='#zlib-compressor'
><tt>zlib-compressor
</tt></a>
51 <li> <a href='#gzip-compressor'
><tt>gzip-compressor
</tt></a>
52 <li> <a href='#callback'
><tt>callback
</tt></a>
53 <li> <a href='#compress-octet'
><tt>compress-octet
</tt></a>
54 <li> <a href='#compress-octet-vector'
><tt>compress-octet-vector
</tt></a>
55 <li> <a href='#finish-compression'
><tt>finish-compression
</tt></a>
56 <li> <a href='#reset'
><tt>reset
</tt></a>
57 <li> <a href='#with-compressor'
><tt>with-compressor
</tt></a>
60 <li> <a href='#sect-customizing-compressors'
>Customizing Compressors
</a>
63 <li> <a href='#write-bits'
><tt>write-bits
</tt></a>
64 <li> <a href='#write-octet'
><tt>write-octet
</tt></a>
65 <li> <a href='#start-data-format'
><tt>start-data-format
</tt></a>
66 <li> <a href='#process-input'
><tt>process-input
</tt></a>
67 <li> <a href='#finish-data-format'
><tt>finish-data-format
</tt></a>
70 <li> <a href='#sect-checksums'
>Checksums
</a>
73 <li> <a href='#adler32-checksum'
><tt>adler32-checksum
</tt></a>
74 <li> <a href='#crc32-checksum'
><tt>crc32-checksum
</tt></a>
75 <li> <a href='#update'
><tt>update
</tt></a>
76 <li> <a href='#result'
><tt>result
</tt></a>
77 <li> <a href='#result-octets'
><tt>result-octets
</tt></a>
78 <li> <a href='#reset-checksum'
><tt>reset
</tt></a>
81 <li> <a href='#sect-shortcuts'
>Shortcuts
</a>
84 <li> <a href='#make-stream-output-callback'
><tt>make-stream-output-callback
</tt></a>
85 <li> <a href='#gzip-stream'
><tt>gzip-stream
</tt></a>
86 <li> <a href='#gzip-file'
><tt>gzip-file
</tt></a>
87 <li> <a href='#compress-data'
><tt>compress-data
</tt></a>
90 <li> <a href='#sect-gray-streams'
>Gray Streams
</a>
93 <li> <a href='#make-compressing-stream'
><tt>make-compressing-stream
</tt></a>
94 <li> <a href='#stream-closed-error'
><tt>stream-closed-error
</tt></a>
99 <li> <a href='#sect-references'
>References
</a>
101 <li> <a href='#sect-acknowledgements'
>Acknowledgements
</a>
103 <li> <a href='#sect-feedback'
>Feedback
</a>
108 <a name='sect-overview-and-limitations'
><h3>Overview and Limitations
</h3></a>
110 <p>Salza2 provides an interface for creating a compressor object. This
111 object acts as a sink for octets (either individual octets or
112 vectors of octets), and is a source for octets in a compressed data
113 format. The compressed octet data is provided to a user-defined
114 callback that can write it to a stream, copy it to another vector,
117 <p>Salza2 has built-in compressors that support the ZLIB, DEFLATE, and
118 GZIP data formats. The classes and generic function protocol are
119 available to make it easy to support similar formats via subclassing
120 and new methods. ZLIB and GZIP are extensions to the DEFLATE format
121 and are implemented as subclasses
122 of
<a href='#deflate-compressor'
><tt>DEFLATE-COMPRESSOR
</tt></a>
123 with a few methods implemented for the protocol.
125 <p>Salza2 is the successor
126 to
<a href=
"http://cliki.net/Salza">Salza
</a>, but it is not
127 backwards-compatible. Among other changes, Salza2 drops support for
128 compressing Lisp character data, since the compression formats are
129 octet-based and obtaining encoded octets from Lisp characters varies
130 from implementation to implementation.
132 <p>There are a number of functions that provide a simple interface to
133 specific tasks such as gzipping a file or compressing a single
136 <p>Salza2 does not decode compressed data. There is no support for
137 dynamically defined Huffman codes. There is currently no interface
138 for changing the tradeoff between compression speed and compressed
142 <a name='sect-dictionary'
><h3>Dictionary
</h3></a>
144 <p>The following symbols are exported from the SALZA2 package.
147 <a name='sect-standard-compressors'
><h4>Standard Compressors
</h4></a>
149 <p><a name='deflate-compressor'
150 ><a name='zlib-compressor'
><a name='gzip-compressor'
>[Classes]
</a></a></a><br>
151 <b>deflate-compressor
</b><br>
152 <b>zlib-compressor
</b><br>
153 <b>gzip-compressor
</b>
156 Instances of these classes may be created via make-instance. The only
157 supported initarg is
<tt>:CALLBACK
</tt>.
158 See
<a href='#callback'
><tt>CALLBACK
</tt></a> for the expected value.
162 <p><a name='callback'
>[Accessor]
</a><br>
163 <b>callback
</b> <i>compressor
</i> =
> <i>callback
</i><br>
164 (
<tt>setf
</tt> (
<b>callback
</b> <i>compressor
</i>)
<i>new-value
</i>)
168 Gets or sets the callback function of
<i>compressor
</i>. The callback
169 should be a function of two arguments, an octet vector and an end
170 index, and it should process all octets from the start of the vector
171 below the end index as the compressed output data stream of the
172 compressor. See
<a href='#make-stream-output-callback'
><tt>MAKE-STREAM-OUTPUT-CALLBACK
</tt></a>
173 for an example callback.
177 <p><a name='compress-octet'
>[Function]
</a><br>
178 <b>compress-octet
</b> <i>octet
</i> <i>compressor
</i> =
> |
181 Adds
<i>octet
</i> to
<i>compressor
</i> to be compressed.
185 <p><a name='compress-octet-vector'
>[Function]
</a><br>
186 <b>compress-octet-vector
</b> <i>vector
</i> <i>compressor
</i> <tt>&key
</tt>
187 <i>start
</i> <i>end
</i> =
> |
190 Adds the octets from
<i>vector
</i> to
<i>compressor
</i> to be
191 compressed, beginning with the octet at
<i>start
</i> and ending at the
193 <i>end
</i> -
1. If
<i>start
</i> is not specified, it defaults to
194 0. If
<i>end
</i> is not specified, it defaults to the total length
195 of
<i>vector
</i>. Equivalent to (but much more efficient than) the
199 (loop for i from start below end
200 do (compress-octet (aref vector i) compressor))
206 <p><a name='finish-compression'
>[Generic function]
</a><br>
207 <b>finish-compression
</b> <i>compressor
</i> =
> |
209 <blockquote>Compresses any pending data, concludes the data format
210 for
<i>compressor
</i> with
211 <a href='#finish-data-format'
><tt>FINISH-DATA-FORMAT
</tt></a>, and
212 invokes the user callback for the final octets of the compressed data
213 format. This function must be called at the end of compression to
214 ensure the validity of the data format; it is called implicitly
215 by
<a href='#with-compressor'
><tt>WITH-COMPRESSOR
</tt></a>.
220 <p><a name='reset'
>[Generic function]
</a><br>
221 <b>reset
</b> <i>compressor
</i> =
> |
225 for
<a href='#deflate-compressor'
><tt>DEFLATE-COMPRESSOR
</tt></a>
226 objects resets the internal state of
<i>compressor
</i> and
227 calls
<a href='#start-data-format'
><tt>START-DATA-FORMAT
</tt></a>. This
228 allows the re-use of a single compressor object for multiple
233 <p><a name='with-compressor'
>[Macro]
<br>
234 <b>with-compressor
</b> (
<i>var
</i> <i>class
</i>
235 <tt>&rest
</tt> <i>initargs
</i>
236 <tt>&key
</tt> <tt>&allow-other-keys
</tt>)
237 <tt>&body
</tt> <i>body
</i> =
> |
240 Evaluates
<i>body
</i> with
<i>var
</i> bound to a new compressor
242 with
<tt>(apply
#'make-instance
class
initargs)
</tt>.
243 <a href='#finish-compression'
><tt>FINISH-COMPRESSION
</tt></a>
244 is implicitly called on the compressor at the end of evaluation.
248 <a name='sect-customizing-compressors'
><h4>Customizing Compressors
</h4></a>
250 <p>Compressor objects follow a protocol that makes it easy to create
251 specialized data formats. The ZLIB data format is essentially the
252 same as the DEFLATE format with an additional header and a trailing
253 checksum; this is implemented by creating a new class and adding a
254 few new methods to the generic functions below.
256 <p>For example, consider a new compressed data format FOO that
257 encapsulates a DEFLATE data stream but adds four signature octets,
258 F0
0D
00 D1, to the start of the output data stream, and adds a
259 trailing
32-bit length value, MSB first, after the end. It could be
260 implemented like this:
263 (defclass foo-compressor (deflate-compressor)
265 :initarg :data-length
266 :accessor data-length))
270 (defmethod
<a href='#start-data-format'
>start-data-format
</a> :before ((compressor foo-compressor))
271 (
<a href='#write-octet'
>write-octet
</a> #xF0 compressor)
272 (write-octet #x0D compressor)
273 (write-octet #x00 compressor)
274 (write-octet #xD1 compressor))
276 (defmethod
<a href='#process-input'
>process-input
</a> :after ((compressor foo-compressor) input start count)
277 (declare (ignore input start))
278 (incf (data-length compressor) count))
280 (defmethod
<a href='#finish-data-format'
>finish-data-format
</a> :after ((compressor foo-compressor))
281 (let ((length (data-length compressor)))
282 (write-octet (ldb (byte
8 24) length) compressor)
283 (write-octet (ldb (byte
8 16) length) compressor)
284 (write-octet (ldb (byte
8 8) length) compressor)
285 (write-octet (ldb (byte
8 0) length) compressor)))
287 (defmethod
<a href='#reset'
>reset
</a> :after ((compressor foo-compressor))
288 (setf (data-length compressor)
0))
292 <p><a name='write-bits'
>[Function]
</a><br>
293 <b>write-bits
</b> <i>code
</i> <i>size
</i> <i>compressor
</i> =
> |
296 Writes
<i>size
</i> low bits of the integer
<i>code
</i> to the output
297 buffer of
<i>compressor
</i>. Follows the bit packing layout described
298 in
<a href=
"http://ietf.org/rfc/rfc1951.txt">RFC
1951</a>. The bits
299 are not compressed, but become literal parts of the output stream.
303 <p><a name='write-octet'
>[Function]
</a><br>
304 <b>write-octet
</b> <i>octet
</i> <i>compressor
</i> =
> |
307 Writes
<i>octet
</i> to the output buffer of
<i>compressor
</i>. Bits of the
308 octet are
<i>not
</i> packed; the octet is added to the output buffer
309 at the next octet boundary. The octet is not compressed, but becomes a
310 literal part of the output stream.
314 <p><a name='start-data-format'
>[Generic function]
</a><br>
315 <b>start-data-format
</b> <i>compressor
</i> =
> |
318 Outputs any prologue bits or octets needed to produce a valid
319 compressed data stream for
<i>compressor
</i>. Called from
320 initialize-instance and
<a href='#reset'
><tt>RESET
</tt></a> for
321 subclasses of deflate-compressor. Should not be called directly, but
322 subclasses may add methods to customize what literal data is added to
323 the beginning of the output buffer.
327 <p><a name='process-input'
>[Generic function]
</a><br>
328 <b>process-input
</b> <i>compressor
</i> <i>input
</i>
329 <i>start
</i> <i>count
</i> =
> |
332 Called when
<i>count
</i> octets of the octet vector
<i>input
</i>,
333 starting from
<i>start
</i>, are about to be compressed. This generic
334 function should not be called directly, but may be specialized.
336 <p>This is useful for data formats that must maintain information about
337 the uncompressed contents of a compressed data stream, such as
338 checksums or total data length.
342 <p><a name='finish-data-format'
>[Generic function]
</a><br>
343 <b>finish-data-format
</b> <i>compressor
</i> =
> |
347 by
<a href='#finish-compression'
><tt>FINISH-COMPRESSION
</tt></a>. Outputs
348 any epilogue bits or octets needed to produce a valid compressed data
349 stream for compressor. This generic function should not be called
350 directly, but may be specialized.
354 <a name='sect-checksums'
><h4>Checksums
</h4></a>
356 <p>Checksums are used in several data formats to check data
357 integrity. For example, PNG uses a CRC32 checksum for its chunks of
358 data. Salza2 exports support for two common checksums.
360 <p><a name='adler32-checksum'
><a name='crc32-checksum'
>[Standard classes]
</a></a><br>
361 <b>adler32-checksum
</b><br>
362 <b>crc32-checksum
</b>
365 Instances of these classes may be created directly with
369 <p><a name='update'
>[Generic function]
</a><br>
370 <b>update
</b> <i>checksum
</i> <i>buffer
</i> <i>start
</i> <i>count
</i>
374 Updates
<i>checksum
</i> with
<i>count
</i> octets from the octet
375 vector
<i>buffer
</i>, starting at
<i>start
</i>.
379 <p><a name='result'
>[Generic function]
</a><br>
380 <b>result
</b> <i>checksum
</i> =
> <i>result
</i>
383 Returns the accumulated value of
<i>checksum
</i> as an integer.
387 <p><a name='result-octets'
>[Generic function]
</a><br>
388 <b>result-octets
</b> <i>checksum
</i> =
> <i>result-list
</i>
391 Returns the individual octets of
<i>checksum
</i> as a list of octets,
395 <p><a name='reset-checksum'
>[Generic function]
<br>
396 <b>reset
</b> <i>checksum
</i> =
> |
399 The default method for checksum objects resets the internal state
400 of
<i>checksum
</i> so it may be re-used.
404 <a name='sect-shortcuts'
><h4>Shortcuts
</h4></a>
406 <p>Some shortcuts for common compression tasks are available.
408 <p><a name='make-stream-output-callback'
>[Function]
</a><br>
409 <b>make-stream-output-callback
</b> <i>stream
</i> =
> <i>callback
</i>>
412 Creates and returns a callback function that writes all compressed
413 data to
<i>stream
</i>. It is defined like this:
416 (defun make-stream-output-callback (stream)
418 (write-sequence buffer stream :end end)))
422 <p><a name='gzip-stream'
>[Function]
</a><br>
423 <b>gzip-stream
</b> <i>input-stream
</i> <i>output-stream
</i> =
> |
426 Compresses all data read from
<i>input-stream
</i> and writes the
427 compressed data to
<i>output-stream
</i>.
431 <p><a name='gzip-file'
>[Function]
</a><br>
432 <b>gzip-file
</b> <i>input-file
</i> <i>output-file
</i> =
> <i>pathname
</i>
435 Compresses
<i>input-file
</i> and writes the compressed data
436 to
<i>output-file
</i>.
440 <p><a name='compress-data'
>[Function]
</a><br>
441 <b>compress-data
</b> <i>data
</i> <i>compressor-designator
</i>
442 <tt>&rest
</tt> <i>initargs
</i> =
> <i>compressed-data
</i>
445 Compresses the octet vector
<i>data
</i> and returns the compressed
446 data as an octet vector.
<i>compressor-designator
</i> should be either
447 a compressor object, designating itself, or a symbol, designating a
448 compressor created as with
<tt>(apply #'make-instance
449 compressor-designator initargs)
</tt>.
454 *
<b>(compress-data (sb-ext:string-to-octets
"Hello, hello, hello, hello world.")
455 'zlib-compressor)
</b>
456 #(
8 153 243 72 205 201 201 215 81 200 192 164 20 202 243 139 114 82 244 0 194 64 11 139)
460 <a name='sect-gray-streams'
><h4>Gray Streams
</h4></a>
462 <p> Salza2 includes support for creating a Gray stream that wraps another
463 stream and transparently compresses the data written to it.
465 <p><a name='make-compressing-stream'
>[Function]
</a><br>
466 <b>make-compressing-stream
</b> <i>compressor-type
</i> <i>stream
</i>
467 =
> <i>compressing-stream
</i>
470 Return a
<i>compressing-stream
</i> that transparently compresses its input
471 and writes it to
<i>stream
</i>.
<i>compressor-type
</i> is a symbol naming the
472 compressor class to use.
474 <p>Closing the returned
<i>compressing-stream
</i> merely finalizes the compression
475 and does not close
<i>stream
</i>.
478 <p><a name='stream-closed-error'
>[Condition]
</a><br>
479 <b>stream-closed-error
</b> <i>stream-error
</i>
482 Signaled when attempting to write to a closed
<i>compressing-stream
</i>.
485 <a name='sect-references'
><h3>References
</h3></a>
490 Gailly,
<a href='http://ietf.org/rfc/rfc1950.txt'
>ZLIB Compressed Data
491 Format Specification version
3.3 (RFC
1950)
</a>
493 <li> Deutsch,
<a href='http://ietf.org/rfc/rfc1951.txt'
>DEFLATE
494 Compressed Data Format Specification version
1.3 (RFC
1951)
</a>
496 <li> Deutsch,
<a href='http://ietf.org/rfc/rfc1952.txt'
>GZIP file
497 format specification version
4.3 (RFC
1952)
</a>
500 Wikipedia,
<a href='http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm'
>Rabin-Karp
501 string search algorithm
</a>
506 <a name='sect-acknowledgements'
><h3>Acknowledgements
</h3></a>
508 <p>Thanks to Paul Khuong for his help optimizing the modulo-
8191
511 <p>Thanks to Austin Haas for providing some test SWF files
512 demonstrating a data format bug.
514 <a name='sect-feedback'
><h3>Feedback
</h3></a>
516 <p>Please direct any comments, questions, bug reports, or other
517 feedback to
<a href='mailto:xach@xach.com'
>Zach Beane
</a>.