1 \section{\module{codecs
} ---
2 Codec registry and base classes
}
4 \declaremodule{standard
}{codecs
}
5 \modulesynopsis{Encode and decode data and streams.
}
6 \moduleauthor{Marc-Andre Lemburg
}{mal@lemburg.com
}
7 \sectionauthor{Marc-Andre Lemburg
}{mal@lemburg.com
}
12 \indexii{Codecs
}{encode
}
13 \indexii{Codecs
}{decode
}
15 \indexii{stackable
}{streams
}
18 This module defines base classes for standard Python codecs (encoders
19 and decoders) and provides access to the internal Python codec
20 registry which manages the codec lookup process.
22 It defines the following functions:
24 \begin{funcdesc
}{register
}{search_function
}
25 Register a codec search function. Search functions are expected to
26 take one argument, the encoding name in all lower case letters, and
27 return a tuple of functions
\code{(
\var{encoder
},
\var{decoder
},
\var{stream_reader
},
28 \var{stream_writer
})
} taking the following arguments:
30 \var{encoder
} and
\var{decoder
}: These must be functions or methods
31 which have the same interface as the .encode/.decode methods of
32 Codec instances (see Codec Interface). The functions/methods are
33 expected to work in a stateless mode.
35 \var{stream_reader
} and
\var{stream_writer
}: These have to be
36 factory functions providing the following interface:
38 \code{factory(
\var{stream
},
\var{errors
}='strict')
}
40 The factory functions must return objects providing the interfaces
41 defined by the base classes
\class{StreamWriter
} and
42 \class{StreamReader
}, respectively. Stream codecs can maintain
45 Possible values for errors are
\code{'strict'
} (raise an exception
46 in case of an encoding error),
\code{'replace'
} (replace malformed
47 data with a suitable replacement marker, such as
\character{?
}) and
48 \code{'ignore'
} (ignore malformed data and continue without further
51 In case a search function cannot find a given encoding, it should
55 \begin{funcdesc
}{lookup
}{encoding
}
56 Looks up a codec tuple in the Python codec registry and returns the
57 function tuple as defined above.
59 Encodings are first looked up in the registry's cache. If not found,
60 the list of registered search functions is scanned. If no codecs tuple
61 is found, a
\exception{LookupError
} is raised. Otherwise, the codecs
62 tuple is stored in the cache and returned to the caller.
65 To simplify working with encoded files or stream, the module
66 also defines these utility functions:
68 \begin{funcdesc
}{open
}{filename, mode
\optional{, encoding
\optional{,
69 errors
\optional{, buffering
}}}}
70 Open an encoded file using the given
\var{mode
} and return
71 a wrapped version providing transparent encoding/decoding.
73 \strong{Note:
} The wrapped version will only accept the object format
74 defined by the codecs, i.e.\ Unicode objects for most built-in
75 codecs. Output is also codec-dependent and will usually be Unicode as
78 \var{encoding
} specifies the encoding which is to be used for the
81 \var{errors
} may be given to define the error handling. It defaults
82 to
\code{'strict'
} which causes a
\exception{ValueError
} to be raised
83 in case an encoding error occurs.
85 \var{buffering
} has the same meaning as for the built-in
86 \function{open()
} function. It defaults to line buffered.
89 \begin{funcdesc
}{EncodedFile
}{file, input
\optional{,
90 output
\optional{, errors
}}}
91 Return a wrapped version of file which provides transparent
94 Strings written to the wrapped file are interpreted according to the
95 given
\var{input
} encoding and then written to the original file as
96 strings using the
\var{output
} encoding. The intermediate encoding will
97 usually be Unicode but depends on the specified codecs.
99 If
\var{output
} is not given, it defaults to
\var{input
}.
101 \var{errors
} may be given to define the error handling. It defaults to
102 \code{'strict'
}, which causes
\exception{ValueError
} to be raised in case
103 an encoding error occurs.
108 ...XXX
document codec base classes...
112 The module also provides the following constants which are useful
113 for reading and writing to platform dependent files:
115 \begin{datadesc
}{BOM
}
122 These constants define the byte order marks (BOM) used in data
123 streams to indicate the byte order used in the stream or file.
124 \constant{BOM
} is either
\constant{BOM_BE
} or
\constant{BOM_LE
}
125 depending on the platform's native byte order, while the others
126 represent big endian (
\samp{_BE
} suffix) and little endian
127 (
\samp{_LE
} suffix) byte order using
32-bit and
64-bit encodings.