1 \section{\module{csv
} --- CSV File Reading and Writing
}
3 \declaremodule{standard
}{csv
}
4 \modulesynopsis{Write and read tabular data to and from delimited files.
}
5 \sectionauthor{Skip Montanaro
}{skip@pobox.com
}
9 \indexii{data
}{tabular
}
11 The so-called CSV (Comma Separated Values) format is the most common import
12 and export format for spreadsheets and databases. There is no ``CSV
13 standard'', so the format is operationally defined by the many applications
14 which read and write it. The lack of a standard means that subtle
15 differences often exist in the data produced and consumed by different
16 applications. These differences can make it annoying to process CSV files
17 from multiple sources. Still, while the delimiters and quoting characters
18 vary, the overall format is similar enough that it is possible to write a
19 single module which can efficiently manipulate such data, hiding the details
20 of reading and writing the data from the programmer.
22 The
\module{csv
} module implements classes to read and write tabular data in
23 CSV format. It allows programmers to say, ``write this data in the format
24 preferred by Excel,'' or ``read data from this file which was generated by
25 Excel,'' without knowing the precise details of the CSV format used by
26 Excel. Programmers can also describe the CSV formats understood by other
27 applications or define their own special-purpose CSV formats.
29 The
\module{csv
} module's
\class{reader
} and
\class{writer
} objects read and
30 write sequences. Programmers can also read and write data in dictionary
31 form using the
\class{DictReader
} and
\class{DictWriter
} classes.
34 This version of the
\module{csv
} module doesn't support Unicode
35 input. Also, there are currently some issues regarding
\ASCII{} NUL
36 characters. Accordingly, all input should generally be printable
37 \ASCII{} to be safe. These restrictions will be removed in the future.
41 % \seemodule{array}{Arrays of uniformly types numeric values.}
42 \seepep{305}{CSV File API
}
43 {The Python Enhancement Proposal which proposed this addition
48 \subsection{Module Contents
\label{csv-contents
}}
50 The
\module{csv
} module defines the following functions:
52 \begin{funcdesc
}{reader
}{csvfile
\optional{,
53 dialect=
\code{'excel'
}\optional{, fmtparam
}}}
54 Return a reader object which will iterate over lines in the given
55 {}\var{csvfile
}.
\var{csvfile
} can be any object which supports the
56 iterator protocol and returns a string each time its
\method{next
}
57 method is called. If
\var{csvfile
} is a file object, it must be opened with
58 the 'b' flag on platforms where that makes a difference. An optional
59 {}\var{dialect
} parameter can be given
60 which is used to define a set of parameters specific to a particular CSV
61 dialect. It may be an instance of a subclass of the
\class{Dialect
}
62 class or one of the strings returned by the
\function{list_dialects
}
63 function. The other optional
{}\var{fmtparam
} keyword arguments can be
64 given to override individual formatting parameters in the current
65 dialect. For more information about the dialect and formatting
66 parameters, see section~
\ref{csv-fmt-params
}, ``Dialects and Formatting
67 Parameters'' for details of these parameters.
69 All data read are returned as strings. No automatic data type
70 conversion is performed.
73 \begin{funcdesc
}{writer
}{csvfile
\optional{,
74 dialect=
\code{'excel'
}\optional{, fmtparam
}}}
75 Return a writer object responsible for converting the user's data into
76 delimited strings on the given file-like object.
\var{csvfile
} can be any
77 object with a
\function{write
} method. If
\var{csvfile
} is a file object,
78 it must be opened with the 'b' flag on platforms where that makes a
79 difference. An optional
80 {}\var{dialect
} parameter can be given which is used to define a set of
81 parameters specific to a particular CSV dialect. It may be an instance
82 of a subclass of the
\class{Dialect
} class or one of the strings
83 returned by the
\function{list_dialects
} function. The other optional
84 {}\var{fmtparam
} keyword arguments can be given to override individual
85 formatting parameters in the current dialect. For more information
86 about the dialect and formatting parameters, see
87 section~
\ref{csv-fmt-params
}, ``Dialects and Formatting Parameters'' for
88 details of these parameters. To make it as easy as possible to
89 interface with modules which implement the DB API, the value
90 \constant{None
} is written as the empty string. While this isn't a
91 reversible transformation, it makes it easier to dump SQL NULL data values
92 to CSV files without preprocessing the data returned from a
93 \code{cursor.fetch*()
} call. All other non-string data are stringified
94 with
\function{str()
} before being written.
97 \begin{funcdesc
}{register_dialect
}{name, dialect
}
98 Associate
\var{dialect
} with
\var{name
}.
\var{dialect
} must be a subclass
99 of
\class{csv.Dialect
}.
\var{name
} must be a string or Unicode object.
102 \begin{funcdesc
}{unregister_dialect
}{name
}
103 Delete the dialect associated with
\var{name
} from the dialect registry. An
104 \exception{Error
} is raised if
\var{name
} is not a registered dialect
108 \begin{funcdesc
}{get_dialect
}{name
}
109 Return the dialect associated with
\var{name
}. An
\exception{Error
} is
110 raised if
\var{name
} is not a registered dialect name.
113 \begin{funcdesc
}{list_dialects
}{}
114 Return the names of all registered dialects.
118 The
\module{csv
} module defines the following classes:
120 \begin{classdesc
}{DictReader
}{csvfile, fieldnames
\optional{,
121 restkey=
\constant{None
}\optional{,
122 restval=
\constant{None
}\optional{,
123 dialect=
\code{'excel'
}\optional{,
125 Create an object which operates like a regular reader but maps the
126 information read into a dict whose keys are given by the
\var{fieldnames
}
127 parameter. If the row read has fewer fields than the fieldnames sequence,
128 the value of
\var{restval
} will be used as the default value. If the row
129 read has more fields than the fieldnames sequence, the remaining data is
130 added as a sequence keyed by the value of
\var{restkey
}. If the row read
131 has fewer fields than the fieldnames sequence, the remaining keys take the
132 value of the optional
\var{restval
} parameter. All other parameters are
133 interpreted as for
\class{reader
} objects.
137 \begin{classdesc
}{DictWriter
}{csvfile, fieldnames
\optional{,
138 restval=""
\optional{,
139 extrasaction=
\code{'raise'
}\optional{,
140 dialect=
\code{'excel'
}\optional{, fmtparam
}}}}}
141 Create an object which operates like a regular writer but maps dictionaries
142 onto output rows. The
\var{fieldnames
} parameter identifies the order in
143 which values in the dictionary passed to the
\method{writerow()
} method are
144 written to the
\var{csvfile
}. The optional
\var{restval
} parameter
145 specifies the value to be written if the dictionary is missing a key in
146 \var{fieldnames
}. If the dictionary passed to the
\method{writerow()
}
147 method contains a key not found in
\var{fieldnames
}, the optional
148 \var{extrasaction
} parameter indicates what action to take. If it is set
149 to
\code{'raise'
} a
\exception{ValueError
} is raised. If it is set to
150 \code{'ignore'
}, extra values in the dictionary are ignored. All other
151 parameters are interpreted as for
\class{writer
} objects.
154 \begin{classdesc*
}{Dialect
}{}
155 The
\class{Dialect
} class is a container class relied on primarily for its
156 attributes, which are used to define the parameters for a specific
157 \class{reader
} or
\class{writer
} instance.
160 \begin{classdesc
}{Sniffer
}{}
161 The
\class{Sniffer
} class is used to deduce the format of a CSV file.
164 The
\class{Sniffer
} class provides a single method:
166 \begin{methoddesc
}{sniff
}{sample
\optional{,delimiters=None
}}
167 Analyze the given
\var{sample
} and return a
\class{Dialect
} subclass
168 reflecting the parameters found. If the optional
\var{delimiters
} parameter
169 is given, it is interpreted as a string containing possible valid delimiter
173 \begin{methoddesc
}{has_header
}{sample
}
174 Analyze the sample text (presumed to be in CSV format) and return
175 \constant{True
} if the first row appears to be a series of column
180 The
\module{csv
} module defines the following constants:
182 \begin{datadesc
}{QUOTE_ALL
}
183 Instructs
\class{writer
} objects to quote all fields.
186 \begin{datadesc
}{QUOTE_MINIMAL
}
187 Instructs
\class{writer
} objects to only quote those fields which contain
188 the current
\var{delimiter
} or begin with the current
\var{quotechar
}.
191 \begin{datadesc
}{QUOTE_NONNUMERIC
}
192 Instructs
\class{writer
} objects to quote all non-numeric fields.
195 \begin{datadesc
}{QUOTE_NONE
}
196 Instructs
\class{writer
} objects to never quote fields. When the current
197 \var{delimiter
} occurs in output data it is preceded by the current
198 \var{escapechar
} character. When
\constant{QUOTE_NONE
} is in effect, it
199 is an error not to have a single-character
\var{escapechar
} defined, even if
200 no data to be written contains the
\var{delimiter
} character.
204 The
\module{csv
} module defines the following exception:
206 \begin{excdesc
}{Error
}
207 Raised by any of the functions when an error is detected.
211 \subsection{Dialects and Formatting Parameters
\label{csv-fmt-params
}}
213 To make it easier to specify the format of input and output records,
214 specific formatting parameters are grouped together into dialects. A
215 dialect is a subclass of the
\class{Dialect
} class having a set of specific
216 methods and a single
\method{validate()
} method. When creating
\class{reader
}
217 or
\class{writer
} objects, the programmer can specify a string or a subclass
218 of the
\class{Dialect
} class as the dialect parameter. In addition to, or
219 instead of, the
\var{dialect
} parameter, the programmer can also specify
220 individual formatting parameters, which have the same names as the
221 attributes defined below for the
\class{Dialect
} class.
223 Dialects support the following attributes:
225 \begin{memberdesc
}[Dialect
]{delimiter
}
226 A one-character string used to separate fields. It defaults to
\code{','
}.
229 \begin{memberdesc
}[Dialect
]{doublequote
}
230 Controls how instances of
\var{quotechar
} appearing inside a field should be
231 themselves be quoted. When
\constant{True
}, the character is doubled.
232 When
\constant{False
}, the
\var{escapechar
} must be a one-character string
233 which is used as a prefix to the
\var{quotechar
}. It defaults to
237 \begin{memberdesc
}[Dialect
]{escapechar
}
238 A one-character string used to escape the
\var{delimiter
} if
\var{quoting
}
239 is set to
\constant{QUOTE_NONE
}. It defaults to
\constant{None
}.
242 \begin{memberdesc
}[Dialect
]{lineterminator
}
243 The string used to terminate lines in the CSV file. It defaults to
247 \begin{memberdesc
}[Dialect
]{quotechar
}
248 A one-character string used to quote elements containing the
\var{delimiter
}
249 or which start with the
\var{quotechar
}. It defaults to
\code{'"'
}.
252 \begin{memberdesc
}[Dialect
]{quoting
}
253 Controls when quotes should be generated by the writer. It can take on any
254 of the
\constant{QUOTE_*
} constants (see section~
\ref{csv-contents
})
255 and defaults to
\constant{QUOTE_MINIMAL
}.
258 \begin{memberdesc
}[Dialect
]{skipinitialspace
}
259 When
\constant{True
}, whitespace immediately following the
\var{delimiter
}
260 is ignored. The default is
\constant{False
}.
264 \subsection{Reader Objects
}
266 Reader objects (
\class{DictReader
} instances and objects returned by
267 the
\function{reader()
} function) have the following public methods:
269 \begin{methoddesc
}[csv reader
]{next
}{}
270 Return the next row of the reader's iterable object as a list, parsed
271 according to the current dialect.
275 \subsection{Writer Objects
}
277 \class{Writer
} objects (
\class{DictWriter
} instances and objects returned by
278 the
\function{writer()
} function) have the following public methods. A
279 {}\var{row
} must be a sequence of strings or numbers for
\class{Writer
}
280 objects and a dictionary mapping fieldnames to strings or numbers (by
281 passing them through
\function{str()
} first) for
{}\class{DictWriter
}
282 objects. Note that complex numbers are written out surrounded by parens.
283 This may cause some problems for other programs which read CSV files
284 (assuming they support complex numbers at all).
286 \begin{methoddesc
}[csv writer
]{writerow
}{row
}
287 Write the
\var{row
} parameter to the writer's file object, formatted
288 according to the current dialect.
291 \begin{methoddesc
}[csv writer
]{writerows
}{rows
}
292 Write all the
\var{rows
} parameters (a list of
\var{row
} objects as
293 described above) to the writer's file object, formatted
294 according to the current dialect.
298 \subsection{Examples
}
300 The ``Hello, world'' of csv reading is
304 reader = csv.reader(file("some.csv"))
309 The corresponding simplest possible writing example is
313 writer = csv.writer(file("some.csv", "w"))
314 for row in someiterable: