1 @c PSPP - a program for statistical analysis.
2 @c Copyright (C) 2017 Free Software Foundation, Inc.
3 @c Permission is granted to copy, distribute and/or modify this document
4 @c under the terms of the GNU Free Documentation License, Version 1.3
5 @c or any later version published by the Free Software Foundation;
6 @c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
7 @c A copy of the license is included in the section entitled "GNU
8 @c Free Documentation License".
10 @node System and Portable File IO
11 @chapter System and Portable File I/O
13 The commands in this chapter read, write, and examine system files and
17 * APPLY DICTIONARY:: Apply system file dictionary to active dataset.
18 * EXPORT:: Write to a portable file.
19 * GET:: Read from a system file.
20 * GET DATA:: Read from foreign files.
21 * IMPORT:: Read from a portable file.
22 * SAVE:: Write to a system file.
23 * SAVE TRANSLATE:: Write data in foreign file formats.
24 * SYSFILE INFO:: Display system file dictionary.
25 * XEXPORT:: Write to a portable file, as a transformation.
26 * XSAVE:: Write to a system file, as a transformation.
29 @node APPLY DICTIONARY
30 @section APPLY DICTIONARY
31 @vindex APPLY DICTIONARY
34 APPLY DICTIONARY FROM=@{'@var{file_name}',@var{file_handle}@}.
37 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
38 and missing values taken from a file to corresponding
39 variables in the active dataset. In some cases it also updates the
42 Specify a system file or portable file's name, a data set name
43 (@pxref{Datasets}), or a file handle name (@pxref{File Handles}). The
44 dictionary in the file will be read, but it will not replace the
45 active dataset's dictionary. The file's data will not be read.
47 Only variables with names that exist in both the active dataset and the
48 system file are considered. Variables with the same name but different
49 types (numeric, string) will cause an error message. Otherwise, the
50 system file variables' attributes will replace those in their matching
51 active dataset variables:
55 If a system file variable has a variable label, then it will replace
56 the variable label of the active dataset variable. If the system
57 file variable does not have a variable label, then the active dataset
58 variable's variable label, if any, will be retained.
61 If the system file variable has custom attributes (@pxref{VARIABLE
62 ATTRIBUTE}), then those attributes replace the active dataset variable's
63 custom attributes. If the system file variable does not have custom
64 attributes, then the active dataset variable's custom attributes, if any,
68 If the active dataset variable is numeric or short string, then value
69 labels and missing values, if any, will be copied to the active dataset
70 variable. If the system file variable does not have value labels or
71 missing values, then those in the active dataset variable, if any, will not
75 In addition to properties of variables, some properties of the active
76 file dictionary as a whole are updated:
80 If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
81 then those attributes replace the active dataset variable's custom
85 If the active dataset has a weighting variable (@pxref{WEIGHT}), and the
86 system file does not, or if the weighting variable in the system file
87 does not exist in the active dataset, then the active dataset weighting
88 variable, if any, is retained. Otherwise, the weighting variable in
89 the system file becomes the active dataset weighting variable.
92 @cmd{APPLY DICTIONARY} takes effect immediately. It does not read the
93 active dataset. The system file is not modified.
101 /OUTFILE='@var{file_name}'
102 /UNSELECTED=@{RETAIN,DELETE@}
106 /RENAME=(@var{src_names}=@var{target_names})@dots{}
111 The @cmd{EXPORT} procedure writes the active dataset's dictionary and
112 data to a specified portable file.
114 By default, cases excluded with FILTER are written to the
115 file. These can be excluded by specifying DELETE on the @subcmd{UNSELECTED}
116 subcommand. Specifying RETAIN makes the default explicit.
118 Portable files express real numbers in base 30. Integers are always
119 expressed to the maximum precision needed to make them exact.
120 Non-integers are, by default, expressed to the machine's maximum
121 natural precision (approximately 15 decimal digits on many machines).
122 If many numbers require this many digits, the portable file may
123 significantly increase in size. As an alternative, the @subcmd{DIGITS}
124 subcommand may be used to specify the number of decimal digits of
125 precision to write. @subcmd{DIGITS} applies only to non-integers.
127 The @subcmd{OUTFILE} subcommand, which is the only required subcommand, specifies
128 the portable file to be written as a file name string or
129 a file handle (@pxref{File Handles}).
131 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the same format as the
132 @subcmd{SAVE} procedure (@pxref{SAVE}).
134 The @subcmd{TYPE} subcommand specifies the character set for use in the
135 portable file. Its value is currently not used.
137 The @subcmd{MAP} subcommand is currently ignored.
139 @cmd{EXPORT} is a procedure. It causes the active dataset to be read.
147 /FILE=@{'@var{file_name}',@var{file_handle}@}
150 /RENAME=(@var{src_names}=@var{target_names})@dots{}
151 /ENCODING='@var{encoding}'
154 @cmd{GET} clears the current dictionary and active dataset and
155 replaces them with the dictionary and data from a specified file.
157 The @subcmd{FILE} subcommand is the only required subcommand. Specify
158 the SPSS system file, SPSS/PC+ system file, or SPSS portable file to
159 be read as a string file name or a file handle (@pxref{File Handles}).
161 By default, all the variables in a file are read. The DROP
162 subcommand can be used to specify a list of variables that are not to be
163 read. By contrast, the @subcmd{KEEP} subcommand can be used to specify
164 variable that are to be read, with all other variables not read.
166 Normally variables in a file retain the names that they were
167 saved under. Use the @subcmd{RENAME} subcommand to change these names.
169 within parentheses, a list of variable names followed by an equals sign
170 (@samp{=}) and the names that they should be renamed to. Multiple
171 parenthesized groups of variable names can be included on a single
172 @subcmd{RENAME} subcommand.
173 Variables' names may be swapped using a @subcmd{RENAME}
174 subcommand of the form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
176 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
177 eliminated. When this is done, only a single variable may be renamed at
178 once. For instance, @subcmd{/RENAME=@var{A}=@var{B}}. This alternate syntax is
181 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are executed in left-to-right order.
182 Each may be present any number of times. @cmd{GET} never modifies a
183 file on disk. Only the active dataset read from the file
184 is affected by these subcommands.
186 @pspp{} automatically detects the encoding of string data in the file,
187 when possible. The character encoding of old SPSS system files cannot
188 always be guessed correctly, and SPSS/PC+ system files do not include
189 any indication of their encoding. Specify the @subcmd{ENCODING}
190 subcommand with an @acronym{IANA} character set name as its string
191 argument to override the default. Use @cmd{SYSFILE INFO} to analyze
192 the encodings that might be valid for a system file. The
193 @subcmd{ENCODING} subcommand is a @pspp{} extension.
195 @cmd{GET} does not cause the data to be read, only the dictionary. The data
196 is read later, when a procedure is executed.
198 Use of @cmd{GET} to read a portable file is a @pspp{} extension.
206 /TYPE=@{GNM,ODS,PSQL,TXT@}
207 @dots{}additional subcommands depending on TYPE@dots{}
210 The @cmd{GET DATA} command is used to read files and other data
211 sources created by other applications. When this command is executed,
212 the current dictionary and active dataset are replaced with variables
213 and data read from the specified source.
215 The @subcmd{TYPE} subcommand is mandatory and must be the first subcommand
216 specified. It determines the type of the file or source to read.
217 @pspp{} currently supports the following file types:
221 Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
224 Spreadsheet files in OpenDocument format (@url{http://opendocumentformat.org}).
227 Relations from PostgreSQL databases (@url{http://postgresql.org}).
230 Textual data files in columnar and delimited formats.
233 Each supported file type has additional subcommands, explained in
234 separate sections below.
237 * GET DATA /TYPE=GNM/ODS:: Spreadsheets
238 * GET DATA /TYPE=PSQL:: Databases
239 * GET DATA /TYPE=TXT:: Delimited Text Files
242 @node GET DATA /TYPE=GNM/ODS
243 @subsection Spreadsheet Files
246 GET DATA /TYPE=@{GNM, ODS@}
247 /FILE=@{'@var{file_name}'@}
248 /SHEET=@{NAME '@var{sheet_name}', INDEX @var{n}@}
249 /CELLRANGE=@{RANGE '@var{range}', FULL@}
250 /READNAMES=@{ON, OFF@}
251 /ASSUMEDSTRWIDTH=@var{n}.
256 @cindex spreadsheet files
258 Gnumeric spreadsheets (@url{http://gnumeric.org}), and spreadsheets
259 in OpenDocument format
260 (@url{http://libreplanet.org/wiki/Group:OpenDocument/Software})
261 can be read using the @cmd{GET DATA} command.
262 Use the @subcmd{TYPE} subcommand to indicate the file's format.
263 /TYPE=GNM indicates Gnumeric files,
264 /TYPE=ODS indicates OpenDocument.
265 The @subcmd{FILE} subcommand is mandatory.
266 Use it to specify the name file to be read.
267 All other subcommands are optional.
269 The format of each variable is determined by the format of the spreadsheet
270 cell containing the first datum for the variable.
271 If this cell is of string (text) format, then the width of the variable is
272 determined from the length of the string it contains, unless the
273 @subcmd{ASSUMEDSTRWIDTH} subcommand is given.
275 The @subcmd{SHEET} subcommand specifies the sheet within the spreadsheet file to read.
276 There are two forms of the @subcmd{SHEET} subcommand.
278 @subcmd{/SHEET=name @var{sheet_name}}, the string @var{sheet_name} is the
279 name of the sheet to read.
280 In the second form, @subcmd{/SHEET=index @var{idx}}, @var{idx} is a
281 integer which is the index of the sheet to read.
282 The first sheet has the index 1.
283 If the @subcmd{SHEET} subcommand is omitted, then the command will read the
284 first sheet in the file.
286 The @subcmd{CELLRANGE} subcommand specifies the range of cells within the sheet to read.
287 If the subcommand is given as @subcmd{/CELLRANGE=FULL}, then the entire
289 To read only part of a sheet, use the form
290 @subcmd{/CELLRANGE=range '@var{top_left_cell}:@var{bottom_right_cell}'}.
291 For example, the subcommand @subcmd{/CELLRANGE=range 'C3:P19'} reads
292 columns C--P, and rows 3--19 inclusive.
293 If no @subcmd{CELLRANGE} subcommand is given, then the entire sheet is read.
295 If @subcmd{/READNAMES=ON} is specified, then the contents of cells of
296 the first row are used as the names of the variables in which to store
297 the data from subsequent rows. This is the default.
298 If @subcmd{/READNAMES=OFF} is
299 used, then the variables receive automatically assigned names.
301 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
302 variables read from the file.
303 If omitted, the default value is determined from the length of the
304 string in the first spreadsheet cell for each variable.
307 @node GET DATA /TYPE=PSQL
308 @subsection Postgres Database Queries
312 /CONNECT=@{@var{connection info}@}
314 [/ASSUMEDSTRWIDTH=@var{w}]
322 The PSQL type is used to import data from a postgres database server.
323 The server may be located locally or remotely.
324 Variables are automatically created based on the table column names
325 or the names specified in the SQL query.
326 Postgres data types of high precision, will loose precision when
327 imported into @pspp{}.
328 Not all the postgres data types are able to be represented in @pspp{}.
329 If a datum cannot be represented a warning will be issued and that
330 datum will be set to SYSMIS.
332 The @subcmd{CONNECT} subcommand is mandatory.
333 It is a string specifying the parameters of the database server from
334 which the data should be fetched.
335 The format of the string is given in the postgres manual
336 @url{http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT}.
338 The @subcmd{SQL} subcommand is mandatory.
339 It must be a valid SQL string to retrieve data from the database.
341 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
342 variables read from the database.
343 If omitted, the default value is determined from the length of the
344 string in the first value read for each variable.
346 The @subcmd{UNENCRYPTED} subcommand allows data to be retrieved over an insecure
348 If the connection is not encrypted, and the @subcmd{UNENCRYPTED} subcommand is
349 not given, then an error will occur.
350 Whether or not the connection is
351 encrypted depends upon the underlying psql library and the
352 capabilities of the database server.
354 The @subcmd{BSIZE} subcommand serves only to optimise the speed of data transfer.
355 It specifies an upper limit on
356 number of cases to fetch from the database at once.
357 The default value is 4096.
358 If your SQL statement fetches a large number of cases but only a small number of
359 variables, then the data transfer may be faster if you increase this value.
360 Conversely, if the number of variables is large, or if the machine on which
361 @pspp{} is running has only a
362 small amount of memory, then a smaller value will be better.
365 The following syntax is an example:
368 /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
369 /SQL='select * from manufacturer'.
373 @node GET DATA /TYPE=TXT
374 @subsection Textual Data Files
378 /FILE=@{'@var{file_name}',@var{file_handle}@}
379 [ENCODING='@var{encoding}']
380 [/ARRANGEMENT=@{DELIMITED,FIXED@}]
381 [/FIRSTCASE=@{@var{first_case}@}]
383 @dots{}additional subcommands depending on ARRANGEMENT@dots{}
388 When TYPE=TXT is specified, GET DATA reads data in a delimited or
389 fixed columnar format, much like DATA LIST (@pxref{DATA LIST}).
391 The @subcmd{FILE} subcommand is mandatory. Specify the file to be read as
392 a string file name or (for textual data only) a
393 file handle (@pxref{File Handles}).
395 The @subcmd{ENCODING} subcommand specifies the character encoding of
396 the file to be read. @xref{INSERT}, for information on supported
399 The @subcmd{ARRANGEMENT} subcommand determines the file's basic format.
400 DELIMITED, the default setting, specifies that fields in the input
401 data are separated by spaces, tabs, or other user-specified
402 delimiters. FIXED specifies that fields in the input data appear at
403 particular fixed column positions within records of a case.
405 By default, cases are read from the input file starting from the first
406 line. To skip lines at the beginning of an input file, set @subcmd{FIRSTCASE}
407 to the number of the first line to read: 2 to skip the first line, 3
408 to skip the first two lines, and so on.
410 @subcmd{IMPORTCASES} is ignored, for compatibility. Use @cmd{N OF
411 CASES} to limit the number of cases read from a file (@pxref{N OF
412 CASES}), or @cmd{SAMPLE} to obtain a random sample of cases
415 The remaining subcommands apply only to one of the two file
416 arrangements, described below.
419 * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
420 * GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
423 @node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
424 @subsubsection Reading Delimited Data
428 /FILE=@{'@var{file_name}',@var{file_handle}@}
429 [/ARRANGEMENT=@{DELIMITED,FIXED@}]
430 [/FIRSTCASE=@{@var{first_case}@}]
431 [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
433 /DELIMITERS="@var{delimiters}"
434 [/QUALIFIER="@var{quotes}"
435 [/DELCASE=@{LINE,VARIABLES @var{n_variables}@}]
436 /VARIABLES=@var{del_var1} [@var{del_var2}]@dots{}
437 where each @var{del_var} takes the form:
441 The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
442 input data from text files in delimited format, where fields are
443 separated by a set of user-specified delimiters. Its capabilities are
444 similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
447 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
448 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
450 @subcmd{DELIMITERS}, which is required, specifies the set of characters that
451 may separate fields. Each character in the string specified on
452 @subcmd{DELIMITERS} separates one field from the next. The end of a line also
453 separates fields, regardless of @subcmd{DELIMITERS}. Two consecutive
454 delimiters in the input yield an empty field, as does a delimiter at
455 the end of a line. A space character as a delimiter is an exception:
456 consecutive spaces do not yield an empty field and neither does any
457 number of spaces at the end of a line.
459 To use a tab as a delimiter, specify @samp{\t} at the beginning of the
460 @subcmd{DELIMITERS} string. To use a backslash as a delimiter, specify
461 @samp{\\} as the first delimiter or, if a tab should also be a
462 delimiter, immediately following @samp{\t}. To read a data file in
463 which each field appears on a separate line, specify the empty string
464 for @subcmd{DELIMITERS}.
466 The optional @subcmd{QUALIFIER} subcommand names one or more characters that
467 can be used to quote values within fields in the input. A field that
468 begins with one of the specified quote characters ends at the next
469 matching quote. Intervening delimiters become part of the field,
470 instead of terminating it. The ability to specify more than one quote
471 character is a @pspp{} extension.
473 The character specified on @subcmd{QUALIFIER} can be embedded within a
474 field that it quotes by doubling the qualifier. For example, if
475 @samp{'} is specified on @subcmd{QUALIFIER}, then @code{'a''b'}
476 specifies a field that contains @samp{a'b}.
478 The @subcmd{DELCASE} subcommand controls how data may be broken across lines in
479 the data file. With LINE, the default setting, each line must contain
480 all the data for exactly one case. For additional flexibility, to
481 allow a single case to be split among lines or multiple cases to be
482 contained on a single line, specify VARIABLES @i{n_variables}, where
483 @i{n_variables} is the number of variables per case.
485 The @subcmd{VARIABLES} subcommand is required and must be the last subcommand.
486 Specify the name of each variable and its input format (@pxref{Input
487 and Output Formats}) in the order they should be read from the input
490 @subsubheading Examples
493 On a Unix-like system, the @samp{/etc/passwd} file has a format
497 root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
498 blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
499 john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
500 jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
504 The following syntax reads a file in the format used by
507 @c If you change this example, change the regression test in
508 @c tests/language/data-io/get-data.at to match.
510 GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
511 /VARIABLES=username A20
521 Consider the following data on used cars:
524 model year mileage price type age
525 Civic 2002 29883 15900 Si 2
526 Civic 2003 13415 15900 EX 1
527 Civic 1992 107000 3800 n/a 12
528 Accord 2002 26613 17900 EX 1
532 The following syntax can be used to read the used car data:
534 @c If you change this example, change the regression test in
535 @c tests/language/data-io/get-data.at to match.
537 GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
547 Consider the following information on animals in a pet store:
550 'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
551 , (Years), , , (Dollars), ,
552 "Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
553 "Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
554 "Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
555 "Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
559 The following syntax can be used to read the pet store data:
561 @c If you change this example, change the regression test in
562 @c tests/language/data-io/get-data.at to match.
564 GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
575 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
576 @subsubsection Reading Fixed Columnar Data
578 @c (modify-syntax-entry ?_ "w")
579 @c (modify-syntax-entry ?' "'")
580 @c (modify-syntax-entry ?@ "'")
584 /FILE=@{'file_name',@var{file_handle}@}
585 [/ARRANGEMENT=@{DELIMITED,FIXED@}]
586 [/FIRSTCASE=@{@var{first_case}@}]
587 [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
590 /VARIABLES @var{fixed_var} [@var{fixed_var}]@dots{}
591 [/rec# @var{fixed_var} [@var{fixed_var}]@dots{}]@dots{}
592 where each @var{fixed_var} takes the form:
593 @var{variable} @var{start}-@var{end} @var{format}
596 The @cmd{GET DATA} command with TYPE=TXT and ARRANGEMENT=FIXED reads input
597 data from text files in fixed format, where each field is located in
598 particular fixed column positions within records of a case. Its
599 capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
600 FIXED}), with a few enhancements.
602 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
603 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
605 The optional @subcmd{FIXCASE} subcommand may be used to specify the positive
606 integer number of input lines that make up each case. The default
609 The @subcmd{VARIABLES} subcommand, which is required, specifies the positions
610 at which each variable can be found. For each variable, specify its
611 name, followed by its start and end column separated by @samp{-}
612 (e.g.@: @samp{0-9}), followed by an input format type (e.g.@:
613 @samp{F}) or a full format specification (e.g.@: @samp{DOLLAR12.2}).
614 For this command, columns are numbered starting from 0 at
615 the left column. Introduce the variables in the second and later
616 lines of a case by a slash followed by the number of the line within
617 the case, e.g.@: @samp{/2} for the second line.
619 @subsubheading Examples
622 Consider the following data on used cars:
625 model year mileage price type age
626 Civic 2002 29883 15900 Si 2
627 Civic 2003 13415 15900 EX 1
628 Civic 1992 107000 3800 n/a 12
629 Accord 2002 26613 17900 EX 1
633 The following syntax can be used to read the used car data:
635 @c If you change this example, change the regression test in
636 @c tests/language/data-io/get-data.at to match.
638 GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
639 /VARIABLES=model 0-7 A
653 /FILE='@var{file_name}'
657 /RENAME=(@var{src_names}=@var{target_names})@dots{}
660 The @cmd{IMPORT} transformation clears the active dataset dictionary and
662 replaces them with a dictionary and data from a system file or
665 The @subcmd{FILE} subcommand, which is the only required subcommand, specifies
666 the portable file to be read as a file name string or a file handle
667 (@pxref{File Handles}).
669 The @subcmd{TYPE} subcommand is currently not used.
671 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the syntax used by @cmd{GET} (@pxref{GET}).
673 @cmd{IMPORT} does not cause the data to be read; only the dictionary. The
674 data is read later, when a procedure is executed.
676 Use of @cmd{IMPORT} to read a system file is a @pspp{} extension.
684 /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
685 /UNSELECTED=@{RETAIN,DELETE@}
686 /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
687 /PERMISSIONS=@{WRITEABLE,READONLY@}
690 /VERSION=@var{version}
691 /RENAME=(@var{src_names}=@var{target_names})@dots{}
696 The @cmd{SAVE} procedure causes the dictionary and data in the active
698 be written to a system file.
700 OUTFILE is the only required subcommand. Specify the system file
701 to be written as a string file name or a file handle
702 (@pxref{File Handles}).
704 By default, cases excluded with FILTER are written to the system file.
705 These can be excluded by specifying @subcmd{DELETE} on the @subcmd{UNSELECTED}
706 subcommand. Specifying @subcmd{RETAIN} makes the default explicit.
708 The @subcmd{UNCOMPRESSED}, @subcmd{COMPRESSED}, and
709 @subcmd{ZCOMPRESSED} subcommand determine the system file's
714 Data is not compressed. Each numeric value uses 8 bytes of disk
715 space. Each string value uses one byte per column width, rounded up
716 to a multiple of 8 bytes.
719 Data is compressed with a simple algorithm. Each integer numeric
720 value between @minus{}99 and 151, inclusive, or system missing value
721 uses one byte of disk space. Each 8-byte segment of a string that
722 consists only of spaces uses 1 byte. Any other numeric value or
723 8-byte string segment uses 9 bytes of disk space.
726 Data is compressed with the ``deflate'' compression algorithm
727 specified in RFC@tie{}1951 (the same algorithm used by
728 @command{gzip}). Files written with this compression level cannot be
729 read by PSPP 0.8.1 or earlier or by SPSS 20 or earlier.
732 @subcmd{COMPRESSED} is the default compression level. The SET command
733 (@pxref{SET}) can change this default.
735 The @subcmd{PERMISSIONS} subcommand specifies permissions for the new system
736 file. WRITEABLE, the default, creates the file with read and write
737 permission. READONLY creates the file for read-only access.
739 By default, all the variables in the active dataset dictionary are written
740 to the system file. The @subcmd{DROP} subcommand can be used to specify a list
741 of variables not to be written. In contrast, KEEP specifies variables
742 to be written, with all variables not specified not written.
744 Normally variables are saved to a system file under the same names they
745 have in the active dataset. Use the @subcmd{RENAME} subcommand to change these names.
746 Specify, within parentheses, a list of variable names followed by an
747 equals sign (@samp{=}) and the names that they should be renamed to.
748 Multiple parenthesized groups of variable names can be included on a
749 single @subcmd{RENAME} subcommand. Variables' names may be swapped using a
750 @subcmd{RENAME} subcommand of the
751 form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
753 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
754 eliminated. When this is done, only a single variable may be renamed at
755 once. For instance, @subcmd{/RENAME=@var{A}=@var{B}}. This alternate syntax is
758 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are performed in
759 left-to-right order. They
760 each may be present any number of times. @cmd{SAVE} never modifies
761 the active dataset. @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} only
762 affect the system file written to disk.
764 The @subcmd{VERSION} subcommand specifies the version of the file format. Valid
765 versions are 2 and 3. The default version is 3. In version 2 system
766 files, variable names longer than 8 bytes will be truncated. The two
767 versions are otherwise identical.
769 The @subcmd{NAMES} and @subcmd{MAP} subcommands are currently ignored.
771 @cmd{SAVE} causes the data to be read. It is a procedure.
774 @section SAVE TRANSLATE
775 @vindex SAVE TRANSLATE
779 /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
782 [/MISSING=@{IGNORE,RECODE@}]
784 [/DROP=@var{var_list}]
785 [/KEEP=@var{var_list}]
786 [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
787 [/UNSELECTED=@{RETAIN,DELETE@}]
790 @dots{}additional subcommands depending on TYPE@dots{}
793 The @cmd{SAVE TRANSLATE} command is used to save data into various
794 formats understood by other applications.
796 The @subcmd{OUTFILE} and @subcmd{TYPE} subcommands are mandatory.
797 @subcmd{OUTFILE} specifies the file to be written, as a string file name or a file handle
798 (@pxref{File Handles}). @subcmd{TYPE} determines the type of the file or
799 source to read. It must be one of the following:
803 Comma-separated value format,
806 Tab-delimited format.
809 By default, @cmd{SAVE TRANSLATE} will not overwrite an existing file. Use
810 @subcmd{REPLACE} to force an existing file to be overwritten.
812 With MISSING=IGNORE, the default, @subcmd{SAVE TRANSLATE} treats user-missing
813 values as if they were not missing. Specify MISSING=RECODE to output
814 numeric user-missing values like system-missing values and string
815 user-missing values as all spaces.
817 By default, all the variables in the active dataset dictionary are saved
818 to the system file, but @subcmd{DROP} or @subcmd{KEEP} can select a subset of variable
819 to save. The @subcmd{RENAME} subcommand can also be used to change the names
820 under which variables are saved. @subcmd{UNSELECTED} determines whether cases
821 filtered out by the @cmd{FILTER} command are written to the output file.
822 These subcommands have the same syntax and meaning as on the
823 @cmd{SAVE} command (@pxref{SAVE}).
825 Each supported file type has additional subcommands, explained in
826 separate sections below.
828 @cmd{SAVE TRANSLATE} causes the data to be read. It is a procedure.
831 * SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
834 @node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
835 @subsection Writing Comma- and Tab-Separated Data Files
839 /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
842 [/MISSING=@{IGNORE,RECODE@}]
844 [/DROP=@var{var_list}]
845 [/KEEP=@var{var_list}]
846 [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
847 [/UNSELECTED=@{RETAIN,DELETE@}]
850 [/CELLS=@{VALUES,LABELS@}]
851 [/TEXTOPTIONS DELIMITER='@var{delimiter}']
852 [/TEXTOPTIONS QUALIFIER='@var{qualifier}']
853 [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
854 [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
857 The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
858 comma- or tab-separated value format similar to that described by
859 RFC@tie{}4180. Each variable becomes one output column, and each case
860 becomes one line of output. If FIELDNAMES is specified, an additional
861 line at the top of the output file lists variable names.
863 The CELLS and TEXTOPTIONS FORMAT settings determine how values are
864 written to the output file:
867 @item CELLS=VALUES FORMAT=PLAIN (the default settings)
868 Writes variables to the output in ``plain'' formats that ignore the
869 details of variable formats. Numeric values are written as plain
870 decimal numbers with enough digits to indicate their exact values in
871 machine representation. Numeric values include @samp{e} followed by
872 an exponent if the exponent value would be less than -4 or greater
873 than 16. Dates are written in MM/DD/YYYY format and times in HH:MM:SS
874 format. WKDAY and MONTH values are written as decimal numbers.
876 Numeric values use, by default, the decimal point character set with
877 SET DECIMAL (@pxref{SET DECIMAL}). Use DECIMAL=DOT or DECIMAL=COMMA
878 to force a particular decimal point character.
880 @item CELLS=VALUES FORMAT=VARIABLE
881 Writes variables using their print formats. Leading and trailing
882 spaces are removed from numeric values, and trailing spaces are
883 removed from string values.
885 @item CELLS=LABEL FORMAT=PLAIN
886 @itemx CELLS=LABEL FORMAT=VARIABLE
887 Writes value labels where they exist, and otherwise writes the values
888 themselves as described above.
891 Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
892 values are output as a single space.
894 For TYPE=TAB, tab characters delimit values. For TYPE=CSV, the
895 TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
896 that separate values within a line. If DELIMITER is specified, then
897 the specified string separate values. If DELIMITER is not specified,
898 then the default is a comma with DECIMAL=DOT or a semicolon with
899 DECIMAL=COMMA. If DECIMAL is not given either, it is implied by the
900 decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
902 The TEXTOPTIONS QUALIFIER setting specifies a character that is output
903 before and after a value that contains the delimiter character or the
904 qualifier character. The default is a double quote (@samp{"}). A
905 qualifier character that appears within a value is doubled.
908 @section SYSFILE INFO
912 SYSFILE INFO FILE='@var{file_name}' [ENCODING='@var{encoding}'].
915 @cmd{SYSFILE INFO} reads the dictionary in an SPSS system file,
916 SPSS/PC+ system file, or SPSS portable file, and displays the
917 information in its dictionary.
919 Specify a file name or file handle. @cmd{SYSFILE INFO} reads that
920 file and displays information on its dictionary.
922 @pspp{} automatically detects the encoding of string data in the file,
923 when possible. The character encoding of old SPSS system files cannot
924 always be guessed correctly, and SPSS/PC+ system files do not include
925 any indication of their encoding. Specify the @subcmd{ENCODING}
926 subcommand with an @acronym{IANA} character set name as its string
927 argument to override the default, or specify @code{ENCODING='DETECT'}
928 to analyze and report possibly valid encodings for the system file.
929 The @subcmd{ENCODING} subcommand is a @pspp{} extension.
931 @cmd{SYSFILE INFO} does not affect the current active dataset.
939 /OUTFILE='@var{file_name}'
943 /RENAME=(@var{src_names}=@var{target_names})@dots{}
948 The @cmd{XEXPORT} transformation writes the active dataset dictionary and
949 data to a specified portable file.
951 This transformation is a @pspp{} extension.
953 It is similar to the @cmd{EXPORT} procedure, with two differences:
957 @cmd{XEXPORT} is a transformation, not a procedure. It is executed when
958 the data is read by a procedure or procedure-like command.
961 @cmd{XEXPORT} does not support the @subcmd{UNSELECTED} subcommand.
964 @xref{EXPORT}, for more information.
972 /OUTFILE='@var{file_name}'
973 /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
974 /PERMISSIONS=@{WRITEABLE,READONLY@}
977 /VERSION=@var{version}
978 /RENAME=(@var{src_names}=@var{target_names})@dots{}
983 The @cmd{XSAVE} transformation writes the active dataset's dictionary and
984 data to a system file. It is similar to the @cmd{SAVE}
985 procedure, with two differences:
989 @cmd{XSAVE} is a transformation, not a procedure. It is executed when
990 the data is read by a procedure or procedure-like command.
993 @cmd{XSAVE} does not support the @subcmd{UNSELECTED} subcommand.
996 @xref{SAVE}, for more information.