doc/files.texi

   1 @node System and Portable File IO
   2 @chapter System and Portable File I/O
   3
   4 The commands in this chapter read, write, and examine system files and
   5 portable files.
   6
   7 @menu
   8 * APPLY DICTIONARY::            Apply system file dictionary to active dataset.
   9 * EXPORT::                      Write to a portable file.
  10 * GET::                         Read from a system file.
  11 * GET DATA::                    Read from foreign files.
  12 * IMPORT::                      Read from a portable file.
  13 * SAVE::                        Write to a system file.
  14 * SAVE TRANSLATE::              Write data in foreign file formats.
  15 * SYSFILE INFO::                Display system file dictionary.
  16 * XEXPORT::                     Write to a portable file, as a transformation.
  17 * XSAVE::                       Write to a system file, as a transformation.
  18 @end menu
  19
  20 @node APPLY DICTIONARY
  21 @section APPLY DICTIONARY
  22 @vindex APPLY DICTIONARY
  23
  24 @display
  25 APPLY DICTIONARY FROM=@{'@var{file_name}',@var{file_handle}@}.
  26 @end display
  27
  28 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
  29 and missing values taken from a file to corresponding
  30 variables in the active dataset.  In some cases it also updates the
  31 weighting variable.
  32
  33 Specify a system file or portable file's name, a data set name
  34 (@pxref{Datasets}), or a file handle name (@pxref{File Handles}).  The
  35 dictionary in the file will be read, but it will not replace the
  36 active dataset's dictionary.  The file's data will not be read.
  37
  38 Only variables with names that exist in both the active dataset and the
  39 system file are considered.  Variables with the same name but different
  40 types (numeric, string) will cause an error message.  Otherwise, the
  41 system file variables' attributes will replace those in their matching
  42 active dataset variables:
  43
  44 @itemize @bullet
  45 @item
  46 If a system file variable has a variable label, then it will replace
  47 the variable label of the active dataset variable.  If the system
  48 file variable does not have a variable label, then the active dataset
  49 variable's variable label, if any, will be retained.
  50
  51 @item
  52 If the system file variable has custom attributes (@pxref{VARIABLE
  53 ATTRIBUTE}), then those attributes replace the active dataset variable's
  54 custom attributes.  If the system file variable does not have custom
  55 attributes, then the active dataset variable's custom attributes, if any,
  56 will be retained.
  57
  58 @item
  59 If the active dataset variable is numeric or short string, then value
  60 labels and missing values, if any, will be copied to the active dataset
  61 variable.  If the system file variable does not have value labels or
  62 missing values, then those in the active dataset variable, if any, will not
  63 be disturbed.
  64 @end itemize
  65
  66 In addition to properties of variables, some properties of the active
  67 file dictionary as a whole are updated:
  68
  69 @itemize @bullet
  70 @item
  71 If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
  72 then those attributes replace the active dataset variable's custom
  73 attributes.
  74
  75 @item
  76 If the active dataset has a weighting variable (@pxref{WEIGHT}), and the
  77 system file does not, or if the weighting variable in the system file
  78 does not exist in the active dataset, then the active dataset weighting
  79 variable, if any, is retained.  Otherwise, the weighting variable in
  80 the system file becomes the active dataset weighting variable.
  81 @end itemize
  82
  83 @cmd{APPLY DICTIONARY} takes effect immediately.  It does not read the
  84 active dataset.  The system file is not modified.
  85
  86 @node EXPORT
  87 @section EXPORT
  88 @vindex EXPORT
  89
  90 @display
  91 EXPORT
  92         /OUTFILE='@var{file_name}'
  93         /UNSELECTED=@{RETAIN,DELETE@}
  94         /DIGITS=@var{n}
  95         /DROP=@var{var_list}
  96         /KEEP=@var{var_list}
  97         /RENAME=(@var{src_names}=@var{target_names})@dots{}
  98         /TYPE=@{COMM,TAPE@}
  99         /MAP
 100 @end display
 101
 102 The @cmd{EXPORT} procedure writes the active dataset's dictionary and
 103 data to a specified portable file.
 104
 105 By default, cases excluded with FILTER are written to the
 106 file.  These can be excluded by specifying DELETE on the @subcmd{UNSELECTED}
 107 subcommand.  Specifying RETAIN makes the default explicit.
 108
 109 Portable files express real numbers in base 30.  Integers are always
 110 expressed to the maximum precision needed to make them exact.
 111 Non-integers are, by default, expressed to the machine's maximum
 112 natural precision (approximately 15 decimal digits on many machines).
 113 If many numbers require this many digits, the portable file may
 114 significantly increase in size.  As an alternative, the @subcmd{DIGITS}
 115 subcommand may be used to specify the number of decimal digits of
 116 precision to write.  @subcmd{DIGITS} applies only to non-integers.
 117
 118 The @subcmd{OUTFILE} subcommand, which is the only required subcommand, specifies
 119 the portable file to be written as a file name string or
 120 a file handle (@pxref{File Handles}).
 121
 122 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the same format as the
 123 @subcmd{SAVE} procedure (@pxref{SAVE}).
 124
 125 The @subcmd{TYPE} subcommand specifies the character set for use in the
 126 portable file.  Its value is currently not used.
 127
 128 The @subcmd{MAP} subcommand is currently ignored.
 129
 130 @cmd{EXPORT} is a procedure.  It causes the active dataset to be read.
 131
 132 @node GET
 133 @section GET
 134 @vindex GET
 135
 136 @display
 137 GET
 138         /FILE=@{'@var{file_name}',@var{file_handle}@}
 139         /DROP=@var{var_list}
 140         /KEEP=@var{var_list}
 141         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 142         /ENCODING='@var{encoding}'
 143 @end display
 144
 145 @cmd{GET} clears the current dictionary and active dataset and
 146 replaces them with the dictionary and data from a specified file.
 147
 148 The @subcmd{FILE} subcommand is the only required subcommand.  Specify
 149 the SPSS system file, SPSS/PC+ system file, or SPSS portable file to
 150 be read as a string file name or a file handle (@pxref{File Handles}).
 151
 152 By default, all the variables in a file are read.  The DROP
 153 subcommand can be used to specify a list of variables that are not to be
 154 read.  By contrast, the @subcmd{KEEP} subcommand can be used to specify
 155 variable that are to be read, with all other variables not read.
 156
 157 Normally variables in a file retain the names that they were
 158 saved under.  Use the @subcmd{RENAME} subcommand to change these names.
 159 Specify,
 160 within parentheses, a list of variable names followed by an equals sign
 161 (@samp{=}) and the names that they should be renamed to.  Multiple
 162 parenthesized groups of variable names can be included on a single
 163 @subcmd{RENAME} subcommand.
 164 Variables' names may be swapped using a @subcmd{RENAME}
 165 subcommand of the form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
 166
 167 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
 168 eliminated.  When this is done, only a single variable may be renamed at
 169 once.  For instance, @subcmd{/RENAME=@var{A}=@var{B}}.  This alternate syntax is
 170 deprecated.
 171
 172 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are executed in left-to-right order.
 173 Each may be present any number of times.  @cmd{GET} never modifies a
 174 file on disk.  Only the active dataset read from the file
 175 is affected by these subcommands.
 176
 177 @pspp{} automatically detects the encoding of string data in the file,
 178 when possible.  The character encoding of old SPSS system files cannot
 179 always be guessed correctly, and SPSS/PC+ system files do not include
 180 any indication of their encoding.  Specify the @subcmd{ENCODING}
 181 subcommand with an @acronym{IANA} character set name as its string
 182 argument to override the default.  Use @cmd{SYSFILE INFO} to analyze
 183 the encodings that might be valid for a system file.  The
 184 @subcmd{ENCODING} subcommand is a @pspp{} extension.
 185
 186 @cmd{GET} does not cause the data to be read, only the dictionary.  The data
 187 is read later, when a procedure is executed.
 188
 189 Use of @cmd{GET} to read a portable file is a @pspp{} extension.
 190
 191 @node GET DATA
 192 @section GET DATA
 193 @vindex GET DATA
 194
 195 @display
 196 GET DATA
 197         /TYPE=@{GNM,ODS,PSQL,TXT@}
 198         @dots{}additional subcommands depending on TYPE@dots{}
 199 @end display
 200
 201 The @cmd{GET DATA} command is used to read files and other data
 202 sources created by other applications.  When this command is executed,
 203 the current dictionary and active dataset are replaced with variables
 204 and data read from the specified source.
 205
 206 The @subcmd{TYPE} subcommand is mandatory and must be the first subcommand
 207 specified.  It determines the type of the file or source to read.
 208 @pspp{} currently supports the following file types:
 209
 210 @table @asis
 211 @item GNM
 212 Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
 213
 214 @item ODS
 215 Spreadsheet files in OpenDocument format (@url{http://opendocumentformat.org}).
 216
 217 @item PSQL
 218 Relations from PostgreSQL databases (@url{http://postgresql.org}).
 219
 220 @item TXT
 221 Textual data files in columnar and delimited formats.
 222 @end table
 223
 224 Each supported file type has additional subcommands, explained in
 225 separate sections below.
 226
 227 @menu
 228 * GET DATA /TYPE=GNM/ODS::     Spreadsheets
 229 * GET DATA /TYPE=PSQL::        Databases
 230 * GET DATA /TYPE=TXT::         Delimited Text Files
 231 @end menu
 232
 233 @node GET DATA /TYPE=GNM/ODS
 234 @subsection Spreadsheet Files
 235
 236 @display
 237 GET DATA /TYPE=@{GNM, ODS@}
 238         /FILE=@{'@var{file_name}'@}
 239         /SHEET=@{NAME '@var{sheet_name}', INDEX @var{n}@}
 240         /CELLRANGE=@{RANGE '@var{range}', FULL@}
 241         /READNAMES=@{ON, OFF@}
 242         /ASSUMEDSTRWIDTH=@var{n}.
 243 @end display
 244
 245 @cindex Gnumeric
 246 @cindex OpenDocument
 247 @cindex spreadsheet files
 248
 249 Gnumeric spreadsheets (@url{http://gnumeric.org}), and spreadsheets
 250 in OpenDocument format
 251 (@url{http://libreplanet.org/wiki/Group:OpenDocument/Software})
 252 can be read using the @cmd{GET DATA} command.
 253 Use the @subcmd{TYPE} subcommand to indicate the file's format.
 254 /TYPE=GNM indicates Gnumeric files,
 255 /TYPE=ODS indicates OpenDocument.
 256 The @subcmd{FILE} subcommand is mandatory.
 257 Use it to specify the name file to be read.
 258 All other subcommands are optional.
 259
 260 The format of each variable is determined by the format of the spreadsheet
 261 cell containing the first datum for the variable.
 262 If this cell is of string (text) format, then the width of the variable is
 263 determined from the length of the string it contains, unless the
 264 @subcmd{ASSUMEDSTRWIDTH} subcommand is given.
 265
 266 The @subcmd{SHEET} subcommand specifies the sheet within the spreadsheet file to read.
 267 There are two forms of the @subcmd{SHEET} subcommand.
 268 In the first form,
 269 @subcmd{/SHEET=name @var{sheet_name}}, the string @var{sheet_name} is the
 270 name of the sheet to read.
 271 In the second form, @subcmd{/SHEET=index @var{idx}}, @var{idx} is a
 272 integer which is the index of the sheet to read.
 273 The first sheet has the index 1.
 274 If the @subcmd{SHEET} subcommand is omitted, then the command will read the
 275 first sheet in the file.
 276
 277 The @subcmd{CELLRANGE} subcommand specifies the range of cells within the sheet to read.
 278 If the subcommand is given as @subcmd{/CELLRANGE=FULL}, then the entire
 279 sheet  is read.
 280 To read only part of a sheet, use the form
 281 @subcmd{/CELLRANGE=range '@var{top_left_cell}:@var{bottom_right_cell}'}.
 282 For example, the subcommand @subcmd{/CELLRANGE=range 'C3:P19'} reads
 283 columns C--P, and rows 3--19 inclusive.
 284 If no @subcmd{CELLRANGE} subcommand is given, then the entire sheet is read.
 285
 286 If @subcmd{/READNAMES=ON} is specified, then the contents of cells of
 287 the first row are used as the names of the variables in which to store
 288 the data from subsequent rows.  This is the default.
 289 If @subcmd{/READNAMES=OFF} is
 290 used, then the variables  receive automatically assigned names.
 291
 292 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
 293 variables read  from the file.
 294 If omitted, the default value is determined from the length of the
 295 string in the first spreadsheet cell for each variable.
 296
 297
 298 @node GET DATA /TYPE=PSQL
 299 @subsection Postgres Database Queries
 300
 301 @display
 302 GET DATA /TYPE=PSQL
 303          /CONNECT=@{@var{connection info}@}
 304          /SQL=@{@var{query}@}
 305          [/ASSUMEDSTRWIDTH=@var{w}]
 306          [/UNENCRYPTED]
 307          [/BSIZE=@var{n}].
 308 @end display
 309
 310 @cindex postgres
 311 @cindex databases
 312
 313 The PSQL type is used to import data from a postgres database server.
 314 The server may be located locally or remotely.
 315 Variables are automatically created based on the table column names
 316 or the names specified in the SQL query.
 317 Postgres data types of high precision, will loose precision when
 318 imported into @pspp{}.
 319 Not all the postgres data types are able to be represented in @pspp{}.
 320 If a datum cannot be represented a warning will be issued and that
 321 datum will be set to SYSMIS.
 322
 323 The @subcmd{CONNECT} subcommand is mandatory.
 324 It is a string specifying the parameters of the database server from
 325 which the data should be fetched.
 326 The format of the string is given in the postgres manual
 327 @url{http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT}.
 328
 329 The @subcmd{SQL} subcommand is mandatory.
 330 It must be a valid SQL string to retrieve data from the database.
 331
 332 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
 333 variables read  from the database.
 334 If omitted, the default value is determined from the length of the
 335 string in the first value read for each variable.
 336
 337 The @subcmd{UNENCRYPTED} subcommand allows data to be retrieved over an insecure
 338 connection.
 339 If the connection is not encrypted, and the @subcmd{UNENCRYPTED} subcommand is
 340 not given, then an error will occur.
 341 Whether or not the connection is
 342 encrypted depends upon the underlying psql library and the
 343 capabilities of the database server.
 344
 345 The @subcmd{BSIZE} subcommand serves only to optimise the speed of data transfer.
 346 It specifies an upper limit on
 347 number of cases to fetch from the database at once.
 348 The default value is 4096.
 349 If your SQL statement fetches a large number of cases but only a small number of
 350 variables, then the data transfer may be faster if you increase this value.
 351 Conversely, if the number of variables is large, or if the machine on which
 352 @pspp{} is running has only a
 353 small amount of memory, then a smaller value will be better.
 354
 355
 356 The following syntax is an example:
 357 @example
 358 GET DATA /TYPE=PSQL
 359      /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
 360      /SQL='select * from manufacturer'.
 361 @end example
 362
 363
 364 @node GET DATA /TYPE=TXT
 365 @subsection Textual Data Files
 366
 367 @display
 368 GET DATA /TYPE=TXT
 369         /FILE=@{'@var{file_name}',@var{file_handle}@}
 370         [ENCODING='@var{encoding}']
 371         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 372         [/FIRSTCASE=@{@var{first_case}@}]
 373         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 374         @dots{}additional subcommands depending on ARRANGEMENT@dots{}
 375 @end display
 376
 377 @cindex text files
 378 @cindex data files
 379 When TYPE=TXT is specified, GET DATA reads data in a delimited or
 380 fixed columnar format, much like DATA LIST (@pxref{DATA LIST}).
 381
 382 The @subcmd{FILE} subcommand is mandatory.  Specify the file to be read as
 383 a string file name or (for textual data only) a
 384 file handle (@pxref{File Handles}).
 385
 386 The @subcmd{ENCODING} subcommand specifies the character encoding of
 387 the file to be read.  @xref{INSERT}, for information on supported
 388 encodings.
 389
 390 The @subcmd{ARRANGEMENT} subcommand determines the file's basic format.
 391 DELIMITED, the default setting, specifies that fields in the input
 392 data are separated by spaces, tabs, or other user-specified
 393 delimiters.  FIXED specifies that fields in the input data appear at
 394 particular fixed column positions within records of a case.
 395
 396 By default, cases are read from the input file starting from the first
 397 line.  To skip lines at the beginning of an input file, set @subcmd{FIRSTCASE}
 398 to the number of the first line to read: 2 to skip the first line, 3
 399 to skip the first two lines, and so on.
 400
 401 @subcmd{IMPORTCASE} can be used to limit the number of cases read from the
 402 input file.  With the default setting, ALL, all cases in the file are
 403 read.  Specify FIRST @var{max_cases} to read at most @var{max_cases} cases
 404 from the file.  Use @subcmd{PERCENT @var{percent}} to read only @var{percent}
 405 percent, approximately, of the cases contained in the file.  (The
 406 percentage is approximate, because there is no way to accurately count
 407 the number of cases in the file without reading the entire file.  The
 408 number of cases in some kinds of unusual files cannot be estimated;
 409 @pspp{} will read all cases in such files.)
 410
 411 @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE} may be used with delimited and fixed-format
 412 data.  The remaining subcommands, which apply only to one of the two  file
 413 arrangements, are described below.
 414
 415 @menu
 416 * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
 417 * GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
 418 @end menu
 419
 420 @node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
 421 @subsubsection Reading Delimited Data
 422
 423 @display
 424 GET DATA /TYPE=TXT
 425         /FILE=@{'@var{file_name}',@var{file_handle}@}
 426         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 427         [/FIRSTCASE=@{@var{first_case}@}]
 428         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 429
 430         /DELIMITERS="@var{delimiters}"
 431         [/QUALIFIER="@var{quotes}" [/ESCAPE]]
 432         [/DELCASE=@{LINE,VARIABLES @var{n_variables}@}]
 433         /VARIABLES=@var{del_var1} [@var{del_var2}]@dots{}
 434 where each @var{del_var} takes the form:
 435         variable format
 436 @end display
 437
 438 The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
 439 input data from text files in delimited format, where fields are
 440 separated by a set of user-specified delimiters.  Its capabilities are
 441 similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
 442 few enhancements.
 443
 444 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
 445 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 446
 447 @subcmd{DELIMITERS}, which is required, specifies the set of characters that
 448 may separate fields.  Each character in the string specified on
 449 @subcmd{DELIMITERS} separates one field from the next.  The end of a line also
 450 separates fields, regardless of @subcmd{DELIMITERS}.  Two consecutive
 451 delimiters in the input yield an empty field, as does a delimiter at
 452 the end of a line.  A space character as a delimiter is an exception:
 453 consecutive spaces do not yield an empty field and neither does any
 454 number of spaces at the end of a line.
 455
 456 To use a tab as a delimiter, specify @samp{\t} at the beginning of the
 457 @subcmd{DELIMITERS} string.  To use a backslash as a delimiter, specify
 458 @samp{\\} as the first delimiter or, if a tab should also be a
 459 delimiter, immediately following @samp{\t}.  To read a data file in
 460 which each field appears on a separate line, specify the empty string
 461 for @subcmd{DELIMITERS}.
 462
 463 The optional @subcmd{QUALIFIER} subcommand names one or more characters that
 464 can be used to quote values within fields in the input.  A field that
 465 begins with one of the specified quote characters ends at the next
 466 matching quote.  Intervening delimiters become part of the field,
 467 instead of terminating it.  The ability to specify more than one quote
 468 character is a @pspp{} extension.
 469
 470 By default, a character specified on @subcmd{QUALIFIER} cannot itself be
 471 embedded within a field that it quotes, because the quote character
 472 always terminates the quoted field.  With ESCAPE, however, a doubled
 473 quote character within a quoted field inserts a single instance of the
 474 quote into the field.  For example, if @samp{'} is specified on
 475 @subcmd{QUALIFIER}, then without ESCAPE @code{'a''b'} specifies a pair of
 476 fields that contain @samp{a} and @samp{b}, but with ESCAPE it
 477 specifies a single field that contains @samp{a'b}.  ESCAPE is a @pspp{}
 478 extension.
 479
 480 The @subcmd{DELCASE} subcommand controls how data may be broken across lines in
 481 the data file.  With LINE, the default setting, each line must contain
 482 all the data for exactly one case.  For additional flexibility, to
 483 allow a single case to be split among lines or multiple cases to be
 484 contained on a single line, specify VARIABLES @i{n_variables}, where
 485 @i{n_variables} is the number of variables per case.
 486
 487 The @subcmd{VARIABLES} subcommand is required and must be the last subcommand.
 488 Specify the name of each variable and its input format (@pxref{Input
 489 and Output Formats}) in the order they should be read from the input
 490 file.
 491
 492 @subsubheading Examples
 493
 494 @noindent
 495 On a Unix-like system, the @samp{/etc/passwd} file has a format
 496 similar to this:
 497
 498 @example
 499 root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
 500 blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
 501 john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
 502 jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
 503 @end example
 504
 505 @noindent
 506 The following syntax reads a file in the format used by
 507 @samp{/etc/passwd}:
 508
 509 @c If you change this example, change the regression test in
 510 @c tests/language/data-io/get-data.at to match.
 511 @example
 512 GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
 513         /VARIABLES=username A20
 514                    password A40
 515                    uid F10
 516                    gid F10
 517                    gecos A40
 518                    home A40
 519                    shell A40.
 520 @end example
 521
 522 @noindent
 523 Consider the following data on used cars:
 524
 525 @example
 526 model   year    mileage price   type    age
 527 Civic   2002    29883   15900   Si      2
 528 Civic   2003    13415   15900   EX      1
 529 Civic   1992    107000  3800    n/a     12
 530 Accord  2002    26613   17900   EX      1
 531 @end example
 532
 533 @noindent
 534 The following syntax can be used to read the used car data:
 535
 536 @c If you change this example, change the regression test in
 537 @c tests/language/data-io/get-data.at to match.
 538 @example
 539 GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
 540         /VARIABLES=model A8
 541                    year F4
 542                    mileage F6
 543                    price F5
 544                    type A4
 545                    age F2.
 546 @end example
 547
 548 @noindent
 549 Consider the following information on animals in a pet store:
 550
 551 @example
 552 'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
 553 , (Years), , , (Dollars), ,
 554 "Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
 555 "Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
 556 "Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
 557 "Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
 558 @end example
 559
 560 @noindent
 561 The following syntax can be used to read the pet store data:
 562
 563 @c If you change this example, change the regression test in
 564 @c tests/language/data-io/get-data.at to match.
 565 @example
 566 GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
 567         /FIRSTCASE=3
 568         /VARIABLES=name A10
 569                    age F3.1
 570                    color A5
 571                    received EDATE10
 572                    price F5.2
 573                    height a5
 574                    type a10.
 575 @end example
 576
 577 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
 578 @subsubsection Reading Fixed Columnar Data
 579
 580 @c (modify-syntax-entry ?_ "w")
 581 @c (modify-syntax-entry ?' "'")
 582 @c (modify-syntax-entry ?@ "'")
 583
 584 @display
 585 GET DATA /TYPE=TXT
 586         /FILE=@{'file_name',@var{file_handle}@}
 587         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 588         [/FIRSTCASE=@{@var{first_case}@}]
 589         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 590
 591         [/FIXCASE=@var{n}]
 592         /VARIABLES @var{fixed_var} [@var{fixed_var}]@dots{}
 593             [/rec# @var{fixed_var} [@var{fixed_var}]@dots{}]@dots{}
 594 where each @var{fixed_var} takes the form:
 595         @var{variable} @var{start}-@var{end} @var{format}
 596 @end display
 597
 598 The @cmd{GET DATA} command with TYPE=TXT and ARRANGEMENT=FIXED reads input
 599 data from text files in fixed format, where each field is located in
 600 particular fixed column positions within records of a case.  Its
 601 capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
 602 FIXED}), with a few enhancements.
 603
 604 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
 605 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 606
 607 The optional @subcmd{FIXCASE} subcommand may be used to specify the positive
 608 integer number of input lines that make up each case.  The default
 609 value is 1.
 610
 611 The @subcmd{VARIABLES} subcommand, which is required, specifies the positions
 612 at which each variable can be found.  For each variable, specify its
 613 name, followed by its start and end column separated by @samp{-}
 614 (e.g.@: @samp{0-9}), followed by an input format type (e.g.@:
 615 @samp{F}) or a full format specification (e.g.@: @samp{DOLLAR12.2}).
 616 For this command, columns are numbered starting from 0 at
 617 the left column.  Introduce the variables in the second and later
 618 lines of a case by a slash followed by the number of the line within
 619 the case, e.g.@: @samp{/2} for the second line.
 620
 621 @subsubheading Examples
 622
 623 @noindent
 624 Consider the following data on used cars:
 625
 626 @example
 627 model   year    mileage price   type    age
 628 Civic   2002    29883   15900   Si      2
 629 Civic   2003    13415   15900   EX      1
 630 Civic   1992    107000  3800    n/a     12
 631 Accord  2002    26613   17900   EX      1
 632 @end example
 633
 634 @noindent
 635 The following syntax can be used to read the used car data:
 636
 637 @c If you change this example, change the regression test in
 638 @c tests/language/data-io/get-data.at to match.
 639 @example
 640 GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
 641         /VARIABLES=model 0-7 A
 642                    year 8-15 F
 643                    mileage 16-23 F
 644                    price 24-31 F
 645                    type 32-40 A
 646                    age 40-47 F.
 647 @end example
 648
 649 @node IMPORT
 650 @section IMPORT
 651 @vindex IMPORT
 652
 653 @display
 654 IMPORT
 655         /FILE='@var{file_name}'
 656         /TYPE=@{COMM,TAPE@}
 657         /DROP=@var{var_list}
 658         /KEEP=@var{var_list}
 659         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 660 @end display
 661
 662 The @cmd{IMPORT} transformation clears the active dataset dictionary and
 663 data and
 664 replaces them with a dictionary and data from a system file or
 665 portable file.
 666
 667 The @subcmd{FILE} subcommand, which is the only required subcommand, specifies
 668 the portable file to be read as a file name string or a file handle
 669 (@pxref{File Handles}).
 670
 671 The @subcmd{TYPE} subcommand is currently not used.
 672
 673 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the syntax used by @cmd{GET} (@pxref{GET}).
 674
 675 @cmd{IMPORT} does not cause the data to be read; only the dictionary.  The
 676 data is read later, when a procedure is executed.
 677
 678 Use of @cmd{IMPORT} to read a system file is a @pspp{} extension.
 679
 680 @node SAVE
 681 @section SAVE
 682 @vindex SAVE
 683
 684 @display
 685 SAVE
 686         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 687         /UNSELECTED=@{RETAIN,DELETE@}
 688         /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
 689         /PERMISSIONS=@{WRITEABLE,READONLY@}
 690         /DROP=@var{var_list}
 691         /KEEP=@var{var_list}
 692         /VERSION=@var{version}
 693         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 694         /NAMES
 695         /MAP
 696 @end display
 697
 698 The @cmd{SAVE} procedure causes the dictionary and data in the active
 699 dataset to
 700 be written to a system file.
 701
 702 OUTFILE is the only required subcommand.  Specify the system file
 703 to be written as a string file name or a file handle
 704 (@pxref{File Handles}).
 705
 706 By default, cases excluded with FILTER are written to the system file.
 707 These can be excluded by specifying @subcmd{DELETE} on the @subcmd{UNSELECTED}
 708 subcommand.  Specifying @subcmd{RETAIN} makes the default explicit.
 709
 710 The @subcmd{UNCOMPRESSED}, @subcmd{COMPRESSED}, and
 711 @subcmd{ZCOMPRESSED} subcommand determine the system file's
 712 compression level:
 713
 714 @table @code
 715 @item UNCOMPRESSED
 716 Data is not compressed.  Each numeric value uses 8 bytes of disk
 717 space.  Each string value uses one byte per column width, rounded up
 718 to a multiple of 8 bytes.
 719
 720 @item COMPRESSED
 721 Data is compressed with a simple algorithm.  Each integer numeric
 722 value between @minus{}99 and 151, inclusive, or system missing value
 723 uses one byte of disk space.  Each 8-byte segment of a string that
 724 consists only of spaces uses 1 byte.  Any other numeric value or
 725 8-byte string segment uses 9 bytes of disk space.
 726
 727 @item ZCOMPRESSED
 728 Data is compressed with the ``deflate'' compression algorithm
 729 specified in RFC@tie{}1951 (the same algorithm used by
 730 @command{gzip}).  Files written with this compression level cannot be
 731 read by PSPP 0.8.1 or earlier or by SPSS 20 or earlier.
 732 @end table
 733
 734 @subcmd{COMPRESSED} is the default compression level.  The SET command
 735 (@pxref{SET}) can change this default.
 736
 737 The @subcmd{PERMISSIONS} subcommand specifies permissions for the new system
 738 file.  WRITEABLE, the default, creates the file with read and write
 739 permission.  READONLY creates the file for read-only access.
 740
 741 By default, all the variables in the active dataset dictionary are written
 742 to the system file.  The @subcmd{DROP} subcommand can be used to specify a list
 743 of variables not to be written.  In contrast, KEEP specifies variables
 744 to be written, with all variables not specified not written.
 745
 746 Normally variables are saved to a system file under the same names they
 747 have in the active dataset.  Use the @subcmd{RENAME} subcommand to change these names.
 748 Specify, within parentheses, a list of variable names followed by an
 749 equals sign (@samp{=}) and the names that they should be renamed to.
 750 Multiple parenthesized groups of variable names can be included on a
 751 single @subcmd{RENAME} subcommand.  Variables' names may be swapped using a
 752 @subcmd{RENAME} subcommand of the
 753 form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
 754
 755 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
 756 eliminated.  When this is done, only a single variable may be renamed at
 757 once.  For instance, @subcmd{/RENAME=@var{A}=@var{B}}.  This alternate syntax is
 758 deprecated.
 759
 760 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are performed in
 761 left-to-right order.  They
 762 each may be present any number of times.  @cmd{SAVE} never modifies
 763 the active dataset.  @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} only
 764 affect the system file written to disk.
 765
 766 The @subcmd{VERSION} subcommand specifies the version of the file format. Valid
 767 versions are 2 and 3.  The default version is 3.  In version 2 system
 768 files, variable names longer than 8 bytes will be truncated.  The two
 769 versions are otherwise identical.
 770
 771 The @subcmd{NAMES} and @subcmd{MAP} subcommands are currently ignored.
 772
 773 @cmd{SAVE} causes the data to be read.  It is a procedure.
 774
 775 @node SAVE TRANSLATE
 776 @section SAVE TRANSLATE
 777 @vindex SAVE TRANSLATE
 778
 779 @display
 780 SAVE TRANSLATE
 781         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 782         /TYPE=@{CSV,TAB@}
 783         [/REPLACE]
 784         [/MISSING=@{IGNORE,RECODE@}]
 785
 786         [/DROP=@var{var_list}]
 787         [/KEEP=@var{var_list}]
 788         [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
 789         [/UNSELECTED=@{RETAIN,DELETE@}]
 790         [/MAP]
 791
 792         @dots{}additional subcommands depending on TYPE@dots{}
 793 @end display
 794
 795 The @cmd{SAVE TRANSLATE} command is used to save data into various
 796 formats understood by other applications.
 797
 798 The @subcmd{OUTFILE} and @subcmd{TYPE} subcommands are mandatory.
 799 @subcmd{OUTFILE} specifies the file to be written, as a string file name or a file handle
 800 (@pxref{File Handles}).  @subcmd{TYPE} determines the type of the file or
 801 source to read.  It must be one of the following:
 802
 803 @table @asis
 804 @item CSV
 805 Comma-separated value format,
 806
 807 @item TAB
 808 Tab-delimited format.
 809 @end table
 810
 811 By default, @cmd{SAVE TRANSLATE} will not overwrite an existing file.  Use
 812 @subcmd{REPLACE} to force an existing file to be overwritten.
 813
 814 With MISSING=IGNORE, the default, @subcmd{SAVE TRANSLATE} treats user-missing
 815 values as if they were not missing.  Specify MISSING=RECODE to output
 816 numeric user-missing values like system-missing values and string
 817 user-missing values as all spaces.
 818
 819 By default, all the variables in the active dataset dictionary are saved
 820 to the system file, but @subcmd{DROP} or @subcmd{KEEP} can select a subset of variable
 821 to save.  The @subcmd{RENAME} subcommand can also be used to change the names
 822 under which variables are saved.  @subcmd{UNSELECTED} determines whether cases
 823 filtered out by the @cmd{FILTER} command are written to the output file.
 824 These subcommands have the same syntax and meaning as on the
 825 @cmd{SAVE} command (@pxref{SAVE}).
 826
 827 Each supported file type has additional subcommands, explained in
 828 separate sections below.
 829
 830 @cmd{SAVE TRANSLATE} causes the data to be read.  It is a procedure.
 831
 832 @menu
 833 * SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
 834 @end menu
 835
 836 @node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
 837 @subsection Writing Comma- and Tab-Separated Data Files
 838
 839 @display
 840 SAVE TRANSLATE
 841         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 842         /TYPE=CSV
 843         [/REPLACE]
 844         [/MISSING=@{IGNORE,RECODE@}]
 845
 846         [/DROP=@var{var_list}]
 847         [/KEEP=@var{var_list}]
 848         [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
 849         [/UNSELECTED=@{RETAIN,DELETE@}]
 850
 851         [/FIELDNAMES]
 852         [/CELLS=@{VALUES,LABELS@}]
 853         [/TEXTOPTIONS DELIMITER='@var{delimiter}']
 854         [/TEXTOPTIONS QUALIFIER='@var{qualifier}']
 855         [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
 856         [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
 857 @end display
 858
 859 The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
 860 comma- or tab-separated value format similar to that described by
 861 RFC@tie{}4180.  Each variable becomes one output column, and each case
 862 becomes one line of output.  If FIELDNAMES is specified, an additional
 863 line at the top of the output file lists variable names.
 864
 865 The CELLS and TEXTOPTIONS FORMAT settings determine how values are
 866 written to the output file:
 867
 868 @table @asis
 869 @item CELLS=VALUES FORMAT=PLAIN (the default settings)
 870 Writes variables to the output in ``plain'' formats that ignore the
 871 details of variable formats.  Numeric values are written as plain
 872 decimal numbers with enough digits to indicate their exact values in
 873 machine representation.  Numeric values include @samp{e} followed by
 874 an exponent if the exponent value would be less than -4 or greater
 875 than 16.  Dates are written in MM/DD/YYYY format and times in HH:MM:SS
 876 format.  WKDAY and MONTH values are written as decimal numbers.
 877
 878 Numeric values use, by default, the decimal point character set with
 879 SET DECIMAL (@pxref{SET DECIMAL}).  Use DECIMAL=DOT or DECIMAL=COMMA
 880 to force a particular decimal point character.
 881
 882 @item CELLS=VALUES FORMAT=VARIABLE
 883 Writes variables using their print formats.  Leading and trailing
 884 spaces are removed from numeric values, and trailing spaces are
 885 removed from string values.
 886
 887 @item CELLS=LABEL FORMAT=PLAIN
 888 @itemx CELLS=LABEL FORMAT=VARIABLE
 889 Writes value labels where they exist, and otherwise writes the values
 890 themselves as described above.
 891 @end table
 892
 893 Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
 894 values are output as a single space.
 895
 896 For TYPE=TAB, tab characters delimit values.  For TYPE=CSV, the
 897 TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
 898 that separate values within a line.  If DELIMITER is specified, then
 899 the specified string separate values.  If DELIMITER is not specified,
 900 then the default is a comma with DECIMAL=DOT or a semicolon with
 901 DECIMAL=COMMA.  If DECIMAL is not given either, it is implied by the
 902 decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
 903
 904 The TEXTOPTIONS QUALIFIER setting specifies a character that is output
 905 before and after a value that contains the delimiter character or the
 906 qualifier character.  The default is a double quote (@samp{"}).  A
 907 qualifier character that appears within a value is doubled.
 908
 909 @node SYSFILE INFO
 910 @section SYSFILE INFO
 911 @vindex SYSFILE INFO
 912
 913 @display
 914 SYSFILE INFO FILE='@var{file_name}' [ENCODING='@var{encoding}'].
 915 @end display
 916
 917 @cmd{SYSFILE INFO} reads the dictionary in an SPSS system file,
 918 SPSS/PC+ system file, or SPSS portable file, and displays the
 919 information in its dictionary.
 920
 921 Specify a file name or file handle.  @cmd{SYSFILE INFO} reads that
 922 file and displays information on its dictionary.
 923
 924 @pspp{} automatically detects the encoding of string data in the file,
 925 when possible.  The character encoding of old SPSS system files cannot
 926 always be guessed correctly, and SPSS/PC+ system files do not include
 927 any indication of their encoding.  Specify the @subcmd{ENCODING}
 928 subcommand with an @acronym{IANA} character set name as its string
 929 argument to override the default, or specify @code{ENCODING='DETECT'}
 930 to analyze and report possibly valid encodings for the system file.
 931 The @subcmd{ENCODING} subcommand is a @pspp{} extension.
 932
 933 @cmd{SYSFILE INFO} does not affect the current active dataset.
 934
 935 @node XEXPORT
 936 @section XEXPORT
 937 @vindex XEXPORT
 938
 939 @display
 940 XEXPORT
 941         /OUTFILE='@var{file_name}'
 942         /DIGITS=@var{n}
 943         /DROP=@var{var_list}
 944         /KEEP=@var{var_list}
 945         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 946         /TYPE=@{COMM,TAPE@}
 947         /MAP
 948 @end display
 949
 950 The @cmd{EXPORT} transformation writes the active dataset dictionary and
 951 data to a specified portable file.
 952
 953 This transformation is a @pspp{} extension.
 954
 955 It is similar to the @cmd{EXPORT} procedure, with two differences:
 956
 957 @itemize
 958 @item
 959 @cmd{XEXPORT} is a transformation, not a procedure.  It is executed when
 960 the data is read by a procedure or procedure-like command.
 961
 962 @item
 963 @cmd{XEXPORT} does not support the @subcmd{UNSELECTED} subcommand.
 964 @end itemize
 965
 966 @xref{EXPORT}, for more information.
 967
 968 @node XSAVE
 969 @section XSAVE
 970 @vindex XSAVE
 971
 972 @display
 973 XSAVE
 974         /OUTFILE='@var{file_name}'
 975         /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
 976         /PERMISSIONS=@{WRITEABLE,READONLY@}
 977         /DROP=@var{var_list}
 978         /KEEP=@var{var_list}
 979         /VERSION=@var{version}
 980         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 981         /NAMES
 982         /MAP
 983 @end display
 984
 985 The @cmd{XSAVE} transformation writes the active dataset's dictionary and
 986 data to a system file.  It is similar to the @cmd{SAVE}
 987 procedure, with two differences:
 988
 989 @itemize
 990 @item
 991 @cmd{XSAVE} is a transformation, not a procedure.  It is executed when
 992 the data is read by a procedure or procedure-like command.
 993
 994 @item
 995 @cmd{XSAVE} does not support the @subcmd{UNSELECTED} subcommand.
 996 @end itemize
 997
 998 @xref{SAVE}, for more information.