doc/files.texi

   1 @c PSPP - a program for statistical analysis.
   2 @c Copyright (C) 2017 Free Software Foundation, Inc.
   3 @c Permission is granted to copy, distribute and/or modify this document
   4 @c under the terms of the GNU Free Documentation License, Version 1.3
   5 @c or any later version published by the Free Software Foundation;
   6 @c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
   7 @c A copy of the license is included in the section entitled "GNU
   8 @c Free Documentation License".
   9 @c
  10 @node System and Portable File IO
  11 @chapter System and Portable File I/O
  12
  13 The commands in this chapter read, write, and examine system files and
  14 portable files.
  15
  16 @menu
  17 * APPLY DICTIONARY::            Apply system file dictionary to active dataset.
  18 * EXPORT::                      Write to a portable file.
  19 * GET::                         Read from a system file.
  20 * GET DATA::                    Read from foreign files.
  21 * IMPORT::                      Read from a portable file.
  22 * SAVE::                        Write to a system file.
  23 * SAVE TRANSLATE::              Write data in foreign file formats.
  24 * SYSFILE INFO::                Display system file dictionary.
  25 * XEXPORT::                     Write to a portable file, as a transformation.
  26 * XSAVE::                       Write to a system file, as a transformation.
  27 @end menu
  28
  29 @node APPLY DICTIONARY
  30 @section APPLY DICTIONARY
  31 @vindex APPLY DICTIONARY
  32
  33 @display
  34 APPLY DICTIONARY FROM=@{'@var{file_name}',@var{file_handle}@}.
  35 @end display
  36
  37 @cmd{APPLY DICTIONARY} applies the variable labels, value labels,
  38 and missing values taken from a file to corresponding
  39 variables in the active dataset.  In some cases it also updates the
  40 weighting variable.
  41
  42 Specify a system file or portable file's name, a data set name
  43 (@pxref{Datasets}), or a file handle name (@pxref{File Handles}).  The
  44 dictionary in the file will be read, but it will not replace the
  45 active dataset's dictionary.  The file's data will not be read.
  46
  47 Only variables with names that exist in both the active dataset and the
  48 system file are considered.  Variables with the same name but different
  49 types (numeric, string) will cause an error message.  Otherwise, the
  50 system file variables' attributes will replace those in their matching
  51 active dataset variables:
  52
  53 @itemize @bullet
  54 @item
  55 If a system file variable has a variable label, then it will replace
  56 the variable label of the active dataset variable.  If the system
  57 file variable does not have a variable label, then the active dataset
  58 variable's variable label, if any, will be retained.
  59
  60 @item
  61 If the system file variable has custom attributes (@pxref{VARIABLE
  62 ATTRIBUTE}), then those attributes replace the active dataset variable's
  63 custom attributes.  If the system file variable does not have custom
  64 attributes, then the active dataset variable's custom attributes, if any,
  65 will be retained.
  66
  67 @item
  68 If the active dataset variable is numeric or short string, then value
  69 labels and missing values, if any, will be copied to the active dataset
  70 variable.  If the system file variable does not have value labels or
  71 missing values, then those in the active dataset variable, if any, will not
  72 be disturbed.
  73 @end itemize
  74
  75 In addition to properties of variables, some properties of the active
  76 file dictionary as a whole are updated:
  77
  78 @itemize @bullet
  79 @item
  80 If the system file has custom attributes (@pxref{DATAFILE ATTRIBUTE}),
  81 then those attributes replace the active dataset variable's custom
  82 attributes.
  83
  84 @item
  85 If the active dataset has a weighting variable (@pxref{WEIGHT}), and the
  86 system file does not, or if the weighting variable in the system file
  87 does not exist in the active dataset, then the active dataset weighting
  88 variable, if any, is retained.  Otherwise, the weighting variable in
  89 the system file becomes the active dataset weighting variable.
  90 @end itemize
  91
  92 @cmd{APPLY DICTIONARY} takes effect immediately.  It does not read the
  93 active dataset.  The system file is not modified.
  94
  95 @node EXPORT
  96 @section EXPORT
  97 @vindex EXPORT
  98
  99 @display
 100 EXPORT
 101         /OUTFILE='@var{file_name}'
 102         /UNSELECTED=@{RETAIN,DELETE@}
 103         /DIGITS=@var{n}
 104         /DROP=@var{var_list}
 105         /KEEP=@var{var_list}
 106         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 107         /TYPE=@{COMM,TAPE@}
 108         /MAP
 109 @end display
 110
 111 The @cmd{EXPORT} procedure writes the active dataset's dictionary and
 112 data to a specified portable file.
 113
 114 By default, cases excluded with FILTER are written to the
 115 file.  These can be excluded by specifying DELETE on the @subcmd{UNSELECTED}
 116 subcommand.  Specifying RETAIN makes the default explicit.
 117
 118 Portable files express real numbers in base 30.  Integers are always
 119 expressed to the maximum precision needed to make them exact.
 120 Non-integers are, by default, expressed to the machine's maximum
 121 natural precision (approximately 15 decimal digits on many machines).
 122 If many numbers require this many digits, the portable file may
 123 significantly increase in size.  As an alternative, the @subcmd{DIGITS}
 124 subcommand may be used to specify the number of decimal digits of
 125 precision to write.  @subcmd{DIGITS} applies only to non-integers.
 126
 127 The @subcmd{OUTFILE} subcommand, which is the only required subcommand, specifies
 128 the portable file to be written as a file name string or
 129 a file handle (@pxref{File Handles}).
 130
 131 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the same format as the
 132 @subcmd{SAVE} procedure (@pxref{SAVE}).
 133
 134 The @subcmd{TYPE} subcommand specifies the character set for use in the
 135 portable file.  Its value is currently not used.
 136
 137 The @subcmd{MAP} subcommand is currently ignored.
 138
 139 @cmd{EXPORT} is a procedure.  It causes the active dataset to be read.
 140
 141 @node GET
 142 @section GET
 143 @vindex GET
 144
 145 @display
 146 GET
 147         /FILE=@{'@var{file_name}',@var{file_handle}@}
 148         /DROP=@var{var_list}
 149         /KEEP=@var{var_list}
 150         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 151         /ENCODING='@var{encoding}'
 152 @end display
 153
 154 @cmd{GET} clears the current dictionary and active dataset and
 155 replaces them with the dictionary and data from a specified file.
 156
 157 The @subcmd{FILE} subcommand is the only required subcommand.  Specify
 158 the SPSS system file, SPSS/PC+ system file, or SPSS portable file to
 159 be read as a string file name or a file handle (@pxref{File Handles}).
 160
 161 By default, all the variables in a file are read.  The DROP
 162 subcommand can be used to specify a list of variables that are not to be
 163 read.  By contrast, the @subcmd{KEEP} subcommand can be used to specify
 164 variable that are to be read, with all other variables not read.
 165
 166 Normally variables in a file retain the names that they were
 167 saved under.  Use the @subcmd{RENAME} subcommand to change these names.
 168 Specify,
 169 within parentheses, a list of variable names followed by an equals sign
 170 (@samp{=}) and the names that they should be renamed to.  Multiple
 171 parenthesized groups of variable names can be included on a single
 172 @subcmd{RENAME} subcommand.
 173 Variables' names may be swapped using a @subcmd{RENAME}
 174 subcommand of the form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
 175
 176 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
 177 eliminated.  When this is done, only a single variable may be renamed at
 178 once.  For instance, @subcmd{/RENAME=@var{A}=@var{B}}.  This alternate syntax is
 179 deprecated.
 180
 181 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are executed in left-to-right order.
 182 Each may be present any number of times.  @cmd{GET} never modifies a
 183 file on disk.  Only the active dataset read from the file
 184 is affected by these subcommands.
 185
 186 @pspp{} automatically detects the encoding of string data in the file,
 187 when possible.  The character encoding of old SPSS system files cannot
 188 always be guessed correctly, and SPSS/PC+ system files do not include
 189 any indication of their encoding.  Specify the @subcmd{ENCODING}
 190 subcommand with an @acronym{IANA} character set name as its string
 191 argument to override the default.  Use @cmd{SYSFILE INFO} to analyze
 192 the encodings that might be valid for a system file.  The
 193 @subcmd{ENCODING} subcommand is a @pspp{} extension.
 194
 195 @cmd{GET} does not cause the data to be read, only the dictionary.  The data
 196 is read later, when a procedure is executed.
 197
 198 Use of @cmd{GET} to read a portable file is a @pspp{} extension.
 199
 200 @node GET DATA
 201 @section GET DATA
 202 @vindex GET DATA
 203
 204 @display
 205 GET DATA
 206         /TYPE=@{GNM,ODS,PSQL,TXT@}
 207         @dots{}additional subcommands depending on TYPE@dots{}
 208 @end display
 209
 210 The @cmd{GET DATA} command is used to read files and other data
 211 sources created by other applications.  When this command is executed,
 212 the current dictionary and active dataset are replaced with variables
 213 and data read from the specified source.
 214
 215 The @subcmd{TYPE} subcommand is mandatory and must be the first subcommand
 216 specified.  It determines the type of the file or source to read.
 217 @pspp{} currently supports the following file types:
 218
 219 @table @asis
 220 @item GNM
 221 Spreadsheet files created by Gnumeric (@url{http://gnumeric.org}).
 222
 223 @item ODS
 224 Spreadsheet files in OpenDocument format (@url{http://opendocumentformat.org}).
 225
 226 @item PSQL
 227 Relations from PostgreSQL databases (@url{http://postgresql.org}).
 228
 229 @item TXT
 230 Textual data files in columnar and delimited formats.
 231 @end table
 232
 233 Each supported file type has additional subcommands, explained in
 234 separate sections below.
 235
 236 @menu
 237 * GET DATA /TYPE=GNM/ODS::     Spreadsheets
 238 * GET DATA /TYPE=PSQL::        Databases
 239 * GET DATA /TYPE=TXT::         Delimited Text Files
 240 @end menu
 241
 242 @node GET DATA /TYPE=GNM/ODS
 243 @subsection Spreadsheet Files
 244
 245 @display
 246 GET DATA /TYPE=@{GNM, ODS@}
 247         /FILE=@{'@var{file_name}'@}
 248         /SHEET=@{NAME '@var{sheet_name}', INDEX @var{n}@}
 249         /CELLRANGE=@{RANGE '@var{range}', FULL@}
 250         /READNAMES=@{ON, OFF@}
 251         /ASSUMEDSTRWIDTH=@var{n}.
 252 @end display
 253
 254 @cindex Gnumeric
 255 @cindex OpenDocument
 256 @cindex spreadsheet files
 257
 258 Gnumeric spreadsheets (@url{http://gnumeric.org}), and spreadsheets
 259 in OpenDocument format
 260 (@url{http://libreplanet.org/wiki/Group:OpenDocument/Software})
 261 can be read using the @cmd{GET DATA} command.
 262 Use the @subcmd{TYPE} subcommand to indicate the file's format.
 263 /TYPE=GNM indicates Gnumeric files,
 264 /TYPE=ODS indicates OpenDocument.
 265 The @subcmd{FILE} subcommand is mandatory.
 266 Use it to specify the name file to be read.
 267 All other subcommands are optional.
 268
 269 The format of each variable is determined by the format of the spreadsheet
 270 cell containing the first datum for the variable.
 271 If this cell is of string (text) format, then the width of the variable is
 272 determined from the length of the string it contains, unless the
 273 @subcmd{ASSUMEDSTRWIDTH} subcommand is given.
 274
 275 The @subcmd{SHEET} subcommand specifies the sheet within the spreadsheet file to read.
 276 There are two forms of the @subcmd{SHEET} subcommand.
 277 In the first form,
 278 @subcmd{/SHEET=name @var{sheet_name}}, the string @var{sheet_name} is the
 279 name of the sheet to read.
 280 In the second form, @subcmd{/SHEET=index @var{idx}}, @var{idx} is a
 281 integer which is the index of the sheet to read.
 282 The first sheet has the index 1.
 283 If the @subcmd{SHEET} subcommand is omitted, then the command will read the
 284 first sheet in the file.
 285
 286 The @subcmd{CELLRANGE} subcommand specifies the range of cells within the sheet to read.
 287 If the subcommand is given as @subcmd{/CELLRANGE=FULL}, then the entire
 288 sheet  is read.
 289 To read only part of a sheet, use the form
 290 @subcmd{/CELLRANGE=range '@var{top_left_cell}:@var{bottom_right_cell}'}.
 291 For example, the subcommand @subcmd{/CELLRANGE=range 'C3:P19'} reads
 292 columns C--P, and rows 3--19 inclusive.
 293 If no @subcmd{CELLRANGE} subcommand is given, then the entire sheet is read.
 294
 295 If @subcmd{/READNAMES=ON} is specified, then the contents of cells of
 296 the first row are used as the names of the variables in which to store
 297 the data from subsequent rows.  This is the default.
 298 If @subcmd{/READNAMES=OFF} is
 299 used, then the variables  receive automatically assigned names.
 300
 301 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
 302 variables read  from the file.
 303 If omitted, the default value is determined from the length of the
 304 string in the first spreadsheet cell for each variable.
 305
 306
 307 @node GET DATA /TYPE=PSQL
 308 @subsection Postgres Database Queries
 309
 310 @display
 311 GET DATA /TYPE=PSQL
 312          /CONNECT=@{@var{connection info}@}
 313          /SQL=@{@var{query}@}
 314          [/ASSUMEDSTRWIDTH=@var{w}]
 315          [/UNENCRYPTED]
 316          [/BSIZE=@var{n}].
 317 @end display
 318
 319 @cindex postgres
 320 @cindex databases
 321
 322 The PSQL type is used to import data from a postgres database server.
 323 The server may be located locally or remotely.
 324 Variables are automatically created based on the table column names
 325 or the names specified in the SQL query.
 326 Postgres data types of high precision, will loose precision when
 327 imported into @pspp{}.
 328 Not all the postgres data types are able to be represented in @pspp{}.
 329 If a datum cannot be represented a warning will be issued and that
 330 datum will be set to SYSMIS.
 331
 332 The @subcmd{CONNECT} subcommand is mandatory.
 333 It is a string specifying the parameters of the database server from
 334 which the data should be fetched.
 335 The format of the string is given in the postgres manual
 336 @url{http://www.postgresql.org/docs/8.0/static/libpq.html#LIBPQ-CONNECT}.
 337
 338 The @subcmd{SQL} subcommand is mandatory.
 339 It must be a valid SQL string to retrieve data from the database.
 340
 341 The @subcmd{ASSUMEDSTRWIDTH} subcommand specifies the maximum width of string
 342 variables read  from the database.
 343 If omitted, the default value is determined from the length of the
 344 string in the first value read for each variable.
 345
 346 The @subcmd{UNENCRYPTED} subcommand allows data to be retrieved over an insecure
 347 connection.
 348 If the connection is not encrypted, and the @subcmd{UNENCRYPTED} subcommand is
 349 not given, then an error will occur.
 350 Whether or not the connection is
 351 encrypted depends upon the underlying psql library and the
 352 capabilities of the database server.
 353
 354 The @subcmd{BSIZE} subcommand serves only to optimise the speed of data transfer.
 355 It specifies an upper limit on
 356 number of cases to fetch from the database at once.
 357 The default value is 4096.
 358 If your SQL statement fetches a large number of cases but only a small number of
 359 variables, then the data transfer may be faster if you increase this value.
 360 Conversely, if the number of variables is large, or if the machine on which
 361 @pspp{} is running has only a
 362 small amount of memory, then a smaller value will be better.
 363
 364
 365 The following syntax is an example:
 366 @example
 367 GET DATA /TYPE=PSQL
 368      /CONNECT='host=example.com port=5432 dbname=product user=fred passwd=xxxx'
 369      /SQL='select * from manufacturer'.
 370 @end example
 371
 372
 373 @node GET DATA /TYPE=TXT
 374 @subsection Textual Data Files
 375
 376 @display
 377 GET DATA /TYPE=TXT
 378         /FILE=@{'@var{file_name}',@var{file_handle}@}
 379         [ENCODING='@var{encoding}']
 380         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 381         [/FIRSTCASE=@{@var{first_case}@}]
 382         [/IMPORTCASES=...]
 383         @dots{}additional subcommands depending on ARRANGEMENT@dots{}
 384 @end display
 385
 386 @cindex text files
 387 @cindex data files
 388 When TYPE=TXT is specified, GET DATA reads data in a delimited or
 389 fixed columnar format, much like DATA LIST (@pxref{DATA LIST}).
 390
 391 The @subcmd{FILE} subcommand is mandatory.  Specify the file to be read as
 392 a string file name or (for textual data only) a
 393 file handle (@pxref{File Handles}).
 394
 395 The @subcmd{ENCODING} subcommand specifies the character encoding of
 396 the file to be read.  @xref{INSERT}, for information on supported
 397 encodings.
 398
 399 The @subcmd{ARRANGEMENT} subcommand determines the file's basic format.
 400 DELIMITED, the default setting, specifies that fields in the input
 401 data are separated by spaces, tabs, or other user-specified
 402 delimiters.  FIXED specifies that fields in the input data appear at
 403 particular fixed column positions within records of a case.
 404
 405 By default, cases are read from the input file starting from the first
 406 line.  To skip lines at the beginning of an input file, set @subcmd{FIRSTCASE}
 407 to the number of the first line to read: 2 to skip the first line, 3
 408 to skip the first two lines, and so on.
 409
 410 @subcmd{IMPORTCASES} is ignored, for compatibility.  Use @cmd{N OF
 411 CASES} to limit the number of cases read from a file (@pxref{N OF
 412 CASES}), or @cmd{SAMPLE} to obtain a random sample of cases
 413 (@pxref{SAMPLE}).
 414
 415 The remaining subcommands apply only to one of the two file
 416 arrangements, described below.
 417
 418 @menu
 419 * GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED::
 420 * GET DATA /TYPE=TXT /ARRANGEMENT=FIXED::
 421 @end menu
 422
 423 @node GET DATA /TYPE=TXT /ARRANGEMENT=DELIMITED
 424 @subsubsection Reading Delimited Data
 425
 426 @display
 427 GET DATA /TYPE=TXT
 428         /FILE=@{'@var{file_name}',@var{file_handle}@}
 429         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 430         [/FIRSTCASE=@{@var{first_case}@}]
 431         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 432
 433         /DELIMITERS="@var{delimiters}"
 434         [/QUALIFIER="@var{quotes}"
 435         [/DELCASE=@{LINE,VARIABLES @var{n_variables}@}]
 436         /VARIABLES=@var{del_var1} [@var{del_var2}]@dots{}
 437 where each @var{del_var} takes the form:
 438         variable format
 439 @end display
 440
 441 The GET DATA command with TYPE=TXT and ARRANGEMENT=DELIMITED reads
 442 input data from text files in delimited format, where fields are
 443 separated by a set of user-specified delimiters.  Its capabilities are
 444 similar to those of DATA LIST FREE (@pxref{DATA LIST FREE}), with a
 445 few enhancements.
 446
 447 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
 448 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 449
 450 @subcmd{DELIMITERS}, which is required, specifies the set of characters that
 451 may separate fields.  Each character in the string specified on
 452 @subcmd{DELIMITERS} separates one field from the next.  The end of a line also
 453 separates fields, regardless of @subcmd{DELIMITERS}.  Two consecutive
 454 delimiters in the input yield an empty field, as does a delimiter at
 455 the end of a line.  A space character as a delimiter is an exception:
 456 consecutive spaces do not yield an empty field and neither does any
 457 number of spaces at the end of a line.
 458
 459 To use a tab as a delimiter, specify @samp{\t} at the beginning of the
 460 @subcmd{DELIMITERS} string.  To use a backslash as a delimiter, specify
 461 @samp{\\} as the first delimiter or, if a tab should also be a
 462 delimiter, immediately following @samp{\t}.  To read a data file in
 463 which each field appears on a separate line, specify the empty string
 464 for @subcmd{DELIMITERS}.
 465
 466 The optional @subcmd{QUALIFIER} subcommand names one or more characters that
 467 can be used to quote values within fields in the input.  A field that
 468 begins with one of the specified quote characters ends at the next
 469 matching quote.  Intervening delimiters become part of the field,
 470 instead of terminating it.  The ability to specify more than one quote
 471 character is a @pspp{} extension.
 472
 473 The character specified on @subcmd{QUALIFIER} can be embedded within a
 474 field that it quotes by doubling the qualifier.  For example, if
 475 @samp{'} is specified on @subcmd{QUALIFIER}, then @code{'a''b'}
 476 specifies a field that contains @samp{a'b}.
 477
 478 The @subcmd{DELCASE} subcommand controls how data may be broken across lines in
 479 the data file.  With LINE, the default setting, each line must contain
 480 all the data for exactly one case.  For additional flexibility, to
 481 allow a single case to be split among lines or multiple cases to be
 482 contained on a single line, specify VARIABLES @i{n_variables}, where
 483 @i{n_variables} is the number of variables per case.
 484
 485 The @subcmd{VARIABLES} subcommand is required and must be the last subcommand.
 486 Specify the name of each variable and its input format (@pxref{Input
 487 and Output Formats}) in the order they should be read from the input
 488 file.
 489
 490 @subsubheading Examples
 491
 492 @noindent
 493 On a Unix-like system, the @samp{/etc/passwd} file has a format
 494 similar to this:
 495
 496 @example
 497 root:$1$nyeSP5gD$pDq/:0:0:,,,:/root:/bin/bash
 498 blp:$1$BrP/pFg4$g7OG:1000:1000:Ben Pfaff,,,:/home/blp:/bin/bash
 499 john:$1$JBuq/Fioq$g4A:1001:1001:John Darrington,,,:/home/john:/bin/bash
 500 jhs:$1$D3li4hPL$88X1:1002:1002:Jason Stover,,,:/home/jhs:/bin/csh
 501 @end example
 502
 503 @noindent
 504 The following syntax reads a file in the format used by
 505 @samp{/etc/passwd}:
 506
 507 @c If you change this example, change the regression test in
 508 @c tests/language/data-io/get-data.at to match.
 509 @example
 510 GET DATA /TYPE=TXT /FILE='/etc/passwd' /DELIMITERS=':'
 511         /VARIABLES=username A20
 512                    password A40
 513                    uid F10
 514                    gid F10
 515                    gecos A40
 516                    home A40
 517                    shell A40.
 518 @end example
 519
 520 @noindent
 521 Consider the following data on used cars:
 522
 523 @example
 524 model   year    mileage price   type    age
 525 Civic   2002    29883   15900   Si      2
 526 Civic   2003    13415   15900   EX      1
 527 Civic   1992    107000  3800    n/a     12
 528 Accord  2002    26613   17900   EX      1
 529 @end example
 530
 531 @noindent
 532 The following syntax can be used to read the used car data:
 533
 534 @c If you change this example, change the regression test in
 535 @c tests/language/data-io/get-data.at to match.
 536 @example
 537 GET DATA /TYPE=TXT /FILE='cars.data' /DELIMITERS=' ' /FIRSTCASE=2
 538         /VARIABLES=model A8
 539                    year F4
 540                    mileage F6
 541                    price F5
 542                    type A4
 543                    age F2.
 544 @end example
 545
 546 @noindent
 547 Consider the following information on animals in a pet store:
 548
 549 @example
 550 'Pet''s Name', "Age", "Color", "Date Received", "Price", "Height", "Type"
 551 , (Years), , , (Dollars), ,
 552 "Rover", 4.5, Brown, "12 Feb 2004", 80, '1''4"', "Dog"
 553 "Charlie", , Gold, "5 Apr 2007", 12.3, "3""", "Fish"
 554 "Molly", 2, Black, "12 Dec 2006", 25, '5"', "Cat"
 555 "Gilly", , White, "10 Apr 2007", 10, "3""", "Guinea Pig"
 556 @end example
 557
 558 @noindent
 559 The following syntax can be used to read the pet store data:
 560
 561 @c If you change this example, change the regression test in
 562 @c tests/language/data-io/get-data.at to match.
 563 @example
 564 GET DATA /TYPE=TXT /FILE='pets.data' /DELIMITERS=', ' /QUALIFIER='''"' /ESCAPE
 565         /FIRSTCASE=3
 566         /VARIABLES=name A10
 567                    age F3.1
 568                    color A5
 569                    received EDATE10
 570                    price F5.2
 571                    height a5
 572                    type a10.
 573 @end example
 574
 575 @node GET DATA /TYPE=TXT /ARRANGEMENT=FIXED
 576 @subsubsection Reading Fixed Columnar Data
 577
 578 @c (modify-syntax-entry ?_ "w")
 579 @c (modify-syntax-entry ?' "'")
 580 @c (modify-syntax-entry ?@ "'")
 581
 582 @display
 583 GET DATA /TYPE=TXT
 584         /FILE=@{'file_name',@var{file_handle}@}
 585         [/ARRANGEMENT=@{DELIMITED,FIXED@}]
 586         [/FIRSTCASE=@{@var{first_case}@}]
 587         [/IMPORTCASE=@{ALL,FIRST @var{max_cases},PERCENT @var{percent}@}]
 588
 589         [/FIXCASE=@var{n}]
 590         /VARIABLES @var{fixed_var} [@var{fixed_var}]@dots{}
 591             [/rec# @var{fixed_var} [@var{fixed_var}]@dots{}]@dots{}
 592 where each @var{fixed_var} takes the form:
 593         @var{variable} @var{start}-@var{end} @var{format}
 594 @end display
 595
 596 The @cmd{GET DATA} command with TYPE=TXT and ARRANGEMENT=FIXED reads input
 597 data from text files in fixed format, where each field is located in
 598 particular fixed column positions within records of a case.  Its
 599 capabilities are similar to those of DATA LIST FIXED (@pxref{DATA LIST
 600 FIXED}), with a few enhancements.
 601
 602 The required @subcmd{FILE} subcommand and optional @subcmd{FIRSTCASE} and @subcmd{IMPORTCASE}
 603 subcommands are described above (@pxref{GET DATA /TYPE=TXT}).
 604
 605 The optional @subcmd{FIXCASE} subcommand may be used to specify the positive
 606 integer number of input lines that make up each case.  The default
 607 value is 1.
 608
 609 The @subcmd{VARIABLES} subcommand, which is required, specifies the positions
 610 at which each variable can be found.  For each variable, specify its
 611 name, followed by its start and end column separated by @samp{-}
 612 (e.g.@: @samp{0-9}), followed by an input format type (e.g.@:
 613 @samp{F}) or a full format specification (e.g.@: @samp{DOLLAR12.2}).
 614 For this command, columns are numbered starting from 0 at
 615 the left column.  Introduce the variables in the second and later
 616 lines of a case by a slash followed by the number of the line within
 617 the case, e.g.@: @samp{/2} for the second line.
 618
 619 @subsubheading Examples
 620
 621 @noindent
 622 Consider the following data on used cars:
 623
 624 @example
 625 model   year    mileage price   type    age
 626 Civic   2002    29883   15900   Si      2
 627 Civic   2003    13415   15900   EX      1
 628 Civic   1992    107000  3800    n/a     12
 629 Accord  2002    26613   17900   EX      1
 630 @end example
 631
 632 @noindent
 633 The following syntax can be used to read the used car data:
 634
 635 @c If you change this example, change the regression test in
 636 @c tests/language/data-io/get-data.at to match.
 637 @example
 638 GET DATA /TYPE=TXT /FILE='cars.data' /ARRANGEMENT=FIXED /FIRSTCASE=2
 639         /VARIABLES=model 0-7 A
 640                    year 8-15 F
 641                    mileage 16-23 F
 642                    price 24-31 F
 643                    type 32-40 A
 644                    age 40-47 F.
 645 @end example
 646
 647 @node IMPORT
 648 @section IMPORT
 649 @vindex IMPORT
 650
 651 @display
 652 IMPORT
 653         /FILE='@var{file_name}'
 654         /TYPE=@{COMM,TAPE@}
 655         /DROP=@var{var_list}
 656         /KEEP=@var{var_list}
 657         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 658 @end display
 659
 660 The @cmd{IMPORT} transformation clears the active dataset dictionary and
 661 data and
 662 replaces them with a dictionary and data from a system file or
 663 portable file.
 664
 665 The @subcmd{FILE} subcommand, which is the only required subcommand, specifies
 666 the portable file to be read as a file name string or a file handle
 667 (@pxref{File Handles}).
 668
 669 The @subcmd{TYPE} subcommand is currently not used.
 670
 671 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} follow the syntax used by @cmd{GET} (@pxref{GET}).
 672
 673 @cmd{IMPORT} does not cause the data to be read; only the dictionary.  The
 674 data is read later, when a procedure is executed.
 675
 676 Use of @cmd{IMPORT} to read a system file is a @pspp{} extension.
 677
 678 @node SAVE
 679 @section SAVE
 680 @vindex SAVE
 681
 682 @display
 683 SAVE
 684         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 685         /UNSELECTED=@{RETAIN,DELETE@}
 686         /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
 687         /PERMISSIONS=@{WRITEABLE,READONLY@}
 688         /DROP=@var{var_list}
 689         /KEEP=@var{var_list}
 690         /VERSION=@var{version}
 691         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 692         /NAMES
 693         /MAP
 694 @end display
 695
 696 The @cmd{SAVE} procedure causes the dictionary and data in the active
 697 dataset to
 698 be written to a system file.
 699
 700 OUTFILE is the only required subcommand.  Specify the system file
 701 to be written as a string file name or a file handle
 702 (@pxref{File Handles}).
 703
 704 By default, cases excluded with FILTER are written to the system file.
 705 These can be excluded by specifying @subcmd{DELETE} on the @subcmd{UNSELECTED}
 706 subcommand.  Specifying @subcmd{RETAIN} makes the default explicit.
 707
 708 The @subcmd{UNCOMPRESSED}, @subcmd{COMPRESSED}, and
 709 @subcmd{ZCOMPRESSED} subcommand determine the system file's
 710 compression level:
 711
 712 @table @code
 713 @item UNCOMPRESSED
 714 Data is not compressed.  Each numeric value uses 8 bytes of disk
 715 space.  Each string value uses one byte per column width, rounded up
 716 to a multiple of 8 bytes.
 717
 718 @item COMPRESSED
 719 Data is compressed with a simple algorithm.  Each integer numeric
 720 value between @minus{}99 and 151, inclusive, or system missing value
 721 uses one byte of disk space.  Each 8-byte segment of a string that
 722 consists only of spaces uses 1 byte.  Any other numeric value or
 723 8-byte string segment uses 9 bytes of disk space.
 724
 725 @item ZCOMPRESSED
 726 Data is compressed with the ``deflate'' compression algorithm
 727 specified in RFC@tie{}1951 (the same algorithm used by
 728 @command{gzip}).  Files written with this compression level cannot be
 729 read by PSPP 0.8.1 or earlier or by SPSS 20 or earlier.
 730 @end table
 731
 732 @subcmd{COMPRESSED} is the default compression level.  The SET command
 733 (@pxref{SET}) can change this default.
 734
 735 The @subcmd{PERMISSIONS} subcommand specifies permissions for the new system
 736 file.  WRITEABLE, the default, creates the file with read and write
 737 permission.  READONLY creates the file for read-only access.
 738
 739 By default, all the variables in the active dataset dictionary are written
 740 to the system file.  The @subcmd{DROP} subcommand can be used to specify a list
 741 of variables not to be written.  In contrast, KEEP specifies variables
 742 to be written, with all variables not specified not written.
 743
 744 Normally variables are saved to a system file under the same names they
 745 have in the active dataset.  Use the @subcmd{RENAME} subcommand to change these names.
 746 Specify, within parentheses, a list of variable names followed by an
 747 equals sign (@samp{=}) and the names that they should be renamed to.
 748 Multiple parenthesized groups of variable names can be included on a
 749 single @subcmd{RENAME} subcommand.  Variables' names may be swapped using a
 750 @subcmd{RENAME} subcommand of the
 751 form @subcmd{/RENAME=(@var{A} @var{B}=@var{B} @var{A})}.
 752
 753 Alternate syntax for the @subcmd{RENAME} subcommand allows the parentheses to be
 754 eliminated.  When this is done, only a single variable may be renamed at
 755 once.  For instance, @subcmd{/RENAME=@var{A}=@var{B}}.  This alternate syntax is
 756 deprecated.
 757
 758 @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} are performed in
 759 left-to-right order.  They
 760 each may be present any number of times.  @cmd{SAVE} never modifies
 761 the active dataset.  @subcmd{DROP}, @subcmd{KEEP}, and @subcmd{RENAME} only
 762 affect the system file written to disk.
 763
 764 The @subcmd{VERSION} subcommand specifies the version of the file format. Valid
 765 versions are 2 and 3.  The default version is 3.  In version 2 system
 766 files, variable names longer than 8 bytes will be truncated.  The two
 767 versions are otherwise identical.
 768
 769 The @subcmd{NAMES} and @subcmd{MAP} subcommands are currently ignored.
 770
 771 @cmd{SAVE} causes the data to be read.  It is a procedure.
 772
 773 @node SAVE TRANSLATE
 774 @section SAVE TRANSLATE
 775 @vindex SAVE TRANSLATE
 776
 777 @display
 778 SAVE TRANSLATE
 779         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 780         /TYPE=@{CSV,TAB@}
 781         [/REPLACE]
 782         [/MISSING=@{IGNORE,RECODE@}]
 783
 784         [/DROP=@var{var_list}]
 785         [/KEEP=@var{var_list}]
 786         [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
 787         [/UNSELECTED=@{RETAIN,DELETE@}]
 788         [/MAP]
 789
 790         @dots{}additional subcommands depending on TYPE@dots{}
 791 @end display
 792
 793 The @cmd{SAVE TRANSLATE} command is used to save data into various
 794 formats understood by other applications.
 795
 796 The @subcmd{OUTFILE} and @subcmd{TYPE} subcommands are mandatory.
 797 @subcmd{OUTFILE} specifies the file to be written, as a string file name or a file handle
 798 (@pxref{File Handles}).  @subcmd{TYPE} determines the type of the file or
 799 source to read.  It must be one of the following:
 800
 801 @table @asis
 802 @item CSV
 803 Comma-separated value format,
 804
 805 @item TAB
 806 Tab-delimited format.
 807 @end table
 808
 809 By default, @cmd{SAVE TRANSLATE} will not overwrite an existing file.  Use
 810 @subcmd{REPLACE} to force an existing file to be overwritten.
 811
 812 With MISSING=IGNORE, the default, @subcmd{SAVE TRANSLATE} treats user-missing
 813 values as if they were not missing.  Specify MISSING=RECODE to output
 814 numeric user-missing values like system-missing values and string
 815 user-missing values as all spaces.
 816
 817 By default, all the variables in the active dataset dictionary are saved
 818 to the system file, but @subcmd{DROP} or @subcmd{KEEP} can select a subset of variable
 819 to save.  The @subcmd{RENAME} subcommand can also be used to change the names
 820 under which variables are saved.  @subcmd{UNSELECTED} determines whether cases
 821 filtered out by the @cmd{FILTER} command are written to the output file.
 822 These subcommands have the same syntax and meaning as on the
 823 @cmd{SAVE} command (@pxref{SAVE}).
 824
 825 Each supported file type has additional subcommands, explained in
 826 separate sections below.
 827
 828 @cmd{SAVE TRANSLATE} causes the data to be read.  It is a procedure.
 829
 830 @menu
 831 * SAVE TRANSLATE /TYPE=CSV and TYPE=TAB::
 832 @end menu
 833
 834 @node SAVE TRANSLATE /TYPE=CSV and TYPE=TAB
 835 @subsection Writing Comma- and Tab-Separated Data Files
 836
 837 @display
 838 SAVE TRANSLATE
 839         /OUTFILE=@{'@var{file_name}',@var{file_handle}@}
 840         /TYPE=CSV
 841         [/REPLACE]
 842         [/MISSING=@{IGNORE,RECODE@}]
 843
 844         [/DROP=@var{var_list}]
 845         [/KEEP=@var{var_list}]
 846         [/RENAME=(@var{src_names}=@var{target_names})@dots{}]
 847         [/UNSELECTED=@{RETAIN,DELETE@}]
 848
 849         [/FIELDNAMES]
 850         [/CELLS=@{VALUES,LABELS@}]
 851         [/TEXTOPTIONS DELIMITER='@var{delimiter}']
 852         [/TEXTOPTIONS QUALIFIER='@var{qualifier}']
 853         [/TEXTOPTIONS DECIMAL=@{DOT,COMMA@}]
 854         [/TEXTOPTIONS FORMAT=@{PLAIN,VARIABLE@}]
 855 @end display
 856
 857 The SAVE TRANSLATE command with TYPE=CSV or TYPE=TAB writes data in a
 858 comma- or tab-separated value format similar to that described by
 859 RFC@tie{}4180.  Each variable becomes one output column, and each case
 860 becomes one line of output.  If FIELDNAMES is specified, an additional
 861 line at the top of the output file lists variable names.
 862
 863 The CELLS and TEXTOPTIONS FORMAT settings determine how values are
 864 written to the output file:
 865
 866 @table @asis
 867 @item CELLS=VALUES FORMAT=PLAIN (the default settings)
 868 Writes variables to the output in ``plain'' formats that ignore the
 869 details of variable formats.  Numeric values are written as plain
 870 decimal numbers with enough digits to indicate their exact values in
 871 machine representation.  Numeric values include @samp{e} followed by
 872 an exponent if the exponent value would be less than -4 or greater
 873 than 16.  Dates are written in MM/DD/YYYY format and times in HH:MM:SS
 874 format.  WKDAY and MONTH values are written as decimal numbers.
 875
 876 Numeric values use, by default, the decimal point character set with
 877 SET DECIMAL (@pxref{SET DECIMAL}).  Use DECIMAL=DOT or DECIMAL=COMMA
 878 to force a particular decimal point character.
 879
 880 @item CELLS=VALUES FORMAT=VARIABLE
 881 Writes variables using their print formats.  Leading and trailing
 882 spaces are removed from numeric values, and trailing spaces are
 883 removed from string values.
 884
 885 @item CELLS=LABEL FORMAT=PLAIN
 886 @itemx CELLS=LABEL FORMAT=VARIABLE
 887 Writes value labels where they exist, and otherwise writes the values
 888 themselves as described above.
 889 @end table
 890
 891 Regardless of CELLS and TEXTOPTIONS FORMAT, numeric system-missing
 892 values are output as a single space.
 893
 894 For TYPE=TAB, tab characters delimit values.  For TYPE=CSV, the
 895 TEXTOPTIONS DELIMITER and DECIMAL settings determine the character
 896 that separate values within a line.  If DELIMITER is specified, then
 897 the specified string separate values.  If DELIMITER is not specified,
 898 then the default is a comma with DECIMAL=DOT or a semicolon with
 899 DECIMAL=COMMA.  If DECIMAL is not given either, it is implied by the
 900 decimal point character set with SET DECIMAL (@pxref{SET DECIMAL}).
 901
 902 The TEXTOPTIONS QUALIFIER setting specifies a character that is output
 903 before and after a value that contains the delimiter character or the
 904 qualifier character.  The default is a double quote (@samp{"}).  A
 905 qualifier character that appears within a value is doubled.
 906
 907 @node SYSFILE INFO
 908 @section SYSFILE INFO
 909 @vindex SYSFILE INFO
 910
 911 @display
 912 SYSFILE INFO FILE='@var{file_name}' [ENCODING='@var{encoding}'].
 913 @end display
 914
 915 @cmd{SYSFILE INFO} reads the dictionary in an SPSS system file,
 916 SPSS/PC+ system file, or SPSS portable file, and displays the
 917 information in its dictionary.
 918
 919 Specify a file name or file handle.  @cmd{SYSFILE INFO} reads that
 920 file and displays information on its dictionary.
 921
 922 @pspp{} automatically detects the encoding of string data in the file,
 923 when possible.  The character encoding of old SPSS system files cannot
 924 always be guessed correctly, and SPSS/PC+ system files do not include
 925 any indication of their encoding.  Specify the @subcmd{ENCODING}
 926 subcommand with an @acronym{IANA} character set name as its string
 927 argument to override the default, or specify @code{ENCODING='DETECT'}
 928 to analyze and report possibly valid encodings for the system file.
 929 The @subcmd{ENCODING} subcommand is a @pspp{} extension.
 930
 931 @cmd{SYSFILE INFO} does not affect the current active dataset.
 932
 933 @node XEXPORT
 934 @section XEXPORT
 935 @vindex XEXPORT
 936
 937 @display
 938 XEXPORT
 939         /OUTFILE='@var{file_name}'
 940         /DIGITS=@var{n}
 941         /DROP=@var{var_list}
 942         /KEEP=@var{var_list}
 943         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 944         /TYPE=@{COMM,TAPE@}
 945         /MAP
 946 @end display
 947
 948 The @cmd{XEXPORT} transformation writes the active dataset dictionary and
 949 data to a specified portable file.
 950
 951 This transformation is a @pspp{} extension.
 952
 953 It is similar to the @cmd{EXPORT} procedure, with two differences:
 954
 955 @itemize
 956 @item
 957 @cmd{XEXPORT} is a transformation, not a procedure.  It is executed when
 958 the data is read by a procedure or procedure-like command.
 959
 960 @item
 961 @cmd{XEXPORT} does not support the @subcmd{UNSELECTED} subcommand.
 962 @end itemize
 963
 964 @xref{EXPORT}, for more information.
 965
 966 @node XSAVE
 967 @section XSAVE
 968 @vindex XSAVE
 969
 970 @display
 971 XSAVE
 972         /OUTFILE='@var{file_name}'
 973         /@{UNCOMPRESSED,COMPRESSED,ZCOMPRESSED@}
 974         /PERMISSIONS=@{WRITEABLE,READONLY@}
 975         /DROP=@var{var_list}
 976         /KEEP=@var{var_list}
 977         /VERSION=@var{version}
 978         /RENAME=(@var{src_names}=@var{target_names})@dots{}
 979         /NAMES
 980         /MAP
 981 @end display
 982
 983 The @cmd{XSAVE} transformation writes the active dataset's dictionary and
 984 data to a system file.  It is similar to the @cmd{SAVE}
 985 procedure, with two differences:
 986
 987 @itemize
 988 @item
 989 @cmd{XSAVE} is a transformation, not a procedure.  It is executed when
 990 the data is read by a procedure or procedure-like command.
 991
 992 @item
 993 @cmd{XSAVE} does not support the @subcmd{UNSELECTED} subcommand.
 994 @end itemize
 995
 996 @xref{SAVE}, for more information.