1 @c PSPP - a program for statistical analysis.
2 @c Copyright (C) 2019 Free Software Foundation, Inc.
3 @c Permission is granted to copy, distribute and/or modify this document
4 @c under the terms of the GNU Free Documentation License, Version 1.3
5 @c or any later version published by the Free Software Foundation;
6 @c with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
7 @c A copy of the license is included in the section entitled "GNU
8 @c Free Documentation License".
11 @node SPSS Viewer File Format
12 @appendix SPSS Viewer File Format
14 SPSS Viewer or @file{.spv} files, here called SPV files, are written
15 by SPSS 16 and later to represent the contents of its output editor.
16 This chapter documents the format, based on examination of a corpus of
17 about 8,000 files from a variety of sources. This description is
18 detailed enough to both read and write SPV files.
20 SPSS 15 and earlier versions instead use @file{.spo} files, which have
21 a completely different output format based on the Microsoft Compound
22 Document Format. This format is not documented here.
24 An SPV file is a Zip archive that can be read with @command{zipinfo}
25 and @command{unzip} and similar programs. The final member in the Zip
26 archive is the @dfn{manifest}, a file named
27 @file{META-INF/MANIFEST.MF}. This structure makes SPV files resemble
28 Java ``JAR'' files (and ODF files), but whereas a JAR manifest
29 contains a sequence of colon-delimited key/value pairs, an SPV
30 manifest contains the string @samp{allowPivoting=true}, without a
31 new-line. PSPP uses this string to identify an SPV file; it is
32 invariant across the corpus.@footnote{SPV files always begin with the
33 7-byte sequence 50 4b 03 04 14 00 08, but this is not a useful magic
34 number because most Zip archives start the same way.}@footnote{SPSS
35 writes @file{META-INF/MANIFEST.MF} to every SPV file, but it does not
36 read it or even require it to exist, so using different contents,
37 e.g.@: as @samp{allowingPivot=false} has no effect.}
39 The rest of the members in an SPV file's Zip archive fall into two
40 categories: @dfn{structure} and @dfn{detail} members. Structure
41 member names take the form with @file{outputViewer@var{number}.xml} or
42 @file{outputViewer@var{number}_heading.xml}, where @var{number} is an
43 10-digit decimal number. Each of these members represents some kind
44 of output item (a table, a heading, a block of text, etc.) or a group
45 of them. The member whose output goes at the beginning of the
46 document is numbered 0, the next member in the output is numbered 1,
49 Structure members contain XML. This XML is sometimes self-contained,
50 but it often references detail members in the Zip archive, which are
54 @item @file{@var{prefix}_table.xml} and @file{@var{prefix}_tableData.bin}
55 @itemx @file{@var{prefix}_lightTableData.bin}
56 The structure of a table plus its data. Older SPV files pair a
57 @file{@var{prefix}_table.xml} file that describes the table's
58 structure with a binary @file{@var{prefix}_tableData.bin} file that
59 gives its data. Newer SPV files (the majority of those in the corpus)
60 instead include a single @file{@var{prefix}_lightTableData.bin} file
61 that incorporates both into a single binary format.
63 @item @file{@var{prefix}_warning.xml} and @file{@var{prefix}_warningData.bin}
64 @itemx @file{@var{prefix}_lightWarningData.bin}
65 Same format used for tables, with a different name.
67 @item @file{@var{prefix}_notes.xml} and @file{@var{prefix}_notesData.bin}
68 @itemx @file{@var{prefix}_lightNotesData.bin}
69 Same format used for tables, with a different name.
71 @item @file{@var{prefix}_chartData.bin} and @file{@var{prefix}_chart.xml}
72 The structure of a chart plus its data. Charts do not have a
75 @item @file{@var{prefix}_Imagegeneric.png}
76 @itemx @file{@var{prefix}_PastedObjectgeneric.png}
77 @itemx @file{@var{prefix}_imageData.bin}
78 A PNG image referenced by an @code{object} element (in the first two
79 cases) or an @code{image} element (in the final case). @xref{SPV
80 Structure object and image Elements}.
82 @item @file{@var{prefix}_pmml.scf}
83 @itemx @file{@var{prefix}_stats.scf}
84 @item @file{@var{prefix}_model.xml}
85 Not yet investigated. The corpus contains few examples.
88 The @file{@var{prefix}} in the names of the detail members is
89 typically an 11-digit decimal number that increases for each item,
90 tending to skip values. Older SPV files use different naming
91 conventions for detail members. Structure member refer to detail
92 members by name, and so their exact names do not matter to readers as
93 long as they are unique.
95 SPSS tolerates corrupted Zip archives that Zip reader libraries tend
96 to reject. These can be fixed up with @command{zip -FF}.
99 * SPV Structure Member Format::
100 * SPV Light Detail Member Format::
101 * SPV Legacy Detail Member Binary Format::
102 * SPV Legacy Detail Member XML Format::
105 @node SPV Structure Member Format
106 @section Structure Member Format
108 A structure member lays out the high-level structure for a group of
109 output items such as heading, tables, and charts. Structure members
110 do not include the details of tables and charts but instead refer to
111 them by their member names.
113 Structure members' XML files claim conformance with a collection of
114 XML Schemas. These schemas are distributed, under a nonfree license,
115 with SPSS binaries. Fortunately, the schemas are not necessary to
116 understand the structure members. The schemas can even
117 be deceptive because they document elements and attributes that are
118 not in the corpus and do not document elements and attributes that are
119 commonly found in the corpus.
121 Structure members use a different XML namespace for each schema, but
122 these namespaces are not entirely consistent. In some SPV files, for
123 example, the @code{viewer-tree} schema is associated with namespace
124 @indicateurl{http://xml.spss.com/spss/viewer-tree} and in others with
125 @indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} (note the
126 additional @file{viewer/}). Under either name, the schema URIs are
127 not resolvable to obtain the schemas themselves.
129 One may ignore all of the above in interpreting a structure member.
130 The actual XML has a simple and straightforward form that does not
131 require a reader to take schemas or namespaces into account. A
132 structure member's root is @code{heading} element, which contains
133 @code{heading} or @code{container} elements (or a mix), forming a
134 tree. In turn, @code{container} holds a @code{label} and one more
135 child, usually @code{text} or @code{table}.
137 The following sections document the elements found in structure
138 members in a context-free grammar-like fashion. Consider the
139 following example, which specifies the attributes and content for the
140 @code{container} element:
144 :visibility=(visible | hidden)
145 :page-break-before=(always)?
146 :text-align=(left | center)?
148 => label (table | container_text | graph | model | object | image | tree)
151 Each attribute specification begins with @samp{:} followed by the
152 attribute's name. If the attribute's value has an easily specified
153 form, then @samp{=} and its description follows the name. Finally, if
154 the attribute is optional, the specification ends with @samp{?}. The
155 following value specifications are defined:
158 @item (@var{a} | @var{b} | @dots{})
159 One of the listed literal strings. If only one string is listed, it
160 is the only acceptable value. If @code{OTHER} is listed, then any
161 string not explicitly listed is also accepted.
164 Either @code{true} or @code{false}.
167 A floating-point number followed by a unit, e.g.@: @code{10pt}. Units
168 in the corpus include @code{in} (inch), @code{pt} (points, 72/inch),
169 @code{px} (``device-independent pixels'', 96/inch), and @code{cm}. If
170 the unit is omitted then points should be assumed. The number and
171 unit may be separated by white space.
173 The corpus also includes localized names for units. A reader must
174 understand these to properly interpret the dimension:
178 @code{인치}, @code{pol.}, @code{cala}, @code{cali}
188 A floating-point number.
194 A color in one of the forms @code{#@var{rr}@var{gg}@var{bb}} or
195 @code{@var{rr}@var{gg}@var{bb}}, or the string @code{transparent}, or
196 one of the standard Web color names.
199 @item ref @var{element}
200 @itemx ref(@var{elem1} | @var{elem2} | @dots{})
201 The name from the @code{id} attribute in some element. If one or more
202 elements are named, the name must refer to one of those elements,
203 otherwise any element is acceptable.
206 All elements have an optional @code{id} attribute. If present, its
207 value must be unique. In practice many elements are assigned
208 @code{id} attributes that are never referenced.
210 The content specification for an element supports the following
217 @item @var{a} @var{b}
218 @var{a} followed by @var{b}.
220 @item @var{a} | @var{b} | @var{c}
221 One of @var{a} or @var{b} or @var{c}.
224 Zero or one instances of @var{a}.
227 Zero or more instances of @var{a}.
230 One or more instances of @var{a}.
232 @item (@var{subexpression})
233 Grouping for a subexpression.
242 Element and attribute names are sometimes suffixed by another name in
243 square brackets to distinguish different uses of the same name. For
244 example, structure XML has two @code{text} elements, one inside
245 @code{container}, the other inside @code{pageParagraph}. The former
246 is defined as @code{text[container_text]} and referenced as
247 @code{container_text}, the latter defined as
248 @code{text[pageParagraph_text]} and referenced as
249 @code{pageParagraph_text}.
251 This language is used in the PSPP source code for parsing structure
252 and detail XML members. Refer to
253 @file{src/output/spv/structure-xml.grammar} and
254 @file{src/output/spv/detail-xml.grammar} for the full grammars.
256 The following example shows the contents of a typical structure member
257 for a @cmd{DESCRIPTIVES} procedure. A real structure member is not
258 indented. This example also omits most attributes, all XML namespace
259 information, and the CSS from the embedded HTML:
262 <?xml version="1.0" encoding="utf-8"?>
264 <label>Output</label>
265 <heading commandName="Descriptives">
266 <label>Descriptives</label>
269 <text commandName="Descriptives" type="title">
271 <![CDATA[<head><style type="text/css">...</style></head><BR>Descriptives]]>
275 <container visibility="hidden">
277 <table commandName="Descriptives" subType="Notes" type="note">
279 <dataPath>00000000001_lightNotesData.bin</dataPath>
284 <label>Descriptive Statistics</label>
285 <table commandName="Descriptives" subType="Descriptive Statistics"
288 <dataPath>00000000002_lightTableData.bin</dataPath>
297 * SPV Structure heading Element::
298 * SPV Structure label Element::
299 * SPV Structure container Element::
300 * SPV Structure text Element (Inside @code{container})::
301 * SPV Structure html Element::
302 * SPV Structure table Element::
303 * SPV Structure graph Element::
304 * SPV Structure model Element::
305 * SPV Structure object and image Elements::
306 * SPV Structure tree Element::
307 * SPV Structure Path Elements::
308 * SPV Structure pageSetup Element::
309 * SPV Structure @code{text} Element (Inside @code{pageParagraph})::
312 @node SPV Structure heading Element
313 @subsection The @code{heading} Element
316 heading[root_heading]
322 => label pageSetup? (container | heading)*
327 :visibility[heading_visibility]=(collapsed)?
330 => label (container | heading)*
333 A @code{heading} represents a tree of content that appears in an
334 output viewer window. It contains a @code{label} text string that is
335 shown in the outline view ordinarily followed by content containers or
336 further nested (sub)-sections of output. Unlike heading elements in
337 HTML and other common document formats, which precede the content that
338 they head, @code{heading} contains the elements that appear below the
341 The root of a structure member is a special @code{heading}. The
342 direct children of the root @code{heading} elements in all structure
343 members in an SPV file are siblings. That is, the root @code{heading}
344 in all of the structure members conceptually represent the same node.
345 The root heading's @code{label} is ignored (see @pxref{SPV Structure
346 label Element}). The root heading in the first structure member in
347 the Zip file may contain a @code{pageSetup} element.
349 The schema implies that any @code{heading} may contain a sequence of
350 any number of @code{heading} and @code{container} elements. This does
351 not work for the root @code{heading} in practice, which must actually
352 contain exactly one @code{container} or @code{heading} child element.
353 Furthermore, if the root heading's child is a @code{heading}, then the
354 structure member's name must end in @file{_heading.xml}; if it is a
355 @code{container} child, then it must not.
357 The following attributes have been observed on both document root and
358 nested @code{heading} elements.
360 @defvr {Attribute} creator-version
361 The version of the software that created this SPV file. A string of
362 the form @code{xxyyzzww} represents software version xx.yy.zz.ww,
363 e.g.@: @code{21000001} is version 21.0.0.1. Trailing pairs of zeros
364 are sometimes omitted, so that @code{21}, @code{210000}, and
365 @code{21000000} are all version 21.0.0.0 (and the corpus contains all
366 three of those forms).
370 The following attributes have been observed on document root
371 @code{heading} elements only:
373 @defvr {Attribute} @code{creator}
374 The directory in the file system of the software that created this SPV
378 @defvr {Attribute} @code{creation-date-time}
379 The date and time at which the SPV file was written, in a
380 locale-specific format, e.g.@: @code{Friday, May 16, 2014 6:47:37 PM
381 PDT} or @code{lunedì 17 marzo 2014 3.15.48 CET} or even @code{Friday,
382 December 5, 2014 5:00:19 o'clock PM EST}.
385 @defvr {Attribute} @code{lockReader}
386 Whether a reader should be allowed to edit the output. The possible
387 values are @code{true} and @code{false}. The value @code{false} is by
391 @defvr {Attribute} @code{schemaLocation}
392 This is actually an XML Namespace attribute. A reader may ignore it.
396 The following attributes have been observed only on nested
397 @code{heading} elements:
399 @defvr {Attribute} @code{commandName}
400 A locale-invariant identifier for the command that produced the
401 output, e.g.@: @code{Frequencies}, @code{T-Test}, @code{Non Par Corr}.
404 @defvr {Attribute} @code{visibility}
405 If this attribute is absent, the heading's content is expanded in the
406 outline view. If it is set to @code{collapsed}, it is collapsed.
407 (This attribute is never present in a root @code{heading} because the
408 root node is always expanded when a file is loaded, even though the UI
409 can be used to collapse it interactively.)
412 @defvr {Attribute} @code{locale}
413 The locale used for output, in Windows format, which is similar to the
414 format used in Unix with the underscore replaced by a hyphen, e.g.@:
415 @code{en-US}, @code{en-GB}, @code{el-GR}, @code{sr-Cryl-RS}.
418 @defvr {Attribute} @code{olang}
419 The output language, e.g.@: @code{en}, @code{it}, @code{es},
420 @code{de}, @code{pt-BR}.
423 @node SPV Structure label Element
424 @subsection The @code{label} Element
430 Every @code{heading} and @code{container} holds a @code{label} as its
431 first child. The label text is what appears in the outline pane of
432 the GUI's viewer window. PSPP also puts it into the outline of PDF
433 output. The label text doesn't appear in the output itself.
435 The text in @code{label} describes what it labels, often by naming the
436 statistical procedure that was executed, e.g.@: ``Frequencies'' or
437 ``T-Test''. Labels are often very generic, especially within a
438 @code{container}, e.g.@: ``Title'' or ``Warnings'' or ``Notes''.
439 Label text is localized according to the output language, e.g.@: in
440 Italian a frequency table procedure is labeled ``Frequenze''.
442 The user can edit labels to be anything they want. The corpus
443 contains a few examples of empty labels, ones that contain no text,
444 probably as a result of user editing.
446 The root @code{heading} in an SPV file has a @code{label}, like every
447 @code{heading}. It normally contains ``Output'' but its content is
448 disregarded anyway. The user cannot edit it.
450 @node SPV Structure container Element
451 @subsection The @code{container} Element
455 :visibility=(visible | hidden)
456 :page-break-before=(always)?
457 :text-align=(left | center)?
459 => label (table | container_text | graph | model | object | image | tree)
462 A @code{container} serves to contain and label a @code{table},
463 @code{text}, or other kind of item.
465 This element has the following attributes.
467 @defvr {Attribute} @code{visibility}
468 Whether the container's content is displayed. ``Notes'' tables are
469 often hidden; other data is usually visible.
472 @defvr {Attribute} @code{text-align}
473 Alignment of text within the container. Observed with nested
474 @code{table} and @code{text} elements.
477 @defvr {Attribute} @code{width}
478 The width of the container, e.g.@: @code{1097px}.
481 All of the elements that nest inside @code{container} (except the
482 @code{label}) have the following optional attribute.
484 @defvr {Attribute} @code{commandName}
485 As on the @code{heading} element. The corpus contains one example
486 of where @code{commandName} is present but set to the empty string.
489 @node SPV Structure text Element (Inside @code{container})
490 @subsection The @code{text} Element (Inside @code{container})
494 :type[text_type]=(title | log | text | page-title)
500 This @code{text} element is nested inside a @code{container}. There
501 is a different @code{text} element that is nested inside a
502 @code{pageParagraph}.
504 This element has the following attributes.
506 @defvr {Attribute} @code{commandName}
507 @xref{SPV Structure container Element}. For output not specific to a
508 command, this is simply @code{log}.
511 @defvr {Attribute} @code{type}
512 The semantics of the text.
515 @defvr {Attribute} @code{creator-version}
516 As on the @code{heading} element.
519 @node SPV Structure html Element
520 @subsection The @code{html} Element
523 html :lang=(en) => TEXT
526 The element contains an HTML document as text (or, in practice, as
527 CDATA). In some cases, the document starts with @code{<html>} and
528 ends with @code{</html>}; in others the @code{html} element is
529 implied. Generally the HTML includes a @code{head} element with a CSS
530 stylesheet. The HTML body often begins with @code{<BR>}.
532 The HTML document uses only the following elements:
536 Sometimes, the document is enclosed with
537 @code{<html>}@dots{}@code{</html>}.
540 The HTML body often begins with @code{<BR>} and may contain it as well.
548 The attributes @code{face}, @code{color}, and @code{size} are
549 observed. The value of @code{color} takes one of the forms
550 @code{#@var{rr}@var{gg}@var{bb}} or @code{rgb (@var{r}, @var{g},
551 @var{b})}. The value of @code{size} is a number between 1 and 7,
555 The CSS in the corpus is simple. To understand it, a parser only
556 needs to be able to skip white space, @code{<!--}, and @code{-->}, and
557 parse style only for @code{p} elements. Only the following properties
562 In the form @code{@var{rr}@var{gg}@var{bb}}, e.g. @code{000000}, with
566 Either @code{bold} or @code{normal}.
569 Either @code{italic} or @code{normal}.
571 @item text-decoration
572 Either @code{underline} or @code{normal}.
575 A font name, commonly @code{Monospaced} or @code{SansSerif}.
578 Values claim to be in points, e.g.@: @code{14pt}, but the values are
579 actually in ``device-independent pixels'' (px), at 96/inch.
582 This element has the following attributes.
584 @defvr {Attribute} @code{lang}
585 This always contains @code{en} in the corpus.
588 @node SPV Structure table Element
589 @subsection The @code{table} Element
598 :displayFiltering=bool?
600 :orphanTolerance=int?
605 :type[table_type]=(table | note | warning)
606 => tableProperties? tableStructure
608 tableStructure => path? dataPath csvPath?
611 This element has the following attributes.
613 @defvr {Attribute} @code{commandName}
614 @xref{SPV Structure container Element}.
617 @defvr {Attribute} @code{type}
618 One of @code{table}, @code{note}, or @code{warning}.
621 @defvr {Attribute} @code{subType}
622 The locale-invariant command ID for the particular kind of output that
623 this table represents in the procedure. This can be the same as
624 @code{commandName} e.g.@: @code{Frequencies}, or different, e.g.@:
625 @code{Case Processing Summary}. Generic subtypes @code{Notes} and
626 @code{Warnings} are often used.
629 @defvr {Attribute} @code{tableId}
630 A number that uniquely identifies the table within the SPV file,
631 typically a large negative number such as @code{-4147135649387905023}.
634 @defvr {Attribute} @code{creator-version}
635 As on the @code{heading} element. In the corpus, this is only present
636 for version 21 and up and always includes all 8 digits.
639 @xref{SPV Detail Legacy Properties}, for details on the
640 @code{tableProperties} element.
642 @node SPV Structure graph Element
643 @subsection The @code{graph} Element
658 => dataPath? path csvPath?
661 This element represents a graph. The @code{dataPath} and @code{path}
662 elements name the Zip members that give the details of the graph.
663 Normally, both elements are present; there is only one counterexample
666 @code{csvPath} only appears in one SPV file in the corpus, for two
667 graphs. In these two cases, @code{dataPath}, @code{path}, and
668 @code{csvPath} all appear. These @code{csvPath} name Zip members with
669 names of the form @file{@var{number}_csv.bin}, where @var{number} is a
670 many-digit number and the same as the @code{csvFileIds}. The named
671 Zip members are CSV text files (despite the @file{.bin} extension).
672 The CSV files are encoded in UTF-8 and begin with a U+FEFF byte-order
675 @node SPV Structure model Element
676 @subsection The @code{model} Element
688 => ViZml? dataPath? path | pmmlContainerPath statsContainerPath
690 pmmlContainerPath => TEXT
692 statsContainerPath => TEXT
694 ViZml :viewName? => TEXT
697 This element represents a model. The @code{dataPath} and @code{path}
698 elements name the Zip members that give the details of the model.
699 Normally, both elements are present; there is only one counterexample
702 The details are unexplored. The @code{ViZml} element contains base-64
703 encoded text, that decodes to a binary format with some embedded text
704 strings, and @code{path} names an Zip member that contains XML.
705 Alternatively, @code{pmmlContainerPath} and @code{statsContainerPath}
706 name Zip members with @file{.scf} extension.
708 @node SPV Structure object and image Elements
709 @subsection The @code{object} and @code{image} Elements
714 :type[object_type]=(unknown)?
724 These two elements represent an image in PNG format. They are
725 equivalent and the corpus contains examples of both. The only
726 difference is the syntax: for @code{object}, the @code{uri} attribute
727 names the Zip member that contains a PNG file; for @code{image}, the
728 text of the inner @code{dataPath} element names the Zip member.
730 PSPP writes @code{object} in output but there is no strong reason to
733 The corpus only contains PNG image files.
735 @node SPV Structure tree Element
736 @subsection The @code{tree} Element
747 This element represents a tree. The @code{dataPath} and @code{path}
748 elements name the Zip members that give the details of the tree.
749 The details are unexplored.
751 @node SPV Structure Path Elements
752 @subsection Path Elements
762 These element contain the name of the Zip members that hold details
763 for a container. For tables:
767 When a ``light'' format is used, only @code{dataPath} is present, and
768 it names a @file{.bin} member of the Zip file that has @code{light} in
769 its name, e.g.@: @code{0000000001437_lightTableData.bin} (@pxref{SPV
770 Light Detail Member Format}).
773 When the legacy format is used, both are present. In this case,
774 @code{dataPath} names a Zip member with a legacy binary format that
775 contains relevant data (@pxref{SPV Legacy Detail Member Binary
776 Format}), and @code{path} names a Zip member that uses an XML format
777 (@pxref{SPV Legacy Detail Member XML Format}).
780 Graphs normally follow the legacy approach described above. The
781 corpus contains one example of a graph with @code{path} but not
782 @code{dataPath}. The reason is unexplored.
784 Models use @code{path} but not @code{dataPath}. @xref{SPV Structure
785 graph Element}, for more information.
787 These elements have no attributes.
789 @node SPV Structure pageSetup Element
790 @subsection The @code{pageSetup} Element
794 :initial-page-number=int?
795 :chart-size=(as-is | full-height | half-height | quarter-height | OTHER)?
796 :margin-left=dimension?
797 :margin-right=dimension?
798 :margin-top=dimension?
799 :margin-bottom=dimension?
800 :paper-height=dimension?
801 :paper-width=dimension?
802 :reference-orientation?
803 :space-after=dimension?
804 => pageHeader pageFooter
806 pageHeader => pageParagraph?
808 pageFooter => pageParagraph?
810 pageParagraph => pageParagraph_text
813 The @code{pageSetup} element has the following attributes.
815 @defvr {Attribute} @code{initial-page-number}
816 The page number to put on the first page of printed output. Usually
820 @defvr {Attribute} @code{chart-size}
821 One of the listed, self-explanatory chart sizes,
822 @code{quarter-height}, or a localization (!) of one of these (e.g.@:
823 @code{dimensione attuale}, @code{Wie vorgegeben}).
826 @defvr {Attribute} @code{margin-left}
827 @defvrx {Attribute} @code{margin-right}
828 @defvrx {Attribute} @code{margin-top}
829 @defvrx {Attribute} @code{margin-bottom}
830 Margin sizes, e.g.@: @code{0.25in}.
833 @defvr {Attribute} @code{paper-height}
834 @defvrx {Attribute} @code{paper-width}
838 @defvr {Attribute} @code{reference-orientation}
839 Indicates the orientation of the output page. Either @code{0deg}
840 (portrait) or @code{90deg} (landscape),
843 @defvr {Attribute} @code{space-after}
844 The amount of space between printed objects, typically @code{12pt}.
847 @node SPV Structure @code{text} Element (Inside @code{pageParagraph})
848 @subsection The @code{text} Element (Inside @code{pageParagraph})
851 text[pageParagraph_text] :type=(title | text) => TEXT
854 This @code{text} element is nested inside a @code{pageParagraph}. There
855 is a different @code{text} element that is nested inside a
858 The element is either empty, or contains CDATA that holds almost-XHTML
859 text: in the corpus, either an @code{html} or @code{p} element. It is
860 @emph{almost}-XHTML because the @code{html} element designates the
862 @indicateurl{http://xml.spss.com/spss/viewer/viewer-tree} instead of
863 an XHTML namespace, and because the CDATA can contain substitution
864 variables. The following variables are supported:
869 The current date or time in the preferred format for the locale.
875 First-, second-, third-, or fourth-level heading.
881 Name of the output file.
887 @code{&[Page]} for the page number and @code{&[PageTitle]} for the
890 Typical contents (indented for clarity):
893 <html xmlns="http://xml.spss.com/spss/viewer/viewer-tree">
896 <p style="text-align:right; margin-top: 0">Page &[Page]</p>
901 This element has the following attributes.
903 @defvr {Attribute} @code{type}
907 @node SPV Light Detail Member Format
908 @section Light Detail Member Format
910 This section describes the format of ``light'' detail @file{.bin}
911 members. These members have a binary format which we describe here in
912 terms of a context-free grammar using the following conventions:
915 @item NonTerminal @result{} @dots{}
916 Nonterminals have CamelCaps names, and @result{} indicates a
917 production. The right-hand side of a production is often broken
918 across multiple lines. Break points are chosen for aesthetics only
919 and have no semantic significance.
921 @item 00, 01, @dots{}, ff.
922 A bytes with a fixed value, written as a pair of hexadecimal digits.
924 @item i0, i1, @dots{}, i9, i10, i11, @dots{}
925 @itemx ib0, ib1, @dots{}, ib9, ib10, ib11, @dots{}
926 A 32-bit integer in little-endian or big-endian byte order,
927 respectively, with a fixed value, written in decimal. Prefixed by
928 @samp{i} for little-endian or @samp{ib} for big-endian.
934 A byte with value 0 or 1.
938 A 16-bit unsigned integer in little-endian or big-endian byte order,
943 A 32-bit unsigned integer in little-endian or big-endian byte order,
948 A 64-bit unsigned integer in little-endian or big-endian byte order,
952 A 64-bit IEEE floating-point number.
955 A 32-bit IEEE floating-point number.
959 A 32-bit unsigned integer, in little-endian or big-endian byte order,
960 respectively, followed by the specified number of bytes of character
961 data. (The encoding is indicated by the Formats nonterminal.)
964 @var{x} is optional, e.g.@: 00? is an optional zero byte.
966 @item @var{x}*@var{n}
967 @var{x} is repeated @var{n} times, e.g.@: byte*10 for ten arbitrary bytes.
969 @item @var{x}[@var{name}]
970 Gives @var{x} the specified @var{name}. Names are used in textual
971 explanations. They are also used, also bracketed, to indicate counts,
972 e.g.@: @code{int32[n] byte*[n]} for a 32-bit integer followed by the
973 specified number of arbitrary bytes.
975 @item @var{a} @math{|} @var{b}
976 Either @var{a} or @var{b}.
979 Parentheses are used for grouping to make precedence clear, especially
980 in the presence of @math{|}, e.g.@: in 00 (01 @math{|} 02 @math{|} 03)
984 @itemx becount(@var{x})
985 A 32-bit unsigned integer, in little-endian or big-endian byte order,
986 respectively, that indicates the number of bytes in @var{x}, followed
990 In a version 1 @file{.bin} member, @var{x}; in version 3, nothing.
991 (The @file{.bin} header indicates the version.)
994 In a version 3 @file{.bin} member, @var{x}; in version 1, nothing.
997 PSPP uses this grammar to parse light detail members. See
998 @file{src/output/spv/light-binary.grammar} in the PSPP source tree for
1001 Little-endian byte order is far more common in this format, but a few
1002 pieces of the format use big-endian byte order.
1004 Light detail members express linear units in two ways: points (pt), at
1005 72/inch, and ``device-independent pixels'' (px), at 96/inch. To
1006 convert from pt to px, multiply by 1.33 and round up. To convert
1007 from px to pt, divide by 1.33 and round down.
1009 A ``light'' detail member @file{.bin} consists of a number of sections
1010 concatenated together, terminated by an optional byte 01:
1014 Header Titles Footnotes
1015 Areas Borders PrintSettings TableSettings Formats
1016 Dimensions Axes Cells
1020 The following sections go into more detail.
1023 * SPV Light Member Header::
1024 * SPV Light Member Titles::
1025 * SPV Light Member Footnotes::
1026 * SPV Light Member Areas::
1027 * SPV Light Member Borders::
1028 * SPV Light Member Print Settings::
1029 * SPV Light Member Table Settings::
1030 * SPV Light Member Formats::
1031 * SPV Light Member Dimensions::
1032 * SPV Light Member Categories::
1033 * SPV Light Member Axes::
1034 * SPV Light Member Cells::
1035 * SPV Light Member Value::
1036 * SPV Light Member ValueMod::
1039 @node SPV Light Member Header
1042 An SPV light member begins with a 39-byte header:
1047 (i1 @math{|} i3)[version]
1050 bool[rotate-inner-column-labels]
1051 bool[rotate-outer-row-labels]
1054 int32[min-col-width] int32[max-col-width]
1055 int32[min-row-width] int32[max-row-width]
1059 @code{version} is a version number that affects the interpretation of
1060 some of the other data in the member. We will refer to ``version 1''
1061 and ``version 3'' later on and use v1(@dots{}) and v3(@dots{}) for
1062 version-specific formatting (as described previously).
1064 If @code{rotate-inner-column-labels} is 1, then column labels closest
1065 to the data are rotated 90° counterclockwise; otherwise, they are
1066 shown in the normal way.
1068 If @code{rotate-outer-row-labels} is 1, then row labels farthest from
1069 the data are rotated 90° counterclockwise; otherwise, they are shown
1072 @code{min-col-width} is the minimum width that a column will be
1073 assigned automatically. @code{max-col-width} is the maximum width
1074 that a column will be assigned to accommodate a long column label.
1075 @code{min-row-width} and @code{max-row-width} are a similar range for
1076 the width of row labels. All of these measurements are in 1/96 inch
1077 units (called a ``device independent pixel'' unit in Windows).
1079 @code{table-id} is a binary version of the @code{tableId} attribute in
1080 the structure member that refers to the detail member. For example,
1081 if @code{tableId} is @code{-4122591256483201023}, then @code{table-id}
1082 would be 0xc6c99d183b300001.
1084 The meaning of the other variable parts of the header is not known. A
1085 writer may safely use version 3, true for @code{x0}, false for
1086 @code{x1}, true for @code{x2}, and 0x15 for @code{x3}.
1088 @node SPV Light Member Titles
1094 Value[subtype] 01? 31
1095 Value[user-title] 01?
1096 (31 Value[corner-text] @math{|} 58)
1097 (31 Value[caption] @math{|} 58)
1100 The Titles follow the Header and specify the table's title, caption,
1103 The @code{user-title} reflects any user
1104 editing of the title text or style. The @code{title} is the title
1105 originally generated by the procedure. Both of these are appropriate
1106 for presentation and localized to the user's language. For example,
1107 for a frequency table, @code{title} and @code{user-title} normally
1108 name the variable and @code{c} is simply ``Frequencies''.
1110 @code{subtype} is the same as the @code{subType} attribute in the
1111 @code{table} structure XML element that referred to this member.
1112 @xref{SPV Structure table Element}, for details.
1114 The @code{corner-text}, if present, is shown in the upper-left corner
1115 of the table, above the row headings and to the left of the column
1116 headings. It is usually absent. When row dimension labels are
1117 displayed in the corner (see @code{show-row-labels-in-corner}), corner
1120 The @code{caption}, if present, is shown below the table.
1121 @code{caption} reflects user editing of the caption.
1123 @node SPV Light Member Footnotes
1124 @subsection Footnotes
1127 Footnotes => int32[n-footnotes] Footnote*[n-footnotes]
1128 Footnote => Value[text] (58 @math{|} 31 Value[marker]) int32[show]
1131 Each footnote has @code{text} and an optional custom @code{marker}
1134 The syntax for Value would allow footnotes (and their markers) to
1135 reference other footnotes, but in practice this doesn't work.
1137 @code{show} is a 32-bit signed integer. It is positive to show the
1138 footnote or negative to hide it. Its magnitude is often 1, and in
1139 other cases tends to be the number of references to the footnote.
1140 It is safe to write 1 to show a footnote and -1 to hide it.
1142 @node SPV Light Member Areas
1149 string[typeface] float[size] int32[style] bool[underline]
1150 int32[halign] int32[valign]
1151 string[fg-color] string[bg-color]
1152 bool[alternate] string[alt-fg-color] string[alt-bg-color]
1153 v3(int32[left-margin] int32[right-margin] int32[top-margin] int32[bottom-margin])
1156 Each Area represents the style for a different area of the table, in
1157 the following order: title, caption, footer, corner, column labels,
1158 row labels, data, and layers.
1160 @code{index} is the 1-based index of the Area, i.e.@: 1 for the first
1161 Area, through 8 for the final Area.
1163 @code{typeface} is the string name of the font used in the area. In
1164 the corpus, this is @code{SansSerif} in over 99% of instances and
1165 @code{Times New Roman} in the rest.
1167 @code{size} is the size of the font, in px (@pxref{SPV Light Detail
1168 Member Format}). The most common size in the corpus is 12 px. Even
1169 though @code{size} has a floating-point type, in the corpus its values
1170 are always integers.
1172 @code{style} is a bit mask. Bit 0 (with value 1) is set for bold, bit
1173 1 (with value 2) is set for italic.
1175 @code{underline} is 1 if the font is underlined, 0 otherwise.
1177 @code{halign} specifies horizontal alignment: 0 for center, 2 for
1178 left, 4 for right, 61453 for decimal, 64173 for mixed. Mixed
1179 alignment varies according to type: string data is left-justified,
1180 numbers and most other formats are right-justified.
1182 @code{valign} specifies vertical alignment: 0 for center, 1 for top, 3
1185 @code{fg-color} and @code{bg-color} are the foreground color and
1186 background color, respectively. In the corpus, these are always
1187 @code{#000000} and @code{#ffffff}, respectively.
1189 @code{alternate} is 1 if rows should alternate colors, 0 if all rows
1190 should be the same color. When @code{alternate} is 1,
1191 @code{alt-fg-color} and @code{alt-bg-color} specify the colors for the
1192 alternate rows; otherwise they are empty strings.
1194 @code{left-margin}, @code{right-margin}, @code{top-margin}, and
1195 @code{bottom-margin} are measured in px.
1197 @node SPV Light Member Borders
1204 be32[n-borders] Border*[n-borders]
1205 bool[show-grid-lines]
1214 The Borders reflect how borders between regions are drawn.
1216 The fixed value of @code{endian} can be used to validate the
1219 @code{show-grid-lines} is 1 to draw grid lines, otherwise 0.
1221 Each Border describes one kind of border. @code{n-borders} seems to
1222 always be 19. Each @code{border-type} appears once (although in an
1223 unpredictable order) and correspond to the following borders:
1229 Left, top, right, and bottom outer frame.
1231 Left, top, right, and bottom inner frame.
1233 Left and top of data area.
1235 Horizontal and vertical dimension rows.
1237 Horizontal and vertical dimension columns.
1239 Horizontal and vertical category rows.
1241 Horizontal and vertical category columns.
1244 @code{stroke-type} describes how a border is drawn, as one of:
1261 @code{color} is an RGB color. Bits 24--31 are alpha, bits 16--23 are
1262 red, 8--15 are green, 0--7 are blue. An alpha of 255 indicates an
1263 opaque color, therefore opaque black is 0xff000000.
1265 @node SPV Light Member Print Settings
1266 @subsection Print Settings
1273 bool[paginate-layers]
1276 bool[top-continuation]
1277 bool[bottom-continuation]
1278 be32[n-orphan-lines]
1279 bestring[continuation-string])
1282 The PrintSettings reflect settings for printing. The fixed value of
1283 @code{endian} can be used to validate the endianness.
1285 @code{all-layers} is 1 to print all layers, 0 to print only the layer
1286 designated by @code{current-layer} in TableSettings (@pxref{SPV Light
1287 Member Table Settings}).
1289 @code{paginate-layers} is 1 to print each layer at the start of a new
1290 page, 0 otherwise. (This setting is honored only @code{all-layers} is
1291 1, since otherwise only one layer is printed.)
1293 @code{fit-width} and @code{fit-length} control whether the table is
1294 shrunk to fit within a page's width or length, respectively.
1296 @code{n-orphan-lines} is the minimum number of rows or columns to put
1297 in one part of a table that is broken across pages.
1299 If @code{top-continuation} is 1, then @code{continuation-string} is
1300 printed at the top of a page when a table is broken across pages for
1301 printing; similarly for @code{bottom-continuation} and the bottom of a
1302 page. Usually, @code{continuation-string} is empty.
1304 @node SPV Light Member Table Settings
1305 @subsection Table Settings
1315 bool[show-row-labels-in-corner]
1316 bool[show-alphabetic-markers]
1317 bool[footnote-marker-superscripts]
1320 Breakpoints[row-breaks] Breakpoints[column-breaks]
1321 Keeps[row-keeps] Keeps[column-keeps]
1322 PointKeeps[row-point-keeps] PointKeeps[column-point-keeps]
1325 bestring[table-look]
1328 Breakpoints => be32[n-breaks] be32*[n-breaks]
1330 Keeps => be32[n-keeps] Keep*[n-keeps]
1331 Keep => be32[offset] be32[n]
1333 PointKeeps => be32[n-point-keeps] PointKeep*[n-point-keeps]
1334 PointKeep => be32[offset] be32 be32
1337 The TableSettings reflect display settings. The fixed value of
1338 @code{endian} can be used to validate the endianness.
1340 @code{current-layer} is the displayed layer. Suppose there are
1341 @math{d} layers, numbered 1 through @math{d} in the order given in the
1342 Dimensions (@pxref{SPV Light Member Dimensions}), and that the
1343 displayed value of dimension @math{i} is @math{d_i}, @math{0 \le x_i <
1344 n_i}, where @math{n_i} is the number of categories in dimension
1345 @math{i}. Then @code{current-layer} is calculated by the following
1349 let @code{current-layer} = 0
1350 for each @math{i} from @math{d} downto 1:
1351 @code{current-layer} = (@math{n_i \times} @code{current-layer}) @math{+} @math{x_i}
1354 If @code{omit-empty} is 1, empty rows or columns (ones with nothing in
1355 any cell) are hidden; otherwise, they are shown.
1357 If @code{show-row-labels-in-corner} is 1, then row labels are shown in
1358 the upper left corner; otherwise, they are shown nested.
1360 If @code{show-alphabetic-markers} is 1, markers are shown as letters
1361 (e.g.@: @samp{a}, @samp{b}, @samp{c}, @dots{}); otherwise, they are
1362 shown as numbers starting from 1.
1364 When @code{footnote-marker-superscripts} is 1, footnote markers are shown
1365 as superscripts, otherwise as subscripts.
1367 The Breakpoints are rows or columns after which there is a page break;
1368 for example, a row break of 1 requests a page break after the second
1369 row. Usually no breakpoints are specified, indicating that page
1370 breaks should be selected automatically.
1372 The Keeps are ranges of rows or columns to be kept together without a
1373 page break; for example, a row Keep with @code{offset} 1 and @code{n}
1374 10 requests that the 10 rows starting with the second row be kept
1375 together. Usually no Keeps are specified.
1377 The PointKeeps seem to be generated automatically based on
1378 user-specified Keeps. They seems to indicate a conversion from rows
1379 or columns to pixel or point offsets.
1381 @code{notes} is a text string that contains user-specified notes. It
1382 is displayed when the user hovers the cursor over the table, like text
1383 in the @code{title} attribute in HTML@. It is not printed. It is
1386 @code{table-look} is the name of a SPSS ``TableLook'' table style,
1387 such as ``Default'' or ``Academic''; it is often empty.
1389 TableSettings ends with an arbitrary number of null bytes. A writer
1390 may safely write 82 null bytes.
1392 A writer may safely use 4 for @code{x5} and 0 for @code{x6}.
1394 @node SPV Light Member Formats
1399 int32[n-widths] int32*[n-widths]
1401 int32[current-layer]
1402 bool[x7] bool[x8] bool[x9]
1407 v3(count(X1 count(X2)) count(X3)))
1408 Y0 => int32[epoch] byte[decimal] byte[grouping]
1409 CustomCurrency => int32[n-ccs] string*[n-ccs]
1412 If @code{n-widths} is nonzero, then the accompanying integers are
1413 column widths as manually adjusted by the user.
1415 @code{locale} is a locale including an encoding, such as
1416 @code{en_US.windows-1252} or @code{it_IT.windows-1252}.
1417 (@code{locale} is often duplicated in Y1, described below).
1419 @code{epoch} is the year that starts the epoch. A 2-digit year is
1420 interpreted as belonging to the 100 years beginning at the epoch. The
1421 default epoch year is 69 years prior to the current year; thus, in
1422 2017 this field by default contains 1948. In the corpus, @code{epoch}
1423 ranges from 1943 to 1948, plus some contain -1.
1425 @code{decimal} is the decimal point character. The observed values
1426 are @samp{.} and @samp{,}.
1428 @code{grouping} is the grouping character. Usually, it is @samp{,} if
1429 @code{decimal} is @samp{.}, and vice versa. Other observed values are
1430 @samp{'} (apostrophe), @samp{ } (space), and zero (presumably
1431 indicating that digits should not be grouped).
1433 @code{n-ccs} is observed as either 0 or 5. When it is 5, the
1434 following strings are CCA through CCE format strings. @xref{Custom
1435 Currency Formats,,, pspp, PSPP}. Most commonly these are all
1436 @code{-,,,} but other strings occur.
1438 A writer may safely use false for @code{x7}, @code{x8}, and @code{x9}.
1442 X0 only appears, optionally, in version 1 members.
1447 string[command] string[command-local]
1448 string[language] string[charset] string[locale]
1451 Y2 => CustomCurrency byte[missing] bool[x17]
1454 @code{command} describes the statistical procedure that generated the
1455 output, in English. It is not necessarily the literal syntax name of
1456 the procedure: for example, NPAR TESTS becomes ``Nonparametric
1457 Tests.'' @code{command-local} is the procedure's name, translated
1458 into the output language; it is often empty and, when it is not,
1459 sometimes the same as @code{command}.
1461 @code{missing} is the character used to indicate that a cell contains
1462 a missing value. It is always observed as @samp{.}.
1464 A writer may safely use false for @code{x17}.
1468 X1 only appears in version 3 members.
1476 byte[show-variables]
1478 int32[x18] int32[x19]
1484 @code{lang} may indicate the language in use. Some values seem to be
1485 0: @t{en}, 1: @t{de}, 2: @t{es}, 3: @t{it}, 5: @t{ko}, 6: @t{pl}, 8:
1486 @t{zh-tw}, 10: @t{pt_BR}, 11: @t{fr}.
1488 @code{show-variables} determines how variables are displayed by
1489 default. A value of 1 means to display variable names, 2 to display
1490 variable labels when available, 3 to display both (name followed by
1491 label, separated by a space). The most common value is 0, which
1492 probably means to use a global default.
1494 @code{show-values} is a similar setting for values. A value of 1
1495 means to display the value, 2 to display the value label when
1496 available, 3 to display both. Again, the most common value is 0,
1497 which probably means to use a global default.
1499 @code{show-title} is 1 to show the caption, 10 to hide it.
1501 @code{show-caption} is true to show the caption, false to hide it.
1503 A writer may safely use false for @code{x14}, false for @code{x16}, 0
1504 for @code{lang}, -1 for @code{x18} and @code{x19}, and false for
1509 X2 only appears in version 3 members.
1513 int32[n-row-heights] int32*[n-row-heights]
1514 int32[n-style-map] StyleMap*[n-style-map]
1515 int32[n-styles] StylePair*[n-styles]
1517 StyleMap => int64[cell-index] int16[style-index]
1520 If present, @code{n-row-heights} and the accompanying integers are row
1521 heights as manually adjusted by the user.
1523 The rest of X2 specifies styles for data cells. At first glance this
1524 is odd, because each data cell can have its own style embedded as part
1525 of the data, but in practice X2 specifies a style for a cell only if
1526 that cell is empty (and thus does not appear in the data at all).
1527 Each StyleMap specifies the index of a blank cell, calculated the same
1528 was as in the Cells (@pxref{SPV Light Member Cells}), along with a
1529 0-based index into the accompanying StylePair array.
1531 A writer may safely omit the optional @code{i0 i0} inside the
1532 @code{count(@dots{})}.
1536 X3 only appears in version 3 members.
1540 01 00 byte[x21] 00 00 00
1543 (string[dataset] string[datafile] i0 int32[date] i0)?
1548 @code{small} is a small real number. In the corpus, it overwhelmingly
1549 takes the value 0.0001, with zero occasionally seen. Nonzero numbers
1550 with format 40 (@pxref{SPV Light Member Value}) whose magnitudes are
1551 smaller than displayed in scientific notation. (Thus, a @code{small}
1552 of zero prevents scientific notation from being chosen.)
1554 @code{dataset} is the name of the dataset analyzed to produce the
1555 output, e.g.@: @code{DataSet1}, and @code{datafile} the name of the
1556 file it was read from, e.g.@: @file{C:\Users\foo\bar.sav}. The latter
1557 is sometimes the empty string.
1559 @code{date} is a date, as seconds since the epoch, i.e.@: since
1560 January 1, 1970. Pivot tables within an SPV file often have dates a
1561 few minutes apart, so this is probably a creation date for the table
1562 rather than for the file.
1564 Sometimes @code{dataset}, @code{datafile}, and @code{date} are present
1565 and other times they are absent. The reader can distinguish by
1566 assuming that they are present and then checking whether the
1567 presumptive @code{dataset} contains a null byte (a valid string never
1570 @code{x22} is usually 0 or 2000000.
1572 A writer may safely use 4 for @code{x21} and omit @code{x22} and the
1573 other optional bytes at the end.
1575 @subsubheading Encoding
1577 Formats contains several indications of character encoding:
1581 @code{locale} in Formats itself.
1584 @code{locale} in Y1 (in version 1, Y1 is optionally nested inside X0;
1585 in version 3, Y1 is nested inside X3).
1588 @code{charset} in version 3, in Y1.
1591 @code{lang} in X1, in version 3.
1594 @code{charset}, if present, is a good indication of character
1595 encoding, and in its absence the encoding suffix on @code{locale} in
1598 @code{locale} in Y1 can be disregarded: it is normally the same as
1599 @code{locale} in Formats, and it is only present if @code{charset} is
1602 @code{lang} is not helpful and should be ignored for character
1605 However, the corpus contains many examples of light members whose
1606 strings are encoded in UTF-8 despite declaring some other character
1607 set. Furthermore, the corpus contains several examples of light
1608 members in which some strings are encoded in UTF-8 (and contain
1609 multibyte characters) and other strings are encoded in another
1610 character set (and contain non-ASCII characters). PSPP treats any
1611 valid UTF-8 string as UTF-8 and only falls back to the declared
1612 encoding for strings that are not valid UTF-8.
1614 The @command{pspp-output} program's @command{strings} command can help
1615 analyze the encoding in an SPV light member. Use @code{pspp-output
1616 --help-dev} to see its usage.
1618 @node SPV Light Member Dimensions
1619 @subsection Dimensions
1621 A pivot table presents multidimensional data. A Dimension identifies
1622 the categories associated with each dimension.
1625 Dimensions => int32[n-dims] Dimension*[n-dims]
1627 Value[name] DimProperties
1628 int32[n-categories] Category*[n-categories]
1633 bool[hide-dim-label]
1634 bool[hide-all-labels]
1638 @code{name} is the name of the dimension, e.g.@: @code{Variables},
1639 @code{Statistics}, or a variable name.
1641 The meanings of @code{x1} and @code{x3} are unknown. @code{x1} is
1642 usually 0 but many other values have been observed. A writer may
1643 safely use 0 for @code{x1} and 2 for @code{x3}.
1645 @code{x2} is 0, 1, or 2. For a pivot table with @var{L} layer
1646 dimensions, @var{R} row dimensions, and @var{C} column dimensions,
1647 @code{x2} is 2 for the first @var{L} dimensions, 0 for the next
1648 @var{R} dimensions, and 1 for the remaining @var{C} dimensions. This
1649 does not mean that the layer dimensions must be presented first,
1650 followed by the row dimensions, followed by the column dimensions---on
1651 the contrary, they are frequently in a different order---but @code{x2}
1652 must follow this pattern to prevent the pivot table from being
1655 If @code{hide-dim-label} is 00, the pivot table displays a label for
1656 the dimension itself. Because usually the group and category labels
1657 are enough explanation, it is usually 01.
1659 If @code{hide-all-labels} is 01, the pivot table omits all labels for
1660 the dimension, including group and category labels. It is usually 00.
1661 When @code{hide-all-labels} is 01, @code{show-dim-label} is ignored.
1663 @code{dim-index} is usually the 0-based index of the dimension, e.g.@:
1664 0 for the first dimension, 1 for the second, and so on. Sometimes it
1665 is -1. There is no visible difference. A writer may safely use the
1668 @node SPV Light Member Categories
1669 @subsection Categories
1671 Categories are arranged in a tree. Only the leaf nodes in the tree
1672 are really categories; the others just serve as grouping constructs.
1675 Category => Value[name] (Leaf @math{|} Group)
1676 Leaf => 00 00 00 i2 int32[leaf-index] i0
1678 bool[merge] 00 01 int32[x23]
1679 i-1 int32[n-subcategories] Category*[n-subcategories]
1682 @code{name} is the name of the category (or group).
1684 A Leaf represents a leaf category. The Leaf's @code{leaf-index} is a
1685 nonnegative integer unique within the Dimension and less than
1686 @code{n-categories} in the Dimension. If the user does not sort or
1687 rearrange the categories, then @code{leaf-index} starts at 0 for the
1688 first Leaf in the dimension and increments by 1 with each successive
1689 Leaf. If the user does sorts or rearrange the categories, then the
1690 order of categories in the file reflects that change and
1691 @code{leaf-index} reflects the original order.
1693 A dimension can have no leaf categories at all. A table that
1694 contains such a dimension necessarily has no data at all.
1696 A Group is a group of nested categories. Usually a Group contains at
1697 least one Category, so that @code{n-subcategories} is positive, but
1698 Groups with zero subcategories have been observed.
1700 If a Group's @code{merge} is 00, the most common value, then the group
1701 is really a distinct group that should be represented as such in the
1702 visual representation and user interface. If @code{merge} is 01, the
1703 categories in this group should be shown and treated as if they were
1704 direct children of the group's containing group (or if it has no
1705 parent group, then direct children of the dimension), and this group's
1706 name is irrelevant and should not be displayed. (Merged groups can be
1709 Writers need not use merged groups.
1711 A Group's @code{x23} appears to be i2 when all of the categories
1712 within a group are leaf categories that directly represent data values
1713 for a variable (e.g.@: in a frequency table or crosstabulation, a group
1714 of values in a variable being tabulated) and i0 otherwise. A writer
1715 may safely write a constant 0 in this field.
1717 @node SPV Light Member Axes
1720 After the dimensions come assignment of each dimension to one of the
1721 axes: layers, rows, and columns.
1725 int32[n-layers] int32[n-rows] int32[n-columns]
1726 int32*[n-layers] int32*[n-rows] int32*[n-columns]
1729 The values of @code{n-layers}, @code{n-rows}, and @code{n-columns}
1730 each specifies the number of dimensions displayed in layers, rows, and
1731 columns, respectively. Any of them may be zero. Their values sum to
1732 @code{n-dimensions} from Dimensions (@pxref{SPV Light Member
1735 The following @code{n-dimensions} integers, in three groups, are a
1736 permutation of the 0-based dimension numbers. The first
1737 @code{n-layers} integers specify each of the dimensions represented by
1738 layers, the next @code{n-rows} integers specify the dimensions
1739 represented by rows, and the final @code{n-columns} integers specify
1740 the dimensions represented by columns. When there is more than one
1741 dimension of a given kind, the inner dimensions are given first. (For
1742 the layer axis, this means that the first dimension is at the bottom
1743 of the list and the last dimension is at the top when the current
1744 layer is displayed.)
1746 @node SPV Light Member Cells
1749 The final part of an SPV light member contains the actual data.
1752 Cells => int32[n-cells] Cell*[n-cells]
1753 Cell => int64[index] v1(00?) Value
1756 A Cell consists of an @code{index} and a Value. Suppose there are
1757 @math{d} dimensions, numbered 1 through @math{d} in the order given in
1758 the Dimensions previously, and that dimension @math{i} has @math{n_i}
1759 categories. Consider the cell at coordinates @math{x_i}, @math{1 \le
1760 i \le d}, and note that @math{0 \le x_i < n_i}. Then the index is
1761 calculated by the following algorithm:
1765 for each @math{i} from 1 to @math{d}:
1766 @i{index} = (@math{n_i \times} @i{index}) @math{+} @math{x_i}
1769 For example, suppose there are 3 dimensions with 3, 4, and 5
1770 categories, respectively. The cell at coordinates (1, 2, 3) has
1771 index @math{5 \times (4 \times (3 \times 0 + 1) + 2) + 3 = 33}.
1772 Within a given dimension, the index is the @code{leaf-index} in a Leaf.
1774 @node SPV Light Member Value
1777 Value is used throughout the SPV light member format. It boils down
1778 to a number or a string.
1781 Value => 00? 00? 00? 00? RawValue
1783 01 ValueMod int32[format] double[x]
1784 @math{|} 02 ValueMod int32[format] double[x]
1785 string[var-name] string[value-label] byte[show]
1786 @math{|} 03 string[local] ValueMod string[id] string[c] bool[fixed]
1787 @math{|} 04 ValueMod int32[format] string[value-label] string[var-name]
1788 byte[show] string[s]
1789 @math{|} 05 ValueMod string[var-name] string[var-label] byte[show]
1790 @math{|} 06 string[local] ValueMod string[id] string[c]
1791 @math{|} ValueMod string[template] int32[n-args] Argument*[n-args]
1794 @math{|} int32[x] i0 Value*[x] /* x > 0 */
1797 There are several possible encodings, which one can distinguish by the
1798 first nonzero byte in the encoding.
1802 The numeric value @code{x}, intended to be presented to the user
1803 formatted according to @code{format}, which is about the same as the
1804 format described for system files (@pxref{System File Output
1805 Formats}). The exception is that format 40 is not MTIME but instead
1806 approximately a synonym for F format with a different rule for whether
1807 a value is shown in scientific notation: a value in format 40 is shown
1808 in scientific notation if and only if it is nonzero and its magnitude
1809 is less than @code{small} (@pxref{SPV Light Member Formats}).
1811 Most commonly, @code{format} has width 40 (the maximum).
1813 An @code{x} with the maximum negative double value @code{-DBL_MAX}
1814 represents the system-missing value SYSMIS. (HIGHEST and LOWEST have
1815 not been observed.) See @ref{System File Format}, for more about
1816 these special values.
1819 Similar to @code{01}, with the additional information that @code{x} is
1820 a value of variable @code{var-name} and has value label
1821 @code{value-label}. Both @code{var-name} and @code{value-label} can
1822 be the empty string, the latter very commonly.
1824 @code{show} determines whether to show the numeric value or the value
1825 label. A value of 1 means to show the value, 2 to show the label, 3
1826 to show both, and 0 means to use the default specified in
1827 @code{show-values} (@pxref{SPV Light Member Formats}).
1830 A text string, in two forms: @code{c} is in English, and sometimes
1831 abbreviated or obscure, and @code{local} is localized to the user's
1832 locale. In an English-language locale, the two strings are often the
1833 same, and in the cases where they differ, @code{local} is more
1834 appropriate for a user interface, e.g.@: @code{c} of ``Not a PxP table
1835 for MCN...'' versus @code{local} of ``Computed only for a PxP table,
1836 where P must be greater than 1.''
1838 @code{c} and @code{local} are always either both empty or both
1841 @code{id} is a brief identifying string whose form seems to resemble a
1842 programming language identifier, e.g.@: @code{cumulative_percent} or
1843 @code{factor_14}. It is not unique.
1845 @code{fixed} is 00 for text taken from user input, such as syntax
1846 fragment, expressions, file names, data set names, and 01 for fixed
1847 text strings such as names of procedures or statistics. In the former
1848 case, @code{id} is always the empty string; in the latter case,
1849 @code{id} is still sometimes empty.
1852 The string value @code{s}, intended to be presented to the user
1853 formatted according to @code{format}. The format for a string is not
1854 too interesting, and the corpus contains many clearly invalid formats
1855 like A16.39 or A255.127 or A134.1, so readers should probably entirely
1856 disregard the format. PSPP only checks @code{format} to distinguish
1859 @code{s} is a value of variable @code{var-name} and has value label
1860 @code{value-label}. @code{var-name} is never empty but
1861 @code{value-label} is commonly empty.
1863 @code{show} has the same meaning as in the encoding for 02.
1866 Variable @code{var-name} with variable label @code{var-label}. In the
1867 corpus, @code{var-name} is rarely empty and @code{var-label} is often
1870 @code{show} determines whether to show the variable name or the
1871 variable label. A value of 1 means to show the name, 2 to show the
1872 label, 3 to show both, and 0 means to use the default specified in
1873 @code{show-variables} (@pxref{SPV Light Member Formats}).
1876 Similar to type 03, with @code{fixed} assumed to be true.
1879 When the first byte of a RawValue is not one of the above, the
1880 RawValue starts with a ValueMod, whose syntax is described in the next
1881 section. (A ValueMod always begins with byte 31 or 58.)
1883 This case is a template string, analogous to @code{printf}, followed
1884 by one or more Arguments, each of which has one or more values. The
1885 template string is copied directly into the output except for the
1886 following special syntax,
1893 Each of these expands to the character following @samp{\\}, to escape
1894 characters that have special meaning in template strings. These are
1895 effective inside and outside the @code{[@dots{}]} syntax forms
1899 Expands to a new-line, inside or outside the @code{[@dots{}]} forms
1903 Expands to a formatted version of argument @var{i}, which must have
1904 only a single value. For example, @code{^1} expands to the first
1905 argument's @code{value}.
1907 @item [:@var{a}:]@var{i}
1908 Expands @var{a} for each of the values in @var{i}. @var{a}
1909 should contain one or more @code{^@var{j}} conversions, which are
1910 drawn from the values for argument @var{i} in order. Some examples
1915 All of the values for the first argument, concatenated.
1918 Expands to the values for the first argument, each followed by
1922 Expands to @code{@var{x} = @var{y}} where @var{x} is the second
1923 argument's first value and @var{y} is its second value. (This would
1924 be used only if the argument has two values. If there were more
1925 values, the second and third values would be directly concatenated,
1926 which would look funny.)
1929 @item [@var{a}:@var{b}:]@var{i}
1930 This extends the previous form so that the first values are expanded
1931 using @var{a} and later values are expanded using @var{b}. For an
1932 unknown reason, within @var{a} the @code{^@var{j}} conversions are
1933 instead written as @code{%@var{j}}. Some examples from the corpus:
1937 Expands to all of the values for the first argument, separated by
1940 @item [%1 = %2:, ^1 = ^2:]1
1941 Given appropriate values for the first argument, expands to @code{X =
1945 Given appropriate values, expands to @code{1, 2, 3}.
1949 The template string is localized to the user's locale.
1952 A writer may safely omit all of the optional 00 bytes at the beginning
1953 of a Value, except that it should write a single 00 byte before a
1956 @node SPV Light Member ValueMod
1957 @subsection ValueMod
1959 A ValueMod can specify special modifications to a Value.
1965 int32[n-refs] int16*[n-refs]
1966 int32[n-subscripts] string*[n-subscripts]
1967 v1(00 (i1 | i2) 00? 00? int32 00? 00?)
1968 v3(count(TemplateString StylePair))
1970 TemplateString => count((count((i0 (58 @math{|} 31 55))?) (58 @math{|} 31 string[id]))?)
1977 bool[bold] bool[italic] bool[underline] bool[show]
1978 string[fg-color] string[bg-color]
1979 string[typeface] byte[size]
1982 int32[halign] int32[valign] double[decimal-offset]
1983 int16[left-margin] int16[right-margin]
1984 int16[top-margin] int16[bottom-margin]
1987 A ValueMod that begins with ``31'' specifies special modifications to
1990 Each of the @code{n-refs} integers is a reference to a Footnote
1991 (@pxref{SPV Light Member Footnotes}) by 0-based index. Footnote
1992 markers are shown appended to the main text of the Value, as
1993 superscripts or subscripts.
1995 The @code{subscripts}, if present, are strings to append to the main
1996 text of the Value, as subscripts. Each subscript text is a brief
1997 indicator, e.g.@: @samp{a} or @samp{b}, with its meaning indicated by
1998 the table caption. When multiple subscripts are present, they are
1999 displayed separated by commas.
2001 The @code{id} inside the TemplateString, if present, is a template
2002 string for substitutions using the syntax explained previously. It
2003 appears to be an English-language version of the localized template
2004 string in the Value in which the Template is nested. A writer may
2005 safely omit the optional fixed data in TemplateString.
2007 FontStyle and CellStyle, if present, change the style for this
2008 individual Value. In FontStyle, @code{bold}, @code{italic}, and
2009 @code{underline} control the particular style. @code{show} is
2010 ordinarily 1; if it is 0, then the cell data is not shown.
2011 @code{fg-color} and @code{bg-color} are strings in the format
2012 @code{#rrggbb}, e.g.@: @code{#ff0000} for red or @code{#ffffff} for
2013 white. The empty string is occasionally observed also. The
2014 @code{size} is a font size in units of 1/128 inch.
2016 In CellStyle, @code{halign} is 0 for center, 2 for left, 4 for right,
2017 6 for decimal, 0xffffffad for mixed. For decimal alignment,
2018 @code{decimal-offset} is the decimal point's offset from the right
2019 side of the cell, in pt (@pxref{SPV Light Detail Member Format}).
2020 @code{valign} specifies vertical alignment: 0 for center, 1 for top, 3
2021 for bottom. @code{left-margin}, @code{right-margin},
2022 @code{top-margin}, and @code{bottom-margin} are in pt.
2024 @node SPV Legacy Detail Member Binary Format
2025 @section Legacy Detail Member Binary Format
2027 Whereas the light binary format represents everything about a given
2028 pivot table, the legacy binary format conceptually consists of a
2029 number of named sources, each of which consists of a number of named
2030 variables, each of which is a 1-dimensional array of numbers or
2031 strings or a mix. Thus, the legacy binary member format is quite
2034 This section uses the same context-free grammar notation as in the
2035 previous section, with the following additions:
2039 In a version 0xaf legacy member, @var{x}; in other versions, nothing.
2040 (The legacy member header indicates the version; see below.)
2043 In a version 0xb0 legacy member, @var{x}; in other versions, nothing.
2046 A legacy detail member @file{.bin} has the following overall format:
2050 00 byte[version] int16[n-sources] int32[member-size]
2051 Metadata*[n-sources]
2056 @code{version} is a version number that affects the interpretation of
2057 some of the other data in the member. Versions 0xaf and 0xb0 are
2058 known. We will refer to ``version 0xaf'' and ``version 0xb0'' members
2061 A legacy member consists of @code{n-sources} data sources, each of
2062 which has Metadata and Data.
2064 @code{member-size} is the size of the legacy binary member, in bytes.
2066 The Data and Strings above are commented out because the Metadata has
2067 some oddities that mean that the Data sometimes seems to start at
2068 an unexpected place. The following section goes into detail.
2071 * SPV Legacy Member Metadata::
2072 * SPV Legacy Member Numeric Data::
2073 * SPV Legacy Member String Data::
2076 @node SPV Legacy Member Metadata
2077 @subsection Metadata
2081 int32[n-values] int32[n-variables] int32[data-offset]
2082 vAF(byte*28[source-name])
2083 vB0(byte*64[source-name] int32[x])
2086 A data source has @code{n-variables} variables, each with
2087 @code{n-values} data values.
2089 @code{source-name} is a 28- or 64-byte string padded on the right with
2090 0-bytes. The names that appear in the corpus are very generic:
2091 usually @code{tableData} for pivot table data or @code{source0} for
2094 A given Metadata's @code{data-offset} is the offset, in bytes, from
2095 the beginning of the member to the start of the corresponding Data.
2096 This allows programs to skip to the beginning of the data for a
2097 particular source. In every case in the corpus, the Data follow the
2098 Metadata in the same order, but it is important to use
2099 @code{data-offset} instead of reading sequentially through the file
2100 because of the exception described below.
2102 One SPV file in the corpus has legacy binary members with version 0xb0
2103 but a 28-byte @code{source-name} field (and only a single source). In
2104 practice, this means that the 64-byte @code{source-name} used in
2105 version 0xb0 has a lot of 0-bytes in the middle followed by the
2106 @code{variable-name} of the following Data. As long as a reader
2107 treats the first 0-byte in the @code{source-name} as terminating the
2108 string, it can properly interpret these members.
2110 The meaning of @code{x} in version 0xb0 is unknown.
2112 @node SPV Legacy Member Numeric Data
2113 @subsection Numeric Data
2116 Data => Variable*[n-variables]
2117 Variable => byte*288[variable-name] double*[n-values]
2120 Data follow the Metadata in the legacy binary format, with sources in
2121 the same order (but readers should use the @code{data-offset} in
2122 Metadata records, rather than reading sequentially). Each Variable
2123 begins with a @code{variable-name} that generally indicates its role
2124 in the pivot table, e.g.@: ``cell'', ``cellFormat'',
2125 ``dimension0categories'', ``dimension0group0'', followed by the
2126 numeric data, one double per datum. A double with the maximum
2127 negative double @code{-DBL_MAX} represents the system-missing value
2130 @node SPV Legacy Member String Data
2131 @subsection String Data
2134 Strings => SourceMaps[maps] Labels
2136 SourceMaps => int32[n-maps] SourceMap*[n-maps]
2138 SourceMap => string[source-name] int32[n-variables] VariableMap*[n-variables]
2139 VariableMap => string[variable-name] int32[n-data] DatumMap*[n-data]
2140 DatumMap => int32[value-idx] int32[label-idx]
2142 Labels => int32[n-labels] Label*[n-labels]
2143 Label => int32[frequency] string[label]
2146 Each variable may include a mix of numeric and string data values. If
2147 a legacy binary member contains any string data, Strings is present;
2148 otherwise, it ends just after the last Data element.
2150 The string data overlays the numeric data. When a variable includes
2151 any string data, its Variable represents the string values with a
2152 SYSMIS or NaN placeholder. (Not all such values need be
2155 Each SourceMap provides a mapping between SYSMIS or NaN values in source
2156 @code{source-name} and the string data that they represent.
2157 @code{n-variables} is the number of variables in the source that
2158 include string data. More precisely, it is the 1-based index of the
2159 last variable in the source that includes any string data; thus, it
2160 would be 4 if there are 5 variables and only the fourth one includes
2163 A VariableMap repeats its variable's name, but variables are always
2164 present in the same order as the source, starting from the first
2165 variable, without skipping any even if they have no string values.
2166 Each VariableMap contains DatumMap nonterminals, each of which maps
2167 from a 0-based index within its variable's data to a 0-based label
2168 index, e.g.@: pair @code{value-idx} = 2, @code{label-idx} = 3, means
2169 that the third data value (which must be SYSMIS or NaN) is to be
2170 replaced by the string of the fourth Label.
2172 The labels themselves follow the pairs. The valuable part of each
2173 label is the string @code{label}. Each label also includes a
2174 @code{frequency} that reports the number of DatumMaps that reference
2175 it (although this is not useful).
2177 @node SPV Legacy Detail Member XML Format
2178 @section Legacy Detail Member XML Format
2180 The design of the detail XML format is not what one would end up with
2181 for describing pivot tables. This is because it is a special case
2182 of a much more general format (``visualization XML'' or ``VizML'')
2183 that can describe a wide range of visualizations. Most of this
2184 generality is overkill for tables, and so we end up with a funny
2185 subset of a general-purpose format.
2187 An XML Schema for VizML is available, distributed with SPSS binaries,
2188 under a nonfree license. It contains documentation that is
2189 occasionally helpful.
2191 This section describes the detail XML format using the same notation
2192 already used for the structure XML format (@pxref{SPV Structure Member
2193 Format}). See @file{src/output/spv/detail-xml.grammar} in the PSPP
2194 source tree for the full grammar that it uses for parsing.
2196 The important elements of the detail XML format are:
2200 Variables. @xref{SPV Detail Variable Elements}.
2203 Assignment of variables to axes. A variable can appear as columns, or
2204 rows, or layers. The @code{faceting} element and its sub-elements
2205 describe this assignment.
2208 Styles and other annotations.
2211 This description is not detailed enough to write legacy tables.
2212 Instead, write tables in the light binary format.
2215 * SPV Detail visualization Element::
2216 * SPV Detail Variable Elements::
2217 * SPV Detail extension Element::
2218 * SPV Detail graph Element::
2219 * SPV Detail location Element::
2220 * SPV Detail faceting Element::
2221 * SPV Detail facetLayout Element::
2222 * SPV Detail label Element::
2223 * SPV Detail setCellProperties Element::
2224 * SPV Detail setFormat Element::
2225 * SPV Detail interval Element::
2226 * SPV Detail style Element::
2227 * SPV Detail labelFrame Element::
2228 * SPV Detail Legacy Properties::
2231 @node SPV Detail visualization Element
2232 @subsection The @code{visualization} Element
2240 :style[style_ref]=ref style
2244 => visualization_extension?
2246 (sourceVariable | derivedVariable)+
2255 extension[visualization_extension]
2258 :minWidthSet=(true)?
2259 :maxWidthSet=(true)?
2262 userSource :missing=(listwise | pairwise)? => EMPTY
2264 categoricalDomain => variableReference simpleSort
2266 simpleSort :method[sort_method]=(custom) => categoryOrder
2268 container :style=ref style => container_extension? location+ labelFrame*
2270 extension[container_extension] :combinedFootnotes=(true) => EMPTY
2278 The @code{visualization} element is the root of detail XML member. It
2279 has the following attributes:
2281 @defvr {Attribute} creator
2282 The version of the software that created this SPV file, as a string of
2283 the form @code{xxyyzz}, which represents software version xx.yy.zz,
2284 e.g.@: @code{160001} is version 16.0.1. The corpus includes major
2285 versions 16 through 19.
2288 @defvr {Attribute} date
2289 The date on the which the file was created, as a string of the form
2293 @defvr {Attribute} lang
2294 The locale used for output, in Windows format, which is similar to the
2295 format used in Unix with the underscore replaced by a hyphen, e.g.@:
2296 @code{en-US}, @code{en-GB}, @code{el-GR}, @code{sr-Cryl-RS}.
2299 @defvr {Attribute} name
2300 The title of the pivot table, localized to the output language.
2303 @defvr {Attribute} style
2304 The base style for the pivot table. In every example in the corpus,
2305 the @code{style} element has no attributes other than @code{id}.
2308 @defvr {Attribute} type
2309 A floating-point number. The meaning is unknown.
2312 @defvr {Attribute} version
2313 The visualization schema version number. In the corpus, the value is
2314 one of 2.4, 2.5, 2.7, and 2.8.
2317 The @code{userSource} element has no visible effect.
2319 The @code{extension} element as a child of @code{visualization} has
2320 the following attributes.
2322 @defvr {Attribute} numRows
2323 An integer that presumably defines the number of rows in the displayed
2327 @defvr {Attribute} showGridline
2328 Always set to @code{false} in the corpus.
2331 @defvr {Attribute} minWidthSet
2332 @defvrx {Attribute} maxWidthSet
2333 Always set to @code{true} in the corpus.
2336 The @code{extension} element as a child of @code{container} has the
2339 @defvr {Attribute} combinedFootnotes
2343 The @code{categoricalDomain} and @code{simpleSort} elements have no
2346 The @code{layerController} element has no visible effect.
2348 @node SPV Detail Variable Elements
2349 @subsection Variable Elements
2351 A ``variable'' in detail XML is a 1-dimensional array of data. Each
2352 element of the array may, independently, have string or numeric
2353 content. All of the variables in a given detail XML member either
2354 have the same number of elements or have zero elements.
2356 Two different elements define variables and their content:
2359 @item sourceVariable
2360 These variables' data comes from the associated @code{tableData.bin}
2363 @item derivedVariable
2364 These variables are defined in terms of a mapping function from a
2365 source variable, or they are empty.
2368 A variable named @code{cell} always exists. This variable holds the
2369 data displayed in the table.
2371 Variables in detail XML roughly correspond to the dimensions in a
2372 light detail member. Each dimension has the following variables with
2373 stylized names, where @var{n} is a number for the dimension starting
2377 @item dimension@var{n}categories
2378 The dimension's leaf categories (@pxref{SPV Light Member Categories}).
2380 @item dimension@var{n}group0
2381 Present only if the dimension's categories are grouped, this variable
2382 holds the group labels for the categories. Grouping is inferred
2383 through adjacent identical labels. Categories that are not part of a
2384 group have empty-string data in this variable.
2386 @item dimension@var{n}group1
2387 Present only if the first-level groups are further grouped, this
2388 variable holds the labels for the second-level groups. There can be
2389 additional variables with further levels of grouping.
2391 @item dimension@var{n}
2395 Determining the data for a (non-empty) variable is a multi-step
2400 Draw initial data from its source, for a @code{sourceVariable}, or
2401 from another named variable, for a @code{derivedVariable}.
2404 Apply mappings from @code{valueMapEntry} elements within the
2405 @code{derivedVariable} element, if any.
2408 Apply mappings from @code{relabel} elements within a @code{format} or
2409 @code{stringFormat} element in the @code{sourceVariable} or
2410 @code{derivedVariable} element, if any.
2413 If the variable is a @code{sourceVariable} with a @code{labelVariable}
2414 attribute, and there were no mappings to apply in previous steps, then
2415 replace each element of the variable by the corresponding value in the
2419 A single variable's data can be modified in two of the steps, if both
2420 @code{valueMapEntry} and @code{relabel} are used. The following
2421 example from the corpus maps several integers to 2, then maps 2 in
2422 turn to the string ``Input'':
2425 <derivedVariable categorical="true" dependsOn="dimension0categories"
2426 id="dimension0group0map" value="map(dimension0group0)">
2428 <relabel from="2" to="Input"/>
2429 <relabel from="10" to="Missing Value Handling"/>
2430 <relabel from="14" to="Resources"/>
2431 <relabel from="0" to=""/>
2432 <relabel from="1" to=""/>
2433 <relabel from="13" to=""/>
2435 <valueMapEntry from="2;3;5;6;7;8;9" to="2"/>
2436 <valueMapEntry from="10;11" to="10"/>
2437 <valueMapEntry from="14;15" to="14"/>
2438 <valueMapEntry from="0" to="0"/>
2439 <valueMapEntry from="1" to="1"/>
2440 <valueMapEntry from="13" to="13"/>
2445 * SPV Detail sourceVariable Element::
2446 * SPV Detail derivedVariable Element::
2447 * SPV Detail valueMapEntry Element::
2450 @node SPV Detail sourceVariable Element
2451 @subsubsection The @code{sourceVariable} Element
2458 :domain=ref categoricalDomain?
2460 :dependsOn=ref sourceVariable?
2462 :labelVariable=ref sourceVariable?
2463 => variable_extension* (format | stringFormat)?
2466 This element defines a variable whose data comes from the
2467 @file{tableData.bin} member that corresponds to this @file{.xml}.
2469 This element has the following attributes.
2471 @defvr {Attribute} id
2472 An @code{id} is always present because this element exists to be
2473 referenced from other elements.
2476 @defvr {Attribute} categorical
2477 Always set to @code{true}.
2480 @defvr {Attribute} source
2481 Always set to @code{tableData}, the @code{source-name} in the
2482 corresponding @file{tableData.bin} member (@pxref{SPV Legacy Member
2486 @defvr {Attribute} sourceName
2487 The name of a variable within the source, corresponding to the
2488 @code{variable-name} in the @file{tableData.bin} member (@pxref{SPV
2489 Legacy Member Numeric Data}).
2492 @defvr {Attribute} label
2493 The variable label, if any.
2496 @defvr {Attribute} labelVariable
2497 The @code{variable-name} of a variable whose string values correspond
2498 one-to-one with the values of this variable and are suitable for use
2502 @defvr {Attribute} dependsOn
2503 This attribute doesn't affect the display of a table.
2506 @node SPV Detail derivedVariable Element
2507 @subsubsection The @code{derivedVariable} Element
2514 :dependsOn=ref sourceVariable?
2515 => variable_extension* (format | stringFormat)? valueMapEntry*
2518 Like @code{sourceVariable}, this element defines a variable whose
2519 values can be used elsewhere in the visualization. Instead of being
2520 read from a data source, the variable's data are defined by a
2521 mathematical expression.
2523 This element has the following attributes.
2525 @defvr {Attribute} id
2526 An @code{id} is always present because this element exists to be
2527 referenced from other elements.
2530 @defvr {Attribute} categorical
2531 Always set to @code{true}.
2534 @defvr {Attribute} value
2535 An expression that defines the variable's value. In theory this could
2536 be an arbitrary expression in terms of constants, functions, and other
2537 variables, e.g.@: @math{(@var{var1} + @var{var2}) / 2}. In practice,
2538 the corpus contains only the following forms of expressions:
2542 @itemx constant(@var{variable})
2543 All zeros. The reason why a variable is sometimes named is unknown.
2544 Sometimes the ``variable name'' has spaces in it.
2546 @item map(@var{variable})
2547 Transforms the values in the named @var{variable} using the
2548 @code{valueMapEntry}s contained within the element.
2552 @defvr {Attribute} dependsOn
2553 This attribute doesn't affect the display of a table.
2556 @node SPV Detail valueMapEntry Element
2557 @subsubsection The @code{valueMapEntry} Element
2560 valueMapEntry :from :to => EMPTY
2563 A @code{valueMapEntry} element defines a mapping from one or more
2564 values of a source expression to a target value. (In the corpus, the
2565 source expression is always just the name of a variable.) Each target
2566 value requires a separate @code{valueMapEntry}. If multiple source
2567 values map to the same target value, they can be combined or separate.
2569 In the corpus, all of the source and target values are integers.
2571 @code{valueMapEntry} has the following attributes.
2573 @defvr {Attribute} from
2574 A source value, or multiple source values separated by semicolons,
2575 e.g.@: @code{0} or @code{13;14;15;16}.
2578 @defvr {Attribute} to
2579 The target value, e.g.@: @code{0}.
2582 @node SPV Detail extension Element
2583 @subsection The @code{extension} Element
2585 This is a general-purpose ``extension'' element. Readers that don't
2586 understand a given extension should be able to safely ignore it. The
2587 attributes on this element, and their meanings, vary based on the
2588 context. Each known usage is described separately below. The current
2589 extensions use attributes exclusively, without any nested elements.
2591 @subsubheading @code{container} Parent Element
2594 extension[container_extension] :combinedFootnotes=(true) => EMPTY
2597 With @code{container} as its parent element, @code{extension} has the
2598 following attributes.
2600 @defvr {Attribute} combinedFootnotes
2601 Always set to @code{true} in the corpus.
2604 @subsubheading @code{sourceVariable} and @code{derivedVariable} Parent Element
2607 extension[variable_extension] :from :helpId => EMPTY
2610 With @code{sourceVariable} or @code{derivedVariable} as its parent
2611 element, @code{extension} has the following attributes. A given
2612 parent element often contains several @code{extension} elements that
2613 specify the meaning of the source data's variables or sources, e.g.@:
2616 <extension from="0" helpId="corrected_model"/>
2617 <extension from="3" helpId="error"/>
2618 <extension from="4" helpId="total_9"/>
2619 <extension from="5" helpId="corrected_total"/>
2622 More commonly they are less helpful, e.g.@:
2625 <extension from="0" helpId="notes"/>
2626 <extension from="1" helpId="notes"/>
2627 <extension from="2" helpId="notes"/>
2628 <extension from="5" helpId="notes"/>
2629 <extension from="6" helpId="notes"/>
2630 <extension from="7" helpId="notes"/>
2631 <extension from="8" helpId="notes"/>
2632 <extension from="12" helpId="notes"/>
2633 <extension from="13" helpId="no_help"/>
2634 <extension from="14" helpId="notes"/>
2637 @defvr {Attribute} from
2638 An integer or a name like ``dimension0''.
2641 @defvr {Attribute} helpId
2645 @node SPV Detail graph Element
2646 @subsection The @code{graph} Element
2650 :cellStyle=ref style
2652 => location+ coordinates faceting facetLayout interval
2654 coordinates => EMPTY
2657 @code{graph} has the following attributes.
2659 @defvr {Attribute} cellStyle
2660 @defvrx {Attribute} style
2661 Each of these is the @code{id} of a @code{style} element (@pxref{SPV
2662 Detail style Element}). The former is the default style for
2663 individual cells, the latter for the entire table.
2666 @node SPV Detail location Element
2667 @subsection The @code{location} Element
2671 :part=(height | width | top | bottom | left | right)
2672 :method=(sizeToContent | attach | fixed | same)
2675 :target=ref (labelFrame | graph | container)?
2680 Each instance of this element specifies where some part of the table
2681 frame is located. All the examples in the corpus have four instances
2682 of this element, one for each of the parts @code{height},
2683 @code{width}, @code{left}, and @code{top}. Some examples in the
2684 corpus add a fifth for part @code{bottom}, even though it is not clear
2685 how all of @code{top}, @code{bottom}, and @code{height} can be honored
2686 at the same time. In any case, @code{location} seems to have little
2687 importance in representing tables; a reader can safely ignore it.
2689 @defvr {Attribute} part
2690 The part of the table being located.
2693 @defvr {Attribute} method
2694 How the location is determined:
2698 Based on the natural size of the table. Observed only for
2699 parts @code{height} and @code{width}.
2702 Based on the location specified in @code{target}. Observed only for
2703 parts @code{top} and @code{bottom}.
2706 Using the value in @code{value}. Observed only for parts @code{top},
2707 @code{bottom}, and @code{left}.
2710 Same as the specified @code{target}. Observed only for part
2715 @defvr {Attribute} min
2716 Minimum size. Only observed with value @code{100pt}. Only observed
2717 for part @code{width}.
2720 @defvr {Dependent} target
2721 Required when @code{method} is @code{attach} or @code{same}, not
2722 observed otherwise. This identifies an element to attach to.
2723 Observed with the ID of @code{title}, @code{footnote}, @code{graph},
2727 @defvr {Dependent} value
2728 Required when @code{method} is @code{fixed}, not observed otherwise.
2729 Observed values are @code{0%}, @code{0px}, @code{1px}, and @code{3px}
2730 on parts @code{top} and @code{left}, and @code{100%} on part
2734 @node SPV Detail faceting Element
2735 @subsection The @code{faceting} Element
2738 faceting => layer[layers1]* cross layer[layers2]*
2740 cross => (unity | nest) (unity | nest)
2744 nest => variableReference[vars]+
2746 variableReference :ref=ref (sourceVariable | derivedVariable) => EMPTY
2749 :variable=ref (sourceVariable | derivedVariable)
2752 :method[layer_method]=(nest)?
2757 The @code{faceting} element describes the row, column, and layer
2758 structure of the table. Its @code{cross} child determines the row and
2759 column structure, and each @code{layer} child (if any) represents a
2760 layer. Layers may appear before or after @code{cross}.
2762 The @code{cross} element describes the row and column structure of the
2763 table. It has exactly two children, the first of which describes the
2764 table's columns and the second the table's rows. Each child is a
2765 @code{nest} element if the table has any dimensions along the axis in
2766 question, otherwise a @code{unity} element.
2768 A @code{nest} element contains of one or more dimensions listed from
2769 innermost to outermost, each represented by @code{variableReference}
2770 child elements. Each variable in a dimension is listed in order.
2771 @xref{SPV Detail Variable Elements}, for information on the variables
2772 that comprise a dimension.
2774 A @code{nest} can contain a single dimension, e.g.:
2778 <variableReference ref="dimension0categories"/>
2779 <variableReference ref="dimension0group0"/>
2780 <variableReference ref="dimension0"/>
2785 A @code{nest} can contain multiple dimensions, e.g.:
2789 <variableReference ref="dimension1categories"/>
2790 <variableReference ref="dimension1group0"/>
2791 <variableReference ref="dimension1"/>
2792 <variableReference ref="dimension0categories"/>
2793 <variableReference ref="dimension0"/>
2797 A @code{nest} may have no dimensions, in which case it still has one
2798 @code{variableReference} child, which references a
2799 @code{derivedVariable} whose @code{value} attribute is
2800 @code{constant(0)}. In the corpus, such a @code{derivedVariable} has
2801 @code{row} or @code{column}, respectively, as its @code{id}. This is
2802 equivalent to using a @code{unity} element in place of @code{nest}.
2804 A @code{variableReference} element refers to a variable through its
2805 @code{ref} attribute.
2807 Each @code{layer} element represents a dimension, e.g.:
2810 <layer value="0" variable="dimension0categories" visible="true"/>
2811 <layer value="dimension0" variable="dimension0" visible="false"/>
2815 @code{layer} has the following attributes.
2817 @defvr {Attribute} variable
2818 Refers to a @code{sourceVariable} or @code{derivedVariable} element.
2821 @defvr {Attribute} value
2822 The value to select. For a category variable, this is always
2823 @code{0}; for a data variable, it is the same as the @code{variable}
2827 @defvr {Attribute} visible
2828 Whether the layer is visible. Generally, category layers are visible
2829 and data layers are not, but sometimes this attribute is omitted.
2832 @defvr {Attribute} method
2833 When present, this is always @code{nest}.
2836 @node SPV Detail facetLayout Element
2837 @subsection The @code{facetLayout} Element
2840 facetLayout => tableLayout setCellProperties[scp1]*
2841 facetLevel+ setCellProperties[scp2]*
2844 :verticalTitlesInCorner=bool
2846 :fitCells=(ticks both)?
2850 The @code{facetLayout} element and its descendants control styling for
2853 Its @code{tableLayout} child has the following attributes
2855 @defvr {Attribute} verticalTitlesInCorner
2856 If true, in the absence of corner text, row headings will be displayed
2860 @defvr {Attribute} style
2861 Refers to a @code{style} element.
2864 @defvr {Attribute} fitCells
2868 @subsubheading The @code{facetLevel} Element
2871 facetLevel :level=int :gap=dimension? => axis
2873 axis :style=ref style => label? majorTicks
2879 :tickFrameStyle=ref style
2880 :labelFrequency=int?
2890 Each @code{facetLevel} describes a @code{variableReference} or
2891 @code{layer}, and a table has one @code{facetLevel} element for
2892 each such element. For example, an SPV detail member that contains
2893 four @code{variableReference} elements and two @code{layer} elements
2894 will contain six @code{facetLevel} elements.
2896 In the corpus, @code{facetLevel} elements and the elements that they
2897 describe are always in the same order. The correspondence may also be
2898 observed in two other ways. First, one may use the @code{level}
2899 attribute, described below. Second, in the corpus, a
2900 @code{facetLevel} always has an @code{id} that is the same as the
2901 @code{id} of the element it describes with @code{_facetLevel}
2902 appended. One should not formally rely on this, of course, but it is
2903 usefully indicative.
2905 @defvr {Attribute} level
2906 A 1-based index into the @code{variableReference} and @code{layer}
2907 elements, e.g.@: a @code{facetLayout} with a @code{level} of 1
2908 describes the first @code{variableReference} in the SPV detail member,
2909 and in a member with four @code{variableReference} elements, a
2910 @code{facetLayout} with a @code{level} of 5 describes the first
2911 @code{layer} in the member.
2914 @defvr {Attribute} gap
2915 Always observed as @code{0pt}.
2918 Each @code{facetLevel} contains an @code{axis}, which in turn may
2919 contain a @code{label} for the @code{facetLevel} (@pxref{SPV Detail
2920 label Element}) and does contain a @code{majorTicks} element.
2922 @defvr {Attribute} labelAngle
2923 Normally 0. The value -90 causes inner column or outer row labels to
2924 be rotated vertically.
2927 @defvr {Attribute} style
2928 @defvrx {Attribute} tickFrameStyle
2929 Each refers to a @code{style} element. @code{style} is the style of
2930 the tick labels, @code{tickFrameStyle} the style for the frames around
2934 @node SPV Detail label Element
2935 @subsection The @code{label} Element
2940 :textFrameStyle=ref style?
2941 :purpose=(title | subTitle | subSubTitle | layer | footnote)?
2942 => text+ | descriptionGroup
2945 :target=ref faceting
2947 => (description | text)+
2949 description :name=(variable | value) => EMPTY
2953 :definesReference=int?
2954 :position=(subscript | superscript)?
2959 This element represents a label on some aspect of the table.
2961 @defvr {Attribute} style
2962 @defvrx {Attribute} textFrameStyle
2963 Each of these refers to a @code{style} element. @code{style} is the
2964 style of the label text, @code{textFrameStyle} the style for the frame
2968 @defvr {Attribute} purpose
2969 The kind of entity being labeled.
2972 A @code{descriptionGroup} concatenates one or more elements to form a
2973 label. Each element can be a @code{text} element, which contains
2974 literal text, or a @code{description} element that substitutes a value
2977 @defvr {Attribute} target
2978 The @code{id} of an element being described. In the corpus, this is
2979 always @code{faceting}.
2982 @defvr {Attribute} separator
2983 A string to separate the description of multiple groups, if the
2984 @code{target} has more than one. In the corpus, this is always a
2988 Typical contents for a @code{descriptionGroup} are a value by itself:
2990 <description name="value"/>
2992 @noindent or a variable and its value, separated by a colon:
2994 <description name="variable"/><text>:</text><description name="value"/>
2997 A @code{description} is like a macro that expands to some property of
2998 the target of its parent @code{descriptionGroup}. The @code{name}
2999 attribute specifies the property.
3001 @node SPV Detail setCellProperties Element
3002 @subsection The @code{setCellProperties} Element
3006 :applyToConverse=bool?
3007 => (setStyle | setFrameStyle | setFormat | setMetaData)* union[union_]?
3010 The @code{setCellProperties} element sets style properties of cells or
3011 row or column labels.
3013 Interpreting @code{setCellProperties} requires answering two
3014 questions: which cells or labels to style, and what styles to use.
3016 @subsubheading Which Cells?
3021 intersect => where+ | intersectWhere | alternating | EMPTY
3024 :variable=ref (sourceVariable | derivedVariable)
3029 :variable=ref (sourceVariable | derivedVariable)
3030 :variable2=ref (sourceVariable | derivedVariable)
3033 alternating => EMPTY
3036 When @code{union} is present with @code{intersect} children, each of
3037 those children specifies a group of cells that should be styled, and
3038 the total group is all those cells taken together. When @code{union}
3039 is absent, every cell is styled. One attribute on
3040 @code{setCellProperties} affects the choice of cells:
3042 @defvr {Attribute} applyToConverse
3043 If true, this inverts the meaning of the cell selection: the selected
3044 cells are the ones @emph{not} designated. This is confusing, given
3045 the additional restrictions of @code{union}, but in the corpus
3046 @code{applyToConverse} is never present along with @code{union}.
3049 An @code{intersect} specifies restrictions on the cells to be matched.
3050 Each @code{where} child specifies which values of a given variable to
3051 include. The attributes of @code{intersect} are:
3053 @defvr {Attribute} variable
3054 Refers to a variable, e.g.@: @code{dimension0categories}. Only
3055 ``categories'' variables make sense here, but other variables, e.g.@:
3056 @code{dimension0group0map}, are sometimes seen. The reader may ignore
3060 @defvr {Attribute} include
3061 A value, or multiple values separated by semicolons,
3062 e.g.@: @code{0} or @code{13;14;15;16}.
3065 PSPP ignores @code{setCellProperties} when @code{intersectWhere} is
3068 @subsubheading What Styles?
3072 :target=ref (labeling | graph | interval | majorTicks)
3076 setMetaData :target=ref graph :key :value => EMPTY
3079 :target=ref (majorTicks | labeling)
3081 => format | numberFormat | stringFormat+ | dateTimeFormat | elapsedTimeFormat
3085 :target=ref majorTicks
3089 The @code{set*} children of @code{setCellProperties} determine the
3092 When @code{setCellProperties} contains a @code{setFormat} whose
3093 @code{target} references a @code{labeling} element, or if it contains
3094 a @code{setStyle} that references a @code{labeling} or @code{interval}
3095 element, the @code{setCellProperties} sets the style for table cells.
3096 The format from the @code{setFormat}, if present, replaces the cells'
3097 format. The style from the @code{setStyle} that references
3098 @code{labeling}, if present, replaces the label's font and cell
3099 styles, except that the background color is taken instead from the
3100 @code{interval}'s style, if present.
3102 When @code{setCellProperties} contains a @code{setFormat} whose
3103 @code{target} references a @code{majorTicks} element, or if it
3104 contains a @code{setStyle} whose @code{target} references a
3105 @code{majorTicks}, or if it contains a @code{setFrameStyle} element,
3106 the @code{setCellProperties} sets the style for row or column labels.
3107 In this case, the @code{setCellProperties} always contains a single
3108 @code{where} element whose @code{variable} designates the variable
3109 whose labels are to be styled. The format from the @code{setFormat},
3110 if present, replaces the labels' format. The style from the
3111 @code{setStyle} that references @code{majorTicks}, if present,
3112 replaces the labels' font and cell styles, except that the background
3113 color is taken instead from the @code{setFrameStyle}'s style, if
3116 When @code{setCellProperties} contains a @code{setStyle} whose
3117 @code{target} references a @code{graph} element, and one that
3118 references a @code{labeling} element, and the @code{union} element
3119 contains @code{alternating}, the @code{setCellProperties} sets the
3120 alternate foreground and background colors for the data area. The
3121 foreground color is taken from the style referenced by the
3122 @code{setStyle} that targets the @code{graph}, the background color
3123 from the @code{setStyle} for @code{labeling}.
3125 A reader may ignore a @code{setCellProperties} that only contains
3126 @code{setMetaData}, as well as @code{setMetaData} within other
3127 @code{setCellProperties}.
3129 A reader may ignore a @code{setCellProperties} whose only @code{set*}
3130 child is a @code{setStyle} that targets the @code{graph} element.
3132 @subsubheading The @code{setStyle} Element
3136 :target=ref (labeling | graph | interval | majorTicks)
3141 This element associates a style with the target.
3143 @defvr {Attribute} target
3144 The @code{id} of an element whose style is to be set.
3147 @defvr {Attribute} style
3148 The @code{id} of a @code{style} element that identifies the style to
3152 @node SPV Detail setFormat Element
3153 @subsection The @code{setFormat} Element
3157 :target=ref (majorTicks | labeling)
3159 => format | numberFormat | stringFormat+ | dateTimeFormat | elapsedTimeFormat
3162 This element sets the format of the target, ``format'' in this case
3163 meaning the SPSS print format for a variable.
3165 The details of this element vary depending on the schema version, as
3166 declared in the root @code{visualization} element's @code{version}
3167 attribute (@pxref{SPV Detail visualization Element}). A reader can
3168 interpret the content without knowing the schema version.
3170 The @code{setFormat} element itself has the following attributes.
3172 @defvr {Attribute} target
3173 Refers to an element whose style is to be set.
3176 @defvr {Attribute} reset
3177 If this is @code{true}, this format replaces the target's previous
3178 format. If it is @code{false}, the modifies the previous format.
3182 * SPV Detail numberFormat Element::
3183 * SPV Detail stringFormat Element::
3184 * SPV Detail dateTimeFormat Element::
3185 * SPV Detail elapsedTimeFormat Element::
3186 * SPV Detail format Element::
3187 * SPV Detail affix Element::
3190 @node SPV Detail numberFormat Element
3191 @subsubsection The @code{numberFormat} Element
3195 :minimumIntegerDigits=int?
3196 :maximumFractionDigits=int?
3197 :minimumFractionDigits=int?
3199 :scientific=(onlyForSmall | whenNeeded | true | false)?
3206 Specifies a format for displaying a number. The available options are
3207 a superset of those available from PSPP print formats. PSPP chooses a
3208 print format type for a @code{numberFormat} as follows:
3212 If @code{scientific} is @code{true}, uses @code{E} format.
3215 If @code{prefix} is @code{$}, uses @code{DOLLAR} format.
3218 If @code{suffix} is @code{%}, uses @code{PCT} format.
3221 If @code{useGrouping} is @code{true}, uses @code{COMMA} format.
3224 Otherwise, uses @code{F} format.
3227 For translating to a print format, PSPP uses
3228 @code{maximumFractionDigits} as the number of decimals, unless that
3229 attribute is missing or out of the range [0,15], in which case it uses
3232 @defvr {Attribute} minimumIntegerDigits
3233 Minimum number of digits to display before the decimal point. Always
3234 observed as @code{0}.
3237 @defvr {Attribute} maximumFractionDigits
3238 @defvrx {Attribute} minimumFractionDigits
3239 Maximum or minimum, respectively, number of digits to display after
3240 the decimal point. The observed values of each attribute range from 0
3244 @defvr {Attribute} useGrouping
3245 Whether to use the grouping character to group digits in large
3249 @defvr {Attribute} scientific
3250 This attribute controls when and whether the number is formatted in
3251 scientific notation. It takes the following values:
3255 Use scientific notation only when the number's magnitude is smaller
3256 than the value of the @code{small} attribute.
3259 Use scientific notation when the number will not otherwise fit in the
3263 Always use scientific notation. Not observed in the corpus.
3266 Never use scientific notation. A number that won't otherwise fit will
3267 be replaced by an error indication (see the @code{errorCharacter}
3268 attribute). Not observed in the corpus.
3272 @defvr {Attribute} small
3273 Only present when the @code{scientific} attribute is
3274 @code{onlyForSmall}, this is a numeric magnitude below which the
3275 number will be formatted in scientific notation. The values @code{0}
3276 and @code{0.0001} have been observed. The value @code{0} seems like a
3277 pathological choice, since no real number has a magnitude less than 0;
3278 perhaps in practice such a choice is equivalent to setting
3279 @code{scientific} to @code{false}.
3282 @defvr {Attribute} prefix
3283 @defvrx {Attribute} suffix
3284 Specifies a prefix or a suffix to apply to the formatted number. Only
3285 @code{suffix} has been observed, with value @samp{%}.
3288 @node SPV Detail stringFormat Element
3289 @subsubsection The @code{stringFormat} Element
3292 stringFormat => relabel* affix*
3294 relabel :from=real :to => EMPTY
3297 The @code{stringFormat} element specifies how to display a string. By
3298 default, a string is displayed verbatim, but @code{relabel} can change
3301 The @code{relabel} element appears as a child of @code{stringFormat}
3302 (and of @code{format}, when it is used to format strings). It
3303 specifies how to display a given value. It is used to implement value
3304 labels and to display the system-missing value in a human-readable
3305 way. It has the following attributes:
3307 @defvr {Attribute} from
3308 The value to map. In the corpus this is an integer or the
3309 system-missing value @code{-1.797693134862316E300}.
3312 @defvr {Attribute} to
3313 The string to display in place of the value of @code{from}. In the
3314 corpus this is a wide variety of value labels; the system-missing
3315 value is mapped to @samp{.}.
3318 @node SPV Detail dateTimeFormat Element
3319 @subsubsection The @code{dateTimeFormat} Element
3323 :baseFormat[dt_base_format]=(date | time | dateTime)
3325 :mdyOrder=(dayMonthYear | monthDayYear | yearMonthDay)?
3327 :yearAbbreviation=bool?
3332 :monthFormat=(long | short | number | paddedNumber)?
3336 :showDayOfWeek=bool?
3337 :dayOfWeekAbbreviation=bool?
3339 :dayOfMonthPadding=bool?
3341 :minutePadding=bool?
3342 :secondPadding=bool?
3348 :dayType=(month | year)?
3349 :hourFormat=(AMPM | AS_24 | AS_12)?
3353 This element appears only in schema version 2.5 and earlier
3354 (@pxref{SPV Detail visualization Element}).
3356 Data to be formatted in date formats is stored as strings in legacy
3357 data, in the format @code{yyyy-mm-ddTHH:MM:SS.SSS} and must be parsed
3358 and reformatted by the reader.
3360 The following attribute is required.
3362 @defvr {Attribute} baseFormat
3363 Specifies whether a date and time are both to be displayed, or just
3367 Many of the attributes' meanings are obvious. The following seem to
3368 be worth documenting.
3370 @defvr {Attribute} separatorChars
3371 Exactly four characters. In order, these are used for: decimal point,
3372 grouping, date separator, time separator. Always @samp{.,-:}.
3375 @defvr {Attribute} mdyOrder
3376 Within a date, the order of the days, months, and years.
3377 @code{dayMonthYear} is the only observed value, but one would expect
3378 that @code{monthDayYear} and @code{yearMonthDay} to be reasonable as
3382 @defvr {Attribute} showYear
3383 @defvrx {Attribute} yearAbbreviation
3384 Whether to include the year and, if so, whether the year should be
3385 shown abbreviated, that is, with only 2 digits. Each is @code{true}
3386 or @code{false}; only values of @code{true} and @code{false},
3387 respectively, have been observed.
3390 @defvr {Attribute} showMonth
3391 @defvrx {Attribute} monthFormat
3392 Whether to include the month (@code{true} or @code{false}) and, if so,
3393 how to format it. @code{monthFormat} is one of the following:
3397 The full name of the month, e.g.@: in an English locale,
3401 The abbreviated name of the month, e.g.@: in an English locale,
3405 The number representing the month, e.g.@: 9 for September.
3408 A two-digit number representing the month, e.g.@: 09 for September.
3411 Only values of @code{true} and @code{short}, respectively, have been
3415 @defvr {Attribute} dayType
3416 This attribute is always @code{month} in the corpus, specifying that
3417 the day of the month is to be displayed; a value of @code{year} is
3418 supposed to indicate that the day of the year, where 1 is January 1,
3419 is to be displayed instead.
3422 @defvr {Attribute} hourFormat
3423 @code{hourFormat}, if present, is one of:
3427 The time is displayed with an @code{am} or @code{pm} suffix, e.g.@:
3431 The time is displayed in a 24-hour format, e.g.@: @code{22:15}.
3433 This is the only value observed in the corpus.
3436 The time is displayed in a 12-hour format, without distinguishing
3437 morning or evening, e.g.@: @code{10;15}.
3440 @code{hourFormat} is sometimes present for @code{elapsedTime} formats,
3441 which is confusing since a time duration does not have a concept of AM
3442 or PM. This might indicate a bug in the code that generated the XML
3443 in the corpus, or it might indicate that @code{elapsedTime} is
3444 sometimes used to format a time of day.
3447 For a @code{baseFormat} of @code{date}, PSPP chooses a print format
3448 type based on the following rules:
3452 If @code{showQuarter} is true: @code{QYR}.
3455 Otherwise, if @code{showWeek} is true: @code{WKYR}.
3458 Otherwise, if @code{mdyOrder} is @code{dayMonthYear}:
3462 If @code{monthFormat} is @code{number} or @code{paddedNumber}: @code{EDATE}.
3465 Otherwise: @code{DATE}.
3469 Otherwise, if @code{mdyOrder} is @code{yearMonthDay}: @code{SDATE}.
3472 Otherwise, @code{ADATE}.
3475 For a @code{baseFormat} of @code{dateTime}, PSPP uses @code{YMDHMS} if
3476 @code{mdyOrder} is @code{yearMonthDay} and @code{DATETIME} otherwise.
3477 For a @code{baseFormat} of @code{time}, PSPP uses @code{DTIME} if
3478 @code{showDay} is true, otherwise @code{TIME} if @code{showHour} is
3479 true, otherwise @code{MTIME}.
3481 For a @code{baseFormat} of @code{date}, the chosen width is the
3482 minimum for the format type, adding 2 if @code{yearAbbreviation} is
3483 false or omitted. For other base formats, the chosen width is the
3484 minimum for its type, plus 3 if @code{showSecond} is true, plus 4 more
3485 if @code{showMillis} is also true. Decimals are 0 by default, or 3
3486 if @code{showMillis} is true.
3488 @node SPV Detail elapsedTimeFormat Element
3489 @subsubsection The @code{elapsedTimeFormat} Element
3493 :baseFormat[dt_base_format]=(date | time | dateTime)
3496 :minutePadding=bool?
3497 :secondPadding=bool?
3507 This element specifies the way to display a time duration.
3509 Data to be formatted in elapsed time formats is stored as strings in
3510 legacy data, in the format @code{H:MM:SS.SSS}, with additional hour
3511 digits as needed for long durations, and must be parsed and
3512 reformatted by the reader.
3514 The following attribute is required.
3516 @defvr {Attribute} baseFormat
3517 Specifies whether a day and a time are both to be displayed, or just
3521 The remaining attributes specify exactly how to display the elapsed
3524 For @code{baseFormat} of @code{time}, PSPP converts this element to
3525 print format type @code{DTIME}; otherwise, if @code{showHour} is true,
3526 to @code{TIME}; otherwise, to @code{MTIME}. The chosen width is the
3527 minimum for the chosen type, adding 3 if @code{showSecond} is true,
3528 adding 4 more if @code{showMillis} is also true. Decimals are 0 by
3529 default, or 3 if @code{showMillis} is true.
3531 @node SPV Detail format Element
3532 @subsubsection The @code{format} Element
3536 :baseFormat[f_base_format]=(date | time | dateTime | elapsedTime)?
3539 :mdyOrder=(dayMonthYear | monthDayYear | yearMonthDay)?
3544 :yearAbbreviation=bool?
3546 :monthFormat=(long | short | number | paddedNumber)?
3548 :dayOfMonthPadding=bool?
3552 :showDayOfWeek=bool?
3553 :dayOfWeekAbbreviation=bool?
3555 :minutePadding=bool?
3556 :secondPadding=bool?
3562 :dayType=(month | year)?
3563 :hourFormat=(AMPM | AS_24 | AS_12)?
3564 :minimumIntegerDigits=int?
3565 :maximumFractionDigits=int?
3566 :minimumFractionDigits=int?
3568 :scientific=(onlyForSmall | whenNeeded | true | false)?
3572 :tryStringsAsNumbers=bool?
3573 :negativesOutside=bool?
3577 This element is the union of all of the more-specific format elements.
3578 It is interpreted in the same way as one of those format elements,
3579 using @code{baseFormat} to determine which kind of format to use.
3581 There are a few attributes not present in the more specific formats:
3583 @defvr {Attribute} tryStringsAsNumbers
3584 When this is @code{true}, it is supposed to indicate that string
3585 values should be parsed as numbers and then displayed according to
3586 numeric formatting rules. However, in the corpus it is always
3590 @defvr {Attribute} negativesOutside
3591 If true, the negative sign should be shown before the prefix; if
3592 false, it should be shown after.
3595 @node SPV Detail affix Element
3596 @subsubsection The @code{affix} Element
3600 :definesReference=int
3601 :position=(subscript | superscript)
3607 This defines a suffix (or, theoretically, a prefix) for a formatted
3608 value. It is used to insert a reference to a footnote. It has the
3609 following attributes:
3611 @defvr {Attribute} definesReference
3612 This specifies the footnote number as a natural number: 1 for the
3613 first footnote, 2 for the second, and so on.
3616 @defvr {Attribute} position
3617 Position for the footnote label. Always @code{superscript}.
3620 @defvr {Attribute} suffix
3621 Whether the affix is a suffix (@code{true}) or a prefix
3622 (@code{false}). Always @code{true}.
3625 @defvr {Attribute} value
3626 The text of the suffix or prefix. Typically a letter, e.g.@: @code{a}
3627 for footnote 1, @code{b} for footnote 2, @enddots{} The corpus
3628 contains other values: @code{*}, @code{**}, and a few that begin with
3629 at least one comma: @code{,b}, @code{,c}, @code{,,b}, and @code{,,c}.
3632 @node SPV Detail interval Element
3633 @subsection The @code{interval} Element
3636 interval :style=ref style => labeling footnotes?
3640 :variable=ref (sourceVariable | derivedVariable)
3641 => (formatting | format | footnotes)*
3643 formatting :variable=ref (sourceVariable | derivedVariable) => formatMapping*
3645 formatMapping :from=int => format?
3649 :variable=ref (sourceVariable | derivedVariable)
3652 footnoteMapping :definesReference=int :from=int :to => EMPTY
3655 The @code{interval} element and its descendants determine the basic
3656 formatting and labeling for the table's cells. These basic styles are
3657 overridden by more specific styles set using @code{setCellProperties}
3658 (@pxref{SPV Detail setCellProperties Element}).
3660 The @code{style} attribute of @code{interval} itself may be ignored.
3662 The @code{labeling} element may have a single @code{formatting} child.
3663 If present, its @code{variable} attribute refers to a variable whose
3664 values are format specifiers as numbers, e.g. value 0x050802 for F8.2.
3665 However, the numbers are not actually interpreted that way. Instead,
3666 each number actually present in the variable's data is mapped by a
3667 @code{formatMapping} child of @code{formatting} to a @code{format}
3668 that specifies how to display it.
3670 The @code{labeling} element may also have a @code{footnotes} child
3671 element. The @code{variable} attribute of this element refers to a
3672 variable whose values are comma-delimited strings that list the
3673 1-based indexes of footnote references. (Cells without any footnote
3674 references are numeric 0 instead of strings.)
3676 Each @code{footnoteMapping} child of the @code{footnotes} element
3677 defines the footnote marker to be its @code{to} attribute text for the
3678 footnote whose 1-based index is given in its @code{definesReference}
3681 @node SPV Detail style Element
3682 @subsection The @code{style} Element
3689 :border-bottom=(solid | thick | thin | double | none)?
3690 :border-top=(solid | thick | thin | double | none)?
3691 :border-left=(solid | thick | thin | double | none)?
3692 :border-right=(solid | thick | thin | double | none)?
3693 :border-bottom-color?
3696 :border-right-color?
3699 :font-weight=(regular | bold)?
3700 :font-style=(regular | italic)?
3701 :font-underline=(none | underline)?
3702 :margin-bottom=dimension?
3703 :margin-left=dimension?
3704 :margin-right=dimension?
3705 :margin-top=dimension?
3706 :textAlignment=(left | right | center | decimal | mixed)?
3707 :labelLocationHorizontal=(positive | negative | center)?
3708 :labelLocationVertical=(positive | negative | center)?
3709 :decimal-offset=dimension?
3716 A @code{style} element has an effect only when it is referenced by
3717 another element to set some aspect of the table's style. Most of the
3718 attributes are self-explanatory. The rest are described below.
3720 @defvr {Attribute} {color}
3721 In some cases, the text color; in others, the background color.
3724 @defvr {Attribute} {color2}
3728 @defvr {Attribute} {labelAngle}
3729 Normally 0. The value -90 causes inner column or outer row labels to
3730 be rotated vertically.
3733 @defvr {Attribute} {labelLocationHorizontal}
3737 @defvr {Attribute} {labelLocationVertical}
3738 The value @code{positive} corresponds to vertically aligning text to
3739 the top of a cell, @code{negative} to the bottom, @code{center} to the
3743 @node SPV Detail labelFrame Element
3744 @subsection The @code{labelFrame} Element
3747 labelFrame :style=ref style => location+ label? paragraph?
3749 paragraph :hangingIndent=dimension? => EMPTY
3752 A @code{labelFrame} element specifies content and style for some
3753 aspect of a table. Only @code{labelFrame} elements that have a
3754 @code{label} child are important. The @code{purpose} attribute in the
3755 @code{label} determines what the @code{labelFrame} affects:
3759 The table's title and its style.
3762 The table's caption and its style.
3765 The table's footnotes and the style for the footer area.
3768 The style for the layer area.
3774 The @code{style} attribute references the style to use for the area.
3776 The @code{label}, if present, specifies the text to put into the title
3777 or caption or footnotes. For footnotes, the label has two @code{text}
3778 children for every footnote, each of which has a @code{usesReference}
3779 attribute identifying the 1-based index of a footnote. The first,
3780 third, fifth, @dots{} @code{text} child specifies the content for a
3781 footnote; the second, fourth, sixth, @dots{} child specifies the
3782 marker. Content tends to end in a new-line, which the reader may wish
3783 to trim; similarly, markers tend to end in @samp{.}.
3785 The @code{paragraph}, if present, may be ignored, since it is always
3788 @node SPV Detail Legacy Properties
3789 @subsection Legacy Properties
3791 The detail XML format has features for styling most of the aspects of
3792 a table. It also inherits defaults for many aspects from structure
3793 XML, which has the following @code{tableProperties} element:
3798 => generalProperties footnoteProperties cellFormatProperties borderProperties printingProperties
3801 :hideEmptyRows=bool?
3802 :maximumColumnWidth=dimension?
3803 :maximumRowWidth=dimension?
3804 :minimumColumnWidth=dimension?
3805 :minimumRowWidth=dimension?
3806 :rowDimensionLabels=(inCorner | nested)?
3810 :markerPosition=(superscript | subscript)?
3811 :numberFormat=(alphabetic | numeric)?
3814 cellFormatProperties => cell_style+
3817 :alternatingColor=color?
3818 :alternatingTextColor=color?
3826 :font-style=(regular | italic)?
3827 :font-weight=(regular | bold)?
3828 :font-underline=(none | underline)?
3829 :labelLocationVertical=(positive | negative | center)?
3830 :margin-bottom=dimension?
3831 :margin-left=dimension?
3832 :margin-right=dimension?
3833 :margin-top=dimension?
3834 :textAlignment=(left | right | center | decimal | mixed)?
3835 :decimal-offset=dimension?
3838 borderProperties => border_style+
3841 :borderStyleType=(none | solid | dashed | thick | thin | double)?
3846 :printAllLayers=bool?
3847 :rescaleLongTableToFitPage=bool?
3848 :rescaleWideTableToFitPage=bool?
3849 :windowOrphanLines=int?
3851 :continuationTextAtBottom=bool?
3852 :continuationTextAtTop=bool?
3853 :printEachLayerOnSeparatePage=bool?
3857 The @code{name} attribute appears only in standalone @file{.stt} files
3858 (@pxref{SPSS TableLook STT Format}).