5 Xapian itself doesn't put any restrictions on the contents of a term, other
6 than that terms can't be empty, and there's an upper limit on the length
7 (which is backend dependent - chert and glass allow 245 bytes, except
8 that zero bytes count double in this length).
10 However, Omega and ``Xapian::QueryParser`` impose some rules to aid
11 interoperability and make it easier to write code that doesn't require
12 excessive configuring. It's probably wise to follow these rules unless
13 you have a good reason not to. Right now you might not intend to use Omega
14 or the QueryParser, not to combine a search with another database. But if
15 you later find you do, it'll be much easier if you're using compatible
18 The basic idea is that terms won't begin with a capital letter (since they're
19 usually lower-cased and often stemmed), so any term which starts with a capital
20 letter is assumed to have a prefix. For all letters apart from X, this is a
21 single character prefix and these have predefined standard meanings (or are
22 reserved for standard meanings but currently unallocated).
24 X starts a multi-capital letter user-defined prefix. If you want a prefix for
25 something without a standard prefix, you create your own starting with an X
26 (e.g. XSHOESIZE). The prefix ends with the first non-capital. If the term
27 you're prefixing starts with a capital letter or ":", add a ":" between prefix
28 and term to resolve ambiguity about where the prefix ends and the term begins.
30 Here's the current allocation list:
35 Topic (mnemonic: what the document is aBout)
37 Date (numeric format: YYYYMMDD or "latest" - e.g. D20050224 or Dlatest)
39 Extension (folded to lowercase - e.g. Ehtml, or E for no extension)
43 newsGroup (or similar entity - e.g. a web forum name)
47 boolean filter term for "can see" permission (mnemonic: Include)
49 Site term (mnemonic: Jumping off point)
55 Month (numeric format: YYYYMM)
57 ISO couNtry code (or domaiN name)
65 Raw (i.e. unstemmed) term (unused by Xapian since 1.0.0)
71 full URL of indexed document - if the resulting term would be > 240
72 bytes, a hashing scheme is used to prevent overflowing
73 the Xapian term length limit (see omindex for how to do this).
75 boolean filter term for "can't see" permission (mnemonic: grep -v)
77 longer prefix for user-defined use
83 Reserved but currently unallocated: CW
85 There are two main uses for prefixes - boolean filters and free-text fields.
90 If the documents being indexed describe objects in a museum, you might
91 have a 'material' field, which records what each object is primarily made of.
92 So a sundial might be 'material=Stone', a letter might be 'material=paper',
93 etc. There's no standard prefix for 'material', so you might allocate ``XM``.
94 If you lowercase the field contents, you can avoid having to add a colon to
95 separate the prefix and content, so documents would be indexed by terms such as
96 ``XMstone`` or ``XMpaper``.
98 If you're indexing using scriptindex, and have a field in the input file
99 such as "material=Stone", and then your index script would have a rule
102 material : lower boolean=XM
104 You can then restrict a search in Omega by passing a B parameter with one
105 of these as the value, e.g. ``B=XMstone``
107 In your HTML search form, you can allow the user to select this using a set of
111 <input type="radio" name="B" value=""> Any<br>
112 <input type="radio" name="B" value="XMpaper"> Paper<br>
113 <input type="radio" name="B" value="XMstone"> Stone<br>
115 If you want to have multiple sets of radio buttons for selecting different
116 boolean filters, you can make use of Omega's preprocessing of CGI parameter
117 names by calling them "B 1", "B 2", etc (names are truncated at the first
118 space - see `cgiparams.html <cgiparams.html>`_ for full details).
120 You can also use a select tag::
124 <option value="">Any</option>
125 <option value="XMpaper">Paper</option>
126 <option value="XMstone">Stone</option>
129 Or if you want the user to be able to select more than one material to filter
130 by, you can use checkboxes instead of radio buttons::
133 <input type="checkbox" name="B" value="XMpaper"> Paper<br>
134 <input type="checkbox" name="B" value="XMstone"> Stone<br>
136 Or a multiple select::
139 <select multiple name="B">
140 <option value="XMpaper">Paper</option>
141 <option value="XMstone">Stone</option>
144 These will work in the natural way - if no materials are selected, then no
145 filtering by material will happen; if multiple materials are selected, then
146 items made of any of the materials will match (in details, groups of filter
147 terms with the same prefix are combined with ``OP_OR``; then these groups
148 are combined with ``OP_AND``).
150 Or perhaps the museum records multiple materials per object - e.g. a clock
151 might be made of brass, glass and wood. This can be handled smoothly too - you
152 can specify multiple material fields to scriptindex::
158 You may then want multiple filters on material to be mean "find me objects
159 which contain **all** of these materials" (rather than the default meaning
160 of "find me objects which contain **any** of these materials") - to do this
161 you want to set ``XM`` as a non-exclusive prefix, which you do like so (this
162 needs Omega 1.3.4 or later)::
164 $setmap{nonexclusiveprefix,XM,true}
166 You can also allow the user to restrict a search with a boolean filter
167 specified in text query (e.g. ``material:paper`` -> ``XMpaper``) by adding this
168 to the start of your OmegaScript template::
170 $setmap{boolprefix,material,XM}
172 Multiple aliases are allowed::
174 $setmap{boolprefix,material,XM,madeof,XM}
176 This decoupling of internal and external names is also useful if you want
177 to offer search frontends in more than one language, as it allows the
178 prefixes the user sees to be translated.
180 If the user specified multiple filters in the query string, for example
181 `material:wood material:paper`, then these are combined using similar logic
182 to that used for filters specified by ``B`` CGI parameters, with terms with the
183 same prefixed combined with ``OP_OR`` by default, or ``OP_AND`` specified by
184 ``$setmap{nonexclusiveprefix,...}``.
189 Say you want to index the title of the document such that the user can
190 search within the title by specifying title:report (for example) in their
193 Title has standard prefix S, so you'd generate terms as normal, but then
194 add an "S" prefix. If you're using scriptindex, then you do this by
195 adding "index=S" to the scriptindex rule like so::
197 title : field=title index=S
199 You then need to tell Xapian::QueryParser that "title:" maps to an "S" prefix.
200 If you're using Omega, then you do so by adding this to your OmegaScript
201 template (at the start is best)::
203 $setmap{prefix,title,S}
205 Or if you're writing your own search frontend, like this::
207 Xapian::QueryParser qp;
208 qp.add_prefix("subject", "S");
209 // And similar lines for other free-text prefixes...
210 // And any other QueryParser configuration (e.g. stemmer, stopper).
211 Xapian::Query query = qp.parse_query(user_query_string);
213 You can add multiple aliases for a prefix (e.g. title and subject for S), and
214 the decoupling of "UI prefix" and "term prefix" means you can easily translate
215 the "UI prefixes" if you have frontends in different languages.
217 Note that if you want words from the subject to be found without a prefix, you
218 either need to generate unprefixed terms as well as the prefixed ones, or map
219 the empty prefix to both "" and "S" like so::
221 Xapian::QueryParser qp;
222 // Search both subject and body if no field is specified:
223 qp.add_prefix("", "");
224 qp.add_prefix("", "S");
225 // Search just the subject if 'subject:' is specified:
226 qp.add_prefix("subject", "S");
227 Xapian::Query query = qp.parse_query(user_query_string);