5 If you just want a very quick overview, you might prefer to read the
6 `quick-start guide <quickstart.html>`_.
8 Omega operates on a set of databases. Each database is created and updated
9 separately using either omindex or `scriptindex <scriptindex.html>`_. You can
10 search these databases (or any other Xapian database with suitable contents)
11 via a web front-end provided by omega, a CGI application. A search can also be
12 done over more than one database at once.
14 There are separate documents covering `CGI parameters <cgiparams.html>`_, the
15 `Term Prefixes <termprefixes.html>`_ which are conventionally used, and
16 `OmegaScript <omegascript.html>`_, the language used to define omega's web
17 interface. Omega ships with several OmegaScript templates and you can
18 use these, modify them, or just write your own. See the "Supplied Templates"
19 section below for details of the supplied templates.
21 Omega parses queries using the ``Xapian::QueryParser`` class - for the supported
22 syntax, see queryparser.html in the xapian-core documentation
23 - available online at: https://xapian.org/docs/queryparser.html
28 Documents within an omega database are indexed by two types of terms: those
29 used for a weighted search from a parsed query string (the CGI parameter
30 ``P``), and those used for boolean filtering (the CGI parameters ``B`` and
31 ``N`` - the latter is a negated variant of 'B' and was added in Omega 1.3.5).
33 Boolean terms always start with a prefix which is an initial capital letter (or
34 multiple capital letters if the first character is `X`) which denotes the
35 category of the term (e.g. `M` for MIME type).
37 Parsed query terms may have a prefix, but don't always. Those from the body of
38 the document in unstemmed form don't; stemmed terms have a `Z` prefix; terms
39 from other fields have a prefix to indicate the field, such as `S` for the
40 document title; stemmed terms from a field have both prefixes, e.g. `ZS`.
42 The "english" stemmer is used by default - you can configure this for omindex
43 and scriptindex with ``--stemmer=LANGUAGE`` (use ``--stemmer=none`` to disable
44 stemming, see omindex ``--help`` for the list of accepted language names). At
45 search time you can configure the stemmer by adding ``$set{stemmer,LANGUAGE}``
46 to the top of your OmegaScript template.
48 The two term types are used as follows when building the query:
50 The ``P`` parameter is parsed using `Xapian::QueryParser` to give a
51 `Xapian::Query` object denoted as `P-terms` below.
53 There are two ways that ``B`` and ``N`` parameters are handled, depending if
54 the term-prefix has been configured as "non-exclusive" or not. The default is
55 "exclusive" (and in versions before 1.3.4, this was how all ``B`` parameters
58 Exclusive Boolean Prefix
59 ------------------------
61 B(oolean) terms from 'B' parameters with the same prefix are ORed together,
67 B(F,1) B(F,2)...B(F,n)
69 Where B(F,1) is the first boolean term with prefix F from a 'B' parameter, and
72 Non-Exclusive Boolean Prefix
73 ----------------------------
75 For example, ``$setmap{nonexclusiveprefix,K,true}`` sets prefix `K` as
76 non-exclusive, which means that multiple filter terms from 'B' parameters will
77 be combined with "AND" instead of "OR", like so::
81 B(K,1) B(K,2)... B(K,m)
83 Combining the Boolean Filters
84 -----------------------------
86 The subqueries for each prefix from "B" parameters are combined with AND,
87 to make this (which we refer to as "B-filter" below)::
94 B(F,1) B(F,2)...B(F,n) B(K,1) B(K,2)...B(K,m)
100 All the terms from all 'N' parameters are combined together with "OR", to
101 make this (which we refer to as "N-filter" below)::
105 N(F,1)...N(F,n) N(K,1)...N(K,m)
107 Putting it all together
108 -----------------------
110 The P-terms are filtered by the B-filter using "FILTER" and by the N-filter
121 The intent here is to allow filtering on arbitrary (and, typically,
122 orthogonal) characteristics of the document. For instance, by adding
123 boolean terms "Ttext/html", "Ttext/plain" and "J/press" you would be
124 filtering the parsed query to only retrieve documents that are both in
125 the "/press" site *and* which are either of MIME type text/html or
126 text/plain. (See below for more information about sites.)
128 If B-terms or N-terms is absent, that part of the query is simply omitted.
130 If there is no parsed query, the boolean filter is promoted to
131 be the query, and the weighting scheme is set to boolean. This has
132 the effect of applying the boolean filter to the whole database. If
133 there are only N-terms, then ``Query::MatchAll`` is used for the left
134 side of the "AND_NOT".
136 In order to add more boolean prefixes, you will need to alter the
137 ``index_file()`` function in omindex.cc. Currently omindex adds several
138 useful ones, detailed below.
140 Parsed query terms are constructed from the title, body and keywords
141 of a document. (Not all document types support all three areas of
142 text.) Title terms are stored with position data starting at 0, body
143 terms starting 100 beyond title terms, and keyword terms starting 100
144 beyond body terms. This allows queries using positional data without
145 causing false matches across the different types of term.
150 Within a database, Omega supports multiple sites. These are recorded
151 using boolean terms (see 'Term construction', above) to allow
154 Sites work by having all documents within them having a common base
155 URL. For instance, you might have two sites, one for your press area
156 and one for your product descriptions:
158 - \http://example.com/press/index.html
159 - \http://example.com/press/bigrelease.html
160 - \http://example.com/products/bigproduct.html
161 - \http://example.com/products/littleproduct.html
163 You could index all documents within \http://example.com/press/ using a
164 site of '/press', and all within \http://example.com/products/ using
167 Sites are also useful because omindex indexes documents through the
168 file system, not by fetching from the web server. If you don't have a
169 URL to file system mapping which puts all documents under one
170 hierarchy, you'll need to index each separate section as a site.
172 An obvious example of this is the way that many web servers map URLs
173 of the form <\http://example.com/~<username>/> to a directory within
174 that user's home directory (such as ~<username>/pub on a Unix
175 system). In this case, you can index each user's home page separately,
176 as a site of the form '/~<username>'. You can then use boolean
177 filters to allow people to search only a specific home page (or a
178 group of them), or omit such terms to search everyone's pages.
180 Note that the site specified when you index is used to build the
181 complete URL that the results page links to. Thus while sites will
182 typically want to be relative to the hostname part of the URL (e.g.
183 '/site' rather than '\http://example.com/site'), you can use them
184 to have a single search across several different hostnames. This will
185 still work if you actually store each distinct hostname in a different
191 omindex is fairly simple to use, for example::
193 omindex --db default --url http://example.com/ /var/www/example.com
195 For a full list of command line options supported, see ``man omindex``
196 or ``omindex --help``.
198 You *must* specify the database to index into (it's created if it doesn't
199 exist, but parent directories must exist). You will often also want to specify
200 the base URL (which is used as the site, and can be relative to the hostname -
201 starts '/' - or absolute - starts with a scheme, e.g.
202 '\http://example.com/products/'). If not specified, the base URL defaults to
205 You also need to tell omindex which directory to index. This should be
206 either a single directory (in which case it is taken to be the
207 directory base of the entire site being indexed), or as two arguments,
208 the first being the directory base of the site being indexed, and the
209 second being a relative directory within that to index.
211 For instance, in the example above, if you separate your products by
212 size, you might end up with:
214 - \http://example.com/press/index.html
215 - \http://example.com/press/bigrelease.html
216 - \http://example.com/products/large/bigproduct.html
217 - \http://example.com/products/small/littleproduct.html
219 If the entire website is stored in the file system under the directory
220 /www/example, then you would probably index the site in two
221 passes, one for the '/press' site and one for the '/products' site. You
222 might use the following commands::
224 $ omindex -p --db /var/lib/omega/data/default --url /press /www/example/press
225 $ omindex -p --db /var/lib/omega/data/default --url /products /www/example/products
227 If you add a new large products, but don't want to reindex the whole of
228 the products section, you could do::
230 $ omindex -p --db /var/lib/omega/data/default --url /products /www/example/products large
232 and just the large products will be reindexed. You need to do it like that, and
235 $ omindex -p --db /var/lib/omega/data/default --url /products/large /www/example/products/large
237 because that would make the large products part of a new site,
238 '/products/large', which is unlikely to be what you want, as large
239 products would no longer come up in a search of the products
240 site. (Note that the ``--depth-limit`` option may come in handy if you have
241 sites '/products' and '/products/large', or similar.)
243 omindex has built-in support for indexing HTML, PHP, text files, CSV
244 (Comma-Separated Values) files, SVG, Atom feeds, and AbiWord documents. It can
245 also index a number of other formats using external programs. Filter programs
246 are run with CPU, time and memory limits to prevent a runaway filter from
247 blocking indexing of other files.
249 The way omindex decides how to index a file is based around MIME content-types.
250 First of all omindex will look up a file's extension in its extension to MIME
251 type map. If there's no entry, it will then ask libmagic to examine the
252 contents of the file and try to determine a MIME type.
254 The following formats are supported as standard (you can tell omindex to use
255 other filters too - see below):
257 * HTML (.html, .htm, .shtml, .shtm, .xhtml, .xhtm)
258 * PHP (.php) - our HTML parser knows to ignore PHP code
259 * text files (.txt, .text)
261 * CSV (Comma-Separated Values) files (.csv)
262 * PDF (.pdf) if pdftotext is available (comes with poppler or xpdf)
263 * PostScript (.ps, .eps, .ai) if ps2pdf (from ghostscript) and pdftotext (comes
264 with poppler or xpdf) are available
265 * OpenOffice/StarOffice documents (.sxc, .stc, .sxd, .std, .sxi, .sti, .sxm,
266 .sxw, .sxg, .stw) if unzip is available
267 * OpenDocument format documents (.odt, .ods, .odp, .odg, .odc, .odf, .odb,
268 .odi, .odm, .ott, .ots, .otp, .otg, .otc, .otf, .oti, .oth) if unzip is
270 * MS Word documents (.dot) if antiword is available (.doc files are left to
271 libmagic, as they may actually be RTF (AbiWord saves RTF when asked to save
272 as .doc, and Microsoft Word quietly loads RTF files with a .doc extension),
274 * MS Excel documents (.xls, .xlb, .xlt, .xlr, .xla) if xls2csv is available
276 * MS Powerpoint documents (.ppt, .pps) if catppt is available (comes with
278 * MS Office 2007 documents (.docx, .docm, .dotx, .dotm, .xlsx, .xlsm, .xltx,
279 .xltm, .pptx, .pptm, .potx, .potm, .ppsx, .ppsm) if unzip is available
280 * Wordperfect documents (.wpd) if wpd2text is available (comes with libwpd)
281 * MS Works documents (.wps, .wpt) if wps2text is available (comes with libwps)
282 * MS Outlook message (.msg) if perl with Email::Outlook::Message and
283 HTML::Parser modules is available
284 * MS Publisher documents (.pub) if pub2xhtml is available (comes with libmspub)
285 * MS Visio documents (.vsd, .vss, .vst, .vsw, .vsdx, .vssx, .vstx, .vsdm,
286 .vssm, .vstm) if vsd2xhtml is available (comes with libvisio)
287 * Apple Keynote documents (.key, .kth, .apxl) if key2text is available (comes
289 * Apple Numbers documents (.numbers) if numbers2text is available (comes with
291 * Apple Pages documents (.pages) if pages2text is available (comes with
293 * AbiWord documents (.abw, .awt)
294 * Compressed AbiWord documents (.zabw)
295 * Rich Text Format documents (.rtf) if unrtf is available
296 * Perl POD documentation (.pl, .pm, .pod) if pod2text is available
297 * reStructured text (.rst, .rest) if rst2html is available (comes with
299 * Markdown (.md, .markdown) if markdown is available
300 * TeX DVI files (.dvi) if catdvi is available
301 * DjVu files (.djv, .djvu) if djvutxt is available
302 * OpenXPS and XPS files (.oxps, .xps) if unzip is available
303 * Debian packages (.deb, .udeb) if dpkg-deb is available
304 * RPM packages (.rpm) if rpm is available
306 * MAFF (.maff) if unzip is available
307 * MHTML (.mhtml, .mht) if perl with MIME::Tools is available
308 * MIME email messages (.eml) and USENET articles if perl with MIME::Tools and
309 HTML::Parser is available
310 * vCard files (.vcf, .vcard) if perl with Text::vCard is available
312 If you have additional extensions that represent one of these types, you can
313 add an additional MIME mapping using the ``--mime-type`` option. For
314 instance, if your press releases are PostScript files with extension
315 ``.posts`` you can tell omindex this like so::
317 $ omindex --db /var/lib/omega/data/default --url /press /www/example/press --mime-type posts:application/postscript
319 The syntax of ``--mime-type`` is 'ext:type', where ext is the extension of
320 a file of that type (everything after the last '.'). The ``type`` can be any
321 string, but to be useful there either needs to be a filter set for that type
322 - either using ``--filter`` or ``--read-filters``, or by ``type`` being
323 understood by default:
325 .. include:: inc/mimetypes.rst
327 You can specify ``*`` as the MIME sub-type for ``--filter``, for example if you
328 have a filter you want to apply to any video files, you could specify it using
329 ``--filter 'video/*:index-video-file'``. Note that this is checked right after
330 checking for the exact MIME type, so will override any built-in filters which
331 would otherwise match. Also you can't use arbitrary wildcards, just ``*`` for
332 the entire sub-type. And be careful to quote ``*`` to protect it from the
333 shell. Support for this was added in 1.3.3.
335 If there's no specific filter, and no subtype wildcard, then ``*/*`` is checked
336 (assuming the mimetype contains a ``/``), and after that ``*`` (for any
337 mimetype string). Combined with filter command ``true`` for indexing by
338 meta-data only, you can specify a fall back case of indexing by meta-data
339 only using ``--filter '*:true'``. Support for this was added in 1.3.4.
341 There are also two special values that can be specified instead of a MIME
344 * ignore - tells omindex to quietly ignore such files
345 * skip - tells omindex to skip such files
347 By default no extensions are marked as "skip", and the following extensions are
350 .. include:: inc/ignored.rst
352 If you wish to remove a MIME mapping, you can do this by omitting the type -
353 for example if you have ``.dot`` files which are inputs for the graphviz
354 tool ``dot``, then you may wish to remove the default mapping for ``.dot``
355 files and let libmagic be used to determine their type, which you can do
356 using: ``--mime-type=dot:`` (if you want to *ignore* all ``.dot`` files,
357 instead use ``--mime-type=dot:ignore``).
359 The lookup of extensions in the MIME mappings is case sensitive, but if an
360 extension isn't found and includes upper case ASCII letters, they're converted
361 to lower case and the lookup is repeated, so you effectively get case
362 insensitive lookup for mappings specified with a lower-case extension, but
363 you can set different handling for differently cased variants if you need
366 You can add support for additional MIME content types (or override existing
367 ones) using the ``--filter`` and/or ``--read-filters`` options to specify a
368 command to run. At present, this command needs to produce output in either
369 HTML, SVG, or plain text format (as of 1.3.3, you can specify the character
370 encoding that the output will be in; in earlier versions, plain text output had
371 to be UTF-8). Support for SVG output from external commands was added in
374 As of 1.3.3, the command can include certain placeholders which are substituted
377 * Any ``%f`` in this command will be replaced with the filename of the file to
378 extract (suitably escaped to protect it from the shell, so don't put quotes
381 If you don't include ``%f`` in the command, then the filename of the file to
382 be extracted will be appended to the command, separated by a space.
384 * Any ``%t`` in this command will be replaced with a filename in a temporary
385 directory (suitably escaped to protect it from the shell, so don't put
386 quotes around ``%t``). The extension of this filename will reflect the
387 expected output format (either ``.html``, ``.svg`` or ``.txt``). If you
388 don't use ``%t`` in the command, then omindex will expect output on
389 ``stdout`` (prior to 1.3.3, output had to be on ``stdout``).
391 * ``%%`` can be used should you need a literal ``%`` in the command.
393 For example, if you'd prefer to use Abiword to extract text from word documents
394 (by default, omindex uses antiword), then you can pass the option
395 ``--filter=application/msword:'abiword --to=txt --to-name=fd://1'`` to
398 Another example - if you wanted to handle files of MIME type
399 ``application/octet-stream`` by running them through ``strings -n8``, you can
400 pass the option ``--filter=application/octet-stream:'strings -n8'``.
402 A more complex example: to process ``.foo`` files with the (fictional)
403 ``foo2utf16`` utility which produces UTF-16 text but doesn't support writing
404 output to stdout, run omindex with ``-Mfoo:text/x-foo
405 -Ftext/x-foo,,utf-16:'foo2utf16 %f %t'``.
407 A less contrived example of the use of ``--filter`` makes use of LibreOffice,
408 via the unoconv script, to extract text from various formats. First you
409 need to start a listening instance (if you don't, unoconv will start up
410 LibreOffice for every file, which is rather inefficient) - the ``&`` tells
411 the shell to run it in the background::
415 Then run omindex with options such as
416 ``--filter=application/msword,html:'unoconv --stdout -f html'`` (you'll want
417 to repeat this for each format which you want to use LibreOffice on).
419 If you specify ``false`` as the command in ``--filter``, omindex will skip
420 files with the specified MIME type. (As of 1.2.20 and 1.3.3 ``false`` is
421 explicitly checked for; in earlier versions this will also work, at least
422 on Unix where ``false`` is a command which ignores its arguments and exits with
425 If you specify ``true`` as the command in ``--filter``, omindex won't try
426 to extract text from the file, but will index it such that it can be searched
427 for via metadata which comes from the filing system (filename, extension, mime
428 content-type, last modified time, size). (As of 1.2.22 and 1.3.4 ``true`` is
429 explicitly checked for; in earlier versions this will also work, at least
430 on Unix where ``true`` is a command which ignores its arguments and exits with
433 If you know of a reliable filter which can extract text from a file format
434 which might be of interest to others, please let us know so we can consider
435 including it as a standard filter.
437 The ``--duplicates`` option controls how omindex handles documents which map
438 to a URL which is already in the database. The default (which can be
439 explicitly set with ``--duplicates=replace``) is to reindex if the last
440 modified time of the file is newer than that recorded in the database.
441 The alternative is ``--duplicates=ignore``, which will never reindex an
442 existing document. If you only add documents, this avoids the overhead
443 of checking the last modified time. It also allows you to prioritise
444 adding completely new documents to the database over updating existing ones.
446 By default, omindex will remove any document in the database which has a URL
447 that doesn't correspond to a file seen on disk - in other words, it will clear
448 out everything that doesn't exist any more. However if you are building up
449 an omega database with several runs of omindex, this is not
450 appropriate (as each run would delete the data from the previous run),
451 so you should use the ``--no-delete`` option. Note that if you
452 choose to work like this, it is impossible to prune old documents from
453 the database using omindex. If this is a problem for you, an
454 alternative is to index each subsite into a different database, and
455 merge all the databases together when searching.
457 ``--depth-limit`` allows you to prevent omindex from descending more than
458 a certain number of directories. Specifying ``--depth-limit=0`` means no limit
459 is imposed on recursion; ``--depth-limit=1`` means don't descend into any
460 subdirectories of the start directory.
462 Tracking files which couldn't be indexed
463 ----------------------------------------
465 In older versions, omindex only tracked files which it successfully indexed -
466 if a file couldn't be read, or a filter program failed on it, or it was marked
467 not to be indexed (e.g. with an HTML meta tag) then it would be retried on
468 subsequent runs. Starting from version 1.3.4, omindex now tracks failed
469 files in the user metadata of the database, along with their sizes and last
470 modified times, and uses this data to skip files which previously failed and
471 haven't changed since.
473 You can force omindex to retry such files using the ``--retry-failed`` option.
474 One situation in which this is useful is if you've upgraded a filter program
475 to a newer version which you suspect will index some files which previously
478 Currently there's no mechanism for automatically removing failure entries
479 when the file they refer to is removed or renamed. These lingering entries are
480 harmless, except they bloat the database a little. A simple way to clear them
481 out is to run periodically with ``--retry-failed`` as this removes any existing
482 failure entries before indexing starts.
487 The document ``<title>`` tag is used as the document title. Metadata in various
488 ``<meta>`` tags is also understood - these values of the ``name`` parameter are
489 currently handled when found:
491 * ``author``, ``dcterms.creator``, ``dcterms.contributor``: author(s)
492 * ``created``, ``dcterms.issued``: document creation date
493 * ``classification``: document topic
494 * ``keywords``, ``dcterms.subject``, ``dcterms.description``: indexed as extra
495 document text (but not stored in the sample)
496 * ``description``: by default, handled as ``keywords``, as of Omega 1.4.4.
497 If ``omindex`` is run with ``--sample=description``, then this is used as
498 the preferred source for the stored sample of document text (HTML documents
499 with no ``description`` fall back to a sample from the body; if
500 ``description`` occurs multiple times then second and subsequent are handled
501 as ``keywords``). In Omega 1.4.2 and earlier, ``--sample`` wasn't supported
502 and the behaviour was as if ``--sample=description`` had been specified. In
503 Omega 1.4.3, ``--sample`` was added, but the default was
504 ``--sample=description`` (contrary to the intended and documented behaviour)
505 - you can use ``--sample=body`` with 1.4.3 and later to store a sample from
508 The HTML parser will look for the 'robots' META tag, and won't index pages
509 which are marked as ``noindex`` or ``none``, for example any of the following::
511 <meta name="robots" content="noindex,nofollow">
512 <meta name="robots" content="noindex">
513 <meta name="robots" content="none">
515 The ``omindex`` option ``--ignore-exclusions`` disables this behaviour, so
516 the files with the above will be indexed anyway.
518 Sometimes it is useful to be able to exclude just part of a page from being
519 indexed (for example you may not want to index navigation links, or a footer
520 which appears on every page). To allow this, the parser supports "magic"
521 comments to mark sections of the document to not index. Two formats are
522 supported - htdig_noindex (used by ht://Dig) and UdmComment (used by
525 Index this bit <!--htdig_noindex-->but <b>not</b> this<!--/htdig_noindex-->
529 <!--UdmComment--><div>Boring copyright notice</div><!--/UdmComment-->
534 omindex will create the following boolean terms when it indexes a
538 Extension of the file (e.g. `Epdf`) [since Omega 1.2.5]
543 The base URL, omitting any trailing slash (so if the base URL was just
544 `/`, the term is just `J`). If the resulting term would be > 240
545 bytes, it's hashed in the same way an `U` prefix terms are. Mnemonic: the
546 Jumping-off point. [since Omega 1.3.4]
548 hostname of site (if supplied - this term won't exist if you index a
549 site with base URL '/press', for instance). Since Omega 1.3.4, if the
550 resulting term would be > 240 bytes, it's hashed in the same way as `U`
553 path terms - one term for the directory which the document is in, and for
554 each parent directories, with no trailing slashes [since Omega 1.3.4 -
555 in earlier versions, there was just one `P` term for the path of site (i.e.
556 the rest of the site base URL) - this will be amongst the terms Omega 1.3.4
557 adds]. Since Omega 1.3.4, if the resulting term would be > 240 bytes, it's
558 hashed in the same way as `U` prefix terms are.
560 full URL of indexed document - if the resulting term would be > 240 bytes,
561 a hashing scheme is used to avoid overflowing Xapian's term length limit.
564 date (numeric format: YYYYMMDD)
566 date can also have the magical form "latest" - a document indexed
567 by the term Dlatest matches any date-range without an end date.
568 You can index dynamic documents which are always up to date
569 with Dlatest and they'll match as expected. (If you use sort by date,
570 you'll probably also want to set the value containing the timestamp to
571 a "max" value so dynamic documents match a date in the far future).
573 month (numeric format: YYYYMM)
580 Most of the omega CGI configuration is dynamic, by setting CGI
581 parameters. However some things must be configured using a
582 configuration file. The configuration file is searched for in
585 - Firstly, if the "OMEGA_CONFIG_FILE" environment variable is
586 set, its value is used as the full path to a configuration file
588 - Next (if the environment variable is not set, or the file pointed
589 to is not present), the file "omega.conf" in the same directory as
590 the Omega CGI is used.
591 - Next (if neither of the previous steps found a file), the file
592 "${sysconfdir}/omega.conf" (e.g. /etc/omega.conf on Linux systems)
594 - Finally, if no configuration file is found, default values are used.
596 The format of the file is very simple: a line per option, with the
597 option name followed by its value, separated by a whitespace. Blank
598 lines are ignored. If the first non-whitespace character on a line
599 is a '#', omega treats the line as a comment and ignores it.
601 The current options are:
603 - `database_dir`: the directory containing all the Omega databases
604 - `template_dir`: the directory containing the OmegaScript templates
605 - `log_dir`: the directory which the OmegaScript `$log` command writes log
607 - `cdb_dir`: the directory which the OmegaScript `$lookup` command
608 looks for CDB files in
610 The default values (used if no configuration file is found) are::
612 database_dir /var/lib/omega/data
613 template_dir /var/lib/omega/templates
614 log_dir /var/log/omega
615 cdb_dir /var/lib/omega/cdb
617 Note that, with apache, environment variables may be set using mod_env, and
618 with apache 1.3.7 or later this may be used inside a .htaccess file. This
619 makes it reasonably easy to share a single system installed copy of Omega
620 between multiple users.
625 The OmegaScript templates supplied with Omega are:
627 * query - This is the default template, providing a typical Web search
629 * topterms - This is just like query, but provides a "top terms" feature
630 which suggests terms the user might want to add to their query to
631 obtain better results.
632 * godmode - Allows you to inspect a database showing which terms index
633 each document, and which documents are indexed by each term.
634 * opensearch - Provides results in OpenSearch format (for more details
635 see http://www.opensearch.org/).
636 * xml - Provides results in a custom XML format.
637 * emptydocs - Shows a list of documents with zero length. If CGI parameter
638 TERM is set to a non-empty value, then only documents indexed by that given
639 term are shown (e.g. TERM=Tapplication/pdf to show PDF files with no text);
640 otherwise all zero length documents are shown.
642 There are also "helper fragments" used by the templates above:
644 * inc/anyalldropbox - Provides a choice of matching "any" or "all" terms
645 by default as a drop down box.
646 * inc/anyallradio - Provides a choice of matching "any" or "all" terms
647 by default as radio buttons.
648 * toptermsjs - Provides some JavaScript used by the topterms template.
650 Document data construction
651 ==========================
653 This is only useful if you need to inject your own documents into the
654 database independently of omindex, such as if you are indexing
655 dynamically-generated documents that are served using a server-side
656 system such as PHP or ASP, but which you can determine the contents of
657 in some way, such as documents generated from reasonably static
660 The document data field stores some summary information about the
661 document, in the following (sample) format::
668 Further fields may be added (although omindex doesn't currently add any
669 others), and may be looked up from OmegaScript using the $field{}
672 As of Omega 0.9.3, you can alternatively add something like this near the
673 start of your OmegaScript template::
675 $set{fieldnames,$split{caption sample url}}
677 Then you need only give the field values in the document data, which can
678 save a lot of space in a large database. With the setting of fieldnames
679 above, the first line of document data can be accessed with $field{caption},
680 the second with $field{sample}, and the third with $field{url}.
685 At search time, Omega uses a built-in list of stopwords, which are::
687 a about an and are as at be by en for from how i in is it of on or that the
688 this to was what when where which who why will with you your