documentation/manual/en/module_specs/Zend_Search_Lucene-Overview.xml

   1 <?xml version="1.0" encoding="UTF-8"?>
   2 <!-- Reviewed: no -->
   3 <sect1 id="zend.search.lucene.overview">
   4     <title>Overview</title>
   5
   6     <sect2 id="zend.search.lucene.introduction">
   7         <title>Introduction</title>
   8
   9         <para>
  10             <classname>Zend_Search_Lucene</classname> is a general purpose text search engine
  11             written entirely in <acronym>PHP</acronym> 5. Since it stores its index on the
  12             filesystem and does not require a database server, it can add search capabilities to
  13             almost any <acronym>PHP</acronym>-driven website.
  14             <classname>Zend_Search_Lucene</classname> supports the following features:
  15
  16             <itemizedlist>
  17                 <listitem>
  18                     <para>Ranked searching - best results returned first</para>
  19                 </listitem>
  20
  21                 <listitem>
  22                     <para>
  23                        Many powerful query types: phrase queries, boolean queries, wildcard queries,
  24                        proximity queries, range queries and many others.
  25                     </para>
  26                 </listitem>
  27
  28                 <listitem>
  29                     <para>Search by specific field (e.g., title, author, contents)</para>
  30                 </listitem>
  31             </itemizedlist>
  32
  33             <classname>Zend_Search_Lucene</classname> was derived from the Apache Lucene project.
  34             The currently (starting from ZF 1.6) supported Lucene index format versions are 1.4 -
  35             2.3. For more information on Lucene, visit <ulink
  36                 url="http://lucene.apache.org/java/docs/"/>.
  37         </para>
  38
  39         <note>
  40             <title/>
  41
  42             <para>
  43                 Previous <classname>Zend_Search_Lucene</classname> implementations support the
  44                 Lucene 1.4 (1.9) - 2.1 index formats.
  45             </para>
  46
  47             <para>
  48                 Starting from Zend Framework 1.5 any index created using pre-2.1 index format is
  49                 automatically upgraded to Lucene 2.1 format after the
  50                 <classname>Zend_Search_Lucene</classname> update and will not be compatible with
  51                 <classname>Zend_Search_Lucene</classname> implementations included into Zend
  52                 Framework 1.0.x.
  53             </para>
  54         </note>
  55     </sect2>
  56
  57     <sect2 id="zend.search.lucene.index-creation.documents-and-fields">
  58         <title>Document and Field Objects</title>
  59
  60         <para>
  61             <classname>Zend_Search_Lucene</classname> operates with documents as atomic objects for
  62             indexing. A document is divided into named fields, and fields have content that can be
  63             searched.
  64         </para>
  65
  66         <para>
  67             A document is represented by the <classname>Zend_Search_Lucene_Document</classname>
  68             class, and this objects of this class contain instances of
  69             <classname>Zend_Search_Lucene_Field</classname> that represent the fields on the
  70             document.
  71         </para>
  72
  73         <para>
  74             It is important to note that any information can be added to the index.
  75             Application-specific information or metadata can be stored in the document
  76             fields, and later retrieved with the document during search.
  77         </para>
  78
  79         <para>
  80             It is the responsibility of your application to control the indexer.
  81             This means that data can be indexed from any source
  82             that is accessible by your application. For example, this could be the
  83             filesystem, a database, an <acronym>HTML</acronym> form, etc.
  84         </para>
  85
  86         <para>
  87             <classname>Zend_Search_Lucene_Field</classname> class provides several static methods to
  88             create fields with different characteristics:
  89         </para>
  90
  91         <programlisting language="php"><![CDATA[
  92 $doc = new Zend_Search_Lucene_Document();
  93
  94 // Field is not tokenized, but is indexed and stored within the index.
  95 // Stored fields can be retrived from the index.
  96 $doc->addField(Zend_Search_Lucene_Field::Keyword('doctype',
  97                                                  'autogenerated'));
  98
  99 // Field is not tokenized nor indexed, but is stored in the index.
 100 $doc->addField(Zend_Search_Lucene_Field::UnIndexed('created',
 101                                                    time()));
 102
 103 // Binary String valued Field that is not tokenized nor indexed,
 104 // but is stored in the index.
 105 $doc->addField(Zend_Search_Lucene_Field::Binary('icon',
 106                                                 $iconData));
 107
 108 // Field is tokenized and indexed, and is stored in the index.
 109 $doc->addField(Zend_Search_Lucene_Field::Text('annotation',
 110                                               'Document annotation text'));
 111
 112 // Field is tokenized and indexed, but is not stored in the index.
 113 $doc->addField(Zend_Search_Lucene_Field::UnStored('contents',
 114                                                   'My document content'));
 115 ]]></programlisting>
 116
 117         <para>
 118             Each of these methods (excluding the
 119             <methodname>Zend_Search_Lucene_Field::Binary()</methodname> method) has an optional
 120             <varname>$encoding</varname> parameter for specifying input data encoding.
 121         </para>
 122
 123         <para>
 124             Encoding may differ for different documents as well as for different fields within one
 125             document:
 126         </para>
 127
 128         <programlisting language="php"><![CDATA[
 129 $doc = new Zend_Search_Lucene_Document();
 130 $doc->addField(Zend_Search_Lucene_Field::Text('title',
 131                                               $title,
 132                                               'iso-8859-1'));
 133 $doc->addField(Zend_Search_Lucene_Field::UnStored('contents',
 134                                                   $contents,
 135                                                   'utf-8'));
 136 ]]></programlisting>
 137
 138         <para>
 139             If encoding parameter is omitted, then the current locale is used at processing time.
 140             For example:
 141         </para>
 142
 143         <programlisting language="php"><![CDATA[
 144 setlocale(LC_ALL, 'de_DE.iso-8859-1');
 145 ...
 146 $doc->addField(Zend_Search_Lucene_Field::UnStored('contents', $contents));
 147 ]]></programlisting>
 148
 149         <para>
 150             Fields are always stored and returned from the index in UTF-8 encoding. Any required
 151             conversion to UTF-8 happens automatically.
 152         </para>
 153
 154         <para>
 155             Text analyzers (<link linkend="zend.search.lucene.extending.analysis">see below</link>)
 156             may also convert text to some other encodings. Actually, the default analyzer converts
 157             text to 'ASCII//TRANSLIT' encoding. Be careful, however; this translation may depend on
 158             current locale.
 159         </para>
 160
 161         <para>
 162             Fields' names are defined at your discretion in the <methodname>addField()</methodname>
 163             method.
 164         </para>
 165
 166         <para>
 167             Java Lucene uses the 'contents' field as a default field to search.
 168             <classname>Zend_Search_Lucene</classname> searches through all fields by default, but
 169             the behavior is configurable. See the <link
 170                 linkend="zend.search.lucene.query-language.fields">"Default search field"</link>
 171             chapter for details.
 172         </para>
 173     </sect2>
 174
 175     <sect2 id="zend.search.lucene.index-creation.understanding-field-types">
 176         <title>Understanding Field Types</title>
 177
 178         <itemizedlist>
 179             <listitem>
 180                 <para>
 181                     <code>Keyword</code> fields are stored and indexed, meaning that they can be
 182                     searched as well as displayed in search results. They are not split up into
 183                     separate words by tokenization. Enumerated database fields usually translate
 184                     well to Keyword fields in <classname>Zend_Search_Lucene</classname>.
 185                 </para>
 186             </listitem>
 187
 188             <listitem>
 189                 <para>
 190                     <code>UnIndexed</code> fields are not searchable, but they are returned with
 191                     search hits. Database timestamps, primary keys, file system paths, and other
 192                     external identifiers are good candidates for UnIndexed fields.
 193                 </para>
 194             </listitem>
 195
 196             <listitem>
 197                 <para>
 198                     <code>Binary</code> fields are not tokenized or indexed, but are stored for
 199                     retrieval with search hits. They can be used to store any data encoded as a
 200                     binary string, such as an image icon.
 201                 </para>
 202             </listitem>
 203
 204             <listitem>
 205                 <para>
 206                     <code>Text</code> fields are stored, indexed, and tokenized. Text fields are
 207                     appropriate for storing information like subjects and titles that need to be
 208                     searchable as well as returned with search results.
 209                 </para>
 210             </listitem>
 211
 212             <listitem>
 213                 <para>
 214                     <code>UnStored</code> fields are tokenized and indexed, but not stored in the
 215                     index. Large amounts of text are best indexed using this type of field. Storing
 216                     data creates a larger index on disk, so if you need to search but not redisplay
 217                     the data, use an UnStored field. UnStored fields are practical when using a
 218                     <classname>Zend_Search_Lucene</classname> index in combination with a relational
 219                     database. You can index large data fields with UnStored fields for searching,
 220                     and retrieve them from your relational database by using a separate field as an
 221                     identifier.
 222                </para>
 223
 224                 <table id="zend.search.lucene.index-creation.understanding-field-types.table">
 225                     <title>Zend_Search_Lucene_Field Types</title>
 226
 227                     <tgroup cols="5">
 228                         <thead>
 229                             <row>
 230                                 <entry>Field Type</entry>
 231                                 <entry>Stored</entry>
 232                                 <entry>Indexed</entry>
 233                                 <entry>Tokenized</entry>
 234                                 <entry>Binary</entry>
 235                             </row>
 236                         </thead>
 237
 238                         <tbody>
 239                             <row>
 240                                 <entry>Keyword</entry>
 241                                 <entry>Yes</entry>
 242                                 <entry>Yes</entry>
 243                                 <entry>No</entry>
 244                                 <entry>No</entry>
 245                             </row>
 246
 247                             <row>
 248                                 <entry>UnIndexed</entry>
 249                                 <entry>Yes</entry>
 250                                 <entry>No</entry>
 251                                 <entry>No</entry>
 252                                 <entry>No</entry>
 253                             </row>
 254
 255                             <row>
 256                                 <entry>Binary</entry>
 257                                 <entry>Yes</entry>
 258                                 <entry>No</entry>
 259                                 <entry>No</entry>
 260                                 <entry>Yes</entry>
 261                             </row>
 262
 263                             <row>
 264                                 <entry>Text</entry>
 265                                 <entry>Yes</entry>
 266                                 <entry>Yes</entry>
 267                                 <entry>Yes</entry>
 268                                 <entry>No</entry>
 269                             </row>
 270
 271                             <row>
 272                                 <entry>UnStored</entry>
 273                                 <entry>No</entry>
 274                                 <entry>Yes</entry>
 275                                 <entry>Yes</entry>
 276                                 <entry>No</entry>
 277                             </row>
 278                         </tbody>
 279                     </tgroup>
 280                 </table>
 281            </listitem>
 282        </itemizedlist>
 283     </sect2>
 284
 285     <sect2 id="zend.search.lucene.index-creation.html-documents">
 286         <title>HTML documents</title>
 287
 288         <para>
 289             <classname>Zend_Search_Lucene</classname> offers a <acronym>HTML</acronym> parsing
 290             feature. Documents can be created directly from a <acronym>HTML</acronym> file or
 291             string:
 292         </para>
 293
 294         <programlisting language="php"><![CDATA[
 295 $doc = Zend_Search_Lucene_Document_Html::loadHTMLFile($filename);
 296 $index->addDocument($doc);
 297 ...
 298 $doc = Zend_Search_Lucene_Document_Html::loadHTML($htmlString);
 299 $index->addDocument($doc);
 300 ]]></programlisting>
 301
 302         <para>
 303             <classname>Zend_Search_Lucene_Document_Html</classname> class uses the
 304             <methodname>DOMDocument::loadHTML()</methodname> and
 305             <methodname>DOMDocument::loadHTMLFile()</methodname> methods to parse the source
 306             <acronym>HTML</acronym>, so it doesn't need <acronym>HTML</acronym> to be well formed or
 307             to be <acronym>XHTML</acronym>. On the other hand, it's sensitive to the encoding
 308             specified by the "meta http-equiv" header tag.
 309         </para>
 310
 311         <para>
 312             <classname>Zend_Search_Lucene_Document_Html</classname> class recognizes document title,
 313             body and document header meta tags.
 314         </para>
 315
 316         <para>
 317             The 'title' field is actually the /html/head/title value. It's stored within the index,
 318             tokenized and available for search.
 319         </para>
 320
 321         <para>
 322             The 'body' field is the actual body content of the <acronym>HTML</acronym> file or
 323             string. It doesn't include scripts, comments or attributes.
 324         </para>
 325
 326         <para>
 327             The <methodname>loadHTML()</methodname> and <methodname>loadHTMLFile()</methodname>
 328             methods of <classname>Zend_Search_Lucene_Document_Html</classname> class also have
 329             second optional argument. If it's set to <constant>TRUE</constant>, then body content is
 330             also stored within index and can be retrieved from the index. By default, the body is
 331             tokenized and indexed, but not stored.
 332         </para>
 333
 334         <para>
 335             The third parameter of <methodname>loadHTML()</methodname> and
 336             <methodname>loadHTMLFile()</methodname> methods optionally specifies source
 337             <acronym>HTML</acronym> document encoding. It's used if encoding is not specified using
 338             Content-type HTTP-EQUIV meta tag.
 339         </para>
 340
 341         <para>
 342             Other document header meta tags produce additional document fields. The field 'name' is
 343             taken from 'name' attribute, and the 'content' attribute populates the field 'value'.
 344             Both are tokenized, indexed and stored, so documents may be searched by their meta tags
 345             (for example, by keywords).
 346         </para>
 347
 348         <para>
 349             Parsed documents may be augmented by the programmer with any other field:
 350         </para>
 351
 352         <programlisting language="php"><![CDATA[
 353 $doc = Zend_Search_Lucene_Document_Html::loadHTML($htmlString);
 354 $doc->addField(Zend_Search_Lucene_Field::UnIndexed('created',
 355                                                    time()));
 356 $doc->addField(Zend_Search_Lucene_Field::UnIndexed('updated',
 357                                                    time()));
 358 $doc->addField(Zend_Search_Lucene_Field::Text('annotation',
 359                                               'Document annotation text'));
 360 $index->addDocument($doc);
 361 ]]></programlisting>
 362
 363         <para>
 364             Document links are not included in the generated document, but may be retrieved with
 365             the <methodname>Zend_Search_Lucene_Document_Html::getLinks()</methodname> and
 366             <methodname>Zend_Search_Lucene_Document_Html::getHeaderLinks()</methodname> methods:
 367         </para>
 368
 369         <programlisting language="php"><![CDATA[
 370 $doc = Zend_Search_Lucene_Document_Html::loadHTML($htmlString);
 371 $linksArray = $doc->getLinks();
 372 $headerLinksArray = $doc->getHeaderLinks();
 373 ]]></programlisting>
 374
 375         <para>
 376             Starting from Zend Framework 1.6 it's also possible to exclude links with
 377             <code>rel</code> attribute set to <code>'nofollow'</code>. Use
 378             <methodname>Zend_Search_Lucene_Document_Html::setExcludeNoFollowLinks($true)</methodname>
 379             to turn on this option.
 380         </para>
 381
 382         <para>
 383             <methodname>Zend_Search_Lucene_Document_Html::getExcludeNoFollowLinks()</methodname>
 384             method returns current state of "Exclude nofollow links" flag.
 385         </para>
 386     </sect2>
 387
 388     <sect2 id="zend.search.lucene.index-creation.docx-documents">
 389         <title>Word 2007 documents</title>
 390
 391         <para>
 392             <classname>Zend_Search_Lucene</classname> offers a Word 2007 parsing feature. Documents
 393             can be created directly from a Word 2007 file:
 394         </para>
 395
 396         <programlisting language="php"><![CDATA[
 397 $doc = Zend_Search_Lucene_Document_Docx::loadDocxFile($filename);
 398 $index->addDocument($doc);
 399 ]]></programlisting>
 400
 401         <para>
 402             <classname>Zend_Search_Lucene_Document_Docx</classname> class uses the
 403             <code>ZipArchive</code> class and <code>simplexml</code> methods to parse the source
 404             document. If the <code>ZipArchive</code> class (from module php_zip) is not available,
 405             the <classname>Zend_Search_Lucene_Document_Docx</classname> will also not be available
 406             for use with Zend Framework.
 407         </para>
 408
 409         <para>
 410             <classname>Zend_Search_Lucene_Document_Docx</classname> class recognizes document meta
 411             data and document text. Meta data consists, depending on document contents, of filename,
 412             title, subject, creator, keywords, description, lastModifiedBy, revision, modified,
 413             created.
 414         </para>
 415
 416         <para>
 417             The 'filename' field is the actual Word 2007 file name.
 418         </para>
 419
 420         <para>
 421             The 'title' field is the actual document title.
 422         </para>
 423
 424         <para>
 425             The 'subject' field is the actual document subject.
 426         </para>
 427
 428         <para>
 429             The 'creator' field is the actual document creator.
 430         </para>
 431
 432         <para>
 433             The 'keywords' field contains the actual document keywords.
 434         </para>
 435
 436         <para>
 437             The 'description' field is the actual document description.
 438         </para>
 439
 440         <para>
 441             The 'lastModifiedBy' field is the username who has last modified the actual document.
 442         </para>
 443
 444         <para>
 445             The 'revision' field is the actual document revision number.
 446         </para>
 447
 448         <para>
 449             The 'modified' field is the actual document last modified date / time.
 450         </para>
 451
 452         <para>
 453             The 'created' field is the actual document creation date / time.
 454         </para>
 455
 456         <para>
 457             The 'body' field is the actual body content of the Word 2007 document. It only includes
 458             normal text, comments and revisions are not included.
 459         </para>
 460
 461         <para>
 462             The <methodname>loadDocxFile()</methodname> methods of
 463             <classname>Zend_Search_Lucene_Document_Docx</classname> class also have second optional
 464             argument. If it's set to <constant>TRUE</constant>, then body content is also stored
 465             within index and can be retrieved from the index. By default, the body is tokenized and
 466             indexed, but not stored.
 467         </para>
 468
 469         <para>
 470             Parsed documents may be augmented by the programmer with any other field:
 471         </para>
 472
 473         <programlisting language="php"><![CDATA[
 474 $doc = Zend_Search_Lucene_Document_Docx::loadDocxFile($filename);
 475 $doc->addField(Zend_Search_Lucene_Field::UnIndexed(
 476     'indexTime',
 477     time())
 478 );
 479 $doc->addField(Zend_Search_Lucene_Field::Text(
 480     'annotation',
 481     'Document annotation text')
 482 );
 483 $index->addDocument($doc);
 484 ]]></programlisting>
 485
 486     </sect2>
 487
 488     <sect2 id="zend.search.lucene.index-creation.pptx-documents">
 489         <title>Powerpoint 2007 documents</title>
 490
 491         <para>
 492             <classname>Zend_Search_Lucene</classname> offers a Powerpoint 2007 parsing feature.
 493             Documents can be created directly from a Powerpoint 2007 file:
 494         </para>
 495
 496         <programlisting language="php"><![CDATA[
 497 $doc = Zend_Search_Lucene_Document_Pptx::loadPptxFile($filename);
 498 $index->addDocument($doc);
 499 ]]></programlisting>
 500
 501         <para>
 502             <classname>Zend_Search_Lucene_Document_Pptx</classname> class uses the
 503             <code>ZipArchive</code> class and <code>simplexml</code> methods to parse the source
 504             document. If the <code>ZipArchive</code> class (from module php_zip) is not available,
 505             the <classname>Zend_Search_Lucene_Document_Pptx</classname> will also not be available
 506             for use with Zend Framework.
 507         </para>
 508
 509         <para>
 510             <classname>Zend_Search_Lucene_Document_Pptx</classname> class recognizes document meta
 511             data and document text. Meta data consists, depending on document contents, of filename,
 512             title, subject, creator, keywords, description, lastModifiedBy, revision, modified,
 513             created.
 514         </para>
 515
 516         <para>
 517             The 'filename' field is the actual Powerpoint 2007 file name.
 518         </para>
 519
 520         <para>
 521             The 'title' field is the actual document title.
 522         </para>
 523
 524         <para>
 525             The 'subject' field is the actual document subject.
 526         </para>
 527
 528         <para>
 529             The 'creator' field is the actual document creator.
 530         </para>
 531
 532         <para>
 533             The 'keywords' field contains the actual document keywords.
 534         </para>
 535
 536         <para>
 537             The 'description' field is the actual document description.
 538         </para>
 539
 540         <para>
 541             The 'lastModifiedBy' field is the username who has last modified the actual document.
 542         </para>
 543
 544         <para>
 545             The 'revision' field is the actual document revision number.
 546         </para>
 547
 548         <para>
 549             The 'modified' field is the actual document last modified date / time.
 550         </para>
 551
 552         <para>
 553             The 'created' field is the actual document creation date / time.
 554         </para>
 555
 556         <para>
 557             The 'body' field is the actual content of all slides and slide notes in the Powerpoint
 558             2007 document.
 559         </para>
 560
 561         <para>
 562             The <methodname>loadPptxFile()</methodname> methods of
 563             <classname>Zend_Search_Lucene_Document_Pptx</classname> class also have second optional
 564             argument. If it's set to <constant>TRUE</constant>, then body content is also stored
 565             within index and can be retrieved from the index. By default, the body is tokenized and
 566             indexed, but not stored.
 567         </para>
 568
 569         <para>
 570             Parsed documents may be augmented by the programmer with any other field:
 571         </para>
 572
 573         <programlisting language="php"><![CDATA[
 574 $doc = Zend_Search_Lucene_Document_Pptx::loadPptxFile($filename);
 575 $doc->addField(Zend_Search_Lucene_Field::UnIndexed(
 576     'indexTime',
 577     time()));
 578 $doc->addField(Zend_Search_Lucene_Field::Text(
 579     'annotation',
 580     'Document annotation text'));
 581 $index->addDocument($doc);
 582 ]]></programlisting>
 583     </sect2>
 584
 585     <sect2 id="zend.search.lucene.index-creation.xlsx-documents">
 586         <title>Excel 2007 documents</title>
 587         <para>
 588             <classname>Zend_Search_Lucene</classname> offers a Excel 2007 parsing feature. Documents
 589             can be created directly from a Excel 2007 file:
 590         </para>
 591
 592         <programlisting language="php"><![CDATA[
 593 $doc = Zend_Search_Lucene_Document_Xlsx::loadXlsxFile($filename);
 594 $index->addDocument($doc);
 595 ]]></programlisting>
 596
 597         <para>
 598             <classname>Zend_Search_Lucene_Document_Xlsx</classname> class uses the
 599             <code>ZipArchive</code> class and <code>simplexml</code> methods to parse the source
 600             document. If the <code>ZipArchive</code> class (from module php_zip) is not available,
 601             the <classname>Zend_Search_Lucene_Document_Xlsx</classname> will also not be available
 602             for use with Zend Framework.
 603         </para>
 604
 605         <para>
 606             <classname>Zend_Search_Lucene_Document_Xlsx</classname> class recognizes document meta
 607             data and document text. Meta data consists, depending on document contents, of filename,
 608             title, subject, creator, keywords, description, lastModifiedBy, revision, modified,
 609             created.
 610         </para>
 611
 612         <para>
 613             The 'filename' field is the actual Excel 2007 file name.
 614         </para>
 615
 616         <para>
 617             The 'title' field is the actual document title.
 618         </para>
 619
 620         <para>
 621             The 'subject' field is the actual document subject.
 622         </para>
 623
 624         <para>
 625             The 'creator' field is the actual document creator.
 626         </para>
 627
 628         <para>
 629             The 'keywords' field contains the actual document keywords.
 630         </para>
 631
 632         <para>
 633             The 'description' field is the actual document description.
 634         </para>
 635
 636         <para>
 637             The 'lastModifiedBy' field is the username who has last modified the actual document.
 638         </para>
 639
 640         <para>
 641             The 'revision' field is the actual document revision number.
 642         </para>
 643
 644         <para>
 645             The 'modified' field is the actual document last modified date / time.
 646         </para>
 647
 648         <para>
 649             The 'created' field is the actual document creation date / time.
 650         </para>
 651
 652         <para>
 653             The 'body' field is the actual content of all cells in all worksheets of the Excel 2007
 654             document.
 655         </para>
 656
 657         <para>
 658             The <methodname>loadXlsxFile()</methodname> methods of
 659             <classname>Zend_Search_Lucene_Document_Xlsx</classname> class also have second optional
 660             argument. If it's set to <constant>TRUE</constant>, then body content is also stored
 661             within index and can be retrieved from the index. By default, the body is tokenized and
 662             indexed, but not stored.
 663         </para>
 664
 665         <para>
 666             Parsed documents may be augmented by the programmer with any other field:
 667         </para>
 668
 669         <programlisting language="php"><![CDATA[
 670 $doc = Zend_Search_Lucene_Document_Xlsx::loadXlsxFile($filename);
 671 $doc->addField(Zend_Search_Lucene_Field::UnIndexed(
 672     'indexTime',
 673     time()));
 674 $doc->addField(Zend_Search_Lucene_Field::Text(
 675     'annotation',
 676     'Document annotation text'));
 677 $index->addDocument($doc);
 678 ]]></programlisting>
 679     </sect2>
 680 </sect1>