libstdc++-v3/doc/xml/manual/policy_data_structures.xml

   1 <chapter xmlns="http://docbook.org/ns/docbook" version="5.0"
   2          xml:id="manual.ext.containers.pbds" xreflabel="pbds">
   3   <info>
   4     <title>Policy-Based Data Structures</title>
   5     <keywordset>
   6       <keyword>ISO C++</keyword>
   7       <keyword>policy</keyword>
   8       <keyword>container</keyword>
   9       <keyword>data</keyword>
  10       <keyword>structure</keyword>
  11       <keyword>associated</keyword>
  12       <keyword>tree</keyword>
  13       <keyword>trie</keyword>
  14       <keyword>hash</keyword>
  15       <keyword>metaprogramming</keyword>
  16     </keywordset>
  17   </info>
  18   <?dbhtml filename="policy_data_structures.html"?>
  19
  20   <!-- 2006-04-01 Ami Tavory -->
  21   <!-- 2011-05-25 Benjamin Kosnik -->
  22
  23   <!-- S01: intro -->
  24   <section xml:id="pbds.intro">
  25     <info><title>Intro</title></info>
  26
  27     <para>
  28       This is a library of policy-based elementary data structures:
  29       associative containers and priority queues. It is designed for
  30       high-performance, flexibility, semantic safety, and conformance to
  31       the corresponding containers in <literal>std</literal> and
  32       <literal>std::tr1</literal> (except for some points where it differs
  33       by design).
  34     </para>
  35     <para>
  36     </para>
  37
  38     <section xml:id="pbds.intro.issues">
  39       <info><title>Performance Issues</title></info>
  40       <para>
  41       </para>
  42
  43       <para>
  44         An attempt is made to categorize the wide variety of possible
  45         container designs in terms of performance-impacting factors. These
  46         performance factors are translated into design policies and
  47         incorporated into container design.
  48       </para>
  49
  50       <para>
  51         There is tension between unravelling factors into a coherent set of
  52         policies. Every attempt is made to make a minimal set of
  53         factors. However, in many cases multiple factors make for long
  54         template names. Every attempt is made to alias and use typedefs in
  55         the source files, but the generated names for external symbols can
  56         be large for binary files or debuggers.
  57       </para>
  58
  59       <para>
  60         In many cases, the longer names allow capabilities and behaviours
  61         controlled by macros to also be unamibiguously emitted as distinct
  62         generated names.
  63       </para>
  64
  65       <para>
  66         Specific issues found while unraveling performance factors in the
  67         design of associative containers and priority queues follow.
  68       </para>
  69
  70       <section xml:id="pbds.intro.issues.associative">
  71         <info><title>Associative</title></info>
  72
  73         <para>
  74           Associative containers depend on their composite policies to a very
  75           large extent. Implicitly hard-wiring policies can hamper their
  76           performance and limit their functionality. An efficient hash-based
  77           container, for example, requires policies for testing key
  78           equivalence, hashing keys, translating hash values into positions
  79           within the hash table, and determining when and how to resize the
  80           table internally. A tree-based container can efficiently support
  81           order statistics, i.e. the ability to query what is the order of
  82           each key within the sequence of keys in the container, but only if
  83           the container is supplied with a policy to internally update
  84           meta-data. There are many other such examples.
  85         </para>
  86
  87         <para>
  88           Ideally, all associative containers would share the same
  89           interface. Unfortunately, underlying data structures and mapping
  90           semantics differentiate between different containers. For example,
  91           suppose one writes a generic function manipulating an associative
  92           container.
  93         </para>
  94
  95         <programlisting>
  96           template&lt;typename Cntnr&gt;
  97           void
  98           some_op_sequence(Cntnr&amp; r_cnt)
  99           {
 100           ...
 101           }
 102         </programlisting>
 103
 104         <para>
 105           Given this, then what can one assume about the instantiating
 106           container? The answer varies according to its underlying data
 107           structure. If the underlying data structure of
 108           <literal>Cntnr</literal> is based on a tree or trie, then the order
 109           of elements is well defined; otherwise, it is not, in general. If
 110           the underlying data structure of <literal>Cntnr</literal> is based
 111           on a collision-chaining hash table, then modifying
 112           r_<literal>Cntnr</literal> will not invalidate its iterators' order;
 113           if the underlying data structure is a probing hash table, then this
 114           is not the case. If the underlying data structure is based on a tree
 115           or trie, then a reference to the container can efficiently be split;
 116           otherwise, it cannot, in general. If the underlying data structure
 117           is a red-black tree, then splitting a reference to the container is
 118           exception-free; if it is an ordered-vector tree, exceptions can be
 119           thrown.
 120         </para>
 121
 122       </section>
 123
 124       <section xml:id="pbds.intro.issues.priority_queue">
 125         <info><title>Priority Que</title></info>
 126
 127         <para>
 128           Priority queues are useful when one needs to efficiently access a
 129           minimum (or maximum) value as the set of values changes.
 130         </para>
 131
 132         <para>
 133           Most useful data structures for priority queues have a relatively
 134           simple structure, as they are geared toward relatively simple
 135           requirements. Unfortunately, these structures do not support access
 136           to an arbitrary value, which turns out to be necessary in many
 137           algorithms. Say, decreasing an arbitrary value in a graph
 138           algorithm. Therefore, some extra mechanism is necessary and must be
 139           invented for accessing arbitrary values. There are at least two
 140           alternatives: embedding an associative container in a priority
 141           queue, or allowing cross-referencing through iterators. The first
 142           solution adds significant overhead; the second solution requires a
 143           precise definition of iterator invalidation. Which is the next
 144           point...
 145         </para>
 146
 147         <para>
 148           Priority queues, like hash-based containers, store values in an
 149           order that is meaningless and undefined externally. For example, a
 150           <code>push</code> operation can internally reorganize the
 151           values. Because of this characteristic, describing a priority
 152           queues' iterator is difficult: on one hand, the values to which
 153           iterators point can remain valid, but on the other, the logical
 154           order of iterators can change unpredictably.
 155         </para>
 156
 157         <para>
 158           Roughly speaking, any element that is both inserted to a priority
 159           queue (e.g. through <code>push</code>) and removed
 160           from it (e.g., through <code>pop</code>), incurs a
 161           logarithmic overhead (in the amortized sense). Different underlying
 162           data structures place the actual cost differently: some are
 163           optimized for amortized complexity, whereas others guarantee that
 164           specific operations only have a constant cost. One underlying data
 165           structure might be chosen if modifying a value is frequent
 166           (Dijkstra's shortest-path algorithm), whereas a different one might
 167           be chosen otherwise. Unfortunately, an array-based binary heap - an
 168           underlying data structure that optimizes (in the amortized sense)
 169           <code>push</code> and <code>pop</code> operations, differs from the
 170           others in terms of its invalidation guarantees. Other design
 171           decisions also impact the cost and placement of the overhead, at the
 172           expense of more difference in the kinds of operations that the
 173           underlying data structure can support. These differences pose a
 174           challenge when creating a uniform interface for priority queues.
 175         </para>
 176       </section>
 177     </section>
 178
 179     <section xml:id="pbds.intro.motivation">
 180       <info><title>Goals</title></info>
 181
 182       <para>
 183         Many fine associative-container libraries were already written,
 184         most notably, the C++ standard's associative containers. Why
 185         then write another library? This section shows some possible
 186         advantages of this library, when considering the challenges in
 187         the introduction. Many of these points stem from the fact that
 188         the ISO C++ process introduced associative-containers in a
 189         two-step process (first standardizing tree-based containers,
 190         only then adding hash-based containers, which are fundamentally
 191         different), did not standardize priority queues as containers,
 192         and (in our opinion) overloads the iterator concept.
 193       </para>
 194
 195       <section xml:id="pbds.intro.motivation.associative">
 196         <info><title>Associative</title></info>
 197         <para>
 198         </para>
 199
 200         <section xml:id="motivation.associative.policy">
 201           <info><title>Policy Choices</title></info>
 202           <para>
 203             Associative containers require a relatively large number of
 204             policies to function efficiently in various settings. In some
 205             cases this is needed for making their common operations more
 206             efficient, and in other cases this allows them to support a
 207             larger set of operations
 208           </para>
 209
 210           <orderedlist>
 211             <listitem>
 212               <para>
 213                 Hash-based containers, for example, support look-up and
 214                 insertion methods (<function>find</function> and
 215                 <function>insert</function>). In order to locate elements
 216                 quickly, they are supplied a hash functor, which instruct
 217                 how to transform a key object into some size type; a hash
 218                 functor might transform <constant>"hello"</constant>
 219                 into <constant>1123002298</constant>. A hash table, though,
 220                 requires transforming each key object into some size-type
 221                 type in some specific domain; a hash table with a 128-long
 222                 table might transform <constant>"hello"</constant> into
 223                 position <constant>63</constant>. The policy by which the
 224                 hash value is transformed into a position within the table
 225                 can dramatically affect performance.  Hash-based containers
 226                 also do not resize naturally (as opposed to tree-based
 227                 containers, for example). The appropriate resize policy is
 228                 unfortunately intertwined with the policy that transforms
 229                 hash value into a position within the table.
 230               </para>
 231             </listitem>
 232
 233             <listitem>
 234               <para>
 235                 Tree-based containers, for example, also support look-up and
 236                 insertion methods, and are primarily useful when maintaining
 237                 order between elements is important. In some cases, though,
 238                 one can utilize their balancing algorithms for completely
 239                 different purposes.
 240               </para>
 241
 242               <para>
 243                 Figure A shows a tree whose each node contains two entries:
 244                 a floating-point key, and some size-type
 245                 <emphasis>metadata</emphasis> (in bold beneath it) that is
 246                 the number of nodes in the sub-tree. (The root has key 0.99,
 247                 and has 5 nodes (including itself) in its sub-tree.) A
 248                 container based on this data structure can obviously answer
 249                 efficiently whether 0.3 is in the container object, but it
 250                 can also answer what is the order of 0.3 among all those in
 251                 the container object: see <xref linkend="biblio.clrs2001"/>.
 252
 253               </para>
 254
 255               <para>
 256                 As another example, Figure B shows a tree whose each node
 257                 contains two entries: a half-open geometric line interval,
 258                 and a number <emphasis>metadata</emphasis> (in bold beneath
 259                 it) that is the largest endpoint of all intervals in its
 260                 sub-tree.  (The root describes the interval <constant>[20,
 261                 36)</constant>, and the largest endpoint in its sub-tree is
 262                 99.) A container based on this data structure can obviously
 263                 answer efficiently whether <constant>[3, 41)</constant> is
 264                 in the container object, but it can also answer efficiently
 265                 whether the container object has intervals that intersect
 266                 <constant>[3, 41)</constant>. These types of queries are
 267                 very useful in geometric algorithms and lease-management
 268                 algorithms.
 269               </para>
 270
 271               <para>
 272                 It is important to note, however, that as the trees are
 273                 modified, their internal structure changes. To maintain
 274                 these invariants, one must supply some policy that is aware
 275                 of these changes.  Without this, it would be better to use a
 276                 linked list (in itself very efficient for these purposes).
 277               </para>
 278
 279             </listitem>
 280           </orderedlist>
 281
 282           <figure>
 283             <title>Node Invariants</title>
 284             <mediaobject>
 285               <imageobject>
 286                 <imagedata align="center" format="PNG" scale="100"
 287                            fileref="../images/pbds_node_invariants.png"/>
 288               </imageobject>
 289               <textobject>
 290                 <phrase>Node Invariants</phrase>
 291               </textobject>
 292             </mediaobject>
 293           </figure>
 294
 295         </section>
 296
 297         <section xml:id="motivation.associative.underlying">
 298           <info><title>Underlying Data Structures</title></info>
 299           <para>
 300             The standard C++ library contains associative containers based on
 301             red-black trees and collision-chaining hash tables. These are
 302             very useful, but they are not ideal for all types of
 303             settings.
 304           </para>
 305
 306           <para>
 307             The figure below shows the different underlying data structures
 308             currently supported in this library.
 309           </para>
 310
 311           <figure>
 312             <title>Underlying Associative Data Structures</title>
 313             <mediaobject>
 314               <imageobject>
 315                 <imagedata align="center" format="PNG" scale="100"
 316                            fileref="../images/pbds_different_underlying_dss_1.png"/>
 317               </imageobject>
 318               <textobject>
 319                 <phrase>Underlying Associative Data Structures</phrase>
 320               </textobject>
 321             </mediaobject>
 322           </figure>
 323
 324           <para>
 325             A shows a collision-chaining hash-table, B shows a probing
 326             hash-table, C shows a red-black tree, D shows a splay tree, E shows
 327             a tree based on an ordered vector(implicit in the order of the
 328             elements), F shows a PATRICIA trie, and G shows a list-based
 329             container with update policies.
 330           </para>
 331
 332           <para>
 333             Each of these data structures has some performance benefits, in
 334             terms of speed, size or both. For now, note that vector-based trees
 335             and probing hash tables manipulate memory more efficiently than
 336             red-black trees and collision-chaining hash tables, and that
 337             list-based associative containers are very useful for constructing
 338             "multimaps".
 339           </para>
 340
 341           <para>
 342             Now consider a function manipulating a generic associative
 343             container,
 344           </para>
 345           <programlisting>
 346             template&lt;class Cntnr&gt;
 347             int
 348             some_op_sequence(Cntnr &amp;r_cnt)
 349             {
 350             ...
 351             }
 352           </programlisting>
 353
 354           <para>
 355             Ideally, the underlying data structure
 356             of <classname>Cntnr</classname> would not affect what can be
 357             done with <varname>r_cnt</varname>.  Unfortunately, this is not
 358             the case.
 359           </para>
 360
 361           <para>
 362             For example, if <classname>Cntnr</classname>
 363             is <classname>std::map</classname>, then the function can
 364             use
 365           </para>
 366           <programlisting>
 367             std::for_each(r_cnt.find(foo), r_cnt.find(bar), foobar)
 368           </programlisting>
 369           <para>
 370             in order to apply <classname>foobar</classname> to all
 371             elements between <classname>foo</classname> and
 372             <classname>bar</classname>. If
 373             <classname>Cntnr</classname> is a hash-based container,
 374             then this call's results are undefined.
 375           </para>
 376
 377           <para>
 378             Also, if <classname>Cntnr</classname> is tree-based, the type
 379             and object of the comparison functor can be
 380             accessed. If <classname>Cntnr</classname> is hash based, these
 381             queries are nonsensical.
 382           </para>
 383
 384           <para>
 385             There are various other differences based on the container's
 386             underlying data structure. For one, they can be constructed by,
 387             and queried for, different policies. Furthermore:
 388           </para>
 389
 390           <orderedlist>
 391             <listitem>
 392               <para>
 393                 Containers based on C, D, E and F store elements in a
 394                 meaningful order; the others store elements in a meaningless
 395                 (and probably time-varying) order. By implication, only
 396                 containers based on C, D, E and F can
 397                 support <function>erase</function> operations taking an
 398                 iterator and returning an iterator to the following element
 399                 without performance loss.
 400               </para>
 401             </listitem>
 402
 403             <listitem>
 404               <para>
 405                 Containers based on C, D, E, and F can be split and joined
 406                 efficiently, while the others cannot. Containers based on C
 407                 and D, furthermore, can guarantee that this is exception-free;
 408                 containers based on E cannot guarantee this.
 409               </para>
 410             </listitem>
 411
 412             <listitem>
 413               <para>
 414                 Containers based on all but E can guarantee that
 415                 erasing an element is exception free; containers based on E
 416                 cannot guarantee this. Containers based on all but B and E
 417                 can guarantee that modifying an object of their type does
 418                 not invalidate iterators or references to their elements,
 419                 while containers based on B and E cannot. Containers based
 420                 on C, D, and E can furthermore make a stronger guarantee,
 421                 namely that modifying an object of their type does not
 422                 affect the order of iterators.
 423               </para>
 424             </listitem>
 425           </orderedlist>
 426
 427           <para>
 428             A unified tag and traits system (as used for the C++ standard
 429             library iterators, for example) can ease generic manipulation of
 430             associative containers based on different underlying data
 431             structures.
 432           </para>
 433
 434         </section>
 435
 436         <section xml:id="motivation.associative.iterators">
 437           <info><title>Iterators</title></info>
 438           <para>
 439             Iterators are centric to the design of the standard library
 440             containers, because of the container/algorithm/iterator
 441             decomposition that allows an algorithm to operate on a range
 442             through iterators of some sequence.  Iterators, then, are useful
 443             because they allow going over a
 444             specific <emphasis>sequence</emphasis>.  The standard library
 445             also uses iterators for accessing a
 446             specific <emphasis>element</emphasis>: when an associative
 447             container returns one through <function>find</function>. The
 448             standard library consistently uses the same types of iterators
 449             for both purposes: going over a range, and accessing a specific
 450             found element. Before the introduction of hash-based containers
 451             to the standard library, this made sense (with the exception of
 452             priority queues, which are discussed later).
 453           </para>
 454
 455           <para>
 456             Using the standard associative containers together with
 457             non-order-preserving associative containers (and also because of
 458             priority-queues container), there is a possible need for
 459             different types of iterators for self-organizing containers:
 460             the iterator concept seems overloaded to mean two different
 461             things (in some cases). <!-- <remark> XXX
 462             "ds_gen.html#find_range">Design::Associative
 463             Containers::Data-Structure Genericity::Point-Type and Range-Type
 464             Methods</remark>. -->
 465           </para>
 466
 467           <section xml:id="associative.iterators.using">
 468             <info>
 469               <title>Using Point Iterators for Range Operations</title>
 470             </info>
 471             <para>
 472               Suppose <classname>cntnr</classname> is some associative
 473               container, and say <varname>c</varname> is an object of
 474               type <classname>cntnr</classname>. Then what will be the outcome
 475               of
 476             </para>
 477
 478             <programlisting>
 479               std::for_each(c.find(1), c.find(5), foo);
 480             </programlisting>
 481
 482             <para>
 483               If <classname>cntnr</classname> is a tree-based container
 484               object, then an in-order walk will
 485               apply <classname>foo</classname> to the relevant elements,
 486               as in the graphic below, label A. If <varname>c</varname> is
 487               a hash-based container, then the order of elements between any
 488               two elements is undefined (and probably time-varying); there is
 489               no guarantee that the elements traversed will coincide with the
 490               <emphasis>logical</emphasis> elements between 1 and 5, as in
 491               label B.
 492             </para>
 493
 494             <figure>
 495               <title>Range Iteration in Different Data Structures</title>
 496               <mediaobject>
 497                 <imageobject>
 498                   <imagedata align="center" format="PNG" scale="100"
 499                              fileref="../images/pbds_point_iterators_range_ops_1.png"/>
 500                 </imageobject>
 501                 <textobject>
 502                   <phrase>Node Invariants</phrase>
 503                 </textobject>
 504               </mediaobject>
 505             </figure>
 506
 507             <para>
 508               In our opinion, this problem is not caused just because
 509               red-black trees are order preserving while
 510               collision-chaining hash tables are (generally) not - it
 511               is more fundamental. Most of the standard's containers
 512               order sequences in a well-defined manner that is
 513               determined by their <emphasis>interface</emphasis>:
 514               calling <function>insert</function> on a tree-based
 515               container modifies its sequence in a predictable way, as
 516               does calling <function>push_back</function> on a list or
 517               a vector. Conversely, collision-chaining hash tables,
 518               probing hash tables, priority queues, and list-based
 519               containers (which are very useful for "multimaps") are
 520               self-organizing data structures; the effect of each
 521               operation modifies their sequences in a manner that is
 522               (practically) determined by their
 523               <emphasis>implementation</emphasis>.
 524             </para>
 525
 526             <para>
 527               Consequently, applying an algorithm to a sequence obtained from most
 528               containers may or may not make sense, but applying it to a
 529               sub-sequence of a self-organizing container does not.
 530             </para>
 531           </section>
 532
 533           <section xml:id="associative.iterators.cost">
 534             <info>
 535               <title>Cost to Point Iterators to Enable Range Operations</title>
 536             </info>
 537             <para>
 538               Suppose <varname>c</varname> is some collision-chaining
 539               hash-based container object, and one calls
 540             </para>
 541             <programlisting>c.find(3)</programlisting>
 542             <para>
 543               Then what composes the returned iterator?
 544             </para>
 545
 546             <para>
 547               In the graphic below, label A shows the simplest (and
 548               most efficient) implementation of a collision-chaining
 549               hash table.  The little box marked
 550               <classname>point_iterator</classname> shows an object
 551               that contains a pointer to the element's node. Note that
 552               this "iterator" has no way to move to the next element (
 553               it cannot support
 554               <function>operator++</function>). Conversely, the little
 555               box marked <classname>iterator</classname> stores both a
 556               pointer to the element, as well as some other
 557               information (the bucket number of the element). the
 558               second iterator, then, is "heavier" than the first one-
 559               it requires more time and space. If we were to use a
 560               different container to cross-reference into this
 561               hash-table using these iterators - it would take much
 562               more space. As noted above, nothing much can be done by
 563               incrementing these iterators, so why is this extra
 564               information needed?
 565             </para>
 566
 567             <para>
 568               Alternatively, one might create a collision-chaining hash-table
 569               where the lists might be linked, forming a monolithic total-element
 570               list, as in the graphic below, label B.  Here the iterators are as
 571               light as can be, but the hash-table's operations are more
 572               complicated.
 573             </para>
 574
 575             <figure>
 576               <title>Point Iteration in Hash Data Structures</title>
 577               <mediaobject>
 578                 <imageobject>
 579                   <imagedata align="center" format="PNG" scale="100"
 580                              fileref="../images/pbds_point_iterators_range_ops_2.png"/>
 581                 </imageobject>
 582                 <textobject>
 583                   <phrase>Point Iteration in Hash Data Structures</phrase>
 584                 </textobject>
 585               </mediaobject>
 586             </figure>
 587
 588             <para>
 589               It should be noted that containers based on collision-chaining
 590               hash-tables are not the only ones with this type of behavior;
 591               many other self-organizing data structures display it as well.
 592             </para>
 593           </section>
 594
 595           <section xml:id="associative.iterators.invalidation">
 596             <info><title>Invalidation Guarantees</title></info>
 597             <para>Consider the following snippet:</para>
 598             <programlisting>
 599               it = c.find(3);
 600               c.erase(5);
 601             </programlisting>
 602
 603             <para>
 604               Following the call to <classname>erase</classname>, what is the
 605               validity of <classname>it</classname>: can it be de-referenced?
 606               can it be incremented?
 607             </para>
 608
 609             <para>
 610               The answer depends on the underlying data structure of the
 611               container. The graphic below shows three cases: A1 and A2 show
 612               a red-black tree; B1 and B2 show a probing hash-table; C1 and C2
 613               show a collision-chaining hash table.
 614             </para>
 615
 616             <figure>
 617               <title>Effect of erase in different underlying data structures</title>
 618               <mediaobject>
 619                 <imageobject>
 620                   <imagedata align="center" format="PNG" scale="100"
 621                              fileref="../images/pbds_invalidation_guarantee_erase.png"/>
 622                 </imageobject>
 623                 <textobject>
 624                   <phrase>Effect of erase in different underlying data structures</phrase>
 625                 </textobject>
 626               </mediaobject>
 627             </figure>
 628
 629             <orderedlist>
 630               <listitem>
 631                 <para>
 632                   Erasing 5 from A1 yields A2. Clearly, an iterator to 3 can
 633                   be de-referenced and incremented. The sequence of iterators
 634                   changed, but in a way that is well-defined by the interface.
 635                 </para>
 636               </listitem>
 637
 638               <listitem>
 639                 <para>
 640                   Erasing 5 from B1 yields B2. Clearly, an iterator to 3 is
 641                   not valid at all - it cannot be de-referenced or
 642                   incremented; the order of iterators changed in a way that is
 643                   (practically) determined by the implementation and not by
 644                   the interface.
 645                 </para>
 646               </listitem>
 647
 648               <listitem>
 649                 <para>
 650                   Erasing 5 from C1 yields C2. Here the situation is more
 651                   complicated. On the one hand, there is no problem in
 652                   de-referencing <classname>it</classname>. On the other hand,
 653                   the order of iterators changed in a way that is
 654                   (practically) determined by the implementation and not by
 655                   the interface.
 656                 </para>
 657               </listitem>
 658             </orderedlist>
 659
 660             <para>
 661               So in the standard library containers, it is not always possible
 662               to express whether <varname>it</varname> is valid or not. This
 663               is true also for <function>insert</function>. Again, the
 664               iterator concept seems overloaded.
 665             </para>
 666           </section>
 667         </section> <!--iterators-->
 668
 669
 670         <section xml:id="motivation.associative.functions">
 671           <info><title>Functional</title></info>
 672           <para>
 673           </para>
 674
 675           <para>
 676             The design of the functional overlay to the underlying data
 677             structures differs slightly from some of the conventions used in
 678             the C++ standard.  A strict public interface of methods that
 679             comprise only operations which depend on the class's internal
 680             structure; other operations are best designed as external
 681             functions. (See <xref linkend="biblio.meyers02both"/>).With this
 682             rubric, the standard associative containers lack some useful
 683             methods, and provide other methods which would be better
 684             removed.
 685           </para>
 686
 687           <section xml:id="motivation.associative.functions.erase">
 688             <info><title><function>erase</function></title></info>
 689
 690             <orderedlist>
 691               <listitem>
 692                 <para>
 693                   Order-preserving standard associative containers provide the
 694                   method
 695                 </para>
 696                 <programlisting>
 697                   iterator
 698                   erase(iterator it)
 699                 </programlisting>
 700
 701                 <para>
 702                   which takes an iterator, erases the corresponding
 703                   element, and returns an iterator to the following
 704                   element. Also standardd hash-based associative
 705                   containers provide this method. This seemingly
 706                   increasesgenericity between associative containers,
 707                   since it is possible to use
 708                 </para>
 709                 <programlisting>
 710                   typename C::iterator it = c.begin();
 711                   typename C::iterator e_it = c.end();
 712
 713                   while(it != e_it)
 714                   it = pred(*it)? c.erase(it) : ++it;
 715                 </programlisting>
 716
 717                 <para>
 718                   in order to erase from a container object <varname>
 719                   c</varname> all element which match a
 720                   predicate <classname>pred</classname>. However, in a
 721                   different sense this actually decreases genericity: an
 722                   integral implication of this method is that tree-based
 723                   associative containers' memory use is linear in the total
 724                   number of elements they store, while hash-based
 725                   containers' memory use is unbounded in the total number of
 726                   elements they store. Assume a hash-based container is
 727                   allowed to decrease its size when an element is
 728                   erased. Then the elements might be rehashed, which means
 729                   that there is no "next" element - it is simply
 730                   undefined. Consequently, it is possible to infer from the
 731                   fact that the standard library's hash-based containers
 732                   provide this method that they cannot downsize when
 733                   elements are erased. As a consequence, different code is
 734                   needed to manipulate different containers, assuming that
 735                   memory should be conserved. Therefor, this library's
 736                   non-order preserving associative containers omit this
 737                   method.
 738                 </para>
 739               </listitem>
 740
 741               <listitem>
 742                 <para>
 743                   All associative containers include a conditional-erase method
 744                 </para>
 745                 <programlisting>
 746                   template&lt;
 747                   class Pred&gt;
 748                   size_type
 749                   erase_if
 750                   (Pred pred)
 751                 </programlisting>
 752                 <para>
 753                   which erases all elements matching a predicate. This is probably the
 754                   only way to ensure linear-time multiple-item erase which can
 755                   actually downsize a container.
 756                 </para>
 757               </listitem>
 758
 759               <listitem>
 760                 <para>
 761                   The standard associative containers provide methods for
 762                   multiple-item erase of the form
 763                 </para>
 764                 <programlisting>
 765                   size_type
 766                   erase(It b, It e)
 767                 </programlisting>
 768                 <para>
 769                   erasing a range of elements given by a pair of
 770                   iterators. For tree-based or trie-based containers, this can
 771                   implemented more efficiently as a (small) sequence of split
 772                   and join operations. For other, unordered, containers, this
 773                   method isn't much better than an external loop. Moreover,
 774                   if <varname>c</varname> is a hash-based container,
 775                   then
 776                 </para>
 777                 <programlisting>
 778                   c.erase(c.find(2), c.find(5))
 779                 </programlisting>
 780                 <para>
 781                   is almost certain to do something
 782                   different than erasing all elements whose keys are between 2
 783                   and 5, and is likely to produce other undefined behavior.
 784                 </para>
 785               </listitem>
 786             </orderedlist>
 787           </section> <!-- erase -->
 788
 789           <section xml:id="motivation.associative.functions.split">
 790             <info>
 791               <title>
 792                 <function>split</function> and <function>join</function>
 793               </title>
 794             </info>
 795             <para>
 796               It is well-known that tree-based and trie-based container
 797               objects can be efficiently split or joined (See
 798               <xref linkend="biblio.clrs2001"/>). Externally splitting or
 799               joining trees is super-linear, and, furthermore, can throw
 800               exceptions. Split and join methods, consequently, seem good
 801               choices for tree-based container methods, especially, since as
 802               noted just before, they are efficient replacements for erasing
 803               sub-sequences.
 804             </para>
 805
 806           </section> <!-- split -->
 807
 808           <section xml:id="motivation.associative.functions.insert">
 809             <info>
 810               <title>
 811                 <function>insert</function>
 812               </title>
 813             </info>
 814             <para>
 815               The standard associative containers provide methods of the form
 816             </para>
 817             <programlisting>
 818               template&lt;class It&gt;
 819               size_type
 820               insert(It b, It e);
 821             </programlisting>
 822
 823             <para>
 824               for inserting a range of elements given by a pair of
 825               iterators. At best, this can be implemented as an external loop,
 826               or, even more efficiently, as a join operation (for the case of
 827               tree-based or trie-based containers). Moreover, these methods seem
 828               similar to constructors taking a range given by a pair of
 829               iterators; the constructors, however, are transactional, whereas
 830               the insert methods are not; this is possibly confusing.
 831             </para>
 832
 833           </section> <!-- insert -->
 834
 835           <section xml:id="motivation.associative.functions.compare">
 836             <info>
 837               <title>
 838                 <function>operator==</function> and <function>operator&lt;=</function>
 839               </title>
 840             </info>
 841
 842             <para>
 843               Associative containers are parametrized by policies allowing to
 844               test key equivalence: a hash-based container can do this through
 845               its equivalence functor, and a tree-based container can do this
 846               through its comparison functor. In addition, some standard
 847               associative containers have global function operators, like
 848               <function>operator==</function> and <function>operator&lt;=</function>,
 849               that allow comparing entire associative containers.
 850             </para>
 851
 852             <para>
 853               In our opinion, these functions are better left out. To begin
 854               with, they do not significantly improve over an external
 855               loop. More importantly, however, they are possibly misleading -
 856               <function>operator==</function>, for example, usually checks for
 857               equivalence, or interchangeability, but the associative
 858               container cannot check for values' equivalence, only keys'
 859               equivalence; also, are two containers considered equivalent if
 860               they store the same values in different order? this is an
 861               arbitrary decision.
 862             </para>
 863           </section> <!-- compare -->
 864
 865         </section>  <!-- functional -->
 866
 867       </section> <!--associative-->
 868
 869       <section xml:id="pbds.intro.motivation.priority_queue">
 870         <info><title>Priority Queues</title></info>
 871
 872         <section xml:id="motivation.priority_queue.policy">
 873           <info><title>Policy Choices</title></info>
 874
 875           <para>
 876             Priority queues are containers that allow efficiently inserting
 877             values and accessing the maximal value (in the sense of the
 878             container's comparison functor). Their interface
 879             supports <function>push</function>
 880             and <function>pop</function>. The standard
 881             container <classname>std::priorityqueue</classname> indeed support
 882             these methods, but little else. For algorithmic and
 883             software-engineering purposes, other methods are needed:
 884           </para>
 885
 886           <orderedlist>
 887             <listitem>
 888               <para>
 889                 Many graph algorithms (see
 890                 <xref linkend="biblio.clrs2001"/>) require increasing a
 891                 value in a priority queue (again, in the sense of the
 892                 container's comparison functor), or joining two
 893                 priority-queue objects.
 894               </para>
 895             </listitem>
 896
 897             <listitem>
 898               <para>The return type of <classname>priority_queue</classname>'s
 899               <function>push</function> method is a point-type iterator, which can
 900               be used for modifying or erasing arbitrary values. For
 901               example:</para>
 902               <programlisting>
 903                 priority_queue&lt;int&gt; p;
 904                 priority_queue&lt;int&gt;::point_iterator it = p.push(3);
 905                 p.modify(it, 4);
 906               </programlisting>
 907
 908               <para>These types of cross-referencing operations are necessary
 909               for making priority queues useful for different applications,
 910               especially graph applications.</para>
 911
 912             </listitem>
 913             <listitem>
 914               <para>
 915                 It is sometimes necessary to erase an arbitrary value in a
 916                 priority queue. For example, consider
 917                 the <function>select</function> function for monitoring
 918                 file descriptors:
 919               </para>
 920
 921               <programlisting>
 922                 int
 923                 select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds,
 924                 struct timeval *timeout);
 925               </programlisting>
 926               <para>
 927                 then, as the select documentation states:
 928               </para>
 929               <para>
 930                 <quote>
 931                   The nfds argument specifies the range of file
 932                   descriptors to be tested. The select() function tests file
 933                 descriptors in the range of 0 to nfds-1.</quote>
 934               </para>
 935
 936               <para>
 937                 It stands to reason, therefore, that we might wish to
 938                 maintain a minimal value for <varname>nfds</varname>, and
 939                 priority queues immediately come to mind. Note, though, that
 940                 when a socket is closed, the minimal file description might
 941                 change; in the absence of an efficient means to erase an
 942                 arbitrary value from a priority queue, we might as well
 943                 avoid its use altogether.
 944               </para>
 945
 946               <para>
 947                 The standard containers typically support iterators. It is
 948                 somewhat unusual
 949                 for <classname>std::priority_queue</classname> to omit them
 950                 (See <xref linkend="biblio.meyers01stl"/>). One might
 951                 ask why do priority queues need to support iterators, since
 952                 they are self-organizing containers with a different purpose
 953                 than abstracting sequences. There are several reasons:
 954               </para>
 955               <orderedlist>
 956                 <listitem>
 957                   <para>
 958                     Iterators (even in self-organizing containers) are
 959                     useful for many purposes: cross-referencing
 960                     containers, serialization, and debugging code that uses
 961                     these containers.
 962                   </para>
 963                 </listitem>
 964
 965                 <listitem>
 966                   <para>
 967                     The standard library's hash-based containers support
 968                     iterators, even though they too are self-organizing
 969                     containers with a different purpose than abstracting
 970                     sequences.
 971                   </para>
 972                 </listitem>
 973
 974                 <listitem>
 975                   <para>
 976                     In standard-library-like containers, it is natural to specify the
 977                     interface of operations for modifying a value or erasing
 978                     a value (discussed previously) in terms of a iterators.
 979                     It should be noted that the standard
 980                     containers also use iterators for accessing and
 981                     manipulating a specific value. In hash-based
 982                     containers, one checks the existence of a key by
 983                     comparing the iterator returned by <function>find</function> to the
 984                     iterator returned by <function>end</function>, and not by comparing a
 985                     pointer returned by <function>find</function> to <type>NULL</type>.
 986                   </para>
 987                 </listitem>
 988               </orderedlist>
 989             </listitem>
 990           </orderedlist>
 991
 992         </section>
 993
 994         <section xml:id="motivation.priority_queue.underlying">
 995           <info><title>Underlying Data Structures</title></info>
 996
 997           <para>
 998             There are three main implementations of priority queues: the
 999             first employs a binary heap, typically one which uses a
1000             sequence; the second uses a tree (or forest of trees), which is
1001             typically less structured than an associative container's tree;
1002             the third simply uses an associative container. These are
1003             shown in the figure below with labels A1 and A2, B, and C.
1004           </para>
1005
1006           <figure>
1007             <title>Underlying Priority Queue Data Structures</title>
1008             <mediaobject>
1009               <imageobject>
1010                 <imagedata align="center" format="PNG" scale="100"
1011                            fileref="../images/pbds_different_underlying_dss_2.png"/>
1012               </imageobject>
1013               <textobject>
1014                 <phrase>Underlying Priority Queue Data Structures</phrase>
1015               </textobject>
1016             </mediaobject>
1017           </figure>
1018
1019           <para>
1020             No single implementation can completely replace any of the
1021             others. Some have better <function>push</function>
1022             and <function>pop</function> amortized performance, some have
1023             better bounded (worst case) response time than others, some
1024             optimize a single method at the expense of others, etc. In
1025             general the "best" implementation is dictated by the specific
1026             problem.
1027           </para>
1028
1029           <para>
1030             As with associative containers, the more implementations
1031             co-exist, the more necessary a traits mechanism is for handling
1032             generic containers safely and efficiently. This is especially
1033             important for priority queues, since the invalidation guarantees
1034             of one of the most useful data structures - binary heaps - is
1035             markedly different than those of most of the others.
1036           </para>
1037
1038         </section>
1039
1040         <section xml:id="motivation.priority_queue.binary_heap">
1041           <info><title>Binary Heaps</title></info>
1042
1043
1044           <para>
1045             Binary heaps are one of the most useful underlying
1046             data structures for priority queues. They are very efficient in
1047             terms of memory (since they don't require per-value structure
1048             metadata), and have the best amortized <function>push</function> and
1049             <function>pop</function> performance for primitive types like
1050             <type>int</type>.
1051           </para>
1052
1053           <para>
1054             The standard library's <classname>priority_queue</classname>
1055             implements this data structure as an adapter over a sequence,
1056             typically
1057             <classname>std::vector</classname>
1058             or <classname>std::deque</classname>, which correspond to labels
1059             A1 and A2 respectively in the graphic above.
1060           </para>
1061
1062           <para>
1063             This is indeed an elegant example of the adapter concept and
1064             the algorithm/container/iterator decomposition. (See <xref linkend="biblio.nelson96stlpq"/>). There are
1065             several reasons why a binary-heap priority queue
1066             may be better implemented as a container instead of a
1067             sequence adapter:
1068           </para>
1069
1070           <orderedlist>
1071             <listitem>
1072               <para>
1073                 <classname>std::priority_queue</classname> cannot erase values
1074                 from its adapted sequence (irrespective of the sequence
1075                 type). This means that the memory use of
1076                 an <classname>std::priority_queue</classname> object is always
1077                 proportional to the maximal number of values it ever contained,
1078                 and not to the number of values that it currently
1079                 contains. (See <filename>performance/priority_queue_text_pop_mem_usage.cc</filename>.)
1080                 This implementation of binary heaps acts very differently than
1081                 other underlying data structures (See also pairing heaps).
1082               </para>
1083             </listitem>
1084
1085             <listitem>
1086               <para>
1087                 Some combinations of adapted sequences and value types
1088                 are very inefficient or just don't make sense. If one uses
1089                 <classname>std::priority_queue&lt;std::vector&lt;std::string&gt;
1090                 &gt; &gt;</classname>, for example, then not only will each
1091                 operation perform a logarithmic number of
1092                 <classname>std::string</classname> assignments, but, furthermore, any
1093                 operation (including <function>pop</function>) can render the container
1094                 useless due to exceptions. Conversely, if one uses
1095                 <classname>std::priority_queue&lt;std::deque&lt;int&gt; &gt;
1096                 &gt;</classname>, then each operation uses incurs a logarithmic
1097                 number of indirect accesses (through pointers) unnecessarily.
1098                 It might be better to let the container make a conservative
1099                 deduction whether to use the structure in the graphic above, labels A1 or A2.
1100               </para>
1101             </listitem>
1102
1103             <listitem>
1104               <para>
1105                 There does not seem to be a systematic way to determine
1106                 what exactly can be done with the priority queue.
1107               </para>
1108               <orderedlist>
1109                 <listitem>
1110                   <para>
1111                     If <classname>p</classname> is a priority queue adapting an
1112                     <classname>std::vector</classname>, then it is possible to iterate over
1113                     all values by using <function>&amp;p.top()</function> and
1114                     <function>&amp;p.top() + p.size()</function>, but this will not work
1115                     if <varname>p</varname> is adapting an <classname>std::deque</classname>; in any
1116                     case, one cannot use <classname>p.begin()</classname> and
1117                     <classname>p.end()</classname>. If a different sequence is adapted, it
1118                     is even more difficult to determine what can be
1119                     done.
1120                   </para>
1121                 </listitem>
1122
1123                 <listitem>
1124                   <para>
1125                     If <varname>p</varname> is a priority queue adapting an
1126                     <classname>std::deque</classname>, then the reference return by
1127                   </para>
1128                   <programlisting>
1129                     p.top()
1130                   </programlisting>
1131                   <para>
1132                     will remain valid until it is popped,
1133                     but if <varname>p</varname> adapts an <classname>std::vector</classname>, the
1134                     next <function>push</function> will invalidate it. If a different
1135                     sequence is adapted, it is even more difficult to
1136                     determine what can be done.
1137                   </para>
1138                 </listitem>
1139               </orderedlist>
1140             </listitem>
1141
1142             <listitem>
1143               <para>
1144                 Sequence-based binary heaps can still implement
1145                 linear-time <function>erase</function> and <function>modify</function> operations.
1146                 This means that if one needs to erase a small
1147                 (say logarithmic) number of values, then one might still
1148                 choose this underlying data structure. Using
1149                 <classname>std::priority_queue</classname>, however, this will generally
1150                 change the order of growth of the entire sequence of
1151                 operations.
1152               </para>
1153             </listitem>
1154           </orderedlist>
1155
1156         </section>
1157       </section>
1158     </section> <!-- goals/motivation -->
1159   </section> <!-- intro -->
1160
1161   <!-- S02: Using -->
1162   <section xml:id="containers.pbds.using">
1163     <info><title>Using</title></info>
1164     <?dbhtml filename="policy_data_structures_using.html"?>
1165
1166     <section xml:id="pbds.using.prereq">
1167       <info><title>Prerequisites</title></info>
1168
1169       <para>The library contains only header files, and does not require any
1170       other libraries except the standard C++ library . All classes are
1171       defined in namespace <code>__gnu_pbds</code>. The library internally
1172       uses macros beginning with <code>PB_DS</code>, but
1173       <code>#undef</code>s anything it <code>#define</code>s (except for
1174       header guards). Compiling the library in an environment where macros
1175       beginning in <code>PB_DS</code> are defined, may yield unpredictable
1176       results in compilation, execution, or both.</para>
1177
1178       <para>
1179         Further dependencies are necessary to create the visual output
1180         for the performance tests. To create these graphs, an
1181         additional package is needed: <command>pychart</command>.
1182       </para>
1183     </section>
1184
1185     <section xml:id="pbds.using.organization">
1186       <info><title>Organization</title></info>
1187
1188       <para>
1189         The various data structures are organized as follows.
1190       </para>
1191
1192       <itemizedlist>
1193         <listitem>
1194           <para>
1195             Branch-Based
1196           </para>
1197
1198           <itemizedlist>
1199             <listitem>
1200               <para>
1201                 <classname>basic_branch</classname>
1202                 is an abstract base class for branched-based
1203                 associative-containers
1204               </para>
1205             </listitem>
1206
1207             <listitem>
1208               <para>
1209                 <classname>tree</classname>
1210                 is a concrete base class for tree-based
1211                 associative-containers
1212               </para>
1213             </listitem>
1214
1215             <listitem>
1216               <para>
1217                 <classname>trie</classname>
1218                 is a concrete base class trie-based
1219                 associative-containers
1220               </para>
1221             </listitem>
1222           </itemizedlist>
1223         </listitem>
1224
1225         <listitem>
1226           <para>
1227             Hash-Based
1228           </para>
1229           <itemizedlist>
1230             <listitem>
1231               <para>
1232                 <classname>basic_hash_table</classname>
1233                 is an abstract base class for hash-based
1234                 associative-containers
1235               </para>
1236             </listitem>
1237
1238             <listitem>
1239               <para>
1240                 <classname>cc_hash_table</classname>
1241                 is a concrete collision-chaining hash-based
1242                 associative-containers
1243               </para>
1244             </listitem>
1245
1246             <listitem>
1247               <para>
1248                 <classname>gp_hash_table</classname>
1249                 is a concrete (general) probing hash-based
1250                 associative-containers
1251               </para>
1252             </listitem>
1253           </itemizedlist>
1254         </listitem>
1255
1256         <listitem>
1257           <para>
1258             List-Based
1259           </para>
1260           <itemizedlist>
1261             <listitem>
1262               <para>
1263                 <classname>list_update</classname>
1264                 list-based update-policy associative container
1265               </para>
1266             </listitem>
1267           </itemizedlist>
1268         </listitem>
1269         <listitem>
1270           <para>
1271             Heap-Based
1272           </para>
1273           <itemizedlist>
1274             <listitem>
1275               <para>
1276                 <classname>priority_queue</classname>
1277                 A priority queue.
1278               </para>
1279             </listitem>
1280           </itemizedlist>
1281         </listitem>
1282       </itemizedlist>
1283
1284       <para>
1285         The hierarchy is composed naturally so that commonality is
1286         captured by base classes. Thus <function>operator[]</function>
1287         is defined at the base of any hierarchy, since all derived
1288         containers support it. Conversely <function>split</function> is
1289         defined in <classname>basic_branch</classname>, since only
1290         tree-like containers support it.
1291       </para>
1292
1293       <para>
1294         In addition, there are the following diagnostics classes,
1295         used to report errors specific to this library's data
1296         structures.
1297       </para>
1298
1299       <figure>
1300         <title>Exception Hierarchy</title>
1301         <mediaobject>
1302           <imageobject>
1303             <imagedata align="center" format="PDF" scale="75"
1304                        fileref="../images/pbds_exception_hierarchy.pdf"/>
1305           </imageobject>
1306           <imageobject>
1307             <imagedata align="center" format="PNG" scale="100"
1308                        fileref="../images/pbds_exception_hierarchy.png"/>
1309           </imageobject>
1310           <textobject>
1311             <phrase>Exception Hierarchy</phrase>
1312           </textobject>
1313         </mediaobject>
1314       </figure>
1315
1316     </section>
1317
1318     <section xml:id="pbds.using.tutorial">
1319       <info><title>Tutorial</title></info>
1320
1321       <section xml:id="pbds.using.tutorial.basic">
1322         <info><title>Basic Use</title></info>
1323
1324         <para>
1325           For the most part, the policy-based containers containers in
1326           namespace <literal>__gnu_pbds</literal> have the same interface as
1327           the equivalent containers in the standard C++ library, except for
1328           the names used for the container classes themselves. For example,
1329           this shows basic operations on a collision-chaining hash-based
1330           container:
1331         </para>
1332         <programlisting>
1333           #include &lt;ext/pb_ds/assoc_container.h&gt;
1334
1335           int main()
1336           {
1337           __gnu_pbds::cc_hash_table&lt;int, char&gt; c;
1338           c[2] = 'b';
1339           assert(c.find(1) == c.end());
1340           };
1341         </programlisting>
1342
1343         <para>
1344           The container is called
1345           <classname>__gnu_pbds::cc_hash_table</classname> instead of
1346           <classname>std::unordered_map</classname>, since <quote>unordered
1347           map</quote> does not necessarily mean a hash-based map as implied by
1348           the C++ library (C++11 or TR1). For example, list-based associative
1349           containers, which are very useful for the construction of
1350           "multimaps," are also unordered.
1351         </para>
1352
1353         <para>This snippet shows a red-black tree based container:</para>
1354
1355         <programlisting>
1356           #include &lt;ext/pb_ds/assoc_container.h&gt;
1357
1358           int main()
1359           {
1360           __gnu_pbds::tree&lt;int, char&gt; c;
1361           c[2] = 'b';
1362           assert(c.find(2) != c.end());
1363           };
1364         </programlisting>
1365
1366         <para>The container is called <classname>tree</classname> instead of
1367         <classname>map</classname> since the underlying data structures are
1368         being named with specificity.
1369         </para>
1370
1371         <para>
1372           The member function naming convention is to strive to be the same as
1373           the equivalent member functions in other C++ standard library
1374           containers. The familiar methods are unchanged:
1375           <function>begin</function>, <function>end</function>,
1376           <function>size</function>, <function>empty</function>, and
1377           <function>clear</function>.
1378         </para>
1379
1380         <para>
1381           This isn't to say that things are exactly as one would expect, given
1382           the container requirments and interfaces in the C++ standard.
1383         </para>
1384
1385         <para>
1386           The names of containers' policies and policy accessors are
1387           different then the usual. For example, if <type>hash_type</type> is
1388         some type of hash-based container, then</para>
1389
1390         <programlisting>
1391           hash_type::hash_fn
1392         </programlisting>
1393
1394         <para>
1395           gives the type of its hash functor, and if <varname>obj</varname> is
1396           some hash-based container object, then
1397         </para>
1398
1399         <programlisting>
1400           obj.get_hash_fn()
1401         </programlisting>
1402
1403         <para>will return a reference to its hash-functor object.</para>
1404
1405
1406         <para>
1407           Similarly, if <type>tree_type</type> is some type of tree-based
1408           container, then
1409         </para>
1410
1411         <programlisting>
1412           tree_type::cmp_fn
1413         </programlisting>
1414
1415         <para>
1416           gives the type of its comparison functor, and if
1417           <varname>obj</varname> is some tree-based container object,
1418           then
1419         </para>
1420
1421         <programlisting>
1422           obj.get_cmp_fn()
1423         </programlisting>
1424
1425         <para>will return a reference to its comparison-functor object.</para>
1426
1427         <para>
1428           It would be nice to give names consistent with those in the existing
1429           C++ standard (inclusive of TR1). Unfortunately, these standard
1430           containers don't consistently name types and methods. For example,
1431           <classname>std::tr1::unordered_map</classname> uses
1432           <type>hasher</type> for the hash functor, but
1433           <classname>std::map</classname> uses <type>key_compare</type> for
1434           the comparison functor. Also, we could not find an accessor for
1435           <classname>std::tr1::unordered_map</classname>'s hash functor, but
1436           <classname>std::map</classname> uses <classname>compare</classname>
1437           for accessing the comparison functor.
1438         </para>
1439
1440         <para>
1441           Instead, <literal>__gnu_pbds</literal> attempts to be internally
1442           consistent, and uses standard-derived terminology if possible.
1443         </para>
1444
1445         <para>
1446           Another source of difference is in scope:
1447           <literal>__gnu_pbds</literal> contains more types of associative
1448           containers than the standard C++ library, and more opportunities
1449           to configure these new containers, since different types of
1450           associative containers are useful in different settings.
1451         </para>
1452
1453         <para>
1454           Namespace <literal>__gnu_pbds</literal> contains different classes for
1455           hash-based containers, tree-based containers, trie-based containers,
1456           and list-based containers.
1457         </para>
1458
1459         <para>
1460           Since associative containers share parts of their interface, they
1461           are organized as a class hierarchy.
1462         </para>
1463
1464         <para>Each type or method is defined in the most-common ancestor
1465         in which it makes sense.
1466         </para>
1467
1468         <para>For example, all associative containers support iteration
1469         expressed in the following form:
1470         </para>
1471
1472         <programlisting>
1473           const_iterator
1474           begin() const;
1475
1476           iterator
1477           begin();
1478
1479           const_iterator
1480           end() const;
1481
1482           iterator
1483           end();
1484         </programlisting>
1485
1486         <para>
1487           But not all containers contain or use hash functors. Yet, both
1488           collision-chaining and (general) probing hash-based associative
1489           containers have a hash functor, so
1490           <classname>basic_hash_table</classname> contains the interface:
1491         </para>
1492
1493         <programlisting>
1494           const hash_fn&amp;
1495           get_hash_fn() const;
1496
1497           hash_fn&amp;
1498           get_hash_fn();
1499         </programlisting>
1500
1501         <para>
1502           so all hash-based associative containers inherit the same
1503           hash-functor accessor methods.
1504         </para>
1505
1506       </section> <!--basic use -->
1507
1508       <section xml:id="pbds.using.tutorial.configuring">
1509         <info>
1510           <title>
1511             Configuring via Template Parameters
1512           </title>
1513         </info>
1514
1515         <para>
1516           In general, each of this library's containers is
1517           parametrized by more policies than those of the standard library. For
1518           example, the standard hash-based container is parametrized as
1519           follows:
1520         </para>
1521         <programlisting>
1522           template&lt;typename Key, typename Mapped, typename Hash,
1523           typename Pred, typename Allocator, bool Cache_Hashe_Code&gt;
1524           class unordered_map;
1525         </programlisting>
1526
1527         <para>
1528           and so can be configured by key type, mapped type, a functor
1529           that translates keys to unsigned integral types, an equivalence
1530           predicate, an allocator, and an indicator whether to store hash
1531           values with each entry. this library's collision-chaining
1532           hash-based container is parametrized as
1533         </para>
1534         <programlisting>
1535           template&lt;typename Key, typename Mapped, typename Hash_Fn,
1536           typename Eq_Fn, typename Comb_Hash_Fn,
1537           typename Resize_Policy, bool Store_Hash
1538           typename Allocator&gt;
1539           class cc_hash_table;
1540         </programlisting>
1541
1542         <para>
1543           and so can be configured by the first four types of
1544           <classname>std::tr1::unordered_map</classname>, then a
1545           policy for translating the key-hash result into a position
1546           within the table, then a policy by which the table resizes,
1547           an indicator whether to store hash values with each entry,
1548           and an allocator (which is typically the last template
1549           parameter in standard containers).
1550         </para>
1551
1552         <para>
1553           Nearly all policy parameters have default values, so this
1554           need not be considered for casual use. It is important to
1555           note, however, that hash-based containers' policies can
1556           dramatically alter their performance in different settings,
1557           and that tree-based containers' policies can make them
1558           useful for other purposes than just look-up.
1559         </para>
1560
1561
1562         <para>As opposed to associative containers, priority queues have
1563         relatively few configuration options. The priority queue is
1564         parametrized as follows:</para>
1565         <programlisting>
1566           template&lt;typename Value_Type, typename Cmp_Fn,typename Tag,
1567           typename Allocator&gt;
1568           class priority_queue;
1569         </programlisting>
1570
1571         <para>The <classname>Value_Type</classname>, <classname>Cmp_Fn</classname>, and
1572         <classname>Allocator</classname> parameters are the container's value type,
1573         comparison-functor type, and allocator type, respectively;
1574         these are very similar to the standard's priority queue. The
1575         <classname>Tag</classname> parameter is different: there are a number of
1576         pre-defined tag types corresponding to binary heaps, binomial
1577         heaps, etc., and <classname>Tag</classname> should be instantiated
1578         by one of them.</para>
1579
1580         <para>Note that as opposed to the
1581         <classname>std::priority_queue</classname>,
1582         <classname>__gnu_pbds::priority_queue</classname> is not a
1583         sequence-adapter; it is a regular container.</para>
1584
1585       </section>
1586
1587       <section xml:id="pbds.using.tutorial.traits">
1588         <info>
1589           <title>
1590             Querying Container Attributes
1591           </title>
1592         </info>
1593         <para></para>
1594
1595         <para>A containers underlying data structure
1596         affect their performance; Unfortunately, they can also affect
1597         their interface. When manipulating generically associative
1598         containers, it is often useful to be able to statically
1599         determine what they can support and what the cannot.
1600         </para>
1601
1602         <para>Happily, the standard provides a good solution to a similar
1603         problem - that of the different behavior of iterators. If
1604         <classname>It</classname> is an iterator, then
1605         </para>
1606         <programlisting>
1607           typename std::iterator_traits&lt;It&gt;::iterator_category
1608         </programlisting>
1609
1610         <para>is one of a small number of pre-defined tag classes, and
1611         </para>
1612         <programlisting>
1613           typename std::iterator_traits&lt;It&gt;::value_type
1614         </programlisting>
1615
1616         <para>is the value type to which the iterator "points".</para>
1617
1618         <para>
1619           Similarly, in this library, if <type>C</type> is a
1620           container, then <classname>container_traits</classname> is a
1621           trait class that stores information about the kind of
1622           container that is implemented.
1623         </para>
1624         <programlisting>
1625           typename container_traits&lt;C&gt;::container_category
1626         </programlisting>
1627         <para>
1628           is one of a small number of predefined tag structures that
1629           uniquely identifies the type of underlying data structure.
1630         </para>
1631
1632         <para>In most cases, however, the exact underlying data
1633         structure is not really important, but what is important is
1634         one of its other attributes: whether it guarantees storing
1635         elements by key order, for example. For this one can
1636         use</para>
1637         <programlisting>
1638           typename container_traits&lt;C&gt;::order_preserving
1639         </programlisting>
1640         <para>
1641           Also,
1642         </para>
1643         <programlisting>
1644           typename container_traits&lt;C&gt;::invalidation_guarantee
1645         </programlisting>
1646
1647         <para>is the container's invalidation guarantee. Invalidation
1648         guarantees are especially important regarding priority queues,
1649         since in this library's design, iterators are practically the
1650         only way to manipulate them.</para>
1651       </section>
1652
1653       <section xml:id="pbds.using.tutorial.point_range_iteration">
1654         <info>
1655           <title>
1656             Point and Range Iteration
1657           </title>
1658         </info>
1659         <para></para>
1660
1661         <para>This library differentiates between two types of methods
1662         and iterators: point-type, and range-type. For example,
1663         <function>find</function> and <function>insert</function> are point-type methods, since
1664         they each deal with a specific element; their returned
1665         iterators are point-type iterators. <function>begin</function> and
1666         <function>end</function> are range-type methods, since they are not used to
1667         find a specific element, but rather to go over all elements in
1668         a container object; their returned iterators are range-type
1669         iterators.
1670         </para>
1671
1672         <para>Most containers store elements in an order that is
1673         determined by their interface. Correspondingly, it is fine that
1674         their point-type iterators are synonymous with their range-type
1675         iterators. For example, in the following snippet
1676         </para>
1677         <programlisting>
1678           std::for_each(c.find(1), c.find(5), foo);
1679         </programlisting>
1680         <para>
1681           two point-type iterators (returned by <function>find</function>) are used
1682           for a range-type purpose - going over all elements whose key is
1683           between 1 and 5.
1684         </para>
1685
1686         <para>
1687           Conversely, the above snippet makes no sense for
1688           self-organizing containers - ones that order (and reorder)
1689           their elements by implementation. It would be nice to have a
1690           uniform iterator system that would allow the above snippet to
1691           compile only if it made sense.
1692         </para>
1693
1694         <para>
1695           This could trivially be done by specializing
1696           <function>std::for_each</function> for the case of iterators returned by
1697           <classname>std::tr1::unordered_map</classname>, but this would only solve the
1698           problem for one algorithm and one container. Fundamentally, the
1699           problem is that one can loop using a self-organizing
1700           container's point-type iterators.
1701         </para>
1702
1703         <para>
1704           This library's containers define two families of
1705           iterators: <type>point_const_iterator</type> and
1706           <type>point_iterator</type> are the iterator types returned by
1707           point-type methods; <type>const_iterator</type> and
1708           <type>iterator</type> are the iterator types returned by range-type
1709           methods.
1710         </para>
1711         <programlisting>
1712           class &lt;- some container -&gt;
1713           {
1714           public:
1715           ...
1716
1717           typedef &lt;- something -&gt; const_iterator;
1718
1719           typedef &lt;- something -&gt; iterator;
1720
1721           typedef &lt;- something -&gt; point_const_iterator;
1722
1723           typedef &lt;- something -&gt; point_iterator;
1724
1725           ...
1726
1727           public:
1728           ...
1729
1730           const_iterator begin () const;
1731
1732           iterator begin();
1733
1734           point_const_iterator find(...) const;
1735
1736           point_iterator find(...);
1737           };
1738         </programlisting>
1739
1740         <para>For
1741         containers whose interface defines sequence order , it
1742         is very simple: point-type and range-type iterators are exactly
1743         the same, which means that the above snippet will compile if it
1744         is used for an order-preserving associative container.
1745         </para>
1746
1747         <para>
1748           For self-organizing containers, however, (hash-based
1749           containers as a special example), the preceding snippet will
1750           not compile, because their point-type iterators do not support
1751           <function>operator++</function>.
1752         </para>
1753
1754         <para>In any case, both for order-preserving and self-organizing
1755         containers, the following snippet will compile:
1756         </para>
1757         <programlisting>
1758           typename Cntnr::point_iterator it = c.find(2);
1759         </programlisting>
1760
1761         <para>
1762           because a range-type iterator can always be converted to a
1763           point-type iterator.
1764         </para>
1765
1766         <para>Distingushing between iterator types also
1767         raises the point that a container's iterators might have
1768         different invalidation rules concerning their de-referencing
1769         abilities and movement abilities. This now corresponds exactly
1770         to the question of whether point-type and range-type iterators
1771         are valid. As explained above, <classname>container_traits</classname> allows
1772         querying a container for its data structure attributes. The
1773         iterator-invalidation guarantees are certainly a property of
1774         the underlying data structure, and so
1775         </para>
1776         <programlisting>
1777           container_traits&lt;C&gt;::invalidation_guarantee
1778         </programlisting>
1779
1780         <para>
1781           gives one of three pre-determined types that answer this
1782           query.
1783         </para>
1784
1785       </section>
1786     </section> <!-- tutorial -->
1787
1788     <section xml:id="pbds.using.examples">
1789       <info><title>Examples</title></info>
1790       <para>
1791         Additional code examples are provided in the source
1792         distribution, as part of the regression and performance
1793         testsuite.
1794       </para>
1795
1796       <section xml:id="pbds.using.examples.basic">
1797         <info><title>Intermediate Use</title></info>
1798
1799         <itemizedlist>
1800           <listitem>
1801             <para>
1802               Basic use of maps:
1803               <filename>basic_map.cc</filename>
1804             </para>
1805           </listitem>
1806
1807           <listitem>
1808             <para>
1809               Basic use of sets:
1810               <filename>basic_set.cc</filename>
1811             </para>
1812           </listitem>
1813
1814           <listitem>
1815             <para>
1816               Conditionally erasing values from an associative container object:
1817               <filename>erase_if.cc</filename>
1818             </para>
1819           </listitem>
1820
1821           <listitem>
1822             <para>
1823               Basic use of multimaps:
1824               <filename>basic_multimap.cc</filename>
1825             </para>
1826           </listitem>
1827
1828           <listitem>
1829             <para>
1830               Basic use of multisets:
1831               <filename>basic_multiset.cc</filename>
1832             </para>
1833           </listitem>
1834
1835           <listitem>
1836             <para>
1837               Basic use of priority queues:
1838               <filename>basic_priority_queue.cc</filename>
1839             </para>
1840           </listitem>
1841
1842           <listitem>
1843             <para>
1844               Splitting and joining priority queues:
1845               <filename>priority_queue_split_join.cc</filename>
1846             </para>
1847           </listitem>
1848
1849           <listitem>
1850             <para>
1851               Conditionally erasing values from a priority queue:
1852               <filename>priority_queue_erase_if.cc</filename>
1853             </para>
1854           </listitem>
1855         </itemizedlist>
1856
1857       </section>
1858
1859       <section xml:id="pbds.using.examples.query">
1860         <info><title>Querying with <classname>container_traits</classname> </title></info>
1861         <itemizedlist>
1862           <listitem>
1863             <para>
1864               Using <classname>container_traits</classname> to query
1865               about underlying data structure behavior:
1866               <filename>assoc_container_traits.cc</filename>
1867             </para>
1868           </listitem>
1869
1870           <listitem>
1871             <para>
1872               A non-compiling example showing wrong use of finding keys in
1873               hash-based containers: <filename>hash_find_neg.cc</filename>
1874             </para>
1875           </listitem>
1876           <listitem>
1877             <para>
1878               Using <classname>container_traits</classname>
1879               to query about underlying data structure behavior:
1880               <filename>priority_queue_container_traits.cc</filename>
1881             </para>
1882           </listitem>
1883
1884         </itemizedlist>
1885
1886       </section>
1887
1888       <section xml:id="pbds.using.examples.container">
1889         <info><title>By Container Method</title></info>
1890         <para></para>
1891
1892         <section xml:id="pbds.using.examples.container.hash">
1893           <info><title>Hash-Based</title></info>
1894
1895           <section xml:id="pbds.using.examples.container.hash.resize">
1896             <info><title>size Related</title></info>
1897
1898             <itemizedlist>
1899               <listitem>
1900                 <para>
1901                   Setting the initial size of a hash-based container
1902                   object:
1903                   <filename>hash_initial_size.cc</filename>
1904                 </para>
1905               </listitem>
1906
1907               <listitem>
1908                 <para>
1909                   A non-compiling example showing how not to resize a
1910                   hash-based container object:
1911                   <filename>hash_resize_neg.cc</filename>
1912                 </para>
1913               </listitem>
1914
1915               <listitem>
1916                 <para>
1917                   Resizing the size of a hash-based container object:
1918                   <filename>hash_resize.cc</filename>
1919                 </para>
1920               </listitem>
1921
1922               <listitem>
1923                 <para>
1924                   Showing an illegal resize of a hash-based container
1925                   object:
1926                   <filename>hash_illegal_resize.cc</filename>
1927                 </para>
1928               </listitem>
1929
1930               <listitem>
1931                 <para>
1932                   Changing the load factors of a hash-based container
1933                   object: <filename>hash_load_set_change.cc</filename>
1934                 </para>
1935               </listitem>
1936             </itemizedlist>
1937           </section>
1938
1939           <section xml:id="pbds.using.examples.container.hash.hashor">
1940             <info><title>Hashing Function Related</title></info>
1941             <para></para>
1942
1943             <itemizedlist>
1944               <listitem>
1945                 <para>
1946                   Using a modulo range-hashing function for the case of an
1947                   unknown skewed key distribution:
1948                   <filename>hash_mod.cc</filename>
1949                 </para>
1950               </listitem>
1951
1952               <listitem>
1953                 <para>
1954                   Writing a range-hashing functor for the case of a known
1955                   skewed key distribution:
1956                   <filename>shift_mask.cc</filename>
1957                 </para>
1958               </listitem>
1959
1960               <listitem>
1961                 <para>
1962                   Storing the hash value along with each key:
1963                   <filename>store_hash.cc</filename>
1964                 </para>
1965               </listitem>
1966
1967               <listitem>
1968                 <para>
1969                   Writing a ranged-hash functor:
1970                   <filename>ranged_hash.cc</filename>
1971                 </para>
1972               </listitem>
1973             </itemizedlist>
1974
1975           </section>
1976
1977         </section>
1978
1979         <section xml:id="pbds.using.examples.container.branch">
1980           <info><title>Branch-Based</title></info>
1981
1982
1983           <section xml:id="pbds.using.examples.container.branch.split">
1984             <info><title>split or join Related</title></info>
1985
1986             <itemizedlist>
1987               <listitem>
1988                 <para>
1989                   Joining two tree-based container objects:
1990                   <filename>tree_join.cc</filename>
1991                 </para>
1992               </listitem>
1993
1994               <listitem>
1995                 <para>
1996                   Splitting a PATRICIA trie container object:
1997                   <filename>trie_split.cc</filename>
1998                 </para>
1999               </listitem>
2000
2001               <listitem>
2002                 <para>
2003                   Order statistics while joining two tree-based container
2004                   objects:
2005                   <filename>tree_order_statistics_join.cc</filename>
2006                 </para>
2007               </listitem>
2008             </itemizedlist>
2009
2010           </section>
2011
2012           <section xml:id="pbds.using.examples.container.branch.invariants">
2013             <info><title>Node Invariants</title></info>
2014
2015             <itemizedlist>
2016               <listitem>
2017                 <para>
2018                   Using trees for order statistics:
2019                   <filename>tree_order_statistics.cc</filename>
2020                 </para>
2021               </listitem>
2022
2023               <listitem>
2024                 <para>
2025                   Augmenting trees to support operations on line
2026                   intervals:
2027                   <filename>tree_intervals.cc</filename>
2028                 </para>
2029               </listitem>
2030             </itemizedlist>
2031
2032           </section>
2033
2034           <section xml:id="pbds.using.examples.container.branch.trie">
2035             <info><title>trie</title></info>
2036             <itemizedlist>
2037               <listitem>
2038                 <para>
2039                   Using a PATRICIA trie for DNA strings:
2040                   <filename>trie_dna.cc</filename>
2041                 </para>
2042               </listitem>
2043
2044               <listitem>
2045                 <para>
2046                   Using a PATRICIA
2047                   trie for finding all entries whose key matches a given prefix:
2048                   <filename>trie_prefix_search.cc</filename>
2049                 </para>
2050               </listitem>
2051             </itemizedlist>
2052
2053           </section>
2054
2055         </section>
2056
2057         <section xml:id="pbds.using.examples.container.priority_queue">
2058           <info><title>Priority Queues</title></info>
2059           <itemizedlist>
2060             <listitem>
2061               <para>
2062                 Cross referencing an associative container and a priority
2063                 queue: <filename>priority_queue_xref.cc</filename>
2064               </para>
2065             </listitem>
2066
2067             <listitem>
2068               <para>
2069                 Cross referencing a vector and a priority queue using a
2070                 very simple version of Dijkstra's shortest path
2071                 algorithm:
2072                 <filename>priority_queue_dijkstra.cc</filename>
2073               </para>
2074             </listitem>
2075           </itemizedlist>
2076
2077         </section>
2078
2079
2080       </section>
2081
2082     </section>
2083
2084   </section> <!-- using -->
2085
2086   <!-- S03: Design -->
2087
2088
2089 <section xml:id="containers.pbds.design">
2090   <info><title>Design</title></info>
2091   <?dbhtml filename="policy_data_structures_design.html"?>
2092   <para></para>
2093
2094   <section xml:id="pbds.design.concepts">
2095     <info><title>Concepts</title></info>
2096
2097     <section xml:id="pbds.design.concepts.null_type">
2098       <info><title>Null Policy Classes</title></info>
2099
2100       <para>
2101         Associative containers are typically parametrized by various
2102         policies. For example, a hash-based associative container is
2103         parametrized by a hash-functor, transforming each key into an
2104         non-negative numerical type. Each such value is then further mapped
2105         into a position within the table. The mapping of a key into a
2106         position within the table is therefore a two-step process.
2107       </para>
2108
2109       <para>
2110         In some cases, instantiations are redundant. For example, when the
2111         keys are integers, it is possible to use a redundant hash policy,
2112         which transforms each key into its value.
2113       </para>
2114
2115       <para>
2116         In some other cases, these policies are irrelevant.  For example, a
2117         hash-based associative container might transform keys into positions
2118         within a table by a different method than the two-step method
2119         described above. In such a case, the hash functor is simply
2120         irrelevant.
2121       </para>
2122
2123       <para>
2124         When a policy is either redundant or irrelevant, it can be replaced
2125         by <classname>null_type</classname>.
2126       </para>
2127
2128       <para>
2129         For example, a <emphasis>set</emphasis> is an associative
2130         container with one of its template parameters (the one for the
2131         mapped type) replaced with <classname>null_type</classname>. Other
2132         places simplifications are made possible with this technique
2133         include node updates in tree and trie data structures, and hash
2134         and probe functions for hash data structures.
2135       </para>
2136     </section>
2137
2138     <section xml:id="pbds.design.concepts.associative_semantics">
2139       <info><title>Map and Set Semantics</title></info>
2140
2141       <section xml:id="concepts.associative_semantics.set_vs_map">
2142         <info>
2143           <title>
2144             Distinguishing Between Maps and Sets
2145           </title>
2146         </info>
2147
2148         <para>
2149           Anyone familiar with the standard knows that there are four kinds
2150           of associative containers: maps, sets, multimaps, and
2151           multisets. The map datatype associates each key to
2152           some data.
2153         </para>
2154
2155         <para>
2156           Sets are associative containers that simply store keys -
2157           they do not map them to anything. In the standard, each map class
2158           has a corresponding set class. E.g.,
2159           <classname>std::map&lt;int, char&gt;</classname> maps each
2160           <classname>int</classname> to a <classname>char</classname>, but
2161           <classname>std::set&lt;int, char&gt;</classname> simply stores
2162           <classname>int</classname>s. In this library, however, there are no
2163           distinct classes for maps and sets. Instead, an associative
2164           container's <classname>Mapped</classname> template parameter is a policy: if
2165           it is instantiated by <classname>null_type</classname>, then it
2166           is a "set"; otherwise, it is a "map". E.g.,
2167         </para>
2168         <programlisting>
2169           cc_hash_table&lt;int, char&gt;
2170         </programlisting>
2171         <para>
2172           is a "map" mapping each <type>int</type> value to a <type>
2173           char</type>, but
2174         </para>
2175         <programlisting>
2176           cc_hash_table&lt;int, null_type&gt;
2177         </programlisting>
2178         <para>
2179           is a type that uniquely stores <type>int</type> values.
2180         </para>
2181         <para>Once the <classname>Mapped</classname> template parameter is instantiated
2182         by <classname>null_type</classname>, then
2183         the "set" acts very similarly to the standard's sets - it does not
2184         map each key to a distinct <classname>null_type</classname> object. Also,
2185         , the container's <type>value_type</type> is essentially
2186         its <type>key_type</type> - just as with the standard's sets
2187         .</para>
2188
2189         <para>
2190           The standard's multimaps and multisets allow, respectively,
2191           non-uniquely mapping keys and non-uniquely storing keys. As
2192           discussed, the
2193           reasons why this might be necessary are 1) that a key might be
2194           decomposed into a primary key and a secondary key, 2) that a
2195           key might appear more than once, or 3) any arbitrary
2196           combination of 1)s and 2)s. Correspondingly,
2197           one should use 1) "maps" mapping primary keys to secondary
2198           keys, 2) "maps" mapping keys to size types, or 3) any arbitrary
2199           combination of 1)s and 2)s. Thus, for example, an
2200           <classname>std::multiset&lt;int&gt;</classname> might be used to store
2201           multiple instances of integers, but using this library's
2202           containers, one might use
2203         </para>
2204         <programlisting>
2205           tree&lt;int, size_t&gt;
2206         </programlisting>
2207
2208         <para>
2209           i.e., a <classname>map</classname> of <type>int</type>s to
2210           <type>size_t</type>s.
2211         </para>
2212         <para>
2213           These "multimaps" and "multisets" might be confusing to
2214           anyone familiar with the standard's <classname>std::multimap</classname> and
2215           <classname>std::multiset</classname>, because there is no clear
2216           correspondence between the two. For example, in some cases
2217           where one uses <classname>std::multiset</classname> in the standard, one might use
2218           in this library a "multimap" of "multisets" - i.e., a
2219           container that maps primary keys each to an associative
2220           container that maps each secondary key to the number of times
2221           it occurs.
2222         </para>
2223
2224         <para>
2225           When one uses a "multimap," one should choose with care the
2226           type of container used for secondary keys.
2227         </para>
2228       </section> <!-- map vs set -->
2229
2230
2231       <section xml:id="concepts.associative_semantics.multi">
2232         <info><title>Alternatives to <classname>std::multiset</classname> and <classname>std::multimap</classname></title></info>
2233
2234         <para>
2235           Brace onself: this library does not contain containers like
2236           <classname>std::multimap</classname> or
2237           <classname>std::multiset</classname>. Instead, these data
2238           structures can be synthesized via manipulation of the
2239           <classname>Mapped</classname> template parameter.
2240         </para>
2241         <para>
2242           One maps the unique part of a key - the primary key, into an
2243           associative-container of the (originally) non-unique parts of
2244           the key - the secondary key. A primary associative-container
2245           is an associative container of primary keys; a secondary
2246           associative-container is an associative container of
2247           secondary keys.
2248         </para>
2249
2250         <para>
2251           Stepping back a bit, and starting in from the beginning.
2252         </para>
2253
2254
2255         <para>
2256           Maps (or sets) allow mapping (or storing) unique-key values.
2257           The standard library also supplies associative containers which
2258           map (or store) multiple values with equivalent keys:
2259           <classname>std::multimap</classname>, <classname>std::multiset</classname>,
2260           <classname>std::tr1::unordered_multimap</classname>, and
2261           <classname>unordered_multiset</classname>. We first discuss how these might
2262           be used, then why we think it is best to avoid them.
2263         </para>
2264
2265         <para>
2266           Suppose one builds a simple bank-account application that
2267           records for each client (identified by an <classname>std::string</classname>)
2268           and account-id (marked by an <type>unsigned long</type>) -
2269           the balance in the account (described by a
2270           <type>float</type>). Suppose further that ordering this
2271           information is not useful, so a hash-based container is
2272           preferable to a tree based container. Then one can use
2273         </para>
2274
2275         <programlisting>
2276           std::tr1::unordered_map&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2277         </programlisting>
2278
2279         <para>
2280           which hashes every combination of client and account-id. This
2281           might work well, except for the fact that it is now impossible
2282           to efficiently list all of the accounts of a specific client
2283           (this would practically require iterating over all
2284           entries). Instead, one can use
2285         </para>
2286
2287         <programlisting>
2288           std::tr1::unordered_multimap&lt;std::pair&lt;std::string, unsigned long&gt;, float, ...&gt;
2289         </programlisting>
2290
2291         <para>
2292           which hashes every client, and decides equivalence based on
2293           client only. This will ensure that all accounts belonging to a
2294           specific user are stored consecutively.
2295         </para>
2296
2297         <para>
2298           Also, suppose one wants an integers' priority queue
2299           (a container that supports <function>push</function>,
2300           <function>pop</function>, and <function>top</function> operations, the last of which
2301           returns the largest <type>int</type>) that also supports
2302           operations such as <function>find</function> and <function>lower_bound</function>. A
2303           reasonable solution is to build an adapter over
2304           <classname>std::set&lt;int&gt;</classname>. In this adapter,
2305           <function>push</function> will just call the tree-based
2306           associative container's <function>insert</function> method; <function>pop</function>
2307           will call its <function>end</function> method, and use it to return the
2308           preceding element (which must be the largest). Then this might
2309           work well, except that the container object cannot hold
2310           multiple instances of the same integer (<function>push(4)</function>,
2311           will be a no-op if <constant>4</constant> is already in the
2312           container object). If multiple keys are necessary, then one
2313           might build the adapter over an
2314           <classname>std::multiset&lt;int&gt;</classname>.
2315         </para>
2316
2317         <para>
2318           The standard library's non-unique-mapping containers are useful
2319           when (1) a key can be decomposed in to a primary key and a
2320           secondary key, (2) a key is needed multiple times, or (3) any
2321           combination of (1) and (2).
2322         </para>
2323
2324         <para>
2325           The graphic below shows how the standard library's container
2326           design works internally; in this figure nodes shaded equally
2327           represent equivalent-key values. Equivalent keys are stored
2328           consecutively using the properties of the underlying data
2329           structure: binary search trees (label A) store equivalent-key
2330           values consecutively (in the sense of an in-order walk)
2331           naturally; collision-chaining hash tables (label B) store
2332           equivalent-key values in the same bucket, the bucket can be
2333           arranged so that equivalent-key values are consecutive.
2334         </para>
2335
2336         <figure>
2337           <title>Non-unique Mapping Standard Containers</title>
2338           <mediaobject>
2339             <imageobject>
2340               <imagedata align="center" format="PNG" scale="100"
2341                          fileref="../images/pbds_embedded_lists_1.png"/>
2342             </imageobject>
2343             <textobject>
2344               <phrase>Non-unique Mapping Standard Containers</phrase>
2345             </textobject>
2346           </mediaobject>
2347         </figure>
2348
2349         <para>
2350           Put differently, the standards' non-unique mapping
2351           associative-containers are associative containers that map
2352           primary keys to linked lists that are embedded into the
2353           container. The graphic below shows again the two
2354           containers from the first graphic above, this time with
2355           the embedded linked lists of the grayed nodes marked
2356           explicitly.
2357         </para>
2358
2359         <figure xml:id="fig.pbds_embedded_lists_2">
2360           <title>
2361             Effect of embedded lists in
2362             <classname>std::multimap</classname>
2363           </title>
2364           <mediaobject>
2365             <imageobject>
2366               <imagedata align="center" format="PNG" scale="100"
2367                          fileref="../images/pbds_embedded_lists_2.png"/>
2368             </imageobject>
2369             <textobject>
2370               <phrase>
2371                 Effect of embedded lists in
2372                 <classname>std::multimap</classname>
2373               </phrase>
2374             </textobject>
2375           </mediaobject>
2376         </figure>
2377
2378         <para>
2379           These embedded linked lists have several disadvantages.
2380         </para>
2381
2382         <orderedlist>
2383           <listitem>
2384             <para>
2385               The underlying data structure embeds the linked lists
2386               according to its own consideration, which means that the
2387               search path for a value might include several different
2388               equivalent-key values. For example, the search path for the
2389               the black node in either of the first graphic, labels A or B,
2390               includes more than a single gray node.
2391             </para>
2392           </listitem>
2393
2394           <listitem>
2395             <para>
2396               The links of the linked lists are the underlying data
2397               structures' nodes, which typically are quite structured.  In
2398               the case of tree-based containers (the grapic above, label
2399               B), each "link" is actually a node with three pointers (one
2400               to a parent and two to children), and a
2401               relatively-complicated iteration algorithm. The linked
2402               lists, therefore, can take up quite a lot of memory, and
2403               iterating over all values equal to a given key (through the
2404               return value of the standard
2405               library's <function>equal_range</function>) can be
2406               expensive.
2407             </para>
2408           </listitem>
2409
2410           <listitem>
2411             <para>
2412               The primary key is stored multiply; this uses more memory.
2413             </para>
2414           </listitem>
2415
2416           <listitem>
2417             <para>
2418               Finally, the interface of this design excludes several
2419               useful underlying data structures. Of all the unordered
2420               self-organizing data structures, practically only
2421               collision-chaining hash tables can (efficiently) guarantee
2422               that equivalent-key values are stored consecutively.
2423             </para>
2424           </listitem>
2425         </orderedlist>
2426
2427         <para>
2428           The above reasons hold even when the ratio of secondary keys to
2429           primary keys (or average number of identical keys) is small, but
2430           when it is large, there are more severe problems:
2431         </para>
2432
2433         <orderedlist>
2434           <listitem>
2435             <para>
2436               The underlying data structures order the links inside each
2437               embedded linked-lists according to their internal
2438               considerations, which effectively means that each of the
2439               links is unordered. Irrespective of the underlying data
2440               structure, searching for a specific value can degrade to
2441               linear complexity.
2442             </para>
2443           </listitem>
2444
2445           <listitem>
2446             <para>
2447               Similarly to the above point, it is impossible to apply
2448               to the secondary keys considerations that apply to primary
2449               keys. For example, it is not possible to maintain secondary
2450               keys by sorted order.
2451             </para>
2452           </listitem>
2453
2454           <listitem>
2455             <para>
2456               While the interface "understands" that all equivalent-key
2457               values constitute a distinct list (through
2458               <function>equal_range</function>), the underlying data
2459               structure typically does not. This means that operations such
2460               as erasing from a tree-based container all values whose keys
2461               are equivalent to a a given key can be super-linear in the
2462               size of the tree; this is also true also for several other
2463               operations that target a specific list.
2464             </para>
2465           </listitem>
2466
2467         </orderedlist>
2468
2469         <para>
2470           In this library, all associative containers map
2471           (or store) unique-key values. One can (1) map primary keys to
2472           secondary associative-containers (containers of
2473           secondary keys) or non-associative containers (2) map identical
2474           keys to a size-type representing the number of times they
2475           occur, or (3) any combination of (1) and (2). Instead of
2476           allowing multiple equivalent-key values, this library
2477           supplies associative containers based on underlying
2478           data structures that are suitable as secondary
2479           associative-containers.
2480         </para>
2481
2482         <para>
2483           In the figure below, labels A and B show the equivalent
2484           underlying data structures in this library, as mapped to the
2485           first graphic above. Labels A and B, respectively. Each shaded
2486           box represents some size-type or secondary
2487           associative-container.
2488         </para>
2489
2490         <figure>
2491           <title>Non-unique Mapping Containers</title>
2492           <mediaobject>
2493             <imageobject>
2494               <imagedata align="center" format="PNG" scale="100"
2495                          fileref="../images/pbds_embedded_lists_3.png"/>
2496             </imageobject>
2497             <textobject>
2498               <phrase>Non-unique Mapping Containers</phrase>
2499             </textobject>
2500           </mediaobject>
2501         </figure>
2502
2503         <para>
2504           In the first example above, then, one would use an associative
2505           container mapping each user to an associative container which
2506           maps each application id to a start time (see
2507           <filename>example/basic_multimap.cc</filename>); in the second
2508           example, one would use an associative container mapping
2509           each <classname>int</classname> to some size-type indicating the
2510           number of times it logically occurs
2511           (see <filename>example/basic_multiset.cc</filename>.
2512         </para>
2513
2514         <para>
2515           See the discussion in list-based container types for containers
2516           especially suited as secondary associative-containers.
2517         </para>
2518       </section>
2519
2520     </section> <!-- map and set semantics -->
2521
2522     <section xml:id="pbds.design.concepts.iterator_semantics">
2523       <info><title>Iterator Semantics</title></info>
2524
2525       <section xml:id="concepts.iterator_semantics.point_and_range">
2526         <info><title>Point and Range Iterators</title></info>
2527
2528         <para>
2529           Iterator concepts are bifurcated in this design, and are
2530           comprised of point-type and range-type iteration.
2531         </para>
2532
2533         <para>
2534           A point-type iterator is an iterator that refers to a specific
2535           element as returned through an
2536           associative-container's <function>find</function> method.
2537         </para>
2538
2539         <para>
2540           A range-type iterator is an iterator that is used to go over a
2541           sequence of elements, as returned by a container's
2542           <function>find</function> method.
2543         </para>
2544
2545         <para>
2546           A point-type method is a method that
2547           returns a point-type iterator; a range-type method is a method
2548           that returns a range-type iterator.
2549         </para>
2550
2551         <para>For most containers, these types are synonymous; for
2552         self-organizing containers, such as hash-based containers or
2553         priority queues, these are inherently different (in any
2554         implementation, including that of C++ standard library
2555         components), but in this design, it is made explicit. They are
2556         distinct types.
2557         </para>
2558       </section>
2559
2560
2561       <section xml:id="concepts.iterator_semantics.both">
2562         <info><title>Distinguishing Point and Range Iterators</title></info>
2563
2564         <para>When using this library, is necessary to differentiate
2565         between two types of methods and iterators: point-type methods and
2566         iterators, and range-type methods and iterators. Each associative
2567         container's interface includes the methods:</para>
2568         <programlisting>
2569           point_const_iterator
2570           find(const_key_reference r_key) const;
2571
2572           point_iterator
2573           find(const_key_reference r_key);
2574
2575           std::pair&lt;point_iterator,bool&gt;
2576           insert(const_reference r_val);
2577         </programlisting>
2578
2579         <para>The relationship between these iterator types varies between
2580         container types. The figure below
2581         shows the most general invariant between point-type and
2582         range-type iterators: In <emphasis>A</emphasis> <literal>iterator</literal>, can
2583         always be converted to <literal>point_iterator</literal>. In <emphasis>B</emphasis>
2584         shows invariants for order-preserving containers: point-type
2585         iterators are synonymous with range-type iterators.
2586         Orthogonally,  <emphasis>C</emphasis>shows invariants for "set"
2587         containers: iterators are synonymous with const iterators.</para>
2588
2589         <figure>
2590           <title>Point Iterator Hierarchy</title>
2591           <mediaobject>
2592             <imageobject>
2593               <imagedata align="center" format="PNG" scale="100"
2594                          fileref="../images/pbds_point_iterator_hierarchy.png"/>
2595             </imageobject>
2596             <textobject>
2597               <phrase>Point Iterator Hierarchy</phrase>
2598             </textobject>
2599           </mediaobject>
2600         </figure>
2601
2602
2603         <para>Note that point-type iterators in self-organizing containers
2604         (hash-based associative containers) lack movement
2605         operators, such as <literal>operator++</literal> - in fact, this
2606         is the reason why this library differentiates from the standard C++ librarys
2607         design on this point.</para>
2608
2609         <para>Typically, one can determine an iterator's movement
2610         capabilities using
2611         <literal>std::iterator_traits&lt;It&gt;iterator_category</literal>,
2612         which is a <literal>struct</literal> indicating the iterator's
2613         movement capabilities. Unfortunately, none of the standard predefined
2614         categories reflect a pointer's <emphasis>not</emphasis> having any
2615         movement capabilities whatsoever. Consequently,
2616         <literal>pb_ds</literal> adds a type
2617         <literal>trivial_iterator_tag</literal> (whose name is taken from
2618         a concept in C++ standardese, which is the category of iterators
2619         with no movement capabilities.) All other standard C++ library
2620         tags, such as <literal>forward_iterator_tag</literal> retain their
2621         common use.</para>
2622
2623       </section>
2624
2625       <section xml:id="pbds.design.concepts.invalidation">
2626         <info><title>Invalidation Guarantees</title></info>
2627         <para>
2628           If one manipulates a container object, then iterators previously
2629           obtained from it can be invalidated. In some cases a
2630           previously-obtained iterator cannot be de-referenced; in other cases,
2631           the iterator's next or previous element might have changed
2632           unpredictably. This corresponds exactly to the question whether a
2633           point-type or range-type iterator (see previous concept) is valid or
2634           not. In this design, one can query a container (in compile time) about
2635           its invalidation guarantees.
2636         </para>
2637
2638
2639         <para>
2640           Given three different types of associative containers, a modifying
2641           operation (in that example, <function>erase</function>) invalidated
2642           iterators in three different ways: the iterator of one container
2643           remained completely valid - it could be de-referenced and
2644           incremented; the iterator of a different container could not even be
2645           de-referenced; the iterator of the third container could be
2646           de-referenced, but its "next" iterator changed unpredictably.
2647         </para>
2648
2649         <para>
2650           Distinguishing between find and range types allows fine-grained
2651           invalidation guarantees, because these questions correspond exactly
2652           to the question of whether point-type iterators and range-type
2653           iterators are valid. The graphic below shows tags corresponding to
2654           different types of invalidation guarantees.
2655         </para>
2656
2657         <figure>
2658           <title>Invalidation Guarantee Tags Hierarchy</title>
2659           <mediaobject>
2660             <imageobject>
2661               <imagedata align="center" format="PDF" scale="75"
2662                          fileref="../images/pbds_invalidation_tag_hierarchy.pdf"/>
2663             </imageobject>
2664             <imageobject>
2665               <imagedata align="center" format="PNG" scale="100"
2666                          fileref="../images/pbds_invalidation_tag_hierarchy.png"/>
2667             </imageobject>
2668             <textobject>
2669               <phrase>Invalidation Guarantee Tags Hierarchy</phrase>
2670             </textobject>
2671           </mediaobject>
2672         </figure>
2673
2674         <itemizedlist>
2675           <listitem>
2676             <para>
2677               <classname>basic_invalidation_guarantee</classname>
2678               corresponds to a basic guarantee that a point-type iterator,
2679               a found pointer, or a found reference, remains valid as long
2680               as the container object is not modified.
2681             </para>
2682           </listitem>
2683
2684           <listitem>
2685             <para>
2686               <classname>point_invalidation_guarantee</classname>
2687               corresponds to a guarantee that a point-type iterator, a
2688               found pointer, or a found reference, remains valid even if
2689               the container object is modified.
2690             </para>
2691           </listitem>
2692
2693           <listitem>
2694             <para>
2695               <classname>range_invalidation_guarantee</classname>
2696               corresponds to a guarantee that a range-type iterator remains
2697               valid even if the container object is modified.
2698             </para>
2699           </listitem>
2700         </itemizedlist>
2701
2702         <para>To find the invalidation guarantee of a
2703         container, one can use</para>
2704         <programlisting>
2705           typename container_traits&lt;Cntnr&gt;::invalidation_guarantee
2706         </programlisting>
2707
2708         <para>Note that this hierarchy corresponds to the logic it
2709         represents: if a container has range-invalidation guarantees,
2710         then it must also have find invalidation guarantees;
2711         correspondingly, its invalidation guarantee (in this case
2712         <classname>range_invalidation_guarantee</classname>)
2713         can be cast to its base class (in this case <classname>point_invalidation_guarantee</classname>).
2714         This means that this this hierarchy can be used easily using
2715         standard metaprogramming techniques, by specializing on the
2716         type of <literal>invalidation_guarantee</literal>.</para>
2717
2718         <para>
2719           These types of problems were addressed, in a more general
2720           setting, in <xref linkend="biblio.meyers96more"/> - Item 2. In
2721           our opinion, an invalidation-guarantee hierarchy would solve
2722           these problems in all container types - not just associative
2723           containers.
2724         </para>
2725
2726       </section>
2727     </section> <!-- iterator semantics -->
2728
2729     <section xml:id="pbds.design.concepts.genericity">
2730       <info><title>Genericity</title></info>
2731
2732       <para>
2733         The design attempts to address the following problem of
2734         data-structure genericity. When writing a function manipulating
2735         a generic container object, what is the behavior of the object?
2736         Suppose one writes
2737       </para>
2738       <programlisting>
2739         template&lt;typename Cntnr&gt;
2740         void
2741         some_op_sequence(Cntnr &amp;r_container)
2742         {
2743         ...
2744         }
2745       </programlisting>
2746
2747       <para>
2748         then one needs to address the following questions in the body
2749         of <function>some_op_sequence</function>:
2750       </para>
2751
2752       <itemizedlist>
2753         <listitem>
2754           <para>
2755             Which types and methods does <literal>Cntnr</literal> support?
2756             Containers based on hash tables can be queries for the
2757             hash-functor type and object; this is meaningless for tree-based
2758             containers. Containers based on trees can be split, joined, or
2759             can erase iterators and return the following iterator; this
2760             cannot be done by hash-based containers.
2761           </para>
2762         </listitem>
2763
2764         <listitem>
2765           <para>
2766             What are the exception and invalidation guarantees
2767             of <literal>Cntnr</literal>? A container based on a probing
2768             hash-table invalidates all iterators when it is modified; this
2769             is not the case for containers based on node-based
2770             trees. Containers based on a node-based tree can be split or
2771             joined without exceptions; this is not the case for containers
2772             based on vector-based trees.
2773           </para>
2774         </listitem>
2775
2776         <listitem>
2777           <para>
2778             How does the container maintain its elements? Tree-based and
2779             Trie-based containers store elements by key order; others,
2780             typically, do not. A container based on a splay trees or lists
2781             with update policies "cache" "frequently accessed" elements;
2782             containers based on most other underlying data structures do
2783             not.
2784           </para>
2785         </listitem>
2786         <listitem>
2787           <para>
2788             How does one query a container about characteristics and
2789             capabilities? What is the relationship between two different
2790             data structures, if anything?
2791           </para>
2792         </listitem>
2793       </itemizedlist>
2794
2795       <para>The remainder of this section explains these issues in
2796       detail.</para>
2797
2798
2799       <section xml:id="concepts.genericity.tag">
2800         <info><title>Tag</title></info>
2801         <para>
2802           Tags are very useful for manipulating generic types. For example, if
2803           <literal>It</literal> is an iterator class, then <literal>typename
2804           It::iterator_category</literal> or <literal>typename
2805           std::iterator_traits&lt;It&gt;::iterator_category</literal> will
2806           yield its category, and <literal>typename
2807           std::iterator_traits&lt;It&gt;::value_type</literal> will yield its
2808           value type.
2809         </para>
2810
2811         <para>
2812           This library contains a container tag hierarchy corresponding to the
2813           diagram below.
2814         </para>
2815
2816         <figure>
2817           <title>Container Tag Hierarchy</title>
2818           <mediaobject>
2819             <imageobject>
2820               <imagedata align="center" format="PDF" scale="75"
2821                          fileref="../images/pbds_container_tag_hierarchy.pdf"/>
2822             </imageobject>
2823             <imageobject>
2824               <imagedata align="center" format="PNG" scale="100"
2825                          fileref="../images/pbds_container_tag_hierarchy.png"/>
2826             </imageobject>
2827             <textobject>
2828               <phrase>Container Tag Hierarchy</phrase>
2829             </textobject>
2830           </mediaobject>
2831         </figure>
2832
2833         <para>
2834           Given any container <type>Cntnr</type>, the tag of
2835           the underlying data structure can be found via <literal>typename
2836           Cntnr::container_category</literal>.
2837         </para>
2838
2839       </section> <!-- tag -->
2840
2841       <section xml:id="concepts.genericity.traits">
2842         <info><title>Traits</title></info>
2843         <para></para>
2844
2845         <para>Additionally, a traits mechanism can be used to query a
2846         container type for its attributes. Given any container
2847         <literal>Cntnr</literal>, then <literal>&lt;Cntnr&gt;</literal>
2848         is a traits class identifying the properties of the
2849         container.</para>
2850
2851         <para>To find if a container can throw when a key is erased (which
2852         is true for vector-based trees, for example), one can
2853         use
2854         </para>
2855         <programlisting>container_traits&lt;Cntnr&gt;::erase_can_throw</programlisting>
2856
2857         <para>
2858           Some of the definitions in <classname>container_traits</classname>
2859           are dependent on other
2860           definitions. If <classname>container_traits&lt;Cntnr&gt;::order_preserving</classname>
2861           is <constant>true</constant> (which is the case for containers
2862           based on trees and tries), then the container can be split or
2863           joined; in this
2864           case, <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2865           indicates whether splits or joins can throw exceptions (which is
2866           true for vector-based trees);
2867           otherwise <classname>container_traits&lt;Cntnr&gt;::split_join_can_throw</classname>
2868           will yield a compilation error. (This is somewhat similar to a
2869           compile-time version of the COM model).
2870         </para>
2871
2872       </section> <!-- traits -->
2873
2874     </section> <!-- genericity -->
2875   </section> <!-- concepts -->
2876
2877   <section xml:id="pbds.design.container">
2878     <info><title>By Container</title></info>
2879
2880     <!-- hash -->
2881     <section xml:id="pbds.design.container.hash">
2882       <info><title>hash</title></info>
2883
2884       <!--
2885
2886 // hash policies
2887 /// general terms / background
2888 /// range hashing policies
2889 /// ranged-hash policies
2890 /// implementation
2891
2892 // resize policies
2893 /// general
2894 /// size policies
2895 /// trigger policies
2896 /// implementation
2897
2898 // policy interactions
2899 /// probe/size/trigger
2900 /// hash/trigger
2901 /// eq/hash/storing hash values
2902 /// size/load-check trigger
2903       -->
2904       <section xml:id="container.hash.interface">
2905         <info><title>Interface</title></info>
2906
2907
2908
2909         <para>
2910           The collision-chaining hash-based container has the
2911         following declaration.</para>
2912         <programlisting>
2913           template&lt;
2914           typename Key,
2915           typename Mapped,
2916           typename Hash_Fn = std::hash&lt;Key&gt;,
2917           typename Eq_Fn = std::equal_to&lt;Key&gt;,
2918           typename Comb_Hash_Fn =  direct_mask_range_hashing&lt;&gt;
2919           typename Resize_Policy = default explained below.
2920           bool Store_Hash = false,
2921           typename Allocator = std::allocator&lt;char&gt; &gt;
2922           class cc_hash_table;
2923         </programlisting>
2924
2925         <para>The parameters have the following meaning:</para>
2926
2927         <orderedlist>
2928           <listitem><para><classname>Key</classname> is the key type.</para></listitem>
2929
2930           <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
2931
2932           <listitem><para><classname>Hash_Fn</classname> is a key hashing functor.</para></listitem>
2933
2934           <listitem><para><classname>Eq_Fn</classname> is a key equivalence functor.</para></listitem>
2935
2936           <listitem><para><classname>Comb_Hash_Fn</classname> is a range-hashing_functor;
2937           it describes how to translate hash values into positions
2938           within the table. </para></listitem>
2939
2940           <listitem><para><classname>Resize_Policy</classname> describes how a container object
2941           should change its internal size. </para></listitem>
2942
2943           <listitem><para><classname>Store_Hash</classname> indicates whether the hash value
2944           should be stored with each entry. </para></listitem>
2945
2946           <listitem><para><classname>Allocator</classname> is an allocator
2947           type.</para></listitem>
2948         </orderedlist>
2949
2950         <para>The probing hash-based container has the following
2951         declaration.</para>
2952         <programlisting>
2953           template&lt;
2954           typename Key,
2955           typename Mapped,
2956           typename Hash_Fn = std::hash&lt;Key&gt;,
2957           typename Eq_Fn = std::equal_to&lt;Key&gt;,
2958           typename Comb_Probe_Fn = direct_mask_range_hashing&lt;&gt;
2959           typename Probe_Fn = default explained below.
2960           typename Resize_Policy = default explained below.
2961           bool Store_Hash = false,
2962           typename Allocator =  std::allocator&lt;char&gt; &gt;
2963           class gp_hash_table;
2964         </programlisting>
2965
2966         <para>The parameters are identical to those of the
2967         collision-chaining container, except for the following.</para>
2968
2969         <orderedlist>
2970           <listitem><para><classname>Comb_Probe_Fn</classname> describes how to transform a probe
2971           sequence into a sequence of positions within the table.</para></listitem>
2972
2973           <listitem><para><classname>Probe_Fn</classname> describes a probe sequence policy.</para></listitem>
2974         </orderedlist>
2975
2976         <para>Some of the default template values depend on the values of
2977         other parameters, and are explained below.</para>
2978
2979       </section>
2980       <section xml:id="container.hash.details">
2981         <info><title>Details</title></info>
2982
2983         <section xml:id="container.hash.details.hash_policies">
2984           <info><title>Hash Policies</title></info>
2985
2986           <section xml:id="details.hash_policies.general">
2987             <info><title>General</title></info>
2988
2989             <para>Following is an explanation of some functions which hashing
2990             involves. The graphic below illustrates the discussion.</para>
2991
2992             <figure>
2993               <title>Hash functions, ranged-hash functions, and
2994               range-hashing functions</title>
2995               <mediaobject>
2996                 <imageobject>
2997                   <imagedata align="center" format="PNG" scale="100"
2998                              fileref="../images/pbds_hash_ranged_hash_range_hashing_fns.png"/>
2999                 </imageobject>
3000                 <textobject>
3001                   <phrase>Hash functions, ranged-hash functions, and
3002                   range-hashing functions</phrase>
3003                 </textobject>
3004               </mediaobject>
3005             </figure>
3006
3007             <para>Let U be a domain (e.g., the integers, or the
3008             strings of 3 characters). A hash-table algorithm needs to map
3009             elements of U "uniformly" into the range [0,..., m -
3010             1] (where m is a non-negative integral value, and
3011             is, in general, time varying). I.e., the algorithm needs
3012             a ranged-hash function</para>
3013
3014             <para>
3015               f : U × Z<subscript>+</subscript> → Z<subscript>+</subscript>
3016             </para>
3017
3018             <para>such that for any u in U ,</para>
3019
3020             <para>0 ≤ f(u, m) ≤ m - 1</para>
3021
3022             <para>and which has "good uniformity" properties (say
3023             <xref linkend="biblio.knuth98sorting"/>.)
3024             One
3025             common solution is to use the composition of the hash
3026             function</para>
3027
3028             <para>h : U → Z<subscript>+</subscript> ,</para>
3029
3030             <para>which maps elements of U into the non-negative
3031             integrals, and</para>
3032
3033             <para>g : Z<subscript>+</subscript> × Z<subscript>+</subscript> →
3034             Z<subscript>+</subscript>,</para>
3035
3036             <para>which maps a non-negative hash value, and a non-negative
3037             range upper-bound into a non-negative integral in the range
3038             between 0 (inclusive) and the range upper bound (exclusive),
3039             i.e., for any r in Z<subscript>+</subscript>,</para>
3040
3041             <para>0 ≤ g(r, m) ≤ m - 1</para>
3042
3043
3044             <para>The resulting ranged-hash function, is</para>
3045
3046             <!-- ranged_hash_composed_of_hash_and_range_hashing -->
3047             <equation>
3048               <title>Ranged Hash Function</title>
3049               <mathphrase>
3050                 f(u , m) = g(h(u), m)
3051               </mathphrase>
3052             </equation>
3053
3054             <para>From the above, it is obvious that given g and
3055             h, f can always be composed (however the converse
3056             is not true). The standard's hash-based containers allow specifying
3057             a hash function, and use a hard-wired range-hashing function;
3058             the ranged-hash function is implicitly composed.</para>
3059
3060             <para>The above describes the case where a key is to be mapped
3061             into a single position within a hash table, e.g.,
3062             in a collision-chaining table. In other cases, a key is to be
3063             mapped into a sequence of positions within a table,
3064             e.g., in a probing table. Similar terms apply in this
3065             case: the table requires a ranged probe function,
3066             mapping a key into a sequence of positions withing the table.
3067             This is typically achieved by composing a hash function
3068             mapping the key into a non-negative integral type, a
3069             probe function transforming the hash value into a
3070             sequence of hash values, and a range-hashing function
3071             transforming the sequence of hash values into a sequence of
3072             positions.</para>
3073
3074           </section>
3075
3076           <section xml:id="details.hash_policies.range">
3077             <info><title>Range Hashing</title></info>
3078
3079             <para>Some common choices for range-hashing functions are the
3080             division, multiplication, and middle-square methods (<xref linkend="biblio.knuth98sorting"/>), defined
3081             as</para>
3082
3083             <equation>
3084               <title>Range-Hashing, Division Method</title>
3085               <mathphrase>
3086                 g(r, m) = r mod m
3087               </mathphrase>
3088             </equation>
3089
3090
3091
3092             <para>g(r, m) = ⌈ u/v ( a r mod v ) ⌉</para>
3093
3094             <para>and</para>
3095
3096             <para>g(r, m) = ⌈ u/v ( r<superscript>2</superscript> mod v ) ⌉</para>
3097
3098             <para>respectively, for some positive integrals u and
3099             v (typically powers of 2), and some a. Each of
3100             these range-hashing functions works best for some different
3101             setting.</para>
3102
3103             <para>The division method (see above) is a
3104             very common choice. However, even this single method can be
3105             implemented in two very different ways. It is possible to
3106             implement using the low
3107             level % (modulo) operation (for any m), or the
3108             low level &amp; (bit-mask) operation (for the case where
3109             m is a power of 2), i.e.,</para>
3110
3111             <equation>
3112               <title>Division via Prime Modulo</title>
3113               <mathphrase>
3114                 g(r, m) = r % m
3115               </mathphrase>
3116             </equation>
3117
3118             <para>and</para>
3119
3120             <equation>
3121               <title>Division via Bit Mask</title>
3122               <mathphrase>
3123                 g(r, m) = r &amp; m - 1, (with m =
3124                 2<superscript>k</superscript> for some k)
3125               </mathphrase>
3126             </equation>
3127
3128
3129             <para>respectively.</para>
3130
3131             <para>The % (modulo) implementation has the advantage that for
3132             m a prime far from a power of 2, g(r, m) is
3133             affected by all the bits of r (minimizing the chance of
3134             collision). It has the disadvantage of using the costly modulo
3135             operation. This method is hard-wired into SGI's implementation
3136             .</para>
3137
3138             <para>The &amp; (bit-mask) implementation has the advantage of
3139             relying on the fast bit-wise and operation. It has the
3140             disadvantage that for g(r, m) is affected only by the
3141             low order bits of r. This method is hard-wired into
3142             Dinkumware's implementation.</para>
3143
3144
3145           </section>
3146
3147           <section xml:id="details.hash_policies.ranged">
3148             <info><title>Ranged Hash</title></info>
3149
3150             <para>In cases it is beneficial to allow the
3151             client to directly specify a ranged-hash hash function. It is
3152             true, that the writer of the ranged-hash function cannot rely
3153             on the values of m having specific numerical properties
3154             suitable for hashing (in the sense used in <xref linkend="biblio.knuth98sorting"/>), since
3155             the values of m are determined by a resize policy with
3156             possibly orthogonal considerations.</para>
3157
3158             <para>There are two cases where a ranged-hash function can be
3159             superior. The firs is when using perfect hashing: the
3160             second is when the values of m can be used to estimate
3161             the "general" number of distinct values required. This is
3162             described in the following.</para>
3163
3164             <para>Let</para>
3165
3166             <para>
3167               s = [ s<subscript>0</subscript>,..., s<subscript>t - 1</subscript>]
3168             </para>
3169
3170             <para>be a string of t characters, each of which is from
3171             domain S. Consider the following ranged-hash
3172             function:</para>
3173             <equation>
3174               <title>
3175                 A Standard String Hash Function
3176               </title>
3177               <mathphrase>
3178                 f<subscript>1</subscript>(s, m) = ∑ <subscript>i =
3179                 0</subscript><superscript>t - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3180               </mathphrase>
3181             </equation>
3182
3183
3184             <para>where a is some non-negative integral value. This is
3185             the standard string-hashing function used in SGI's
3186             implementation (with a = 5). Its advantage is that
3187             it takes into account all of the characters of the string.</para>
3188
3189             <para>Now assume that s is the string representation of a
3190             of a long DNA sequence (and so S = {'A', 'C', 'G',
3191             'T'}). In this case, scanning the entire string might be
3192             prohibitively expensive. A possible alternative might be to use
3193             only the first k characters of the string, where</para>
3194
3195             <para>|S|<superscript>k</superscript> ≥ m ,</para>
3196
3197             <para>i.e., using the hash function</para>
3198
3199             <equation>
3200               <title>
3201                 Only k String DNA Hash
3202               </title>
3203               <mathphrase>
3204                 f<subscript>2</subscript>(s, m) = ∑ <subscript>i
3205                 = 0</subscript><superscript>k - 1</superscript> s<subscript>i</subscript> a<superscript>i</superscript> mod m
3206               </mathphrase>
3207             </equation>
3208
3209             <para>requiring scanning over only</para>
3210
3211             <para>k = log<subscript>4</subscript>( m )</para>
3212
3213             <para>characters.</para>
3214
3215             <para>Other more elaborate hash-functions might scan k
3216             characters starting at a random position (determined at each
3217             resize), or scanning k random positions (determined at
3218             each resize), i.e., using</para>
3219
3220             <para>f<subscript>3</subscript>(s, m) = ∑ <subscript>i =
3221             r</subscript>0<superscript>r<subscript>0</subscript> + k - 1</superscript> s<subscript>i</subscript>
3222             a<superscript>i</superscript> mod m ,</para>
3223
3224             <para>or</para>
3225
3226             <para>f<subscript>4</subscript>(s, m) = ∑ <subscript>i = 0</subscript><superscript>k -
3227             1</superscript> s<subscript>r</subscript>i a<superscript>r<subscript>i</subscript></superscript> mod
3228             m ,</para>
3229
3230             <para>respectively, for r<subscript>0</subscript>,..., r<subscript>k-1</subscript>
3231             each in the (inclusive) range [0,...,t-1].</para>
3232
3233             <para>It should be noted that the above functions cannot be
3234             decomposed as per a ranged hash composed of hash and range hashing.</para>
3235
3236
3237           </section>
3238
3239           <section xml:id="details.hash_policies.implementation">
3240             <info><title>Implementation</title></info>
3241
3242             <para>This sub-subsection describes the implementation of
3243             the above in this library. It first explains range-hashing
3244             functions in collision-chaining tables, then ranged-hash
3245             functions in collision-chaining tables, then probing-based
3246             tables, and finally lists the relevant classes in this
3247             library.</para>
3248
3249             <section xml:id="hash_policies.implementation.collision-chaining">
3250               <info><title>
3251                 Range-Hashing and Ranged-Hashes in Collision-Chaining Tables
3252               </title></info>
3253
3254
3255               <para><classname>cc_hash_table</classname> is
3256               parametrized by <classname>Hash_Fn</classname> and <classname>Comb_Hash_Fn</classname>, a
3257               hash functor and a combining hash functor, respectively.</para>
3258
3259               <para>In general, <classname>Comb_Hash_Fn</classname> is considered a
3260               range-hashing functor. <classname>cc_hash_table</classname>
3261               synthesizes a ranged-hash function from <classname>Hash_Fn</classname> and
3262               <classname>Comb_Hash_Fn</classname>. The figure below shows an <classname>insert</classname> sequence
3263               diagram for this case. The user inserts an element (point A),
3264               the container transforms the key into a non-negative integral
3265               using the hash functor (points B and C), and transforms the
3266               result into a position using the combining functor (points D
3267               and E).</para>
3268
3269               <figure>
3270                 <title>Insert hash sequence diagram</title>
3271                 <mediaobject>
3272                   <imageobject>
3273                     <imagedata align="center" format="PNG" scale="100"
3274                                fileref="../images/pbds_hash_range_hashing_seq_diagram.png"/>
3275                   </imageobject>
3276                   <textobject>
3277                     <phrase>Insert hash sequence diagram</phrase>
3278                   </textobject>
3279                 </mediaobject>
3280               </figure>
3281
3282               <para>If <classname>cc_hash_table</classname>'s
3283               hash-functor, <classname>Hash_Fn</classname> is instantiated by <classname>null_type</classname> , then <classname>Comb_Hash_Fn</classname> is taken to be
3284               a ranged-hash function. The graphic below shows an <function>insert</function> sequence
3285               diagram. The user inserts an element (point A), the container
3286               transforms the key into a position using the combining functor
3287               (points B and C).</para>
3288
3289               <figure>
3290                 <title>Insert hash sequence diagram with a null policy</title>
3291                 <mediaobject>
3292                   <imageobject>
3293                     <imagedata align="center" format="PNG" scale="100"
3294                                fileref="../images/pbds_hash_range_hashing_seq_diagram2.png"/>
3295                   </imageobject>
3296                   <textobject>
3297                     <phrase>Insert hash sequence diagram with a null policy</phrase>
3298                   </textobject>
3299                 </mediaobject>
3300               </figure>
3301
3302             </section>
3303
3304             <section xml:id="hash_policies.implementation.probe">
3305               <info><title>
3306                 Probing tables
3307               </title></info>
3308               <para><classname>gp_hash_table</classname> is parametrized by
3309               <classname>Hash_Fn</classname>, <classname>Probe_Fn</classname>,
3310               and <classname>Comb_Probe_Fn</classname>. As before, if
3311               <classname>Hash_Fn</classname> and <classname>Probe_Fn</classname>
3312               are both <classname>null_type</classname>, then
3313               <classname>Comb_Probe_Fn</classname> is a ranged-probe
3314               functor. Otherwise, <classname>Hash_Fn</classname> is a hash
3315               functor, <classname>Probe_Fn</classname> is a functor for offsets
3316               from a hash value, and <classname>Comb_Probe_Fn</classname>
3317               transforms a probe sequence into a sequence of positions within
3318               the table.</para>
3319
3320             </section>
3321
3322             <section xml:id="hash_policies.implementation.predefined">
3323               <info><title>
3324                 Pre-Defined Policies
3325               </title></info>
3326
3327               <para>This library contains some pre-defined classes
3328               implementing range-hashing and probing functions:</para>
3329
3330               <orderedlist>
3331                 <listitem><para><classname>direct_mask_range_hashing</classname>
3332                 and <classname>direct_mod_range_hashing</classname>
3333                 are range-hashing functions based on a bit-mask and a modulo
3334                 operation, respectively.</para></listitem>
3335
3336                 <listitem><para><classname>linear_probe_fn</classname>, and
3337                 <classname>quadratic_probe_fn</classname> are
3338                 a linear probe and a quadratic probe function,
3339                 respectively.</para></listitem>
3340               </orderedlist>
3341
3342               <para>
3343                 The graphic below shows the relationships.
3344               </para>
3345               <figure>
3346                 <title>Hash policy class diagram</title>
3347                 <mediaobject>
3348                   <imageobject>
3349                     <imagedata align="center" format="PNG" scale="100"
3350                                fileref="../images/pbds_hash_policy_cd.png"/>
3351                   </imageobject>
3352                   <textobject>
3353                     <phrase>Hash policy class diagram</phrase>
3354                   </textobject>
3355                 </mediaobject>
3356               </figure>
3357
3358
3359             </section>
3360
3361           </section> <!-- impl -->
3362
3363         </section>
3364
3365         <section xml:id="container.hash.details.resize_policies">
3366           <info><title>Resize Policies</title></info>
3367
3368           <section xml:id="resize_policies.general">
3369             <info><title>General</title></info>
3370
3371             <para>Hash-tables, as opposed to trees, do not naturally grow or
3372             shrink. It is necessary to specify policies to determine how
3373             and when a hash table should change its size. Usually, resize
3374             policies can be decomposed into orthogonal policies:</para>
3375
3376             <orderedlist>
3377               <listitem><para>A size policy indicating how a hash table
3378               should grow (e.g., it should multiply by powers of
3379               2).</para></listitem>
3380
3381               <listitem><para>A trigger policy indicating when a hash
3382               table should grow (e.g., a load factor is
3383               exceeded).</para></listitem>
3384             </orderedlist>
3385
3386           </section>
3387
3388           <section xml:id="resize_policies.size">
3389             <info><title>Size Policies</title></info>
3390
3391
3392             <para>Size policies determine how a hash table changes size. These
3393             policies are simple, and there are relatively few sensible
3394             options. An exponential-size policy (with the initial size and
3395             growth factors both powers of 2) works well with a mask-based
3396             range-hashing function, and is the
3397             hard-wired policy used by Dinkumware. A
3398             prime-list based policy works well with a modulo-prime range
3399             hashing function and is the hard-wired policy used by SGI's
3400             implementation.</para>
3401
3402           </section>
3403
3404           <section xml:id="resize_policies.trigger">
3405             <info><title>Trigger Policies</title></info>
3406
3407             <para>Trigger policies determine when a hash table changes size.
3408             Following is a description of two policies: load-check
3409             policies, and collision-check policies.</para>
3410
3411             <para>Load-check policies are straightforward. The user specifies
3412             two factors, Α<subscript>min</subscript> and
3413             Α<subscript>max</subscript>, and the hash table maintains the
3414             invariant that</para>
3415
3416             <para>Α<subscript>min</subscript> ≤ (number of
3417             stored elements) / (hash-table size) ≤
3418             Α<subscript>max</subscript>
3419             <!-- <remark>load factor min max</remark> -->
3420             </para>
3421
3422             <para>Collision-check policies work in the opposite direction of
3423             load-check policies. They focus on keeping the number of
3424             collisions moderate and hoping that the size of the table will
3425             not grow very large, instead of keeping a moderate load-factor
3426             and hoping that the number of collisions will be small. A
3427             maximal collision-check policy resizes when the longest
3428             probe-sequence grows too large.</para>
3429
3430             <para>Consider the graphic below. Let the size of the hash table
3431             be denoted by m, the length of a probe sequence be denoted by k,
3432             and some load factor be denoted by Α. We would like to
3433             calculate the minimal length of k, such that if there were Α
3434             m elements in the hash table, a probe sequence of length k would
3435             be found with probability at most 1/m.</para>
3436
3437             <figure>
3438               <title>Balls and bins</title>
3439               <mediaobject>
3440                 <imageobject>
3441                   <imagedata align="center" format="PNG" scale="100"
3442                              fileref="../images/pbds_balls_and_bins.png"/>
3443                 </imageobject>
3444                 <textobject>
3445                   <phrase>Balls and bins</phrase>
3446                 </textobject>
3447               </mediaobject>
3448             </figure>
3449
3450             <para>Denote the probability that a probe sequence of length
3451             k appears in bin i by p<subscript>i</subscript>, the
3452             length of the probe sequence of bin i by
3453             l<subscript>i</subscript>, and assume uniform distribution. Then</para>
3454
3455
3456
3457             <equation>
3458               <title>
3459                 Probability of Probe Sequence of Length k
3460               </title>
3461               <mathphrase>
3462                 p<subscript>1</subscript> =
3463               </mathphrase>
3464             </equation>
3465
3466             <para>P(l<subscript>1</subscript> ≥ k) =</para>
3467
3468             <para>
3469               P(l<subscript>1</subscript> ≥ α ( 1 + k / α - 1) ≤ (a)
3470             </para>
3471
3472             <para>
3473               e ^ ( - ( α ( k / α - 1 )<superscript>2</superscript> ) /2)
3474             </para>
3475
3476             <para>where (a) follows from the Chernoff bound (<xref linkend="biblio.motwani95random"/>). To
3477             calculate the probability that some bin contains a probe
3478             sequence greater than k, we note that the
3479             l<subscript>i</subscript> are negatively-dependent
3480             (<xref linkend="biblio.dubhashi98neg"/>)
3481             . Let
3482             I(.) denote the indicator function. Then</para>
3483
3484             <equation>
3485               <title>
3486                 Probability Probe Sequence in Some Bin
3487               </title>
3488               <mathphrase>
3489                 P( exists<subscript>i</subscript> l<subscript>i</subscript> ≥ k ) =
3490               </mathphrase>
3491             </equation>
3492
3493             <para>P ( ∑ <subscript>i = 1</subscript><superscript>m</superscript>
3494             I(l<subscript>i</subscript> ≥ k) ≥ 1 ) =</para>
3495
3496             <para>P ( ∑ <subscript>i = 1</subscript><superscript>m</superscript> I (
3497             l<subscript>i</subscript> ≥ k ) ≥ m p<subscript>1</subscript> ( 1 + 1 / (m
3498             p<subscript>1</subscript>) - 1 ) ) ≤ (a)</para>
3499
3500             <para>e ^ ( ( - m p<subscript>1</subscript> ( 1 / (m p<subscript>1</subscript>)
3501             - 1 ) <superscript>2</superscript> ) / 2 ) ,</para>
3502
3503             <para>where (a) follows from the fact that the Chernoff bound can
3504             be applied to negatively-dependent variables (<xref
3505             linkend="biblio.dubhashi98neg"/>). Inserting the first probability
3506             equation into the second one, and equating with 1/m, we
3507             obtain</para>
3508
3509
3510             <para>k ~ √ ( 2 α ln 2 m ln(m) )
3511             ) .</para>
3512
3513           </section>
3514
3515           <section xml:id="resize_policies.impl">
3516             <info><title>Implementation</title></info>
3517
3518             <para>This sub-subsection describes the implementation of the
3519             above in this library. It first describes resize policies and
3520             their decomposition into trigger and size policies, then
3521             describes pre-defined classes, and finally discusses controlled
3522             access the policies' internals.</para>
3523
3524             <section xml:id="resize_policies.impl.decomposition">
3525               <info><title>Decomposition</title></info>
3526
3527
3528               <para>Each hash-based container is parametrized by a
3529               <classname>Resize_Policy</classname> parameter; the container derives
3530               <classname>public</classname>ly from <classname>Resize_Policy</classname>. For
3531               example:</para>
3532               <programlisting>
3533                 cc_hash_table&lt;typename Key,
3534                 typename Mapped,
3535                 ...
3536                 typename Resize_Policy
3537                 ...&gt; : public Resize_Policy
3538               </programlisting>
3539
3540               <para>As a container object is modified, it continuously notifies
3541               its <classname>Resize_Policy</classname> base of internal changes
3542               (e.g., collisions encountered and elements being
3543               inserted). It queries its <classname>Resize_Policy</classname> base whether
3544               it needs to be resized, and if so, to what size.</para>
3545
3546               <para>The graphic below shows a (possible) sequence diagram
3547               of an insert operation. The user inserts an element; the hash
3548               table notifies its resize policy that a search has started
3549               (point A); in this case, a single collision is encountered -
3550               the table notifies its resize policy of this (point B); the
3551               container finally notifies its resize policy that the search
3552               has ended (point C); it then queries its resize policy whether
3553               a resize is needed, and if so, what is the new size (points D
3554               to G); following the resize, it notifies the policy that a
3555               resize has completed (point H); finally, the element is
3556               inserted, and the policy notified (point I).</para>
3557
3558               <figure>
3559                 <title>Insert resize sequence diagram</title>
3560                 <mediaobject>
3561                   <imageobject>
3562                     <imagedata align="center" format="PNG" scale="100"
3563                                fileref="../images/pbds_insert_resize_sequence_diagram1.png"/>
3564                   </imageobject>
3565                   <textobject>
3566                     <phrase>Insert resize sequence diagram</phrase>
3567                   </textobject>
3568                 </mediaobject>
3569               </figure>
3570
3571
3572               <para>In practice, a resize policy can be usually orthogonally
3573               decomposed to a size policy and a trigger policy. Consequently,
3574               the library contains a single class for instantiating a resize
3575               policy: <classname>hash_standard_resize_policy</classname>
3576               is parametrized by <classname>Size_Policy</classname> and
3577               <classname>Trigger_Policy</classname>, derives <classname>public</classname>ly from
3578               both, and acts as a standard delegate (<xref linkend="biblio.gof"/>)
3579               to these policies.</para>
3580
3581               <para>The two graphics immediately below show sequence diagrams
3582               illustrating the interaction between the standard resize policy
3583               and its trigger and size policies, respectively.</para>
3584
3585               <figure>
3586                 <title>Standard resize policy trigger sequence
3587                 diagram</title>
3588                 <mediaobject>
3589                   <imageobject>
3590                     <imagedata align="center" format="PNG" scale="100"
3591                                fileref="../images/pbds_insert_resize_sequence_diagram2.png"/>
3592                   </imageobject>
3593                   <textobject>
3594                     <phrase>Standard resize policy trigger sequence
3595                     diagram</phrase>
3596                   </textobject>
3597                 </mediaobject>
3598               </figure>
3599
3600               <figure>
3601                 <title>Standard resize policy size sequence
3602                 diagram</title>
3603                 <mediaobject>
3604                   <imageobject>
3605                     <imagedata align="center" format="PNG" scale="100"
3606                                fileref="../images/pbds_insert_resize_sequence_diagram3.png"/>
3607                   </imageobject>
3608                   <textobject>
3609                     <phrase>Standard resize policy size sequence
3610                     diagram</phrase>
3611                   </textobject>
3612                 </mediaobject>
3613               </figure>
3614
3615
3616             </section>
3617
3618             <section xml:id="resize_policies.impl.predefined">
3619               <info><title>Predefined Policies</title></info>
3620               <para>The library includes the following
3621               instantiations of size and trigger policies:</para>
3622
3623               <orderedlist>
3624                 <listitem><para><classname>hash_load_check_resize_trigger</classname>
3625                 implements a load check trigger policy.</para></listitem>
3626
3627                 <listitem><para><classname>cc_hash_max_collision_check_resize_trigger</classname>
3628                 implements a collision check trigger policy.</para></listitem>
3629
3630                 <listitem><para><classname>hash_exponential_size_policy</classname>
3631                 implements an exponential-size policy (which should be used
3632                 with mask range hashing).</para></listitem>
3633
3634                 <listitem><para><classname>hash_prime_size_policy</classname>
3635                 implementing a size policy based on a sequence of primes
3636                 (which should
3637                 be used with mod range hashing</para></listitem>
3638               </orderedlist>
3639
3640               <para>The graphic below gives an overall picture of the resize-related
3641               classes. <classname>basic_hash_table</classname>
3642               is parametrized by <classname>Resize_Policy</classname>, which it subclasses
3643               publicly. This class is currently instantiated only by <classname>hash_standard_resize_policy</classname>.
3644               <classname>hash_standard_resize_policy</classname>
3645               itself is parametrized by <classname>Trigger_Policy</classname> and
3646               <classname>Size_Policy</classname>. Currently, <classname>Trigger_Policy</classname> is
3647               instantiated by <classname>hash_load_check_resize_trigger</classname>,
3648               or <classname>cc_hash_max_collision_check_resize_trigger</classname>;
3649               <classname>Size_Policy</classname> is instantiated by <classname>hash_exponential_size_policy</classname>,
3650               or <classname>hash_prime_size_policy</classname>.</para>
3651
3652             </section>
3653
3654             <section xml:id="resize_policies.impl.internals">
3655               <info><title>Controling Access to Internals</title></info>
3656
3657               <para>There are cases where (controlled) access to resize
3658               policies' internals is beneficial. E.g., it is sometimes
3659               useful to query a hash-table for the table's actual size (as
3660               opposed to its <function>size()</function> - the number of values it
3661               currently holds); it is sometimes useful to set a table's
3662               initial size, externally resize it, or change load factors.</para>
3663
3664               <para>Clearly, supporting such methods both decreases the
3665               encapsulation of hash-based containers, and increases the
3666               diversity between different associative-containers' interfaces.
3667               Conversely, omitting such methods can decrease containers'
3668               flexibility.</para>
3669
3670               <para>In order to avoid, to the extent possible, the above
3671               conflict, the hash-based containers themselves do not address
3672               any of these questions; this is deferred to the resize policies,
3673               which are easier to change or replace. Thus, for example,
3674               neither <classname>cc_hash_table</classname> nor
3675               <classname>gp_hash_table</classname>
3676               contain methods for querying the actual size of the table; this
3677               is deferred to <classname>hash_standard_resize_policy</classname>.</para>
3678
3679               <para>Furthermore, the policies themselves are parametrized by
3680               template arguments that determine the methods they support
3681               (
3682               <xref linkend="biblio.alexandrescu01modern"/>
3683               shows techniques for doing so). <classname>hash_standard_resize_policy</classname>
3684               is parametrized by <classname>External_Size_Access</classname> that
3685               determines whether it supports methods for querying the actual
3686               size of the table or resizing it. <classname>hash_load_check_resize_trigger</classname>
3687               is parametrized by <classname>External_Load_Access</classname> that
3688               determines whether it supports methods for querying or
3689               modifying the loads. <classname>cc_hash_max_collision_check_resize_trigger</classname>
3690               is parametrized by <classname>External_Load_Access</classname> that
3691               determines whether it supports methods for querying the
3692               load.</para>
3693
3694               <para>Some operations, for example, resizing a container at
3695               run time, or changing the load factors of a load-check trigger
3696               policy, require the container itself to resize. As mentioned
3697               above, the hash-based containers themselves do not contain
3698               these types of methods, only their resize policies.
3699               Consequently, there must be some mechanism for a resize policy
3700               to manipulate the hash-based container. As the hash-based
3701               container is a subclass of the resize policy, this is done
3702               through virtual methods. Each hash-based container has a
3703               <classname>private</classname> <classname>virtual</classname> method:</para>
3704               <programlisting>
3705                 virtual void
3706                 do_resize
3707                 (size_type new_size);
3708               </programlisting>
3709
3710               <para>which resizes the container. Implementations of
3711               <classname>Resize_Policy</classname> can export public methods for resizing
3712               the container externally; these methods internally call
3713               <classname>do_resize</classname> to resize the table.</para>
3714
3715
3716             </section>
3717
3718           </section>
3719
3720
3721         </section> <!-- resize policies -->
3722
3723         <section xml:id="container.hash.details.policy_interaction">
3724           <info><title>Policy Interactions</title></info>
3725           <para>
3726           </para>
3727           <para>Hash-tables are unfortunately especially susceptible to
3728           choice of policies. One of the more complicated aspects of this
3729           is that poor combinations of good policies can form a poor
3730           container. Following are some considerations.</para>
3731
3732           <section xml:id="policy_interaction.probesizetrigger">
3733             <info><title>probe/size/trigger</title></info>
3734
3735             <para>Some combinations do not work well for probing containers.
3736             For example, combining a quadratic probe policy with an
3737             exponential size policy can yield a poor container: when an
3738             element is inserted, a trigger policy might decide that there
3739             is no need to resize, as the table still contains unused
3740             entries; the probe sequence, however, might never reach any of
3741             the unused entries.</para>
3742
3743             <para>Unfortunately, this library cannot detect such problems at
3744             compilation (they are halting reducible). It therefore defines
3745             an exception class <classname>insert_error</classname> to throw an
3746             exception in this case.</para>
3747
3748           </section>
3749
3750           <section xml:id="policy_interaction.hashtrigger">
3751             <info><title>hash/trigger</title></info>
3752
3753             <para>Some trigger policies are especially susceptible to poor
3754             hash functions. Suppose, as an extreme case, that the hash
3755             function transforms each key to the same hash value. After some
3756             inserts, a collision detecting policy will always indicate that
3757             the container needs to grow.</para>
3758
3759             <para>The library, therefore, by design, limits each operation to
3760             one resize. For each <classname>insert</classname>, for example, it queries
3761             only once whether a resize is needed.</para>
3762
3763           </section>
3764
3765           <section xml:id="policy_interaction.eqstorehash">
3766             <info><title>equivalence functors/storing hash values/hash</title></info>
3767
3768             <para><classname>cc_hash_table</classname> and
3769             <classname>gp_hash_table</classname> are
3770             parametrized by an equivalence functor and by a
3771             <classname>Store_Hash</classname> parameter. If the latter parameter is
3772             <classname>true</classname>, then the container stores with each entry
3773             a hash value, and uses this value in case of collisions to
3774             determine whether to apply a hash value. This can lower the
3775             cost of collision for some types, but increase the cost of
3776             collisions for other types.</para>
3777
3778             <para>If a ranged-hash function or ranged probe function is
3779             directly supplied, however, then it makes no sense to store the
3780             hash value with each entry. This library's container will
3781             fail at compilation, by design, if this is attempted.</para>
3782
3783           </section>
3784
3785           <section xml:id="policy_interaction.sizeloadtrigger">
3786             <info><title>size/load-check trigger</title></info>
3787
3788             <para>Assume a size policy issues an increasing sequence of sizes
3789             a, a q, a q<superscript>1</superscript>, a q<superscript>2</superscript>, ... For
3790             example, an exponential size policy might issue the sequence of
3791             sizes 8, 16, 32, 64, ...</para>
3792
3793             <para>If a load-check trigger policy is used, with loads
3794             α<subscript>min</subscript> and α<subscript>max</subscript>,
3795             respectively, then it is a good idea to have:</para>
3796
3797             <orderedlist>
3798               <listitem><para>α<subscript>max</subscript> ~ 1 / q</para></listitem>
3799
3800               <listitem><para>α<subscript>min</subscript> &lt; 1 / (2 q)</para></listitem>
3801             </orderedlist>
3802
3803             <para>This will ensure that the amortized hash cost of each
3804             modifying operation is at most approximately 3.</para>
3805
3806             <para>α<subscript>min</subscript> ~ α<subscript>max</subscript> is, in
3807             any case, a bad choice, and α<subscript>min</subscript> &gt;
3808             α <subscript>max</subscript> is horrendous.</para>
3809
3810           </section>
3811
3812         </section>
3813
3814       </section> <!-- details -->
3815
3816     </section> <!-- hash -->
3817
3818     <!-- tree -->
3819     <section xml:id="pbds.design.container.tree">
3820       <info><title>tree</title></info>
3821
3822       <section xml:id="container.tree.interface">
3823         <info><title>Interface</title></info>
3824
3825         <para>The tree-based container has the following declaration:</para>
3826         <programlisting>
3827           template&lt;
3828           typename Key,
3829           typename Mapped,
3830           typename Cmp_Fn = std::less&lt;Key&gt;,
3831           typename Tag = rb_tree_tag,
3832           template&lt;
3833           typename Const_Node_Iterator,
3834           typename Node_Iterator,
3835           typename Cmp_Fn_,
3836           typename Allocator_&gt;
3837           class Node_Update = null_node_update,
3838           typename Allocator = std::allocator&lt;char&gt; &gt;
3839           class tree;
3840         </programlisting>
3841
3842         <para>The parameters have the following meaning:</para>
3843
3844         <orderedlist>
3845           <listitem>
3846           <para><classname>Key</classname> is the key type.</para></listitem>
3847
3848           <listitem>
3849           <para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
3850
3851           <listitem>
3852           <para><classname>Cmp_Fn</classname> is a key comparison functor</para></listitem>
3853
3854           <listitem>
3855             <para><classname>Tag</classname> specifies which underlying data structure
3856           to use.</para></listitem>
3857
3858           <listitem>
3859             <para><classname>Node_Update</classname> is a policy for updating node
3860           invariants.</para></listitem>
3861
3862           <listitem>
3863             <para><classname>Allocator</classname> is an allocator
3864           type.</para></listitem>
3865         </orderedlist>
3866
3867         <para>The <classname>Tag</classname> parameter specifies which underlying
3868         data structure to use. Instantiating it by <classname>rb_tree_tag</classname>, <classname>splay_tree_tag</classname>, or
3869         <classname>ov_tree_tag</classname>,
3870         specifies an underlying red-black tree, splay tree, or
3871         ordered-vector tree, respectively; any other tag is illegal.
3872         Note that containers based on the former two contain more types
3873         and methods than the latter (e.g.,
3874         <classname>reverse_iterator</classname> and <classname>rbegin</classname>), and different
3875         exception and invalidation guarantees.</para>
3876
3877       </section>
3878
3879       <section xml:id="container.tree.details">
3880         <info><title>Details</title></info>
3881
3882         <section xml:id="container.tree.node">
3883           <info><title>Node Invariants</title></info>
3884
3885
3886           <para>Consider the two trees in the graphic below, labels A and B. The first
3887           is a tree of floats; the second is a tree of pairs, each
3888           signifying a geometric line interval. Each element in a tree is referred to as a node of the tree. Of course, each of
3889           these trees can support the usual queries: the first can easily
3890           search for <classname>0.4</classname>; the second can easily search for
3891           <classname>std::make_pair(10, 41)</classname>.</para>
3892
3893           <para>Each of these trees can efficiently support other queries.
3894           The first can efficiently determine that the 2rd key in the
3895           tree is <constant>0.3</constant>; the second can efficiently determine
3896           whether any of its intervals overlaps
3897           <programlisting>std::make_pair(29,42)</programlisting> (useful in geometric
3898           applications or distributed file systems with leases, for
3899           example).  It should be noted that an <classname>std::set</classname> can
3900           only solve these types of problems with linear complexity.</para>
3901
3902           <para>In order to do so, each tree stores some metadata in
3903           each node, and maintains node invariants (see <xref linkend="biblio.clrs2001"/>.) The first stores in
3904           each node the size of the sub-tree rooted at the node; the
3905           second stores at each node the maximal endpoint of the
3906           intervals at the sub-tree rooted at the node.</para>
3907
3908           <figure>
3909             <title>Tree node invariants</title>
3910             <mediaobject>
3911               <imageobject>
3912                 <imagedata align="center" format="PNG" scale="100"
3913                            fileref="../images/pbds_tree_node_invariants.png"/>
3914               </imageobject>
3915               <textobject>
3916                 <phrase>Tree node invariants</phrase>
3917               </textobject>
3918             </mediaobject>
3919           </figure>
3920
3921           <para>Supporting such trees is difficult for a number of
3922           reasons:</para>
3923
3924           <orderedlist>
3925             <listitem><para>There must be a way to specify what a node's metadata
3926             should be (if any).</para></listitem>
3927
3928             <listitem><para>Various operations can invalidate node
3929             invariants.  The graphic below shows how a right rotation,
3930             performed on A, results in B, with nodes x and y having
3931             corrupted invariants (the grayed nodes in C). The graphic shows
3932             how an insert, performed on D, results in E, with nodes x and y
3933             having corrupted invariants (the grayed nodes in F). It is not
3934             feasible to know outside the tree the effect of an operation on
3935             the nodes of the tree.</para></listitem>
3936
3937             <listitem><para>The search paths of standard associative containers are
3938             defined by comparisons between keys, and not through
3939             metadata.</para></listitem>
3940
3941             <listitem><para>It is not feasible to know in advance which methods trees
3942             can support. Besides the usual <classname>find</classname> method, the
3943             first tree can support a <classname>find_by_order</classname> method, while
3944             the second can support an <classname>overlaps</classname> method.</para></listitem>
3945           </orderedlist>
3946
3947           <figure>
3948             <title>Tree node invalidation</title>
3949             <mediaobject>
3950               <imageobject>
3951                 <imagedata align="center" format="PNG" scale="100"
3952                            fileref="../images/pbds_tree_node_invalidations.png"/>
3953               </imageobject>
3954               <textobject>
3955                 <phrase>Tree node invalidation</phrase>
3956               </textobject>
3957             </mediaobject>
3958           </figure>
3959
3960           <para>These problems are solved by a combination of two means:
3961           node iterators, and template-template node updater
3962           parameters.</para>
3963
3964           <section xml:id="container.tree.node.iterators">
3965             <info><title>Node Iterators</title></info>
3966
3967
3968             <para>Each tree-based container defines two additional iterator
3969             types, <classname>const_node_iterator</classname>
3970             and <classname>node_iterator</classname>.
3971             These iterators allow descending from a node to one of its
3972             children. Node iterator allow search paths different than those
3973             determined by the comparison functor. The <classname>tree</classname>
3974             supports the methods:</para>
3975             <programlisting>
3976               const_node_iterator
3977               node_begin() const;
3978
3979               node_iterator
3980               node_begin();
3981
3982               const_node_iterator
3983               node_end() const;
3984
3985               node_iterator
3986               node_end();
3987             </programlisting>
3988
3989             <para>The first pairs return node iterators corresponding to the
3990             root node of the tree; the latter pair returns node iterators
3991             corresponding to a just-after-leaf node.</para>
3992           </section>
3993
3994           <section xml:id="container.tree.node.updator">
3995             <info><title>Node Updator</title></info>
3996
3997             <para>The tree-based containers are parametrized by a
3998             <classname>Node_Update</classname> template-template parameter. A
3999             tree-based container instantiates
4000             <classname>Node_Update</classname> to some
4001             <classname>node_update</classname> class, and publicly subclasses
4002             <classname>node_update</classname>. The graphic below shows this
4003             scheme, as well as some predefined policies (which are explained
4004             below).</para>
4005
4006             <figure>
4007               <title>A tree and its update policy</title>
4008               <mediaobject>
4009                 <imageobject>
4010                   <imagedata align="center" format="PNG" scale="100"
4011                              fileref="../images/pbds_tree_node_updator_policy_cd.png"/>
4012                 </imageobject>
4013                 <textobject>
4014                   <phrase>A tree and its update policy</phrase>
4015                 </textobject>
4016               </mediaobject>
4017             </figure>
4018
4019             <para><classname>node_update</classname> (an instantiation of
4020             <classname>Node_Update</classname>) must define <classname>metadata_type</classname> as
4021             the type of metadata it requires. For order statistics,
4022             e.g., <classname>metadata_type</classname> might be <classname>size_t</classname>.
4023             The tree defines within each node a <classname>metadata_type</classname>
4024             object.</para>
4025
4026             <para><classname>node_update</classname> must also define the following method
4027             for restoring node invariants:</para>
4028             <programlisting>
4029               void
4030               operator()(node_iterator nd_it, const_node_iterator end_nd_it)
4031             </programlisting>
4032
4033             <para>In this method, <varname>nd_it</varname> is a
4034             <classname>node_iterator</classname> corresponding to a node whose
4035             A) all descendants have valid invariants, and B) its own
4036             invariants might be violated; <classname>end_nd_it</classname> is
4037             a <classname>const_node_iterator</classname> corresponding to a
4038             just-after-leaf node. This method should correct the node
4039             invariants of the node pointed to by
4040             <classname>nd_it</classname>. For example, say node x in the
4041             graphic below label A has an invalid invariant, but its' children,
4042             y and z have valid invariants. After the invocation, all three
4043             nodes should have valid invariants, as in label B.</para>
4044
4045
4046             <figure>
4047               <title>Restoring node invariants</title>
4048               <mediaobject>
4049                 <imageobject>
4050                   <imagedata align="center" format="PNG" scale="100"
4051                              fileref="../images/pbds_restoring_node_invariants.png"/>
4052                 </imageobject>
4053                 <textobject>
4054                   <phrase>Restoring node invariants</phrase>
4055                 </textobject>
4056               </mediaobject>
4057             </figure>
4058
4059             <para>When a tree operation might invalidate some node invariant,
4060             it invokes this method in its <classname>node_update</classname> base to
4061             restore the invariant. For example, the graphic below shows
4062             an <function>insert</function> operation (point A); the tree performs some
4063             operations, and calls the update functor three times (points B,
4064             C, and D). (It is well known that any <function>insert</function>,
4065             <function>erase</function>, <function>split</function> or <function>join</function>, can restore
4066             all node invariants by a small number of node invariant updates (<xref linkend="biblio.clrs2001"/>)
4067             .</para>
4068
4069             <figure>
4070               <title>Insert update sequence</title>
4071               <mediaobject>
4072                 <imageobject>
4073                   <imagedata align="center" format="PNG" scale="100"
4074                              fileref="../images/pbds_update_seq_diagram.png"/>
4075                 </imageobject>
4076                 <textobject>
4077                   <phrase>Insert update sequence</phrase>
4078                 </textobject>
4079               </mediaobject>
4080             </figure>
4081
4082             <para>To complete the description of the scheme, three questions
4083             need to be answered:</para>
4084
4085             <orderedlist>
4086               <listitem><para>How can a tree which supports order statistics define a
4087               method such as <classname>find_by_order</classname>?</para></listitem>
4088
4089               <listitem><para>How can the node updater base access methods of the
4090               tree?</para></listitem>
4091
4092               <listitem><para>How can the following cyclic dependency be resolved?
4093               <classname>node_update</classname> is a base class of the tree, yet it
4094               uses node iterators defined in the tree (its child).</para></listitem>
4095             </orderedlist>
4096
4097             <para>The first two questions are answered by the fact that
4098             <classname>node_update</classname> (an instantiation of
4099             <classname>Node_Update</classname>) is a <emphasis>public</emphasis> base class
4100             of the tree. Consequently:</para>
4101
4102             <orderedlist>
4103               <listitem><para>Any public methods of
4104               <classname>node_update</classname> are automatically methods of
4105               the tree (<xref linkend="biblio.alexandrescu01modern"/>).
4106               Thus an order-statistics node updater,
4107               <classname>tree_order_statistics_node_update</classname> defines
4108               the <function>find_by_order</function> method; any tree
4109               instantiated by this policy consequently supports this method as
4110               well.</para></listitem>
4111
4112               <listitem><para>In C++, if a base class declares a method as
4113               <literal>virtual</literal>, it is
4114               <literal>virtual</literal> in its subclasses. If
4115               <classname>node_update</classname> needs to access one of the
4116               tree's methods, say the member function
4117               <function>end</function>, it simply declares that method as
4118               <literal>virtual</literal> abstract.</para></listitem>
4119             </orderedlist>
4120
4121             <para>The cyclic dependency is solved through template-template
4122             parameters. <classname>Node_Update</classname> is parametrized by
4123             the tree's node iterators, its comparison functor, and its
4124             allocator type. Thus, instantiations of
4125             <classname>Node_Update</classname> have all information
4126             required.</para>
4127
4128             <para>This library assumes that constructing a metadata object and
4129             modifying it are exception free. Suppose that during some method,
4130             say <classname>insert</classname>, a metadata-related operation
4131             (e.g., changing the value of a metadata) throws an exception. Ack!
4132             Rolling back the method is unusually complex.</para>
4133
4134             <para>Previously, a distinction was made between redundant
4135             policies and null policies. Node invariants show a
4136             case where null policies are required.</para>
4137
4138             <para>Assume a regular tree is required, one which need not
4139             support order statistics or interval overlap queries.
4140             Seemingly, in this case a redundant policy - a policy which
4141             doesn't affect nodes' contents would suffice. This, would lead
4142             to the following drawbacks:</para>
4143
4144             <orderedlist>
4145               <listitem><para>Each node would carry a useless metadata object, wasting
4146               space.</para></listitem>
4147
4148               <listitem><para>The tree cannot know if its
4149               <classname>Node_Update</classname> policy actually modifies a
4150               node's metadata (this is halting reducible). In the graphic
4151               below, assume the shaded node is inserted. The tree would have
4152               to traverse the useless path shown to the root, applying
4153               redundant updates all the way.</para></listitem>
4154             </orderedlist>
4155             <figure>
4156               <title>Useless update path</title>
4157               <mediaobject>
4158                 <imageobject>
4159                   <imagedata align="center" format="PNG" scale="100"
4160                              fileref="../images/pbds_rationale_null_node_updator.png"/>
4161                 </imageobject>
4162                 <textobject>
4163                   <phrase>Useless update path</phrase>
4164                 </textobject>
4165               </mediaobject>
4166             </figure>
4167
4168
4169             <para>A null policy class, <classname>null_node_update</classname>
4170             solves both these problems. The tree detects that node
4171             invariants are irrelevant, and defines all accordingly.</para>
4172
4173           </section>
4174
4175         </section>
4176
4177         <section xml:id="container.tree.details.split">
4178           <info><title>Split and Join</title></info>
4179
4180           <para>Tree-based containers support split and join methods.
4181           It is possible to split a tree so that it passes
4182           all nodes with keys larger than a given key to a different
4183           tree. These methods have the following advantages over the
4184           alternative of externally inserting to the destination
4185           tree and erasing from the source tree:</para>
4186
4187           <orderedlist>
4188             <listitem><para>These methods are efficient - red-black trees are split
4189             and joined in poly-logarithmic complexity; ordered-vector
4190             trees are split and joined at linear complexity. The
4191             alternatives have super-linear complexity.</para></listitem>
4192
4193             <listitem><para>Aside from orders of growth, these operations perform
4194             few allocations and de-allocations. For red-black trees, allocations are not performed,
4195             and the methods are exception-free. </para></listitem>
4196           </orderedlist>
4197         </section>
4198
4199       </section> <!-- details -->
4200
4201     </section> <!-- tree -->
4202
4203     <!-- trie -->
4204     <section xml:id="pbds.design.container.trie">
4205       <info><title>Trie</title></info>
4206
4207       <section xml:id="container.trie.interface">
4208         <info><title>Interface</title></info>
4209
4210         <para>The trie-based container has the following declaration:</para>
4211         <programlisting>
4212           template&lt;typename Key,
4213           typename Mapped,
4214           typename Cmp_Fn = std::less&lt;Key&gt;,
4215           typename Tag = pat_trie_tag,
4216           template&lt;typename Const_Node_Iterator,
4217           typename Node_Iterator,
4218           typename E_Access_Traits_,
4219           typename Allocator_&gt;
4220           class Node_Update = null_node_update,
4221           typename Allocator = std::allocator&lt;char&gt; &gt;
4222           class trie;
4223         </programlisting>
4224
4225         <para>The parameters have the following meaning:</para>
4226
4227         <orderedlist>
4228           <listitem><para><classname>Key</classname> is the key type.</para></listitem>
4229
4230           <listitem><para><classname>Mapped</classname> is the mapped-policy.</para></listitem>
4231
4232           <listitem><para><classname>E_Access_Traits</classname> is described in below.</para></listitem>
4233
4234           <listitem><para><classname>Tag</classname> specifies which underlying data structure
4235           to use, and is described shortly.</para></listitem>
4236
4237           <listitem><para><classname>Node_Update</classname> is a policy for updating node
4238           invariants. This is described below.</para></listitem>
4239
4240           <listitem><para><classname>Allocator</classname> is an allocator
4241           type.</para></listitem>
4242         </orderedlist>
4243
4244         <para>The <classname>Tag</classname> parameter specifies which underlying
4245         data structure to use. Instantiating it by <classname>pat_trie_tag</classname>, specifies an
4246         underlying PATRICIA trie (explained shortly); any other tag is
4247         currently illegal.</para>
4248
4249         <para>Following is a description of a (PATRICIA) trie
4250         (this implementation follows <xref linkend="biblio.okasaki98mereable"/> and
4251         <xref linkend="biblio.filliatre2000ptset"/>).
4252         </para>
4253
4254         <para>A (PATRICIA) trie is similar to a tree, but with the
4255         following differences:</para>
4256
4257         <orderedlist>
4258           <listitem><para>It explicitly views keys as a sequence of elements.
4259           E.g., a trie can view a string as a sequence of
4260           characters; a trie can view a number as a sequence of
4261           bits.</para></listitem>
4262
4263           <listitem><para>It is not (necessarily) binary. Each node has fan-out n
4264           + 1, where n is the number of distinct
4265           elements.</para></listitem>
4266
4267           <listitem><para>It stores values only at leaf nodes.</para></listitem>
4268
4269           <listitem><para>Internal nodes have the properties that A) each has at
4270           least two children, and B) each shares the same prefix with
4271           any of its descendant.</para></listitem>
4272         </orderedlist>
4273
4274         <para>A (PATRICIA) trie has some useful properties:</para>
4275
4276         <orderedlist>
4277           <listitem><para>It can be configured to use large node fan-out, giving it
4278           very efficient find performance (albeit at insertion
4279           complexity and size).</para></listitem>
4280
4281           <listitem><para>It works well for common-prefix keys.</para></listitem>
4282
4283           <listitem><para>It can support efficiently queries such as which
4284           keys match a certain prefix. This is sometimes useful in file
4285           systems and routers, and for "type-ahead" aka predictive text matching
4286           on mobile devices.</para></listitem>
4287         </orderedlist>
4288
4289
4290       </section>
4291
4292       <section xml:id="container.trie.details">
4293         <info><title>Details</title></info>
4294
4295         <section xml:id="container.trie.details.etraits">
4296           <info><title>Element Access Traits</title></info>
4297
4298           <para>A trie inherently views its keys as sequences of elements.
4299           For example, a trie can view a string as a sequence of
4300           characters. A trie needs to map each of n elements to a
4301           number in {0, n - 1}. For example, a trie can map a
4302           character <varname>c</varname> to
4303           <programlisting>static_cast&lt;size_t&gt;(c)</programlisting>.</para>
4304
4305           <para>Seemingly, then, a trie can assume that its keys support
4306           (const) iterators, and that the <classname>value_type</classname> of this
4307           iterator can be cast to a <classname>size_t</classname>. There are several
4308           reasons, though, to decouple the mechanism by which the trie
4309           accesses its keys' elements from the trie:</para>
4310
4311           <orderedlist>
4312             <listitem><para>In some cases, the numerical value of an element is
4313             inappropriate. Consider a trie storing DNA strings. It is
4314             logical to use a trie with a fan-out of 5 = 1 + |{'A', 'C',
4315             'G', 'T'}|. This requires mapping 'T' to 3, though.</para></listitem>
4316
4317             <listitem><para>In some cases the keys' iterators are different than what
4318             is needed. For example, a trie can be used to search for
4319             common suffixes, by using strings'
4320             <classname>reverse_iterator</classname>. As another example, a trie mapping
4321             UNICODE strings would have a huge fan-out if each node would
4322             branch on a UNICODE character; instead, one can define an
4323             iterator iterating over 8-bit (or less) groups.</para></listitem>
4324           </orderedlist>
4325
4326           <para>trie is,
4327           consequently, parametrized by <classname>E_Access_Traits</classname> -
4328           traits which instruct how to access sequences' elements.
4329           <classname>string_trie_e_access_traits</classname>
4330           is a traits class for strings. Each such traits define some
4331           types, like:</para>
4332           <programlisting>
4333             typename E_Access_Traits::const_iterator
4334           </programlisting>
4335
4336           <para>is a const iterator iterating over a key's elements. The
4337           traits class must also define methods for obtaining an iterator
4338           to the first and last element of a key.</para>
4339
4340           <para>The graphic below shows a
4341           (PATRICIA) trie resulting from inserting the words: "I wish
4342           that I could ever see a poem lovely as a trie" (which,
4343           unfortunately, does not rhyme).</para>
4344
4345           <para>The leaf nodes contain values; each internal node contains
4346           two <classname>typename E_Access_Traits::const_iterator</classname>
4347           objects, indicating the maximal common prefix of all keys in
4348           the sub-tree. For example, the shaded internal node roots a
4349           sub-tree with leafs "a" and "as". The maximal common prefix is
4350           "a". The internal node contains, consequently, to const
4351           iterators, one pointing to <varname>'a'</varname>, and the other to
4352           <varname>'s'</varname>.</para>
4353
4354           <figure>
4355             <title>A PATRICIA trie</title>
4356             <mediaobject>
4357               <imageobject>
4358                 <imagedata align="center" format="PNG" scale="100"
4359                            fileref="../images/pbds_pat_trie.png"/>
4360               </imageobject>
4361               <textobject>
4362                 <phrase>A PATRICIA trie</phrase>
4363               </textobject>
4364             </mediaobject>
4365           </figure>
4366
4367         </section>
4368
4369         <section xml:id="container.trie.details.node">
4370           <info><title>Node Invariants</title></info>
4371
4372           <para>Trie-based containers support node invariants, as do
4373           tree-based containers. There are two minor
4374           differences, though, which, unfortunately, thwart sharing them
4375           sharing the same node-updating policies:</para>
4376
4377           <orderedlist>
4378             <listitem>
4379               <para>A trie's <classname>Node_Update</classname> template-template
4380               parameter is parametrized by <classname>E_Access_Traits</classname>, while
4381               a tree's <classname>Node_Update</classname> template-template parameter is
4382             parametrized by <classname>Cmp_Fn</classname>.</para></listitem>
4383
4384             <listitem><para>Tree-based containers store values in all nodes, while
4385             trie-based containers (at least in this implementation) store
4386             values in leafs.</para></listitem>
4387           </orderedlist>
4388
4389           <para>The graphic below shows the scheme, as well as some predefined
4390           policies (which are explained below).</para>
4391
4392           <figure>
4393             <title>A trie and its update policy</title>
4394             <mediaobject>
4395               <imageobject>
4396                 <imagedata align="center" format="PNG" scale="100"
4397                            fileref="../images/pbds_trie_node_updator_policy_cd.png"/>
4398               </imageobject>
4399               <textobject>
4400                 <phrase>A trie and its update policy</phrase>
4401               </textobject>
4402             </mediaobject>
4403           </figure>
4404
4405
4406           <para>This library offers the following pre-defined trie node
4407           updating policies:</para>
4408
4409           <orderedlist>
4410             <listitem>
4411               <para>
4412                 <classname>trie_order_statistics_node_update</classname>
4413                 supports order statistics.
4414               </para>
4415             </listitem>
4416
4417             <listitem><para><classname>trie_prefix_search_node_update</classname>
4418             supports searching for ranges that match a given prefix.</para></listitem>
4419
4420             <listitem><para><classname>null_node_update</classname>
4421             is the null node updater.</para></listitem>
4422           </orderedlist>
4423
4424         </section>
4425
4426         <section xml:id="container.trie.details.split">
4427           <info><title>Split and Join</title></info>
4428           <para>Trie-based containers support split and join methods; the
4429           rationale is equal to that of tree-based containers supporting
4430           these methods.</para>
4431         </section>
4432
4433       </section> <!-- details -->
4434
4435     </section> <!-- trie -->
4436
4437     <!-- list_update -->
4438     <section xml:id="pbds.design.container.list">
4439       <info><title>List</title></info>
4440
4441       <section xml:id="container.list.interface">
4442         <info><title>Interface</title></info>
4443
4444         <para>The list-based container has the following declaration:</para>
4445         <programlisting>
4446           template&lt;typename Key,
4447           typename Mapped,
4448           typename Eq_Fn = std::equal_to&lt;Key&gt;,
4449           typename Update_Policy = move_to_front_lu_policy&lt;&gt;,
4450           typename Allocator = std::allocator&lt;char&gt; &gt;
4451           class list_update;
4452         </programlisting>
4453
4454         <para>The parameters have the following meaning:</para>
4455
4456         <orderedlist>
4457           <listitem>
4458             <para>
4459               <classname>Key</classname> is the key type.
4460             </para>
4461           </listitem>
4462
4463           <listitem>
4464             <para>
4465               <classname>Mapped</classname> is the mapped-policy.
4466             </para>
4467           </listitem>
4468
4469           <listitem>
4470             <para>
4471               <classname>Eq_Fn</classname> is a key equivalence functor.
4472             </para>
4473           </listitem>
4474
4475           <listitem>
4476             <para>
4477               <classname>Update_Policy</classname> is a policy updating positions in
4478               the list based on access patterns. It is described in the
4479               following subsection.
4480             </para>
4481           </listitem>
4482
4483           <listitem>
4484             <para>
4485               <classname>Allocator</classname> is an allocator type.
4486             </para>
4487           </listitem>
4488         </orderedlist>
4489
4490         <para>A list-based associative container is a container that
4491         stores elements in a linked-list. It does not order the elements
4492         by any particular order related to the keys.  List-based
4493         containers are primarily useful for creating "multimaps". In fact,
4494         list-based containers are designed in this library expressly for
4495         this purpose.</para>
4496
4497         <para>List-based containers might also be useful for some rare
4498         cases, where a key is encapsulated to the extent that only
4499         key-equivalence can be tested. Hash-based containers need to know
4500         how to transform a key into a size type, and tree-based containers
4501         need to know if some key is larger than another.  List-based
4502         associative containers, conversely, only need to know if two keys
4503         are equivalent.</para>
4504
4505         <para>Since a list-based associative container does not order
4506         elements by keys, is it possible to order the list in some
4507         useful manner? Remarkably, many on-line competitive
4508         algorithms exist for reordering lists to reflect access
4509         prediction. (See <xref linkend="biblio.motwani95random"/> and <xref linkend="biblio.andrew04mtf"/>).
4510         </para>
4511
4512       </section>
4513
4514       <section xml:id="container.list.details">
4515         <info><title>Details</title></info>
4516         <para>
4517         </para>
4518         <section xml:id="container.list.details.ds">
4519           <info><title>Underlying Data Structure</title></info>
4520
4521           <para>The graphic below shows a
4522           simple list of integer keys. If we search for the integer 6, we
4523           are paying an overhead: the link with key 6 is only the fifth
4524           link; if it were the first link, it could be accessed
4525           faster.</para>
4526
4527           <figure>
4528             <title>A simple list</title>
4529             <mediaobject>
4530               <imageobject>
4531                 <imagedata align="center" format="PNG" scale="100"
4532                            fileref="../images/pbds_simple_list.png"/>
4533               </imageobject>
4534               <textobject>
4535                 <phrase>A simple list</phrase>
4536               </textobject>
4537             </mediaobject>
4538           </figure>
4539
4540           <para>List-update algorithms reorder lists as elements are
4541           accessed. They try to determine, by the access history, which
4542           keys to move to the front of the list. Some of these algorithms
4543           require adding some metadata alongside each entry.</para>
4544
4545           <para>For example, in the graphic below label A shows the counter
4546           algorithm. Each node contains both a key and a count metadata
4547           (shown in bold). When an element is accessed (e.g. 6) its count is
4548           incremented, as shown in label B. If the count reaches some
4549           predetermined value, say 10, as shown in label C, the count is set
4550           to 0 and the node is moved to the front of the list, as in label
4551           D.
4552           </para>
4553
4554           <figure>
4555             <title>The counter algorithm</title>
4556             <mediaobject>
4557               <imageobject>
4558                 <imagedata align="center" format="PNG" scale="100"
4559                            fileref="../images/pbds_list_update.png"/>
4560               </imageobject>
4561               <textobject>
4562                 <phrase>The counter algorithm</phrase>
4563               </textobject>
4564             </mediaobject>
4565           </figure>
4566
4567
4568         </section>
4569
4570         <section xml:id="container.list.details.policies">
4571           <info><title>Policies</title></info>
4572
4573           <para>this library allows instantiating lists with policies
4574           implementing any algorithm moving nodes to the front of the
4575           list (policies implementing algorithms interchanging nodes are
4576           unsupported).</para>
4577
4578           <para>Associative containers based on lists are parametrized by a
4579           <classname>Update_Policy</classname> parameter. This parameter defines the
4580           type of metadata each node contains, how to create the
4581           metadata, and how to decide, using this metadata, whether to
4582           move a node to the front of the list. A list-based associative
4583           container object derives (publicly) from its update policy.
4584           </para>
4585
4586           <para>An instantiation of <classname>Update_Policy</classname> must define
4587           internally <classname>update_metadata</classname> as the metadata it
4588           requires. Internally, each node of the list contains, besides
4589           the usual key and data, an instance of <classname>typename
4590           Update_Policy::update_metadata</classname>.</para>
4591
4592           <para>An instantiation of <classname>Update_Policy</classname> must define
4593           internally two operators:</para>
4594           <programlisting>
4595             update_metadata
4596             operator()();
4597
4598             bool
4599             operator()(update_metadata &amp;);
4600           </programlisting>
4601
4602           <para>The first is called by the container object, when creating a
4603           new node, to create the node's metadata. The second is called
4604           by the container object, when a node is accessed (
4605           when a find operation's key is equivalent to the key of the
4606           node), to determine whether to move the node to the front of
4607           the list.
4608           </para>
4609
4610           <para>The library contains two predefined implementations of
4611           list-update policies. The first
4612           is <classname>lu_counter_policy</classname>, which implements the
4613           counter algorithm described above. The second is
4614           <classname>lu_move_to_front_policy</classname>,
4615           which unconditionally move an accessed element to the front of
4616           the list. The latter type is very useful in this library,
4617           since there is no need to associate metadata with each element.
4618           (See <xref linkend="biblio.andrew04mtf"/>
4619           </para>
4620
4621         </section>
4622
4623         <section xml:id="container.list.details.mapped">
4624           <info><title>Use in Multimaps</title></info>
4625
4626           <para>In this library, there are no equivalents for the standard's
4627           multimaps and multisets; instead one uses an associative
4628           container mapping primary keys to secondary keys.</para>
4629
4630           <para>List-based containers are especially useful as associative
4631           containers for secondary keys. In fact, they are implemented
4632           here expressly for this purpose.</para>
4633
4634           <para>To begin with, these containers use very little per-entry
4635           structure memory overhead, since they can be implemented as
4636           singly-linked lists. (Arrays use even lower per-entry memory
4637           overhead, but they are less flexible in moving around entries,
4638           and have weaker invalidation guarantees).</para>
4639
4640           <para>More importantly, though, list-based containers use very
4641           little per-container memory overhead. The memory overhead of an
4642           empty list-based container is practically that of a pointer.
4643           This is important for when they are used as secondary
4644           associative-containers in situations where the average ratio of
4645           secondary keys to primary keys is low (or even 1).</para>
4646
4647           <para>In order to reduce the per-container memory overhead as much
4648           as possible, they are implemented as closely as possible to
4649           singly-linked lists.</para>
4650
4651           <orderedlist>
4652             <listitem>
4653               <para>
4654                 List-based containers do not store internally the number
4655                 of values that they hold. This means that their <function>size</function>
4656                 method has linear complexity (just like <classname>std::list</classname>).
4657                 Note that finding the number of equivalent-key values in a
4658                 standard multimap also has linear complexity (because it must be
4659                 done,  via <function>std::distance</function> of the
4660                 multimap's <function>equal_range</function> method), but usually with
4661                 higher constants.
4662               </para>
4663             </listitem>
4664
4665             <listitem>
4666               <para>
4667                 Most associative-container objects each hold a policy
4668                 object (a hash-based container object holds a
4669                 hash functor). List-based containers, conversely, only have
4670                 class-wide policy objects.
4671               </para>
4672             </listitem>
4673           </orderedlist>
4674
4675
4676         </section>
4677
4678       </section> <!-- details -->
4679
4680     </section> <!-- list -->
4681
4682
4683     <!-- priority_queue -->
4684     <section xml:id="pbds.design.container.priority_queue">
4685       <info><title>Priority Queue</title></info>
4686
4687       <section xml:id="container.priority_queue.interface">
4688         <info><title>Interface</title></info>
4689
4690         <para>The priority queue container has the following
4691         declaration:
4692         </para>
4693         <programlisting>
4694           template&lt;typename  Value_Type,
4695           typename  Cmp_Fn = std::less&lt;Value_Type&gt;,
4696           typename  Tag = pairing_heap_tag,
4697           typename  Allocator = std::allocator&lt;char &gt; &gt;
4698           class priority_queue;
4699         </programlisting>
4700
4701         <para>The parameters have the following meaning:</para>
4702
4703         <orderedlist>
4704           <listitem><para><classname>Value_Type</classname> is the value type.</para></listitem>
4705
4706           <listitem><para><classname>Cmp_Fn</classname> is a value comparison functor</para></listitem>
4707
4708           <listitem><para><classname>Tag</classname> specifies which underlying data structure
4709           to use.</para></listitem>
4710
4711           <listitem><para><classname>Allocator</classname> is an allocator
4712           type.</para></listitem>
4713         </orderedlist>
4714
4715         <para>The <classname>Tag</classname> parameter specifies which underlying
4716         data structure to use. Instantiating it by<classname>pairing_heap_tag</classname>,<classname>binary_heap_tag</classname>,
4717         <classname>binomial_heap_tag</classname>,
4718         <classname>rc_binomial_heap_tag</classname>,
4719         or <classname>thin_heap_tag</classname>,
4720         specifies, respectively,
4721         an underlying pairing heap (<xref linkend="biblio.fredman86pairing"/>),
4722         binary heap (<xref linkend="biblio.clrs2001"/>),
4723         binomial heap (<xref linkend="biblio.clrs2001"/>),
4724         a binomial heap with a redundant binary counter (<xref linkend="biblio.maverick_lowerbounds"/>),
4725         or a thin heap (<xref linkend="biblio.kt99fat_heaps"/>).
4726         </para>
4727
4728         <para>
4729           As mentioned in the tutorial,
4730           <classname>__gnu_pbds::priority_queue</classname> shares most of the
4731           same interface with <classname>std::priority_queue</classname>.
4732           E.g. if <varname>q</varname> is a priority queue of type
4733           <classname>Q</classname>, then <function>q.top()</function> will
4734           return the "largest" value in the container (according to
4735           <classname>typename
4736           Q::cmp_fn</classname>). <classname>__gnu_pbds::priority_queue</classname>
4737           has a larger (and very slightly different) interface than
4738           <classname>std::priority_queue</classname>, however, since typically
4739           <classname>push</classname> and <classname>pop</classname> are deemed
4740         insufficient for manipulating priority-queues. </para>
4741
4742         <para>Different settings require different priority-queue
4743         implementations which are described in later; see traits
4744         discusses ways to differentiate between the different traits of
4745         different implementations.</para>
4746
4747
4748       </section>
4749
4750       <section xml:id="container.priority_queue.details">
4751         <info><title>Details</title></info>
4752
4753         <section xml:id="container.priority_queue.details.iterators">
4754           <info><title>Iterators</title></info>
4755
4756           <para>There are many different underlying-data structures for
4757           implementing priority queues. Unfortunately, most such
4758           structures are oriented towards making <function>push</function> and
4759           <function>top</function> efficient, and consequently don't allow efficient
4760           access of other elements: for instance, they cannot support an efficient
4761           <function>find</function> method. In the use case where it
4762           is important to both access and "do something with" an
4763           arbitrary value, one would be out of luck. For example, many graph algorithms require
4764           modifying a value (typically increasing it in the sense of the
4765           priority queue's comparison functor).</para>
4766
4767           <para>In order to access and manipulate an arbitrary value in a
4768           priority queue, one needs to reference the internals of the
4769           priority queue from some form of an associative container -
4770           this is unavoidable. Of course, in order to maintain the
4771           encapsulation of the priority queue, this needs to be done in a
4772           way that minimizes exposure to implementation internals.</para>
4773
4774           <para>In this library the priority queue's <function>insert</function>
4775           method returns an iterator, which if valid can be used for subsequent <function>modify</function> and
4776           <function>erase</function> operations. This both preserves the priority
4777           queue's encapsulation, and allows accessing arbitrary values (since the
4778           returned iterators from the <function>push</function> operation can be
4779           stored in some form of associative container).</para>
4780
4781           <para>Priority queues' iterators present a problem regarding their
4782           invalidation guarantees. One assumes that calling
4783           <function>operator++</function> on an iterator will associate it
4784           with the "next" value. Priority-queues are
4785           self-organizing: each operation changes what the "next" value
4786           means. Consequently, it does not make sense that <function>push</function>
4787           will return an iterator that can be incremented - this can have
4788           no possible use. Also, as in the case of hash-based containers,
4789           it is awkward to define if a subsequent <function>push</function> operation
4790           invalidates a prior returned iterator: it invalidates it in the
4791           sense that its "next" value is not related to what it
4792           previously considered to be its "next" value. However, it might not
4793           invalidate it, in the sense that it can be
4794           de-referenced and used for <function>modify</function> and <function>erase</function>
4795           operations.</para>
4796
4797           <para>Similarly to the case of the other unordered associative
4798           containers, this library uses a distinction between
4799           point-type and range type iterators. A priority queue's <classname>iterator</classname> can always be
4800           converted to a <classname>point_iterator</classname>, and a
4801           <classname>const_iterator</classname> can always be converted to a
4802           <classname>point_const_iterator</classname>.</para>
4803
4804           <para>The following snippet demonstrates manipulating an arbitrary
4805           value:</para>
4806           <programlisting>
4807             // A priority queue of integers.
4808             priority_queue&lt;int &gt; p;
4809
4810             // Insert some values into the priority queue.
4811             priority_queue&lt;int &gt;::point_iterator it = p.push(0);
4812
4813             p.push(1);
4814             p.push(2);
4815
4816             // Now modify a value.
4817             p.modify(it, 3);
4818
4819             assert(p.top() == 3);
4820           </programlisting>
4821
4822
4823           <para>It should be noted that an alternative design could embed an
4824           associative container in a priority queue. Could, but most
4825           probably should not. To begin with, it should be noted that one
4826           could always encapsulate a priority queue and an associative
4827           container mapping values to priority queue iterators with no
4828           performance loss. One cannot, however, "un-encapsulate" a priority
4829           queue embedding an associative container, which might lead to
4830           performance loss. Assume, that one needs to associate each value
4831           with some data unrelated to priority queues. Then using
4832           this library's design, one could use an
4833           associative container mapping each value to a pair consisting of
4834           this data and a priority queue's iterator. Using the embedded
4835           method would need to use two associative containers. Similar
4836           problems might arise in cases where a value can reside
4837           simultaneously in many priority queues.</para>
4838
4839         </section>
4840
4841
4842         <section xml:id="container.priority_queue.details.d">
4843           <info><title>Underlying Data Structure</title></info>
4844
4845           <para>There are three main implementations of priority queues: the
4846           first employs a binary heap, typically one which uses a
4847           sequence; the second uses a tree (or forest of trees), which is
4848           typically less structured than an associative container's tree;
4849           the third simply uses an associative container. These are
4850           shown in the graphic below, in labels A1 and A2, label B, and label C.</para>
4851
4852           <figure>
4853             <title>Underlying Priority-Queue Data-Structures.</title>
4854             <mediaobject>
4855               <imageobject>
4856                 <imagedata align="center" format="PNG" scale="100"
4857                            fileref="../images/pbds_priority_queue_different_underlying_dss.png"/>
4858               </imageobject>
4859               <textobject>
4860                 <phrase>Underlying Priority-Queue Data-Structures.</phrase>
4861               </textobject>
4862             </mediaobject>
4863           </figure>
4864
4865           <para>Roughly speaking, any value that is both pushed and popped
4866           from a priority queue must incur a logarithmic expense (in the
4867           amortized sense). Any priority queue implementation that would
4868           avoid this, would violate known bounds on comparison-based
4869           sorting (see <xref linkend="biblio.clrs2001"/> and <xref linkend="biblio.brodal96priority"/>).
4870           </para>
4871
4872           <para>Most implementations do
4873           not differ in the asymptotic amortized complexity of
4874           <function>push</function> and <function>pop</function> operations, but they differ in
4875           the constants involved, in the complexity of other operations
4876           (e.g., <function>modify</function>), and in the worst-case
4877           complexity of single operations. In general, the more
4878           "structured" an implementation (i.e., the more internal
4879           invariants it possesses) - the higher its amortized complexity
4880           of <function>push</function> and <function>pop</function> operations.</para>
4881
4882           <para>This library implements different algorithms using a
4883           single class: <classname>priority_queue</classname>.
4884           Instantiating the <classname>Tag</classname> template parameter, "selects"
4885           the implementation:</para>
4886
4887           <orderedlist>
4888             <listitem><para>
4889               Instantiating <classname>Tag = binary_heap_tag</classname> creates
4890               a binary heap of the form in represented in the graphic with labels A1 or A2. The former is internally
4891               selected by priority_queue
4892               if <classname>Value_Type</classname> is instantiated by a primitive type
4893               (e.g., an <type>int</type>); the latter is
4894               internally selected for all other types (e.g.,
4895               <classname>std::string</classname>). This implementations is relatively
4896               unstructured, and so has good <classname>push</classname> and <classname>pop</classname>
4897               performance; it is the "best-in-kind" for primitive
4898               types, e.g., <type>int</type>s. Conversely, it has
4899               high worst-case performance, and can support only linear-time
4900             <function>modify</function> and <function>erase</function> operations.</para></listitem>
4901
4902             <listitem><para>Instantiating <classname>Tag =
4903             pairing_heap_tag</classname> creates a pairing heap of the form
4904             in represented by label B in the graphic above. This
4905             implementations too is relatively unstructured, and so has good
4906             <function>push</function> and <function>pop</function>
4907             performance; it is the "best-in-kind" for non-primitive types,
4908             e.g., <classname>std:string</classname>s. It also has very good
4909             worst-case <function>push</function> and
4910             <function>join</function> performance (O(1)), but has high
4911             worst-case <function>pop</function>
4912             complexity.</para></listitem>
4913
4914             <listitem><para>Instantiating <classname>Tag =
4915             binomial_heap_tag</classname> creates a binomial heap of the
4916             form repsented by label B in the graphic above. This
4917             implementations is more structured than a pairing heap, and so
4918             has worse <function>push</function> and <function>pop</function>
4919             performance. Conversely, it has sub-linear worst-case bounds for
4920             <function>pop</function>, e.g., and so it might be preferred in
4921             cases where responsiveness is important.</para></listitem>
4922
4923             <listitem><para>Instantiating <classname>Tag =
4924             rc_binomial_heap_tag</classname> creates a binomial heap of the
4925             form represented in label B above, accompanied by a redundant
4926             counter which governs the trees. This implementations is
4927             therefore more structured than a binomial heap, and so has worse
4928             <function>push</function> and <function>pop</function>
4929             performance. Conversely, it guarantees O(1)
4930             <function>push</function> complexity, and so it might be
4931             preferred in cases where the responsiveness of a binomial heap
4932             is insufficient.</para></listitem>
4933
4934             <listitem><para>Instantiating <classname>Tag =
4935             thin_heap_tag</classname> creates a thin heap of the form
4936             represented by the label B in the graphic above. This
4937             implementations too is more structured than a pairing heap, and
4938             so has worse <function>push</function> and
4939             <function>pop</function> performance. Conversely, it has better
4940             worst-case and identical amortized complexities than a Fibonacci
4941             heap, and so might be more appropriate for some graph
4942             algorithms.</para></listitem>
4943           </orderedlist>
4944
4945           <para>Of course, one can use any order-preserving associative
4946           container as a priority queue, as in the graphic above label C, possibly by creating an adapter class
4947           over the associative container (much as
4948           <classname>std::priority_queue</classname> can adapt <classname>std::vector</classname>).
4949           This has the advantage that no cross-referencing is necessary
4950           at all; the priority queue itself is an associative container.
4951           Most associative containers are too structured to compete with
4952           priority queues in terms of <function>push</function> and <function>pop</function>
4953           performance.</para>
4954
4955
4956
4957         </section>
4958
4959         <section xml:id="container.priority_queue.details.traits">
4960           <info><title>Traits</title></info>
4961
4962           <para>It would be nice if all priority queues could
4963           share exactly the same behavior regardless of implementation. Sadly, this is not possible. Just one for instance is in join operations: joining
4964           two binary heaps might throw an exception (not corrupt
4965           any of the heaps on which it operates), but joining two pairing
4966           heaps is exception free.</para>
4967
4968           <para>Tags and traits are very useful for manipulating generic
4969           types. <classname>__gnu_pbds::priority_queue</classname>
4970           publicly defines <classname>container_category</classname> as one of the tags. Given any
4971           container <classname>Cntnr</classname>, the tag of the underlying
4972           data structure can be found via <classname>typename
4973           Cntnr::container_category</classname>; this is one of the possible tags shown in the graphic below.
4974           </para>
4975
4976           <figure>
4977             <title>Priority-Queue Data-Structure Tags.</title>
4978             <mediaobject>
4979               <imageobject>
4980                 <imagedata align="center" format="PNG" scale="100"
4981                  fileref="../images/pbds_priority_queue_tag_hierarchy.png"/>
4982               </imageobject>
4983               <textobject>
4984                 <phrase>Priority-Queue Data-Structure Tags.</phrase>
4985               </textobject>
4986             </mediaobject>
4987           </figure>
4988
4989
4990           <para>Additionally, a traits mechanism can be used to query a
4991           container type for its attributes. Given any container
4992           <classname>Cntnr</classname>, then <programlisting>__gnu_pbds::container_traits&lt;Cntnr&gt;</programlisting>
4993           is a traits class identifying the properties of the
4994           container.</para>
4995
4996           <para>To find if a container might throw if two of its objects are
4997           joined, one can use
4998           <programlisting>
4999             container_traits&lt;Cntnr&gt;::split_join_can_throw
5000           </programlisting>
5001           </para>
5002
5003           <para>
5004             Different priority-queue implementations have different invalidation guarantees. This is
5005             especially important, since there is no way to access an arbitrary
5006             value of priority queues except for iterators. Similarly to
5007             associative containers, one can use
5008             <programlisting>
5009               container_traits&lt;Cntnr&gt;::invalidation_guarantee
5010             </programlisting>
5011           to get the invalidation guarantee type of a priority queue.</para>
5012
5013           <para>It is easy to understand from the graphic above, what <classname>container_traits&lt;Cntnr&gt;::invalidation_guarantee</classname>
5014           will be for different implementations. All implementations of
5015           type represented by label B have <classname>point_invalidation_guarantee</classname>:
5016           the container can freely internally reorganize the nodes -
5017           range-type iterators are invalidated, but point-type iterators
5018           are always valid. Implementations of type represented by labels A1 and A2 have <classname>basic_invalidation_guarantee</classname>:
5019           the container can freely internally reallocate the array - both
5020           point-type and range-type iterators might be invalidated.</para>
5021
5022           <para>
5023             This has major implications, and constitutes a good reason to avoid
5024             using binary heaps. A binary heap can perform <function>modify</function>
5025             or <function>erase</function> efficiently given a valid point-type
5026             iterator. However, in order to supply it with a valid point-type
5027             iterator, one needs to iterate (linearly) over all
5028             values, then supply the relevant iterator (recall that a
5029             range-type iterator can always be converted to a point-type
5030             iterator). This means that if the number of <function>modify</function> or
5031             <function>erase</function> operations is non-negligible (say
5032             super-logarithmic in the total sequence of operations) - binary
5033             heaps will perform badly.
5034           </para>
5035
5036         </section>
5037
5038       </section> <!-- details -->
5039
5040     </section> <!-- priority_queue -->
5041
5042
5043
5044   </section> <!-- container -->
5045
5046   </section> <!-- design -->
5047
5048
5049
5050   <!-- S04: Test -->
5051   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml"
5052               href="test_policy_data_structures.xml">
5053   </xi:include>
5054
5055   <!-- S05: Reference/Acknowledgments -->
5056   <section xml:id="pbds.ack">
5057     <info><title>Acknowledgments</title></info>
5058     <?dbhtml filename="policy_data_structures_ack.html"?>
5059
5060     <para>
5061       Written by Ami Tavory and Vladimir Dreizin (IBM Haifa Research
5062       Laboratories), and Benjamin Kosnik (Red Hat).
5063     </para>
5064
5065     <para>
5066       This library was partially written at IBM's Haifa Research Labs.
5067       It is based heavily on policy-based design and uses many useful
5068       techniques from Modern C++ Design: Generic Programming and Design
5069       Patterns Applied by Andrei Alexandrescu.
5070     </para>
5071
5072     <para>
5073       Two ideas are borrowed from the SGI-STL implementation:
5074     </para>
5075
5076     <orderedlist>
5077       <listitem>
5078         <para>
5079           The prime-based resize policies use a list of primes taken from
5080           the SGI-STL implementation.
5081         </para>
5082       </listitem>
5083
5084       <listitem>
5085         <para>
5086           The red-black trees contain both a root node and a header node
5087           (containing metadata), connected in a way that forward and
5088           reverse iteration can be performed efficiently.
5089         </para>
5090       </listitem>
5091     </orderedlist>
5092
5093     <para>
5094       Some test utilities borrow ideas from
5095       <link xmlns:xlink="http://www.w3.org/1999/xlink"
5096             xlink:href="http://www.boost.org/libs/timer/">boost::timer</link>.
5097     </para>
5098
5099     <para>
5100       We would like to thank Scott Meyers for useful comments (without
5101       attributing to him any flaws in the design or implementation of the
5102       library).
5103     </para>
5104     <para>We would like to thank Matt Austern for the suggestion to
5105     include tries.</para>
5106   </section>
5107
5108   <!-- S06: Biblio -->
5109 <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" parse="xml"
5110             href="policy_data_structures_biblio.xml">
5111 </xi:include>
5112
5113 </chapter>