src/backend/access/gist/README

   1 $PostgreSQL$
   2
   3 GiST Indexing
   4 =============
   5
   6 This directory contains an implementation of GiST indexing for Postgres.
   7
   8 GiST stands for Generalized Search Tree. It was introduced in the seminal paper
   9 "Generalized Search Trees for Database Systems", 1995, Joseph M. Hellerstein,
  10 Jeffrey F. Naughton, Avi Pfeffer:
  11
  12     http://www.sai.msu.su/~megera/postgres/gist/papers/gist.ps
  13
  14 and implemented by J. Hellerstein and P. Aoki in an early version of
  15 PostgreSQL (more details are available from The GiST Indexing Project
  16 at Berkeley at http://gist.cs.berkeley.edu/). As a "university"
  17 project it had a limited number of features and was in rare use.
  18
  19 The current implementation of GiST supports:
  20
  21   * Variable length keys
  22   * Composite keys (multi-key)
  23   * provides NULL-safe interface to GiST core
  24   * Concurrency
  25   * Recovery support via WAL logging
  26
  27 The support for concurrency implemented in PostgreSQL was developed based on
  28 the paper "Access Methods for Next-Generation Database Systems" by
  29 Marcel Kornaker:
  30
  31     http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/access-methods-for-next-generation.pdf.gz
  32
  33 The original algorithms were modified in several ways:
  34
  35 * They should be adapted to PostgreSQL conventions. For example, the SEARCH
  36   algorithm was considerably changed, because in PostgreSQL function search
  37   should return one tuple (next), not all tuples at once. Also, it should
  38   release page locks between calls.
  39 * Since we added support for variable length keys, it's not possible to
  40   guarantee enough free space for all keys on pages after splitting. User
  41   defined function picksplit doesn't have information about size of tuples
  42   (each tuple may contain several keys as in multicolumn index while picksplit
  43   could work with only one key) and pages.
  44 * We modified original INSERT algorithm for performance reason. In particular,
  45   it is now a single-pass algorithm.
  46 * Since the papers were theoretical, some details were omitted and we
  47   have to find out ourself how to solve some specific problems.
  48
  49 Because of the above reasons, we have to revised interaction of GiST
  50 core and PostgreSQL WAL system. Moreover, we encountered (and solved)
  51 a problem of uncompleted insertions when recovering after crash, which
  52 was not touched in the paper.
  53
  54 Search Algorithm
  55 ----------------
  56
  57 Function gettuple finds a tuple which satisfies the search
  58 predicate. It store their state and returns next tuple under
  59 subsequent calls. Stack contains page, its LSN and LSN of parent page
  60 and currentposition is saved between calls.
  61
  62 gettuple(search-pred)
  63         if ( firsttime )
  64                 push(stack, [root, 0, 0]) // page, LSN, parentLSN
  65                 currentposition=0
  66         end
  67         ptr = top of stack
  68         while(true)
  69                 latch( ptr->page, S-mode )
  70                 if ( ptr->page->lsn != ptr->lsn )
  71                         ptr->lsn = ptr->page->lsn
  72                         currentposition=0
  73                         if ( ptr->parentlsn < ptr->page->nsn )
  74                                 add to stack rightlink
  75                 else
  76                         currentposition++
  77                 end
  78
  79                 while(true)
  80                         currentposition = find_first_match( currentposition )
  81                         if ( currentposition is invalid )
  82                                 unlatch( ptr->page )
  83                                 pop stack
  84                                 ptr = top of stack
  85                                 if (ptr is NULL)
  86                                         return NULL
  87                                 break loop
  88                         else if ( ptr->page is leaf )
  89                                 unlatch( ptr->page )
  90                                 return tuple
  91                         else
  92                                 add to stack child page
  93                         end
  94                         currentposition++
  95                 end
  96         end
  97
  98
  99 Insert Algorithm
 100 ----------------
 101
 102 INSERT guarantees that the GiST tree remains balanced. User defined key method
 103 Penalty is used for choosing a subtree to insert; method PickSplit is used for
 104 the node splitting algorithm; method Union is used for propagating changes
 105 upward to maintain the tree properties.
 106
 107 NOTICE: We modified original INSERT algorithm for performance reason. In
 108 particularly, it is now a single-pass algorithm.
 109
 110 Function findLeaf is used to identify subtree for insertion. Page, in which
 111 insertion is proceeded, is locked as well as its parent page. Functions
 112 findParent and findPath are used to find parent pages, which could be changed
 113 because of concurrent access. Function pageSplit is reccurrent and could split
 114 page by more than 2 pages, which could be necessary if keys have different
 115 lengths or more than one key are inserted (in such situation, user defined
 116 function pickSplit cannot guarantee free space on page).
 117
 118 findLeaf(new-key)
 119         push(stack, [root, 0]) //page, LSN
 120         while(true)
 121                 ptr = top of stack
 122                 latch( ptr->page, S-mode )
 123                 ptr->lsn = ptr->page->lsn
 124                 if ( exists ptr->parent AND ptr->parent->lsn < ptr->page->nsn )
 125                         unlatch( ptr->page )
 126                         pop stack
 127                 else if ( ptr->page is not leaf )
 128                         push( stack, [get_best_child(ptr->page, new-key), 0] )
 129                         unlatch( ptr->page )
 130                 else
 131                         unlatch( ptr->page )
 132                         latch( ptr->page, X-mode )
 133                         if ( ptr->page is not leaf )
 134                                 //the only root page can become a non-leaf
 135                                 unlatch( ptr->page )
 136                         else if ( ptr->parent->lsn < ptr->page->nsn )
 137                                 unlatch( ptr->page )
 138                                 pop stack
 139                         else
 140                                 return stack
 141                         end
 142                 end
 143         end
 144
 145 findPath( stack item )
 146         push stack, [root, 0, 0] // page, LSN, parent
 147         while( stack )
 148                 ptr = top of stack
 149                 latch( ptr->page, S-mode )
 150                 if ( ptr->parent->page->lsn < ptr->page->nsn )
 151                         push stack, [ ptr->page->rightlink, 0, ptr->parent ]
 152                 end
 153                 for( each tuple on page )
 154                         if ( tuple->pagepointer == item->page )
 155                                 return stack
 156                         else
 157                                 add to stack at the end [tuple->pagepointer,0, ptr]
 158                         end
 159                 end
 160                 unlatch( ptr->page )
 161                 pop stack
 162         end
 163
 164 findParent( stack item )
 165         parent = item->parent
 166         latch( parent->page, X-mode )
 167         if ( parent->page->lsn != parent->lsn )
 168                 while(true)
 169                         search parent tuple on parent->page, if found the return
 170                         rightlink = parent->page->rightlink
 171                         unlatch( parent->page )
 172                         if ( rightlink is incorrect )
 173                                 break loop
 174                         end
 175                         parent->page = rightlink
 176                         latch( parent->page, X-mode )
 177                 end
 178                 newstack = findPath( item->parent )
 179                 replace part of stack to new one
 180                 return findParent( item )
 181         end
 182
 183 pageSplit(page, allkeys)
 184         (lkeys, rkeys) = pickSplit( allkeys )
 185         if ( page is root )
 186                 lpage = new page
 187         else
 188                 lpage = page
 189         rpage = new page
 190         if ( no space left on rpage )
 191                 newkeys = pageSplit( rpage, rkeys )
 192         else
 193                 push newkeys, union(rkeys)
 194         end
 195         if ( no space left on lpage )
 196                 push newkeys, pageSplit( lpage, lkeys )
 197         else
 198                 push newkeys, union(lkeys)
 199         end
 200         return newkeys
 201
 202
 203 placetopage(page, keysarray)
 204         if ( no space left on page )
 205                 keysarray = pageSplit(page, [ extract_keys(page), keysarray])
 206                 last page in chain gets old NSN,
 207                 original and others - new NSN equals to LSN
 208                 if ( page is root )
 209                         make new root with keysarray
 210                 end
 211         else
 212                 put keysarray on page
 213                 if ( length of keysarray > 1 )
 214                         keysarray = [ union(keysarray) ]
 215                 end
 216         end
 217
 218 insert(new-key)
 219         stack = findLeaf(new-key)
 220         keysarray = [new-key]
 221         ptr = top of stack
 222         while(true)
 223                 findParent( ptr ) //findParent latches parent page
 224                 keysarray = placetopage(ptr->page, keysarray)
 225                 unlatch( ptr->page )
 226                 pop stack;
 227                 ptr = top of stack
 228                 if (length of keysarray == 1)
 229                         newboundingkey = union(oldboundingkey, keysarray)
 230                         if (newboundingkey == oldboundingkey)
 231                                 unlatch ptr->page
 232                                 break loop
 233                         end
 234                 end
 235         end
 236
 237 Authors:
 238         Teodor Sigaev   <teodor@sigaev.ru>
 239         Oleg Bartunov   <oleg@sai.msu.su>