6 This directory contains an implementation of GiST indexing for Postgres.
8 GiST stands for Generalized Search Tree. It was introduced in the seminal paper
9 "Generalized Search Trees for Database Systems", 1995, Joseph M. Hellerstein,
10 Jeffrey F. Naughton, Avi Pfeffer:
12 http://www.sai.msu.su/~megera/postgres/gist/papers/gist.ps
14 and implemented by J. Hellerstein and P. Aoki in an early version of
15 PostgreSQL (more details are available from The GiST Indexing Project
16 at Berkeley at http://gist.cs.berkeley.edu/). As a "university"
17 project it had a limited number of features and was in rare use.
19 The current implementation of GiST supports:
21 * Variable length keys
22 * Composite keys (multi-key)
23 * provides NULL-safe interface to GiST core
25 * Recovery support via WAL logging
27 The support for concurrency implemented in PostgreSQL was developed based on
28 the paper "Access Methods for Next-Generation Database Systems" by
31 http://www.sai.msu.su/~megera/postgres/gist/papers/concurrency/access-methods-for-next-generation.pdf.gz
33 The original algorithms were modified in several ways:
35 * They should be adapted to PostgreSQL conventions. For example, the SEARCH
36 algorithm was considerably changed, because in PostgreSQL function search
37 should return one tuple (next), not all tuples at once. Also, it should
38 release page locks between calls.
39 * Since we added support for variable length keys, it's not possible to
40 guarantee enough free space for all keys on pages after splitting. User
41 defined function picksplit doesn't have information about size of tuples
42 (each tuple may contain several keys as in multicolumn index while picksplit
43 could work with only one key) and pages.
44 * We modified original INSERT algorithm for performance reason. In particular,
45 it is now a single-pass algorithm.
46 * Since the papers were theoretical, some details were omitted and we
47 have to find out ourself how to solve some specific problems.
49 Because of the above reasons, we have to revised interaction of GiST
50 core and PostgreSQL WAL system. Moreover, we encountered (and solved)
51 a problem of uncompleted insertions when recovering after crash, which
52 was not touched in the paper.
57 Function gettuple finds a tuple which satisfies the search
58 predicate. It store their state and returns next tuple under
59 subsequent calls. Stack contains page, its LSN and LSN of parent page
60 and currentposition is saved between calls.
64 push(stack, [root, 0, 0]) // page, LSN, parentLSN
69 latch( ptr->page, S-mode )
70 if ( ptr->page->lsn != ptr->lsn )
71 ptr->lsn = ptr->page->lsn
73 if ( ptr->parentlsn < ptr->page->nsn )
74 add to stack rightlink
80 currentposition = find_first_match( currentposition )
81 if ( currentposition is invalid )
88 else if ( ptr->page is leaf )
92 add to stack child page
102 INSERT guarantees that the GiST tree remains balanced. User defined key method
103 Penalty is used for choosing a subtree to insert; method PickSplit is used for
104 the node splitting algorithm; method Union is used for propagating changes
105 upward to maintain the tree properties.
107 NOTICE: We modified original INSERT algorithm for performance reason. In
108 particularly, it is now a single-pass algorithm.
110 Function findLeaf is used to identify subtree for insertion. Page, in which
111 insertion is proceeded, is locked as well as its parent page. Functions
112 findParent and findPath are used to find parent pages, which could be changed
113 because of concurrent access. Function pageSplit is reccurrent and could split
114 page by more than 2 pages, which could be necessary if keys have different
115 lengths or more than one key are inserted (in such situation, user defined
116 function pickSplit cannot guarantee free space on page).
119 push(stack, [root, 0]) //page, LSN
122 latch( ptr->page, S-mode )
123 ptr->lsn = ptr->page->lsn
124 if ( exists ptr->parent AND ptr->parent->lsn < ptr->page->nsn )
127 else if ( ptr->page is not leaf )
128 push( stack, [get_best_child(ptr->page, new-key), 0] )
132 latch( ptr->page, X-mode )
133 if ( ptr->page is not leaf )
134 //the only root page can become a non-leaf
136 else if ( ptr->parent->lsn < ptr->page->nsn )
145 findPath( stack item )
146 push stack, [root, 0, 0] // page, LSN, parent
149 latch( ptr->page, S-mode )
150 if ( ptr->parent->page->lsn < ptr->page->nsn )
151 push stack, [ ptr->page->rightlink, 0, ptr->parent ]
153 for( each tuple on page )
154 if ( tuple->pagepointer == item->page )
157 add to stack at the end [tuple->pagepointer,0, ptr]
164 findParent( stack item )
165 parent = item->parent
166 latch( parent->page, X-mode )
167 if ( parent->page->lsn != parent->lsn )
169 search parent tuple on parent->page, if found the return
170 rightlink = parent->page->rightlink
171 unlatch( parent->page )
172 if ( rightlink is incorrect )
175 parent->page = rightlink
176 latch( parent->page, X-mode )
178 newstack = findPath( item->parent )
179 replace part of stack to new one
180 return findParent( item )
183 pageSplit(page, allkeys)
184 (lkeys, rkeys) = pickSplit( allkeys )
190 if ( no space left on rpage )
191 newkeys = pageSplit( rpage, rkeys )
193 push newkeys, union(rkeys)
195 if ( no space left on lpage )
196 push newkeys, pageSplit( lpage, lkeys )
198 push newkeys, union(lkeys)
203 placetopage(page, keysarray)
204 if ( no space left on page )
205 keysarray = pageSplit(page, [ extract_keys(page), keysarray])
206 last page in chain gets old NSN,
207 original and others - new NSN equals to LSN
209 make new root with keysarray
212 put keysarray on page
213 if ( length of keysarray > 1 )
214 keysarray = [ union(keysarray) ]
219 stack = findLeaf(new-key)
220 keysarray = [new-key]
223 findParent( ptr ) //findParent latches parent page
224 keysarray = placetopage(ptr->page, keysarray)
228 if (length of keysarray == 1)
229 newboundingkey = union(oldboundingkey, keysarray)
230 if (newboundingkey == oldboundingkey)
238 Teodor Sigaev <teodor@sigaev.ru>
239 Oleg Bartunov <oleg@sai.msu.su>