1 \documentclass[twocolumn
]{article
}
3 \title{The Dynamic Binary search tree
}
10 \section{Introduction
}
12 Binary trees are an important data organization in which searching for
13 an element is performed in $O(
\log{n
})$ steps. Dynamic trees were new
14 elements can be added and removed may become unbalanced and thus more
15 sophisticated binary tree implementations have been suggested to keep
16 the tree in $O(
\log{n
})$ limits.
18 Our goal is controlling the height of a tree to small values. A balanced
19 tree fulfills this goal but a tree need not necessarily be balanced to have
20 its height controlled.
22 The algorithm for an efficient implementation of dynamic tree is described,
23 along with comparisons with existing dynamic tree implementations.
25 \section{Description of the tree
}
27 The basic idea is using a simple binary search tree and setting a limit on
28 it's height. Let us use for the rest of this study the number
32 as height
29 limit. This restricts the total number of nodes to no more than $
2^
{32}$ as
30 long as duplicate elements are not allowed.
31 We should notice that due to the logarithmic nature of complexity,
32 in real-life applications a balanced tree will never reach such values for
35 Supposing we can efficiently fulfill this condition, locating an element in
36 the tree will be performed in no more than
32 steps. Typically
37 $O(
32)<O(
\log{n
})$. Actually it may result in a few more steps.
38 In practice it is an unimportant
39 difference and therefore it is as fast as any balanced tree.
41 Removing an element from the tree can be easily done without the necessity
42 of a balancing operation as long as by removing an element the
43 height of the tree is not increased. This condition can be implemented.
45 If the node to be removed has two child nodes then the procedure is to find
46 the biggest element from the smaller ones or the smallest from the bigger
47 ones and replace the node to be removed with this element. A sample code is
48 illustrated in
\ref{app:rmv
}
50 Adding a new element to the tree is the most important operation. While
51 inserting the element the number of comparisons performed is counted. The
52 simple rule, when this number reaches
32 the tree must be rebalanced, is
55 Balancing the tree is accomplished in $n$ steps. However, the frequency at
56 which the tree needs to be balanced again is a function of $
\frac{1}{2^n
}$.
57 As a result adding a new element is in the great majority of the cases done
58 in less than
32 steps; but rebalancing is more rare as the number of nodes
60 \footnote{A wosre-case behavior were a lot of elements have to be inserted
61 in an empty tree and they arrive sorted is solved in
\ref{init
}}.
63 \subsection{Balancing the tree
}
65 The binary tree can be entirely reconstructed in $n$ steps using a simple
66 algorithm. First, an array of $n$ pointers is allocated an all the elements
67 of the tree are placed in this array in sorted order. Let us remember than an
68 always-balanced tree implementation requires extra memory for the node
69 structures and therefore the extra allocation of the array is yet efficient;
70 furthermore, in modern multithreaded database systems an operation which
71 alters part of the data is given full control of the hardware and the memory
72 usage burst will in this case be safely performed.
74 The next step is to reset the root of the tree and start inserting elements
75 in the order at which they will result in a perfectly balanced tree. This is
76 done by first inserting the middle element $N/
2$ of the array, then
77 elements $N/
4$ and $
3 N /
4$, then $N/
8$, $
3 N /
8$, $
5 N /
8 \ldots$
80 By this procedure $
2^k -
1$ where $k$ the integer for which
81 $
2^k
\leq N <
2^
{k +
1}$ nodes can be inserted. For the remaining $N-(
2^k -
1)$
82 nodes, if any, various algorithms of $O(n)$ steps can be applied. One way
83 that is used in the sample code is to proceed normally as if there were
84 $
2^
{k+
1}$ nodes and exclude the elements $i$ for which the modulo of the
85 division $i *
2^k / N$ is greater or equal to $N-(
2^k -
1)$ or zero; in other
86 words excluding those numbers $a$ for which there is a number $b$ so that
88 \left\lfloor \frac{a N
}{2^
{k+
1}} \right\rfloor =
89 \left\lfloor \frac{b N
}{2^k
} \right\rfloor
93 The complexity of the above algorithm will be linear if when inserting nodes
94 in the new tree redundant comparisons are avoided.
96 A sample implementation is provided at
\ref{code:balance
}.
98 \section{Characteristics of the tree
}
100 The main feature of the this tree is that after is has been balanced, the
101 more nodes it contains, the more new elements be distributed in external
102 nodes and thus it will be much more unlikely to become ``unbalanced''.
106 Memory requirements for this tree are minimal. The standard memory needed
107 is as much as any simple binary tree and the same as a double linked
108 list. Memory usage bursts occur when
109 balancing is performed but this is generally a rare and very fast action.
110 Memory due to recursion is not trivial since tree height is limited
113 Locating elements will happen in less than
32 steps. Generally after
114 balancing has been performed locating elements is a matter of $
\log{n
}$.
116 Removing elements is also a matter of less than
32 steps.
117 However, there are no rotations performed and therefore removing an
118 element is possibly performed faster than in an always-balanced tree.
120 Adding elements in most of the cases is a matter of less than
32 very
122 Balancing is rare and becomes even more rare as the tree grows and it
123 requires $n$ simple steps.
125 In long term performance, for a growing tree the number of nodes $n$
126 divided with the number of steps spent for balancing, tends to be
130 Tests have shown that the tree will need to balance
2 times for
131 1,
000,
000 elements arriving in random order, once after
20,
000 elements
132 and once again at about
400,
000.
134 \section{Database initialization
}
137 One case were the suggested tree would prove inefficient is inserting a lot
138 of elements in sorted order in an
\emph{empty
} tree.
139 If the tree is not empty and has been balanced before then the arriving
140 elements will be distributed on the external nodes.
142 This is a case which would occur when initially loading the data from
143 storage devices. The workaround to this problem is similar to the way the
144 tree is balanced. The -sorted- elements will have to be inserted in the tree
145 in the order that they will make a balanced tree, first the middle element,
146 $N/
4$, $
3*N/
4$ and so on.
148 Once a perfectly balance tree is constructed there should be no worry for
149 this worse-case occurring unless most of the elements are removed.
151 Generally, in cases were a long sequence of elements is to be inserted
152 in the tree, if the tree has few nodes it may be preferable to sort the
153 sequence of elements and then insert the right elements first.
154 This procedure is worth the tradeoff if the arriving elements are almost
155 sorted, and in this case special sorting algorithms can be applied.
157 Another solution is to make the elements really random, by using a random
158 tree. In a
\emph{random tree
}, the comparison of elements is heads or tails.
159 After the sequence of elements has been inserted in a random tree (which
160 will never be unbalanced), they can be read in-order and sent to the dbs
163 \section{Conclusions
}
165 The tree structure with the characteristics that have been presented is
166 efficient for specific database requirements.
168 When the database is idle or the specific tree is not used, is consumes the
169 least memory compared to other balanced tree structures.
170 That is the primary reason for the design of this structure.
172 When the tree is just accessed for locating an element this is as fast as
175 Theoretically, balanced tree implementations which perform rotations after
176 a node is inserted or removed from the tree are better. In practice, and
177 more specificly in a database environment limiting the height of the tree to
178 a `small' number is more than adequate.
179 Except from memory efficiency, the suggested tree structure may possibly
180 behave faster as well; the worse case scenario will either occur at the
181 initialization of the database or else is can be easilly prevented.
184 \section{Sample code
}
186 A sample implementation in C of the presented tree is provided.
187 For the sample code the simple definition of a node with a pointer to
188 data is used. The function
{\tt compare ()
} is the application-specific
189 function which compares the data of two nodes.
192 #define MAX_HEIGHT
32
193 typedef struct tnode node;
201 \subsection{Removing a node
}
204 When removing an element which has two children,
205 both the smallest from the bigger elements and the
206 biggest from the smaller elements are located. By convention the deepest one
207 is selected to replace the node to be removed.
210 void remove (node *n)
212 node *np, *nl, *nr, *nlp, *nrp;
215 if (!(isroot = (np = root) == n))
217 if (compare (np->data, n->data) >
0)
218 if (np->less == n) break;
221 if (np->more == n) break;
224 if (!(n->less && n->more))
226 if (isroot) root = (n->less) ? n->less : n->more;
228 if (np->less == n) np->less = (n->less) ? n->less : n->more;
229 else np->more = (n->less) ? n->less : n->more;
232 for (i =
0, nlp = NULL, nl = n->less; nl->more; i++)
233 nlp = nl, nl = nl->more;
234 for (j =
0, nrp = NULL, nr = n->more; nr->less; j++)
235 nrp = nr, nr = nr->less;
238 if (isroot) root = nl;
240 if (np->less == n) np->less = nl;
249 if (isroot) root = nr;
251 if (np->less == n) np->less = nr;
263 \subsection{Balancing
}
266 The algorithm for reconstructing a balanced tree.
267 {\tt alloca
} is a stack allocation mechanism.
269 Special care should be taken that the variables
{\tt i, j, D, k
} in
270 {\tt balance()
} are
64 bit integers. If not then the algorithm will fail for
271 trees with more than about
65000 nodes.
277 void count_nodes (node *n)
280 if (n->less) count_nodes (n->less);
281 if (n->more) count_nodes (n->more);
283 void tree_to_array (node *n)
286 tree_to_array (n->less);
288 tree_to_array (n->more);
293 long int i, j, D, k, k2, k3;
295 /** Generate a sorted array **/
298 npp = snpp = (node**) alloca (sizeof (node*) * suint);
299 tree_to_array (root);
300 /** Insert the top
2^k -
1 nodes **/
301 root = npp
[suint /
2];
302 for (i =
4; i <= suint +
1; i *=
2)
303 for (j =
2; j < i; j +=
4)
306 npp
[k2
]->less = npp
[suint * (j -
1) / i
];
307 npp
[k2
]->more = npp
[suint * (j +
1) / i
];
309 /** Test whether there are nodes left **/
310 if ((k = suint +
1 - i /
2)) ==
0)
312 for (i /=
2, j =
1; j < i; j +=
2)
314 npp
[k2
]->less = npp
[k2
]->more = NULL;
317 /** Proceed normally but for specific nodes **/
318 for (j =
2; j < i; j +=
4)
321 D = (k2 = suint * (j -
1) / i) * (i /
2)
% suint;
322 if (D >= k || D ==
0) /* k2 Excluded */
323 npp
[k3
]->less = NULL;
324 else
{ /* k2 Inserted */
325 npp
[k3
]->less = npp
[k2
];
326 npp
[k2
]->less = npp
[k2
]->more = NULL;
328 D = (k2 = suint * (j +
1) / i) * (i /
2)
% suint;
329 if (D >= k || D ==
0)
330 npp
[k3
]->more = NULL;
332 npp
[k3
]->more = npp
[k2
];
333 npp
[k2
]->less = npp
[k2
]->more = NULL;
340 \subsection{Inserting a node
}
342 The procedure is similar to that of a simple binary search tree.
350 n->less = n->more = NULL;
357 if (compare (np->data, n->data) >
0)
358 if (!np->less)
{ np->less = n; break;
}
361 if (!np->more)
{ np->more = n; break;
}
367 } else if (h > MAX_HEIGHT)
{
368 // If that ever happens please send me a mail and I'll send
369 // you
10.000$ for having a tree with
4bill nodes
370 // or make MAX_HEIGHT
34 and go to
16bills
371 printf ("Tree full. Reached
2^
%i nodes\n", MAX_HEIGHT);