B-tree variations
As described below, a number of variations to the basic B-tree algorithms can be attempted in order to save time or space.
- B-tree implementations often explicitly store keys in B-tree nodes. The B-TREE-P routines, however, store only Pick item id's in the B-tree, and the actual keys are computed on the fly as the B-tree is traversed. Branches #8 discusses the trade-offs of the two approaches.
- B-TREE-P supports secondary sorts by concatenating subsidiary keys to the primary keys using nulls in the BTPKEY routine. Some B-tree implementations (such as the MUMPS environment used by many hospitals) instead let keys also act as root nodes for whole subsidiary B-trees, creating a hierarchy of B-trees. For example, the root B-tree might have patient numbers as keys. Each patient key can also serve as the root for a second level B-tree containing visit dates as keys, each visit date can be the root for a third level B-tree containing medical procedure codes, and so on.
- B-TREE-P performs a binary search inside each node. Sometimes a simple linear search or some other technique can provide better performance.
- Because the value of adjacent keys and pointers within a node are usually very similar, it's generally easy to save space by compressing nodes using algorithms that store only the first key or pointer along with the computed differences between subsequent keys and pointers in the node.
- A trick some B-tree implementors use: whenever traveling down a B-tree, split nearly full nodes right away, so splitting never has to propagate back up the tree. (This can be particularly valuable in multi-user environments, since only low-level portions of the B-tree have to be locked against simultaneous access during updates, instead of the entire tree having to be locked from the root down.)
- So-called B*-trees keep each node at least 2/3 full instead of just 1/2 by redistributing keys until 2 child nodes are full, then splitting the 2 full nodes into 3 nodes each 2/3 full. As a result, nodes are generally fuller and trees are more shallow (leading to faster searches).
- Each B-TREE-P node has a pointer back to its parent node. Some B-tree algorithms don't store a parent pointer at all, and instead use recursive stack techniques to traverse nodes. Or, as in so-called B+-trees, all keys are stored at the lowest level of the B-tree in leaves, which can then be easily linked in one "leaf list" for sequential traversal (although managing the "index keys" in the higher intermediate nodes becomes slightly more complicated.)
- In the same way that redistributing keys after an underflow can avoid having to concatenate nodes, redistributing after an overflow can avoid splitting. In other words, instead of splitting as soon as a node fills, try redistributing keys with a neighbor and only split when the neighbor is too full.
Many more variations like the above have been invented over the years by B-tree developers. For a good starting point at researching B-tree techniques, Semaphore Corporation recommends consulting The Ubiquitous B-Tree by Douglas Comer, published in Computing Surveys Vol. 11, No. 2, June 1979.
B-TREE-P copyright © 1987-1999 by Semaphore Corporation