This C++ Tutorial Explains the B Tree & B+ Tree Data Structures. They are Used to Store Data in Disks When the Entire Data Cannot be Stored in the Main Memory:
B-tree is a self-balanced tree as well as a specialized m-way tree that is used for disk access.
When the amount of data to be stored is very high, we cannot store the entire data in the main memory. Hence we store data in the disk. Data access from the disk takes more time when compared to the main memory access.
When the number of keys of the data stored in disks is very high, the data is usually accessed in the form of blocks. The time to access these blocks is directly proportional to the height of the tree.
What You Will Learn:
B-Tree In C++
The B-Tree is a flat tree i.e. the height of the B tree is kept to a minimum. Instead, as many keys are put in each node of the B-tree. By keeping the height of the B-tree to the minimum, the access is faster when compared to other balanced trees like AVL trees.
A typical B-tree is shown below:
Generally, the node size in B-tree is kept the same as the block size.
Listed below are some of the properties of B-Tree.
- All leaves of B-tree are at the same level.
- A B-tree of order m can have at most m-1 keys and m children.
- Every node in B-tree has at most m children.
- Root node must have at least two nodes.
- Every node except the root node and the leaf node contain m/2 children.
Next, we discuss some of the basic operations of B-tree.
Basic Operations Of B-Tree
Given below are some of the Basic operations of B-Tree.
Searching in B tree is similar to that in BST. In the above tree if we want to look for item 3, then we will proceed as follows:
- Compare 3 with root elements. Here, 3 < 6 and 3 <15. So we proceed to the left subtree.
- Compare 3 with 4 in the left subtree. As 3<4, we proceed to the left subtree of node 4.
- The next node has two keys, 2 and 3. 3 > 2 while 3=3. So we have found the key at this node. Return to the found location.
Searching in B tree depends on the height of the tree. It usually takes O (log n) amount of time to search for a given item.
The insertion of a new item in B tree is done at the leaf nodes level.
Following is the sequence of steps (algorithm) to insert a new item in the B tree.
- Traverse the B tree to find a location at the leaf nodes to insert the item.
- If the leaf node contains keys < m-1, then insert a node in increasing order.
- Else if leaf node keys = m-1, then:
- Insert a new item in an increasing number of items.
- Take the median of the nodes and split the node into two nodes.
- Push median to its parent node.
- If parent node key = m-1, then repeat the above steps for parent node as well. This is done until we find the appropriate leaf node.
- Finally, insert the element.
We have demonstrated insertion in B tree as follows.
Let us insert item 70 in the tree shown below.
The given tree is with minimum degree ‘m’ = 3. Hence, each node can accommodate, 2*m -1 = 5 keys inside it.
Now we insert item 70. As we can have 5 keys in a node, we compare element 70 with the root element 30. Since 70 > 30, we will insert item 70 in the right subtree.
In the right subtree, we have a node with keys 40, 50, 60. As each node can have 5 keys in it, we will insert the item 70 in this node itself.
After insertion, the B-Tree looks as follows.
Just like insertion, the deletion of the key is also carried out at leaf nodes level. But unlike insertion, deletion is more complicated. The key to be deleted can be either a leaf node or an internal node.
To delete a key, we follow the below sequence of steps (algorithm).
1. Locate the leaf node.
2. In case the number of keys in a node > m/2, then delete the given key from the node.
3. In case the leaf node is not having m/2 keys, then we need to complete the keys by taking keys from the right or left subtrees to maintain the B tree.
We follow these steps:
- If the left subtree contains m/2 elements then we push its largest element to the parent node and then move the intervening element to the place where the key was deleted.
- If the right subtree contains m/2 elements then we push its smallest element to the parent node and then move the intervening element to the place where the key was deleted.
4. If no node has m/2 nodes then the above steps cannot be performed, thus we create a new leaf node by joining two leaf nodes and the intervening element of the parent node.
5. If a parent is without m/2 nodes, then we repeat the above steps for the parent node in question. These steps are repeated until we get a balanced B tree.
Given below is an illustration to delete a node from B tree.
Consider the following B tree once again.
Let's assume that we want to delete the key 60 from the B tree. If we check the B tree, we can find that the key to be deleted is present in the leaf node. We delete the key 60 from this leaf node.
Now the tree will look as shown below:
We can notice that after deleting the key 60, the B tree leaf node satisfies the required properties and we need not do any more modifications to the B tree.
However, if we had to delete key 20, then only key 10 would have remained in the left leaf node. In this case, we would have to shift root 30 to the leaf node and move key 40 to the root.
Thus while deleting a key from B tree, care should be taken and the properties of the B tree should not be violated.
Traversal In B Tree
Traversal in B tree is also similar to inorder traversal in BST. We start recursively from the left then come to root and proceed towards the left subtree.
Applications Of B Trees
- B trees are used to index the data especially in large databases as access to data stored in large databases on disks is very time-consuming.
- Searching of data in larger unsorted data sets takes a lot of time but this can be improved significantly with indexing using B tree.
B+ Tree In C++
B+ tree is an extension of the B tree. The difference in B+ tree and B tree is that in B tree the keys and records can be stored as internal as well as leaf nodes whereas in B+ trees, the records are stored as leaf nodes and the keys are stored only in internal nodes.
The records are linked to each other in a linked list fashion. This arrangement makes the searches of B+ trees faster and efficient. Internal nodes of the B+ tree are called index nodes.
The B+ trees have two orders i.e. one for internal nodes and other for leaf or external nodes.
An Example of B+ Tree is shown below.
As B+ tree is an extension of B-tree, the basic operations that we discussed under the topic B-tree still holds.
While inserting as well as deleting, we should maintain the basic properties of B+ Trees intact. However, deletion operation in the B+ tree is comparatively easier as the data is stored only in the leaf nodes and it will be deleted from the leaf nodes always.
Advantages Of B+ Trees
- We can fetch records in an equal number of disk accesses.
- Compared to the B tree, the height of the B+ tree is less and remains balanced.
- We use keys for indexing.
- Data in the B+ tree can be accessed sequentially or directly as the leaf nodes are arranged in a linked list.
- Search is faster as data is stored in leaf nodes only and as a linked list.
Difference Between B-Tree And B+ Tree
|Data is stored in leaf nodes as well as internal nodes.||Data is stored only in leaf nodes.|
|Searching is a bit slower as data is stored in internal as well as leaf nodes.||Searching is faster as the data is stored only in the leaf nodes.|
|No redundant search keys are present.||Redundant search keys may be present.|
|Deletion operation is complex.||Deletion operation is easy as data can be directly deleted from the leaf nodes.|
|Leaf nodes cannot be linked together.||Leaf nodes are linked together to form a linked list.|
In this tutorial, we have discussed B-tree and B+ tree in detail. These two data structures are used when there is a very high amount of data and when the entire data cannot be stored in the main memory. Thus to store data in disks, we make use of B-tree and B+ tree.
B-tree search is slightly slower as the data is stored in internal nodes as well as leaf nodes in it. B+ tree is an extension of B-tree and the data here is stored in leaf nodes only. Due to this factor, searching in a B+ tree is faster and efficient.