Introduction
Binary search trees are very important data structures
in computer science. They can be used to implement ordered sets
and ordered maps.
To make them more efficient, it is often necessary to implement
them as self-balancing trees. Self-balancing trees are trees
that balance themselves after every insertion and deletion.
In this article, we will implement an AVL tree, which is
such a self-balancing tree.
Binary Search Trees
Binary search trees are trees that are ordered by their keys. If we were to do an in-order traversal of the tree, we would get the keys in ascending order:
Due to their sorted nature, binary search trees provide three
operations:
-
Insertion:
traverse down the tree until
we find a leaf node, and insert a new node with
the provided value as its left or right child.
If we find a node with the same value along the way, the value is already in the tree and we do nothing. Binary search trees do not support duplicate values. -
Deletion:
traverse down the tree until
we find a node with the value we want to delete.
We replace the node with its left child.
If it doesn't have a left child, we replace it
with its right child.
If we don't find a node with the value to delete, we do nothing. - Search: traverse down the tree until we find the node we are looking for. Then we return true. If we don't find it, we return false.
Binary search trees are quite easy to implement, but they can be very inefficient, because they are not self-balancing.
Imagine that we implement a binary search tree with the algorithms described above, and we insert the integers between 1 and 5 in order.
We will end up with a tree like this:
That doesn't look very much like a binary tree!
In fact, we have a unary tree, which is a tree with only
one node as child. Do you know another name for this data
structure?
I do, it's a linked list.
When we try to search for the value 5, we have to traverse
down the entire tree, which completely defeats the purpose of
a binary search tree.
This is why we need self-balancing trees.
"Balanced" Trees
Before we can understand what a self-balancing tree is,
we first have to define what makes a tree balanced.
This definition depends on the type of self-balancing tree
we are dealing with.
Since we are going to implement an AVL tree in this article,
we will use the following definition:
The above tree is balanced, because there is no node
which children's heights differ by more than one.
Note that the height of a null node is 0.
Let's look at node 3 and find out if it is balanced or not.
Node 3 only has a left child, which has a height of 1.
The right child of node 3 is null, so it has a height of 0.
Since the height difference between the heights of the
children of node 3 is one, this node is balanced.
Now let's look at node 8.
Both children of node 8 have heights of 1, so node 8 is
balanced.
We can use the same logic for all other nodes and see that
they are all balanced.
The definition of a balanced AVL tree also allows for some
intuition:
AVL trees are balanced if they look "kind of complete".
All levels in the tree must be filled, except for the last two
levels, which may be partially filled with gaps in between.
If we were to add a node to the tree, we might have to
rebalance the tree.
In the above image, we can see that when we add a node with
value 2 to the tree, we have to rebalance the tree, because
the tree is not balanced anymore at node 3.
We will have to perform a rotation to fix the tree.
In particular, this rotation is called a "left-right rotation",
because the three nodes we are rotating around are in a
left-right order.
We can see that the tree is properly balanced again after
the rotation.
To make things more efficient, we store the height of each
node in the tree.
It's important to realise that we have to update the
heights of all nodes in the traversal path when we insert
the new node.
AVL Tree Rotations
We can generalise what we have discussed so far to four different scenarios, which all require their own rotation. Before we go over these four scenarios, let's first write some essential code for AVL tree nodes:
1
class AVLTreeNode<T>
2
{
3
public T value;
4
public int height;
5
public AVLTreeNode<T> left;
6
public AVLTreeNode<T> right;
7
8
/**
9
* Constructs an AVL tree leaf node with a given value.
10
*/
11
public
12
AVLTreeNode(T value)
13
{
14
this.value = value;
15
height = 1;
16
left = null;
17
right = null;
18
}
19
}
20
21
class AVLTree<T extends Comparable<T>>
22
{
23
public AVLTreeNode<T> root;
24
int size;
25
26
/**
27
* Constructs an empty AVL tree.
28
*/
29
public
30
AVLTree()
31
{
32
root = null;
33
size = 0;
34
}
35
36
public void
37
insert(T value) { ... }
38
39
public void
40
remove(T value) { ... }
41
42
public boolean
43
has(T value) { ... }
44
45
/**
46
* Returns the height of a node in an AVL tree.
47
* If the node is null, returns 0.
48
*/
49
private int
50
avlHeight(AVLTreeNode<T> node)
51
{
52
if(node == null)
53
{
54
return 0;
55
}
56
57
return node.height;
58
}
59
}
Now we will go over the four rotation scenarios.
We will create a function for each of them, which will take in
a node and perform the appropriate rotation.
In the end, the new root node of the subtree is returned.
Left-Left Rotation
In this scenario, we have a left-left imbalance.
The tree leans too much towards the left-left side.
The height of Y is at least two greater than the height of T4.
The highest child of Y is its left child: X.
Therefore we will have to perform a rotation around X, Y and Z
to fix the tree.
1
/**
2
* Rebalances the subtree rooted at node `z` for the left-left case.
3
*/
4
private AVLTreeNode<T>
5
avlRebalanceLeftLeft(AVLTreeNode<T> z)
6
{
7
AVLTreeNode<T> y = z.left;
8
AVLTreeNode<T> x = y.left;
9
AVLTreeNode<T> T1 = x.left;
10
AVLTreeNode<T> T2 = x.right;
11
AVLTreeNode<T> T3 = y.right;
12
AVLTreeNode<T> T4 = z.right;
13
14
// Reorganise the nodes.
15
16
z.left = T3;
17
y.right = z;
18
19
// Update the heights.
20
21
x.height = 1 + Math.max(avlHeight(T1), avlHeight(T2));
22
y.height = 1 + Math.max(avlHeight(T3), avlHeight(T4));
23
z.height = 1 + Math.max(x.height, y.height);
24
25
// Return the new root.
26
27
return y;
28
}
Left-Right Rotation
In this scenario, we have a left-right imbalance.
The tree leans too much towards the left-right side.
The height of Y is at least two greater than the height of T4.
The highest child of Y is its right child: X.
Therefore we will have to perform a rotation around X, Y and Z
to fix the tree.
1
/**
2
* Rebalances the subtree rooted at node `z` for the left-right case.
3
*/
4
private AVLTreeNode<T>
5
avlRebalanceLeftRight(AVLTreeNode<T> z)
6
{
7
AVLTreeNode<T> y = z.left;
8
AVLTreeNode<T> x = y.right;
9
AVLTreeNode<T> T1 = y.left;
10
AVLTreeNode<T> T2 = x.left;
11
AVLTreeNode<T> T3 = x.right;
12
AVLTreeNode<T> T4 = z.right;
13
14
// Reorganise the nodes.
15
16
y.right = T2;
17
z.left = T3;
18
x.left = y;
19
x.right = z;
20
21
// Update the heights.
22
23
y.height = 1 + Math.max(avlHeight(T1), avlHeight(T2));
24
z.height = 1 + Math.max(avlHeight(T3), avlHeight(T4));
25
x.height = 1 + Math.max(y.height, z.height);
26
27
// Return the new root.
28
29
return x;
30
}
Right-Right Rotation
In this scenario, we have a right-right imbalance.
The tree leans too much towards the right-right side.
The height of Y is at least two greater than the height of T1.
The highest child of Y is its right child: X.
Therefore we will have to perform a rotation around X, Y and Z
to fix the tree.
1
/**
2
* Rebalances the subtree rooted at node `z` for the right-right case.
3
*/
4
private AVLTreeNode<T>
5
avlRebalanceRightRight(AVLTreeNode<T> z)
6
{
7
AVLTreeNode<T> y = z.right;
8
AVLTreeNode<T> x = y.right;
9
AVLTreeNode<T> T1 = z.left;
10
AVLTreeNode<T> T2 = y.left;
11
AVLTreeNode<T> T3 = x.left;
12
AVLTreeNode<T> T4 = x.right;
13
14
// Reorganise the nodes.
15
16
z.right = T2;
17
y.left = z;
18
19
// Update the heights.
20
21
z.height = 1 + Math.max(avlHeight(T1), avlHeight(T2));
22
x.height = 1 + Math.max(avlHeight(T3), avlHeight(T4));
23
y.height = 1 + Math.max(x.height, z.height);
24
25
// Return the new root.
26
27
return y;
28
}
Right-Left Rotation
In this scenario, we have a right-left imbalance.
The tree leans too much towards the right-left side.
The height of Y is at least two greater than the height of T1.
The highest child of Y is its left child: X.
Therefore we will have to perform a rotation around X, Y and Z
to fix the tree.
1
/**
2
* Rebalances the subtree rooted at node `z` for the right-left case.
3
*/
4
private AVLTreeNode<T>
5
avlRebalanceRightLeft(AVLTreeNode<T> z)
6
{
7
AVLTreeNode<T> y = z.right;
8
AVLTreeNode<T> x = y.left;
9
AVLTreeNode<T> T1 = z.left;
10
AVLTreeNode<T> T2 = x.left;
11
AVLTreeNode<T> T3 = x.right;
12
AVLTreeNode<T> T4 = y.right;
13
14
// Reorganise the nodes.
15
16
z.right = T2;
17
y.left = T3;
18
x.left = z;
19
x.right = y;
20
21
// Update the heights.
22
23
z.height = 1 + Math.max(avlHeight(T1), avlHeight(T2));
24
y.height = 1 + Math.max(avlHeight(T3), avlHeight(T4));
25
x.height = 1 + Math.max(y.height, z.height);
26
27
// Return the new root.
28
29
return x;
30
}
AVL Tree Insertion
One of the three main operations in an AVL tree is the
insertion of a particular value.
The insertion process takes
time, where is the number of nodes in the tree.
The insertion process is implemented in the following
three steps:
-
Traverse down the tree in binary search
fashion; if we come across a node with a value
larger than the value we are trying to insert,
we go left. If we come across a smaller node,
we go right.
We have to keep in mind one edge case: if we find a node with the same value as the value we are trying to insert, we will stop the insertion process and not do anything. AVL trees don't store duplicate values. - When we are done traversing, we have found a leaf node where we will insert the new value. We create a node for the value and attach it to the correct side of the leaf node.
-
We traverse back up the tree. In each step
we recompute the height of the current node,
and check if the node is balanced.
If a node becomes unbalanced, we check which rotation we have to perform in order to fix the tree. We perform the rotation.
1
private AVLTreeNode<T>
2
avlInsert(AVLTreeNode<T> node, T value)
3
{
4
// When we reach the end of the tree
5
// insert the new value at this position.
6
7
if (node == null)
8
{
9
size++;
10
return new AVLTreeNode<T>(value);
11
}
12
13
// Traverse the tree according to how the new value compares
14
// to the left and right children of the current node.
15
16
if (value.compareTo(node.value) < 0)
17
{
18
node.left = avlInsert(node.left, value);
19
}
20
else if (value.compareTo(node.value) > 0)
21
{
22
node.right = avlInsert(node.right, value);
23
}
24
else
25
{
26
// The value already exists in the tree.
27
// We will do nothing.
28
29
return node;
30
}
31
32
// Get the height of the left and right children.
33
34
int leftHeight = avlHeight(node.left);
35
int rightHeight = avlHeight(node.right);
36
37
// After the recursive downwards step, we will traverse the
38
// tree upwards again and update the height of the current node.
39
40
node.height = 1 + Math.max(leftHeight, rightHeight);
41
42
// We will check if the current node is unbalanced.
43
// If it is, we will have to find a rotation to fix it.
44
45
int balance = leftHeight - rightHeight;
46
47
if (balance > 1)
48
{
49
// The left subtree is too tall.
50
51
if (value.compareTo(node.left.value) < 0)
52
{
53
// Left-left case.
54
55
return avlRebalanceLeftLeft(node);
56
}
57
else
58
{
59
// Left-right case.
60
61
return avlRebalanceLeftRight(node);
62
}
63
}
64
65
if (balance < -1)
66
{
67
// The right subtree is too tall.
68
69
if (value.compareTo(node.right.value) > 0)
70
{
71
// Right-right case.
72
73
return avlRebalanceRightRight(node);
74
}
75
else
76
{
77
// Right-left case.
78
79
return avlRebalanceRightLeft(node);
80
}
81
}
82
83
// This node is already balanced, so we just return it
84
// as it currently is.
85
86
return node;
87
}
Note for the attentive reader:
I have received a very good question about the part of the
above code where we decide which rotation to perform.
In the code above, we check in what subtree we inserted
the new value. We make the decision based on that information,
and not on which subtree is actually higher. This is not a
mistake, but an optimisation. It is slightly more efficient
to compare the node to the value we inserted than to compare
heights of both children.
The reason this works is because the newly inserted node
will always be in the subtree that is too large. After all,
the new node caused the imbalance in the first place.
If we get to an unbalanced node, we can now simply check in
what subtree we inserted the new value to get the correct
rotation we have to perform. If the node is in the left-right
subtree, we will perform a left-right rotation etc.
We cannot perform this optimisation if we want to
remove a node. So in the AVL removal code you will see below,
we will instead check which subtree is higher the
unoptimised way.
AVL Tree Removal
The second fundamental operation of an AVL tree is the
removal of a value from the tree.
This algorithm also takes
time.
The algorithm is as follows:
- Traverse down the tree in binary search fashion; if we come across a node with a value larger than the value we are trying to remove, we go left. If we come across a smaller node, we go right.
-
When we come across the node we want to delete,
we will delete it from the AVL tree.
There are three cases to consider:- If the node has no children, we can simply delete it.
- If the node has one child, we will can it with its child.
- If the node has two children, we will replace it with the smallest value in its right subtree.
-
We traverse back up the tree. In each step we recompute the height of the current node, and check if the node is balanced.
If a node becomes unbalanced, we check which rotation we have to perform in order to fix the tree. We perform the rotation.
1
private int
2
avlBalance(AVLTreeNode<T> node)
3
{
4
if (node == null)
5
{
6
return 0;
7
}
8
9
return avlHeight(node.left) - avlHeight(node.right);
10
}
11
12
private AVLTreeNode<T>
13
avlRemove(AVLTreeNode<T> node, T value)
14
{
15
// When we reach the end of the tree,
16
// the value we want to remove is not in the tree.
17
// We will simply return and do nothing.
18
19
if (node == null)
20
{
21
return null;
22
}
23
24
// Traverse the tree according to how the value to be removed
25
// compares to the left and right children of the current node.
26
27
if (value.compareTo(node.value) < 0)
28
{
29
node.left = avlRemove(node.left, value);
30
}
31
else if (value.compareTo(node.value) > 0)
32
{
33
node.right = avlRemove(node.right, value);
34
}
35
else
36
{
37
// We found the node to be removed.
38
// Now we have to check if this node has one or two children
39
// or if it is a leaf.
40
41
size--;
42
node = avlDeleteNode(node);
43
}
44
45
// Make sure the node exists.
46
47
if (node == null)
48
{
49
return null;
50
}
51
52
// Get the height of the left and right children.
53
54
int leftHeight = avlHeight(node.left);
55
int rightHeight = avlHeight(node.right);
56
57
// After the recursive downwards step, we will traverse the
58
// tree upwards again and update the height of the current node.
59
60
node.height = 1 + Math.max(leftHeight, rightHeight);
61
62
// We will check if the current node is unbalanced.
63
// If it is, we will have to find a rotation to fix it.
64
65
int balance = leftHeight - rightHeight;
66
67
if (balance > 1)
68
{
69
// The left subtree is too tall.
70
71
if (avlBalance(node.left) >= 0)
72
{
73
// Left-left case.
74
75
return avlRebalanceLeftLeft(node);
76
}
77
else
78
{
79
// Left-right case.
80
81
return avlRebalanceLeftRight(node);
82
}
83
}
84
85
if (balance < -1)
86
{
87
// The right subtree is too tall.
88
89
if (avlBalance(node.left) <= 0)
90
{
91
// Right-right case.
92
93
return avlRebalanceRightRight(node);
94
}
95
else
96
{
97
// Right-left case.
98
99
return avlRebalanceRightLeft(node);
100
}
101
}
102
103
// This node is already balanced, so we just return it
104
// as it currently is.
105
106
return node;
107
}
108
109
private AVLTreeNode<T>
110
avlDeleteNode(AVLTreeNode<T> node)
111
{
112
// Check how many children this node has.
113
114
if (node.left == null && node.right == null)
115
{
116
// The node has no children.
117
// We can just delete it from the tree.
118
119
return null;
120
}
121
122
if (node.left == null)
123
{
124
// The node has only a right child.
125
// We can just replace the node with its right child.
126
127
return node.right;
128
}
129
130
if (node.right == null)
131
{
132
// The node has only a left child.
133
// We can just replace the node with its left child.
134
135
return node.left;
136
}
137
138
// The node has two children.
139
// We will replace the node with its in-order successor.
140
// This is the left-most child of the right subtree.
141
142
AVLTreeNode<T> successor = node.right;
143
144
while (successor.left != null)
145
{
146
successor = successor.left;
147
}
148
149
// Copy the value from the successor node.
150
151
node.value = successor.value;
152
153
// Delete the successor node.
154
155
node.right = avlRemove(node.right, successor.value);
156
157
// Calling the above function has the unwanted side-effect
158
// of decrementing the size of the AVL tree again.
159
// We will counter this by incrementing the size back up.
160
161
size++;
162
163
// Return the "new" root of the subtree.
164
// It's not actually new, because we just replace the value
165
// with the value of the successor node.
166
167
return node;
168
}
AVL Tree Find
The final important AVL tree function is the search function. It takes a value and returns true if the value is in the tree, or false otherwise.
1
public boolean
2
avlHas(AVLTreeNode<T> node, T value)
3
{
4
// If we reach the end of the tree, the value is not in the tree.
5
6
if (node == null)
7
{
8
return false;
9
}
10
11
// Traverse the tree according to how the value to be found
12
// compares to the left and right children of the current node.
13
14
if (value.compareTo(node.value) < 0)
15
{
16
return avlHas(node.left, value);
17
}
18
19
if (value.compareTo(node.value) > 0)
20
{
21
return avlHas(node.right, value);
22
}
23
24
// We found the value.
25
26
return true;
27
}
Wrapping It Up
With the code segments above, we have now completed all the necessary functions to implement an AVL tree. We can now create implementations for the three fundamental methods of an AVL tree:
1
public void
2
insert(T value)
3
{
4
root = avlInsert(root, value);
5
}
6
7
public void
8
remove(T value)
9
{
10
root = avlRemove(root, value);
11
}
12
13
public boolean
14
has(T value)
15
{
16
return avlHas(root, value);
17
}
The full code listing can be found here.
More Ideas
Before ending this post, I would like to make you think about
how our implementation of the AVL trees can be modified
to implement a map data structure.
The implementation we made is a data structure that implements
a set with operations.
In a set, we can add distinct values, remove them, and check
for their existence.
A map builds on top of the idea of the set, and allows us to
look up values by their keys.
In a map, we can add values by their keys, remove them by their
keys, and check for their existence by their keys.
We can also update the value associated with a key.
What would we have to change in the functions we wrote, and
how can we implement the updating?
Final Thoughts
AVL trees are efficient self-balancing trees that can be used
to implement sets and maps. AVL trees support
operations for
insertion, removal, and lookup.
I hope you enjoyed this post and properly understand how AVL
trees work under the hood and how they are implemented.
Feel free to
contact me
if you have any feedback.