Red Black Tree Data Structure – Detailed Language Agnostic Intro

This post is a detailed introduction to the red black tree data structure. For those that are interested in data structures, the red black tree is a self-balancing binary search tree.

Before reading further, let me warn you. This tutorial is not going to be a light read. As mentioned in the title, the tutorial will dive into the depth of the red-black tree. However, if you read through the entire tutorial, you will understand the red-black tree. And most likely, you will never forget it again. This is assuming that you spent a good deal of time thinking about and applying the theory in this post.

Please note that this tutorial is language​ agnostic. Other tutorials on implementing in languages such as Java, JavaScript and C++ will be made in the future. 

More...

Pre-requisite Knowledge​

The content in this tutorial is not geared towards beginners. Those who have not been programming for long will definitely struggle with the material presented. Readers should have an understanding of

1. Introduction to the Red Black Tree

As I mentioned, the red black tree is a self-balancing binary tree. The AVL tree is also a self-balancing binary tree. However, the difference between the two is the underlying assumptions. These differences primarily revolve around how it handles situations where the tree becomes unbalanced. 

Search operations are slightly faster on AVL trees compared to the red-black trees. The reason for this is because in AVL trees, balance checks are implemented on every data manipulating operation. In contrast, red-black trees do not check on every insertion.

Therefore, while search operations may be slightly slower due to AVL trees being slightly more balanced, insertion and removal operations on the red-black trees are much faster than AVL trees.

I have said this many times. I will say it again. In software engineering, especially in data structures and algorithms, assumptions are crucial. Assumptions enable us to tailor specific algorithms to tackle a group of problems in an optimal way. What happens if we use an array to store everything? We would not be able to perform complex searches or retrieve data in an efficient and systematic way.

Unlike AVL trees, red black trees do not deal with the height parameters. Instead, every node in the red black tree is given a color. Care to take a stab at what these colors are?​

The red black tree is named aptly, because each node in the tree is either red or black.

Surprise of the century right? Bet you didn't see that one coming. Nevertheless, this is an important piece of information that forms the foundation of the red black tree. In the next section, we will dive deeper into the underlying assumptions behind the red black tree.

2. Red Black Tree Specification Sheet​

2.1 Key Features


The red black tree performs better than the AVL tree when data manipulation happens frequently. In contrast, the AVL tree outperforms the red black tree when the primary use case is looking up data in the tree. Below are some of the key features of the red black tree.

2.2 Assumptions


Like all data structures, the red black tree is built on strict assumptions. As a result, it is very important that readers know what these assumptions are. Afterwards, the next step is to understand why these assumptions are needed.

​Please take time to study these assumptions, as they are very important. Without these assumptions the red-black tree will not be a red-black tree. 

  • Each node is either red or black
  • The root node is always black.
  • All leaf nodes (NULL or NIL) are black
  • If a node is red, then its children are black
  • Each red node has a black parent.
  • Every path from a given node to any of its descendant null nodes contain the same number of black nodes.
  • By default, each new node is red.

Don't bother memorizing all the assumptions. Simply reading through each assumption a couple of times should suffice. In section two, we will be exploring the assumptions and logic behind the red-black tree. To prove visual illustration of all the assumptions below in action, lets take a look at an example.

Example of a balanced red black tree

I see a Red Leaf Node. Isn't this a violation of the red black tree rule?

Good observation. However, just as with the AVL tree's height parameter, null pointers do count as "leaf nodes". This is why we have broken black lines. In the basic binary search tree, the nodes with key values of 4, 12 and 18 in the example above would count as leaf nodes. However, for the red black properties, null pointers constitute as the leaf nodes. To sum it up,

In the red black tree, NULL pointers are the leaf nodes and they are ALWAYS black.

Assumption No.6 in the List above is Confusing. Please Explain

Admittedly, when I was learning the red black tree, I had to do a double take to understand what this mean. For the sake of clarity, I will be referring to the diagram above. Lets first examine the assumption.

Every path from a given node to any of its descendant null nodes contain the same number of black nodes.

Come again? Okay, let's review the visual diagram from the previous section.

Example of a balanced red black tree

Take a look at the root node 8. Try tracing to any part of the tree. For example, try tracing it to all the null nodes and count the number of black nodes. The number of black nodes should be three for all the given paths. Therefore, assumption number 6 is NOT violated in the given example. 

2.3 Big-O Analysis


As both the red-black and AVL are self-balancing binary search trees, their performance is often compared against each other. 

Readers might be asking: Why does the worst case stick out in the table below? Generally, when analyzing the performance of an algorithm, we are concerned most often with the worst case. ​

To keep it short, it is because by identify the algorithm's performance in the worst case, we can deduce its longest running time. Therefore, we are able to guarantee that the algorithm will finish in that time or earlier. In the future, I will write a separate post on the big-o notation. ​

Operation

Space Complexity

Search

Insert

Remove

Worst Case

O (n)

O (log n)

O (log n)

O (log n)

Average

O (n)

O (log n)

O (log n)

O (log n)

source: bigocheatsheet.com

3. Reviewing Binary Search Tree Rotations

Before moving on, let us review the rotations in the Binary Search Tree. Please do take note that the rotation logic will be slightly different to that in the AVL tree. 

We also need to consider rotating an entire tree, as opposed to the basic rotation cases in the AVL tree, which only presents four cases where the tree is rigidly balanced.

If readers did not understand what I just said, I recommend reading the AVL tree article before proceeding. Attached below is a useful 3 minute video that explains rotations in depth. In case readers are wondering, no, the video is not made by me. The author of the video Michael Sambol makes great videos explaining difficult concepts in a clear and concise manner. So, I do recommend readers to check out his video series on the red-black tree.

The reason why I posted up the video is to remind readers that we are doing left/right rotations on an entire tree. In the upcoming sub-sections, we will go through and examine each of the rotation cases. ​

4. Fixing Violations in the Red Black Tree

Have you ever though of the reason why rotations are made? The simple answer is to minimize the height of the tree. Ultimately, we want to move the sub-trees with greater key values down and the ones with smaller key values up.

Another valid and the more common ​response would be to ensure that the tree is balanced and that the red black tree properties are not violated. 

There are four possible cases of property violation in the red black tree. Each of these cases require recoloring the nodes, making rotations or a combination of both. Below, I have attached an image of the four possible rotation/recoloring cases so that readers can familiarize themselves with the cases.

Just as in the AVL tree, after the recoloring/rotation, we need to check the tree recursively to see whether all the sub-trees fit in with the definition of a red-black tree.​ As with any tree data structure, it is best to perform the checks recursively, due to the nature of the tree data structure.

4.1 Case One - Node Inserted has a Red Uncle

The first case is the simplest. All we need to do is simply recolor the nodes. It is important to note that by default, newly inserted nodes are red. For example, if we inserted a new node with key value of 6 to the tree below, it would look something like this. In the example below, the inserted node 6 has a red uncle 10. How is it an uncle? The parent is 4 right? The parent of 4 is 8. Therefore, 8 is the grandparent of 6. 4 and 10 are siblings, which is why 10 is the uncle of 6. 

red black tree recolor case one
Red Black Tree Rule Violation Case One
  • A red node must have TWO black children
  • Every path from a given node to any of its descendant null nodes contains the same number of black nodes.

In the example above, node with key value 6 is a newly inserted node. Its parent, 4, is also a red node. As mentioned above, red nodes MUST have two black child nodes. However, 6 is inserted as a red node. 

Solution: Recolor the inserted node and grandparent to red. Recolor the parent and uncle node to black.

Take note also that the uncle of 6 (in this example, 10) is also a red node. Therefore, we need to recolor the root node 8 and newly inserted node 6 to black. We also need to recolor nodes 4 and 10 from red to black. The result will look something like this.

red black tree recolor case one result

Now, we can see in the result above that the red black tree is no longer violated (assuming that A is a NULL or a black node). However, we also need to make checks to ensure whether the red black rules are being violated elsewhere. Therefore, we need to do this recursively and make adjustments accordingly. 

Note - Symmetric cases: Please do bear in that for all the rotation/recoloring cases, we can have symmetric cases. Fore example, take a look at the image below. Looks pretty symmetric to the case one example at the start of this section right? The example below is exactly the same as the first case one recoloring example that we examined in this section.

Symmetric version for the recoloring case one

4.2 Case Two - Uncle of Inserted Node is Black

Case two looks something like this. The uncle of the inserted node (10) as you can see is black. We already know from the previous case that the violation occurs, because by default, each inserted node is a red node. A red node must have two black children. As in the previous example, it is evident that 4 (a red node) has another red node as its child.

Case two: the uncle is black. Requires a single rotation

In this case, we rotate 6's parent 4 to the left.​

If you have read my tutorial on AVL trees, this should be very simple. The process is exactly the same as the AVL tree example. Because I don't like to repeat myself, I will not be going in-depth into the rotation process.​ Assuming that node A is black, 4 now has two red children. 

Result of case two is case three: the uncle is black and the inserted node is the left child of the parent.

But wait a​ minute, the node 6 is red and has a left child. Isn't this violating the red-black properties? I thought red nodes must ALWAYS have two black children. 

If readers felt that way, then yes, you are correct. This is the case 3 situation, which we will be discussing very shortly. ​

5.3 Case Three - Uncle of Inserted Node is Black and is a Left Child

Okay, this might be very clear at first, but you will get me once you set this diagram. In the diagram below, the uncle 10 is black. Since 6, a red node, has a red child, this is already a violation of the red black properties.

Result of case two is case three: the uncle is black and the inserted node is the left child of the parent.

In this situation, we will make a right rotation on node 8, the root of the tree. In another words, 6 will become the new root node and its right child will become 8, the former root node. The former root node 8's right child will remain as 8. The result will look something like the following.

Case three: Make a right rotation on root node

Afterwards, we need to recolor nodes 6 and 8. Node will now become black and node 8 will be recolored to red. The result should appear as follows.

Case Three: after the rotation, recolor the root node and the appropriate child

5.4 Case Four - Uncle of Inserted Node is Red and is a left Child

The final case is very similar to case three. The major difference is that the in the previous case, the uncle was black. However, in case four, the uncle (10) is red. This current case will look something like the following.

red black tree violation fix case four

Handling this situation is very simple: all we need to do is recolor some nodes. In the example above, we would first have to recolor the root node 8 to red. Afterwards, recolor its children 6 and 8 to black. The result will look like this.

Case four is simple. All we need to do is recolor some nodes

Congratulations! You have now gone through all the possible violation fixes in the red black tree.

6 FAQ

In this section, we will be going through some commonly asked questions regarding the red black tree. From now on, I plan on adding an FAQ section to most of the tutorials. These FAQ sections will continue to grow based on the questions I receive, so please leave a question down below as a comment and I will answer it as soon as possible. Thank you!

6.1. What are some of the factors I should consider when choosing to use the red black tree?

Excellent question!​ If you are considering a balanced binary search tree, think of the key distinguishing feature of the red black tree. Compared to other self-balancing trees such as the AVL tree, insertions are going to be FAST.

Will data be continuously added onto the tree? Will the tree be simply used as a search table? If data is continuously added onto the tree, the red black tree might be the right solution. Why? Because the red black tree is not as strictly balanced as some of its siblings. ​

6.2. What are some Examples of Red Black Tree Implementations in the Java API?

The java.util.treeMap​ and java.util.TreeSet are examples that utilize the red-black tree as its underlying data structure. 

 

About the Author Jay

I am a programmer currently living in Seoul, South Korea. I created this blog as an outlet to express what I know / have been learning in text form for retaining knowledge and also to hopefully help the wider community. I am passionate about data structures and algorithms. The back-end and databases is where my heart is at.

follow me on: