Tree: Hierarchical Structure

Prologue: When Arrays Stopped Being Enough

When I first learned programming, I only knew arrays and lists. Data structures that lined things up in a row. Perfect for storing student rosters, shopping lists, or to-do items. But when I started working on real projects, I encountered data that simply couldn't be arranged in a straight line.

I had to store a company org chart. CEO at the top, CTO and CFO below, development team lead under the CTO, and developers under that. How do you represent this with an array? ["CEO", "CTO", "CFO", "Dev Lead", "Developer1", "Developer2"]? Then how do you know who reports to whom?

Looking at my file explorer, I realized something. My computer's folder structure cannot possibly be flattened into a line. Under C:\, there's Users and Program Files. Under Users, there's a folder with my name. Inside that, Documents and Downloads. This is a hierarchy. There's a top and bottom, parent-child relationships.

That's when I learned about the Tree data structure.

At First, I Didn't Get It: "Why Is the Tree Upside Down?"

When I first heard the term "tree", I was confused. Real trees have roots at the bottom and branches reaching upward, but in CS, trees are inverted. The Root is at the top, and branches extend downward. Why draw it upside down?

I understood later. It's about "drawing convenience". When representing data in computers, putting the "starting point" at the top is more intuitive. Think about org charts. You draw the CEO at the top, department heads below, team members further down. Family trees work the same way. The ancestor goes at the top, descendants branch downward.

Eventually, I accepted it not as an "upside-down tree", but as a "hierarchical diagram with the starting point at the top".

The Terminology Swamp: Root, Parent, Child, Leaf, Depth, Height...

The most frustrating part of learning trees was the sheer number of terms. Root, Parent, Child, Sibling, Leaf, Internal Node, Depth, Height, Level... I couldn't keep them straight.

I sorted it out this way: Use the metaphor of a family tree + corporate org chart, and every term becomes clear.

Root

The topmost node. The founding ancestor. In a company, the CEO. In a computer file system, the C:\ drive. A tree must have exactly one Root. If there are two, that's a forest.

Parent and Child

Direct superior-subordinate relationship. My immediate boss is my Parent, my direct report is my Child. A node can have multiple Children, but only one Parent. (Root has no Parent.)

Sibling

Nodes that share the same Parent. Colleagues working under the same boss.

Leaf

A node with no Children. The lowest-level employee. An intern. In a file system, an actual file (not a folder).

Internal Node

A node with at least one Child. If Root has Children, it's also an Internal Node. Team lead level or higher.

Depth

How far you are from the Root. Root has Depth 0. Root's Children have Depth 1, their Children have Depth 2... In corporate terms, "how many levels down are you?"

Height

The distance from a node to its farthest Leaf. Leaves have Height 0. The Height of the entire tree is the Height of the Root. "How many levels remain below this node?"

Level

Sometimes used synonymously with Depth, but usually means "the set of nodes at the same Depth". Level 0 is just the Root, Level 1 is Root's children, Level 2 is grandchildren...

At first, I confused Depth and Height. I understood it this way: Depth is measured from the top down, Height is measured from the bottom up. Think of a building. 1st floor, 2nd floor, 3rd floor is Depth. "This building is 10 stories tall" is Height.

Types of Trees: Binary Trees Are Most Common

There are many types of trees. At first, I thought "a tree is a tree, what's the difference?" But I learned that the limit on number of Children completely changes the properties.

General Tree (N-ary Tree)

No limit on number of Children. The CEO can have 10 executives or 50. A folder can contain 1,000 files. Flexible, but algorithm design gets complex.

Binary Tree

Each node can have at most 2 Children: Left Child and Right Child. 90% of algorithm problems involve binary trees. Why? Implementation is simple, and it's easy to handle with recursive structures.

I understood binary trees as "Twenty Questions". With each question, you cut the possibilities in half. "Is it an animal?" → Yes/No → "Can it fly?" → Yes/No... Keep halving, and you reach the answer quickly.

Binary Search Tree (BST)

A special form of binary tree. Left Child is smaller than Parent, Right Child is larger than Parent. This rule makes searching incredibly fast.

For example, if you have the sorted array [1, 3, 5, 7, 9, 11, 13], you can build a BST like this:

Want to find 7? Check Root, done. Want to find 1? Smaller than Root, go left. Smaller than 3, go left again. Found it. Maximum 3 steps. In an array with sequential search, it could take 7 steps.

But BST's problem is that if it becomes "unbalanced", efficiency tanks. If you insert [1, 2, 3, 4, 5] in order into a BST:

This is just a straight line, identical to an array. Search time becomes O(n), a total disaster.

Balanced Tree

That's why we have AVL Trees, Red-Black Trees, and other "self-balancing" trees. They rearrange the tree whenever you insert/delete nodes to minimize Height. This guarantees search time is always O(log n).

When I learned that database indexes are implemented with B-Trees, a type of balanced tree, I thought "Ah, that's why databases are fast."

Aha Moment: "The File Explorer Is a Tree!"

The moment I truly understood trees was while looking at Windows Explorer. I saw the folder tree structure on the left, and suddenly it clicked. "This IS a tree!"

C:\ → Root
Users, Program Files → Root's Children
Users\RATIA → Child of Users
RATIA\Documents\report.docx → Leaf (actual file)

Clicking a folder to expand it, clicking again to collapse it—that's tree "traversal". I had been using trees every single day.

HTML DOM on web pages was the same. The <html> tag is the Root, <head> and <body> are Children, and <div>, <p>, <span> are all nodes. When a browser renders a page, it traverses this tree and draws each element.

Corporate org charts, family trees, tournament brackets, decision flowcharts... Everything was a tree.

Tree Traversal: In What Order Do You Visit Nodes?

Once you've built a tree, you often need to "visit every node without missing any". Maybe you're summing up all file sizes in a file system, or extracting all text from the DOM.

The problem is that trees aren't "lined up in a row", so the order isn't obvious. With an array, you just go front to back. But with a tree, when you go "top to bottom", there are multiple branches.

When I first learned the 4 traversal methods, I thought "why are these even necessary?" But once I realized each traversal solves a different problem, it made sense.

Preorder Traversal: Root → Left → Right

"Me first, then my children"

This is the order when copying folders. First create the folder, then copy the files inside. Use this when Root must be processed first.

def preorder(node):
    if node is None:
        return
    print(node.value)  # Me first
    preorder(node.left)  # Left child
    preorder(node.right)  # Right child

Inorder Traversal: Left → Root → Right

"Go to the leftmost end, then me, then right"

In a BST, traversing in this order produces sorted output! This was amazing to me. Because of BST's property (Left < Root < Right), it automatically gives you ascending order.

def inorder(node):
    if node is None:
        return
    inorder(node.left)  # To the leftmost end
    print(node.value)  # Me
    inorder(node.right)  # Right

Postorder Traversal: Left → Right → Root

"Children first, then me"

This is the order when deleting folders. First delete all files inside, then delete the folder itself. Also used when calculating file sizes. Calculate the sizes of subfolders first, then sum them up for the parent folder.

def postorder(node):
    if node is None:
        return
    postorder(node.left)  # Left children
    postorder(node.right)  # Right children
    print(node.value)  # Me last

Level-order Traversal: Top to Bottom, Same Level Together

Also called BFS (Breadth-First Search). Start from Root, visit all nodes at the same depth first, then move to the next depth.

Like visiting "C-level executives → Team leads → Team members" in an org chart. Implemented using a Queue.

from collections import deque

def level_order(root):
    if root is None:
        return
    queue = deque([root])
    while queue:
        node = queue.popleft()
        print(node.value)
        if node.left:
            queue.append(node.left)
        if node.right:
            queue.append(node.right)

I remember Level-order not as "breadth-first search", but as "process same rank together".

Code Implementation: Starting with the Node Class

I understood the theory, but how do you actually implement it in code? At first, I tried to "represent the tree with an array" and failed miserably. Calculating array indices was way too complex.

Eventually, I realized that creating a Node class is the most intuitive approach. Each node holds its value and pointers to its children.

Binary Tree Node Implementation

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.left = None
        self.right = None

# Building a tree
root = TreeNode(1)
root.left = TreeNode(2)
root.right = TreeNode(3)
root.left.left = TreeNode(4)
root.left.right = TreeNode(5)

#       1
#      / \
#     2   3
#    / \
#   4   5

Simple. Each node has just three attributes: value, left, right. left and right point to other nodes, or are None if absent.

Inserting a Node into a BST

BST has rules, so insertion automatically determines the position. Repeat "smaller goes left, larger goes right".

class BST:
    def __init__(self):
        self.root = None

    def insert(self, value):
        if self.root is None:
            self.root = TreeNode(value)
        else:
            self._insert_recursive(self.root, value)

    def _insert_recursive(self, node, value):
        if value < node.value:
            if node.left is None:
                node.left = TreeNode(value)
            else:
                self._insert_recursive(node.left, value)
        else:
            if node.right is None:
                node.right = TreeNode(value)
            else:
                self._insert_recursive(node.right, value)

    def search(self, value):
        return self._search_recursive(self.root, value)

    def _search_recursive(self, node, value):
        if node is None:
            return False
        if node.value == value:
            return True
        elif value < node.value:
            return self._search_recursive(node.left, value)
        else:
            return self._search_recursive(node.right, value)

# Usage example
bst = BST()
bst.insert(7)
bst.insert(3)
bst.insert(11)
bst.insert(1)
bst.insert(5)

print(bst.search(5))  # True
print(bst.search(10))  # False

Using recursion makes the code clean. Since trees are recursive structures (they repeat the same pattern), implementing with recursive functions is natural.

Recursion was hard at first, but I understood it with the mindset: "Do my job, delegate the rest to my children". The insert function just "determines whether this value goes left or right", then delegates actual insertion to the child node's insert.

Real-World Application: Calculating File System Size

We've learned theory and code. Now let's solve an actual problem. Suppose you need to write a function that "calculates the total size of a folder". A folder contains files and subfolders. Subfolders contain more files and folders.

This is a classic tree problem. Folders are nodes, subfolders/files are children. You need postorder traversal. Why? Because you need to know the sizes of subfolders first before calculating the parent folder's size.

class FileNode:
    def __init__(self, name, size=0, is_folder=False):
        self.name = name
        self.size = size  # Actual size if file, 0 if folder
        self.is_folder = is_folder
        self.children = []  # Only folders have children

    def add_child(self, child):
        self.children.append(child)

def calculate_folder_size(node):
    if not node.is_folder:
        # If file, return its size
        return node.size

    # If folder, sum all children's sizes
    total_size = 0
    for child in node.children:
        total_size += calculate_folder_size(child)  # Recursion

    return total_size

# Building file system structure
root = FileNode("C:", is_folder=True)
users = FileNode("Users", is_folder=True)
documents = FileNode("Documents", is_folder=True)
file1 = FileNode("report.docx", size=500)
file2 = FileNode("image.png", size=1200)

documents.add_child(file1)
documents.add_child(file2)
users.add_child(documents)
root.add_child(users)

print(f"Total size: {calculate_folder_size(root)} KB")  # 1700

Running this code, I felt "Ah, I actually understand trees now". Process children first with recursion, then combine their results to produce the parent's result. That's the essence of postorder traversal.

Another Application: Extracting All Text from DOM

Suppose you need to extract all text from a web page. HTML DOM is a tree structure. A <div> contains a <p>, and the <p> contains text. You need to visit all nodes and collect text.

For this, preorder, inorder, or postorder doesn't matter. You just need to visit all nodes.

function extractAllText(element) {
    let text = '';

    // If text node, add as is
    if (element.nodeType === Node.TEXT_NODE) {
        return element.textContent;
    }

    // If element node, traverse children
    for (let child of element.childNodes) {
        text += extractAllText(child);
    }

    return text;
}

// Usage example
const bodyText = extractAllText(document.body);
console.log(bodyText);

This is also recursion. "My text + children's text" combined, done.

Summary: Trees Are a Tool for Representing Relationships

When I first learned trees, I thought "How is this different from arrays? It's just more complicated." But after actually using them, I realized: Trees are a tool for representing relationships.

Arrays represent "order". 1st place, 2nd place, 3rd place. First item, second item. But many things in the real world aren't about order—they're about "hierarchy". Boss-subordinate, parent-child, folder-file, category-subcategory.

Now I naturally think of trees in these situations:

Building an org chart → Tree
Navigating a file system → Tree
Parsing a web page → Tree (DOM)
Need fast searching → BST or balanced tree
Managing routing rules → Tree (Trie)

The real power of trees comes from their recursive structure. Parts of a tree are also trees. Called subtrees. This property makes recursive algorithms a perfect fit. Break the problem into smaller problems, keep breaking, and solve.

In conclusion, this is how I've come to understand trees:

A tree is a hierarchical structure that branches out from a single starting point (Root). Parent-child relationships are clear, and there are no cycles. It's used to represent countless real-world hierarchies: file systems, org charts, DOM, database indexes. Thanks to its recursive nature, code implementation also falls cleanly into recursive functions.

I can't go back to the days when I only knew arrays. Knowing trees, I see the world differently.