Core Data Structures: Arrays, Trees, Graphs, and Hash Tables

The data structures you choose directly determine the performance characteristics of your software. Picking a hash table instead of a sorted array for membership queries, or a B+ tree instead of a binary search tree for range scans on disk, are decisions with measurable impact. This page provides a concise reference for the most important data structures—what they are, how they work, and their time and space trade-offs—so you can make those choices deliberately.

Linear Structures

Linear structures store elements in a sequence. The trade-offs between them come down to where you spend time: at the ends, in the middle, or on memory locality.

Structure	Access	Search	Insert (head/tail)	Insert (middle)	Delete	Space
Array	O(1)	O(n)	O(n) (shift)	O(n)	O(n)	O(n)
Linked list (singly)	O(n)	O(n)	O(1) (head) / O(n) (tail without tail ptr)	O(1) with ptr	O(1) with ptr	O(n) + pointer overhead
Doubly linked list	O(n)	O(n)	O(1) (head and tail)	O(1) with ptr	O(1) with ptr	O(n) + 2× pointer overhead
Stack	O(1) top	O(n)	O(1) push	—	O(1) pop	O(n)
Queue	O(1) front	O(n)	O(1) enqueue	—	O(1) dequeue	O(n)
Deque	O(1) ends	O(n)	O(1) both ends	—	O(1) both ends	O(n)

Arrays excel when you need cache-friendly sequential access or constant-time random access by index. Every element is contiguous in memory, so the CPU prefetcher is effective. Linked lists excel when you frequently insert or delete from arbitrary positions and you already have a pointer to the location—no shifting required. The cost is pointer indirection (cache unfriendly) and per-node allocation overhead. Stacks (LIFO) appear in function call management, expression evaluation, DFS, and undo/redo systems. Queues (FIFO) appear in BFS, task scheduling, and producer-consumer pipelines.

Trees

Trees are hierarchical structures where each node has zero or more children. The most important variants differ in their balance guarantees and therefore their worst-case performance.

Binary Tree

A tree where each node has at most two children (left and right). Without balance constraints, a binary tree degenerates to a linked list in the worst case (O(n) search). Full binary tree: every node has 0 or 2 children.
Complete binary tree: all levels are fully filled except possibly the last, which is filled left to right.
Perfect binary tree: all levels are fully filled.

Binary Search Tree (BST)

A binary tree satisfying the BST property: for every node, all values in the left subtree are smaller and all values in the right subtree are larger.

Operation	Average	Worst case
Search	O(log n)	O(n)
Insert	O(log n)	O(n)
Delete	O(log n)	O(n)

The worst case occurs when elements are inserted in sorted order, producing a degenerate chain. Self-balancing BSTs fix this.

AVL Tree

An AVL tree maintains the balance factor (height of left subtree − height of right subtree) at every node within . After an insertion or deletion that violates this invariant, the tree performs rotations to restore balance.

Guaranteed O(log n) height.
Stricter balance than red-black trees → faster lookups.
More rotations on insert/delete → slower writes.

AVL trees are preferred when reads dominate (e.g., in-memory lookup tables).

Red-Black Tree

A red-black tree is a self-balancing BST where each node is colored red or black and four invariants are maintained:

Every node is red or black.
The root is black.
No two adjacent red nodes (red node cannot have a red parent or red child).
Every path from a node to its descendant null pointers has the same number of black nodes.

These rules guarantee that the longest path is at most twice the shortest, bounding height at O(log n). Red-black trees require fewer rotations than AVL trees on insert/delete, making them better for write-heavy workloads. They are the backing structure for std::map, std::set (C++ STL), TreeMap (Java), and epoll’s internal socket tracking in the Linux kernel.

B+ Tree

A B+ tree is a multi-way balanced tree where every node can hold multiple keys (up to order m). All data records are stored in leaf nodes, which are linked in a doubly-linked list for efficient range scans. Internal nodes store only keys (routing information). Properties:

Tree height is O(log_m n) — very shallow for large m.
Each node fits in one disk page (typically 4 KB or 16 KB), minimizing I/O.
Sequential range scans traverse the leaf linked list, not the tree.

B+ trees are the standard index structure for relational databases. MySQL InnoDB uses a B+ tree for both primary (clustered) indexes and secondary indexes. The shallow height means a query touching millions of rows typically reads only 3–4 pages.

Heaps and Priority Queues

A heap is a complete binary tree stored implicitly in an array that satisfies the heap property:

Max-heap: every parent is ≥ its children. Root holds the maximum.
Min-heap: every parent is ≤ its children. Root holds the minimum.

Array Representation

For a node at index i (1-indexed):

Left child: 2i
Right child: 2i+1
Parent: ⌊i/2⌋

No pointers are needed; the structure is implicit in the array layout.

Operations

Operation	Time
Find min/max (peek)	O(1)
Insert	O(log n)
Extract min/max	O(log n)
Build heap from array (heapify)	O(n)
Decrease/increase key	O(log n)

Heapify builds a valid heap in O(n) by calling sift-down from the last non-leaf node upward—faster than inserting n elements one by one (O(n log n)).

Use Cases

Priority queues: task schedulers, Dijkstra’s shortest path, Prim’s MST.
Heap sort: in-place O(n log n) sort.
K-th largest/smallest: maintain a min-heap of size k; O(n log k).
Median maintenance: two heaps (max-heap for lower half, min-heap for upper half).

Hash Tables

A hash table maps keys to values in O(1) average time by using a hash function to compute an array index from a key.

Hash Functions

A good hash function distributes keys uniformly across the bucket array, minimizing collisions. Common approaches include polynomial rolling hash for strings, and bit-mixing algorithms (FNV-1a, MurmurHash3, xxHash) for general data. Cryptographic hash functions (SHA-256) are too slow for hash tables.

Collision Resolution

Separate chaining: each bucket holds a linked list (or another structure) of entries that hash to it. Lookup walks the list. Works well at high load factors but adds pointer overhead. Open addressing: all entries live in the array itself. On a collision, probe alternative slots:

Linear probing: check i+1, i+2, … — simple but suffers clustering.
Quadratic probing: check i+1, i+4, i+9, … — reduces primary clustering.
Double hashing: use a second hash function for the step size — best distribution.

Load Factor

load factor α = n / m where n is the number of entries and m is the number of buckets.

At α < 0.7, separate chaining and open addressing both perform well.
Above α ≈ 0.75 (Java HashMap’s threshold), performance degrades. The table rehashes by doubling capacity and reinserting all entries — O(n) amortized.

Aspect	Separate Chaining	Open Addressing
Memory per entry	Key + value + pointer	Key + value only
Cache friendliness	Poor (pointer chasing)	Good (array locality)
Deletion	Easy (unlink node)	Requires tombstone markers
High load factor	Degrades gracefully	Degrades quickly

Real-world hash maps: C++ std::unordered_map uses separate chaining. Rust’s HashMap uses Robin Hood open addressing. Python dict uses a custom open-addressing scheme.

Graphs

A graph G = (V, E) consists of a set of vertices V and edges E. Graphs may be directed or undirected, weighted or unweighted.

Representation

Representation	Space	Edge lookup	Enumerate neighbors	Best for
Adjacency matrix	O(V²)	O(1)	O(V)	Dense graphs
Adjacency list	O(V + E)	O(degree)	O(degree)	Sparse graphs

Most real-world graphs are sparse, so adjacency lists (typically vector<vector<pair<int,int>>> or a linked-forward-star) are the standard choice in competitive programming and production graph libraries.

BFS and DFS

BFS (Breadth-First Search) explores layer by layer using a queue. It finds the shortest path (by hop count) in an unweighted graph and is the basis for multi-source shortest path and bipartite checking. DFS (Depth-First Search) explores as deep as possible before backtracking, using a stack (implicit via recursion). It underlies topological sort, cycle detection, SCC (Tarjan’s / Kosaraju’s), and many tree-based DP patterns. Both run in O(V + E).

Shortest Path Algorithms

Algorithm	Graph type	Time	Notes
BFS	Unweighted	O(V + E)	Optimal hop count
Dijkstra (binary heap)	Non-negative weights	O((V + E) log V)	Greedy; fails on negative edges
Bellman-Ford	Any weights	O(VE)	Detects negative cycles
SPFA	Any weights	O(VE) worst, often faster	Bellman-Ford with queue optimization
Floyd-Warshall	Any weights, all pairs	O(V³)	DP over intermediate nodes

Dijkstra uses a greedy strategy: always relax the unvisited node with the smallest known distance. With a priority queue (min-heap), it runs in O((V + E) log V). Floyd-Warshall computes all-pairs shortest paths using a simple DP recurrence:

f[k][x][y] = min(f[k-1][x][y], f[k-1][x][k] + f[k-1][k][y])

Because the first dimension has no effect on the final answer, you compress it:

f[x][y] = min(f[x][y], f[x][k] + f[k][y])   // for each k in 1..V

Bellman-Ford relaxes all edges V-1 times. If a further relaxation is still possible in iteration V, a negative cycle exists.

Topological Sort

Topological sort orders vertices of a DAG (directed acyclic graph) such that for every edge u → v, u appears before v. It is the basis for build systems, dependency resolution, and course prerequisite ordering. The standard BFS-based approach (Kahn’s algorithm):

Compute the in-degree of every vertex.
Enqueue all vertices with in-degree 0.
Dequeue a vertex, output it, decrement the in-degree of its neighbors, enqueue any neighbor whose in-degree reaches 0.
If fewer than V vertices were output, the graph contains a cycle.

Time: O(V + E).

Get Started

Databases

Backend Engineering

Computer Science

Software Design

Core Data Structures: Arrays, Trees, Graphs, and Hash Tables

Linear Structures

Trees

Binary Tree

Binary Search Tree (BST)

AVL Tree

Red-Black Tree

B+ Tree

Heaps and Priority Queues

Array Representation

Operations

Use Cases

Hash Tables

Hash Functions

Collision Resolution

Load Factor

Graphs

Representation

BFS and DFS

Shortest Path Algorithms

Topological Sort

Get Started

Databases

Backend Engineering

Computer Science

Software Design

​Linear Structures

​Trees

​Binary Tree

​Binary Search Tree (BST)

​AVL Tree

​Red-Black Tree

​B+ Tree

​Heaps and Priority Queues

​Array Representation

​Operations

​Use Cases

​Hash Tables

​Hash Functions

​Collision Resolution

​Load Factor

​Graphs

​Representation

​BFS and DFS

​Shortest Path Algorithms

​Topological Sort

Linear Structures

Trees

Binary Tree

Binary Search Tree (BST)

AVL Tree

Red-Black Tree

B+ Tree

Heaps and Priority Queues

Array Representation

Operations

Use Cases

Hash Tables

Hash Functions

Collision Resolution

Load Factor

Graphs

Representation

BFS and DFS

Shortest Path Algorithms

Topological Sort