Skip to main content
The data structures you choose directly determine the performance characteristics of your software. Picking a hash table instead of a sorted array for membership queries, or a B+ tree instead of a binary search tree for range scans on disk, are decisions with measurable impact. This page provides a concise reference for the most important data structures—what they are, how they work, and their time and space trade-offs—so you can make those choices deliberately.

Linear Structures

Linear structures store elements in a sequence. The trade-offs between them come down to where you spend time: at the ends, in the middle, or on memory locality.
StructureAccessSearchInsert (head/tail)Insert (middle)DeleteSpace
ArrayO(1)O(n)O(n) (shift)O(n)O(n)O(n)
Linked list (singly)O(n)O(n)O(1) (head) / O(n) (tail without tail ptr)O(1) with ptrO(1) with ptrO(n) + pointer overhead
Doubly linked listO(n)O(n)O(1) (head and tail)O(1) with ptrO(1) with ptrO(n) + 2× pointer overhead
StackO(1) topO(n)O(1) pushO(1) popO(n)
QueueO(1) frontO(n)O(1) enqueueO(1) dequeueO(n)
DequeO(1) endsO(n)O(1) both endsO(1) both endsO(n)
Arrays excel when you need cache-friendly sequential access or constant-time random access by index. Every element is contiguous in memory, so the CPU prefetcher is effective. Linked lists excel when you frequently insert or delete from arbitrary positions and you already have a pointer to the location—no shifting required. The cost is pointer indirection (cache unfriendly) and per-node allocation overhead. Stacks (LIFO) appear in function call management, expression evaluation, DFS, and undo/redo systems. Queues (FIFO) appear in BFS, task scheduling, and producer-consumer pipelines.

Trees

Trees are hierarchical structures where each node has zero or more children. The most important variants differ in their balance guarantees and therefore their worst-case performance.

Binary Tree

A tree where each node has at most two children (left and right). Without balance constraints, a binary tree degenerates to a linked list in the worst case (O(n) search). Full binary tree: every node has 0 or 2 children.
Complete binary tree: all levels are fully filled except possibly the last, which is filled left to right.
Perfect binary tree: all levels are fully filled.

Binary Search Tree (BST)

A binary tree satisfying the BST property: for every node, all values in the left subtree are smaller and all values in the right subtree are larger.
OperationAverageWorst case
SearchO(log n)O(n)
InsertO(log n)O(n)
DeleteO(log n)O(n)
The worst case occurs when elements are inserted in sorted order, producing a degenerate chain. Self-balancing BSTs fix this.

AVL Tree

An AVL tree maintains the balance factor (height of left subtree − height of right subtree) at every node within . After an insertion or deletion that violates this invariant, the tree performs rotations to restore balance.
  • Guaranteed O(log n) height.
  • Stricter balance than red-black trees → faster lookups.
  • More rotations on insert/delete → slower writes.
AVL trees are preferred when reads dominate (e.g., in-memory lookup tables).

Red-Black Tree

A red-black tree is a self-balancing BST where each node is colored red or black and four invariants are maintained:
  1. Every node is red or black.
  2. The root is black.
  3. No two adjacent red nodes (red node cannot have a red parent or red child).
  4. Every path from a node to its descendant null pointers has the same number of black nodes.
These rules guarantee that the longest path is at most twice the shortest, bounding height at O(log n). Red-black trees require fewer rotations than AVL trees on insert/delete, making them better for write-heavy workloads. They are the backing structure for std::map, std::set (C++ STL), TreeMap (Java), and epoll’s internal socket tracking in the Linux kernel.

B+ Tree

A B+ tree is a multi-way balanced tree where every node can hold multiple keys (up to order m). All data records are stored in leaf nodes, which are linked in a doubly-linked list for efficient range scans. Internal nodes store only keys (routing information). Properties:
  • Tree height is O(log_m n) — very shallow for large m.
  • Each node fits in one disk page (typically 4 KB or 16 KB), minimizing I/O.
  • Sequential range scans traverse the leaf linked list, not the tree.
B+ trees are the standard index structure for relational databases. MySQL InnoDB uses a B+ tree for both primary (clustered) indexes and secondary indexes. The shallow height means a query touching millions of rows typically reads only 3–4 pages.

Heaps and Priority Queues

A heap is a complete binary tree stored implicitly in an array that satisfies the heap property:
  • Max-heap: every parent is ≥ its children. Root holds the maximum.
  • Min-heap: every parent is ≤ its children. Root holds the minimum.

Array Representation

For a node at index i (1-indexed):
  • Left child: 2i
  • Right child: 2i+1
  • Parent: ⌊i/2⌋
No pointers are needed; the structure is implicit in the array layout.

Operations

OperationTime
Find min/max (peek)O(1)
InsertO(log n)
Extract min/maxO(log n)
Build heap from array (heapify)O(n)
Decrease/increase keyO(log n)
Heapify builds a valid heap in O(n) by calling sift-down from the last non-leaf node upward—faster than inserting n elements one by one (O(n log n)).

Use Cases

  • Priority queues: task schedulers, Dijkstra’s shortest path, Prim’s MST.
  • Heap sort: in-place O(n log n) sort.
  • K-th largest/smallest: maintain a min-heap of size k; O(n log k).
  • Median maintenance: two heaps (max-heap for lower half, min-heap for upper half).

Hash Tables

A hash table maps keys to values in O(1) average time by using a hash function to compute an array index from a key.

Hash Functions

A good hash function distributes keys uniformly across the bucket array, minimizing collisions. Common approaches include polynomial rolling hash for strings, and bit-mixing algorithms (FNV-1a, MurmurHash3, xxHash) for general data. Cryptographic hash functions (SHA-256) are too slow for hash tables.

Collision Resolution

Separate chaining: each bucket holds a linked list (or another structure) of entries that hash to it. Lookup walks the list. Works well at high load factors but adds pointer overhead. Open addressing: all entries live in the array itself. On a collision, probe alternative slots:
  • Linear probing: check i+1, i+2, … — simple but suffers clustering.
  • Quadratic probing: check i+1, i+4, i+9, … — reduces primary clustering.
  • Double hashing: use a second hash function for the step size — best distribution.

Load Factor

load factor α = n / m where n is the number of entries and m is the number of buckets.
  • At α < 0.7, separate chaining and open addressing both perform well.
  • Above α ≈ 0.75 (Java HashMap’s threshold), performance degrades. The table rehashes by doubling capacity and reinserting all entries — O(n) amortized.
AspectSeparate ChainingOpen Addressing
Memory per entryKey + value + pointerKey + value only
Cache friendlinessPoor (pointer chasing)Good (array locality)
DeletionEasy (unlink node)Requires tombstone markers
High load factorDegrades gracefullyDegrades quickly
Real-world hash maps: C++ std::unordered_map uses separate chaining. Rust’s HashMap uses Robin Hood open addressing. Python dict uses a custom open-addressing scheme.

Graphs

A graph G = (V, E) consists of a set of vertices V and edges E. Graphs may be directed or undirected, weighted or unweighted.

Representation

RepresentationSpaceEdge lookupEnumerate neighborsBest for
Adjacency matrixO(V²)O(1)O(V)Dense graphs
Adjacency listO(V + E)O(degree)O(degree)Sparse graphs
Most real-world graphs are sparse, so adjacency lists (typically vector<vector<pair<int,int>>> or a linked-forward-star) are the standard choice in competitive programming and production graph libraries.

BFS and DFS

BFS (Breadth-First Search) explores layer by layer using a queue. It finds the shortest path (by hop count) in an unweighted graph and is the basis for multi-source shortest path and bipartite checking. DFS (Depth-First Search) explores as deep as possible before backtracking, using a stack (implicit via recursion). It underlies topological sort, cycle detection, SCC (Tarjan’s / Kosaraju’s), and many tree-based DP patterns. Both run in O(V + E).

Shortest Path Algorithms

AlgorithmGraph typeTimeNotes
BFSUnweightedO(V + E)Optimal hop count
Dijkstra (binary heap)Non-negative weightsO((V + E) log V)Greedy; fails on negative edges
Bellman-FordAny weightsO(VE)Detects negative cycles
SPFAAny weightsO(VE) worst, often fasterBellman-Ford with queue optimization
Floyd-WarshallAny weights, all pairsO(V³)DP over intermediate nodes
Dijkstra uses a greedy strategy: always relax the unvisited node with the smallest known distance. With a priority queue (min-heap), it runs in O((V + E) log V). Floyd-Warshall computes all-pairs shortest paths using a simple DP recurrence:
f[k][x][y] = min(f[k-1][x][y], f[k-1][x][k] + f[k-1][k][y])
Because the first dimension has no effect on the final answer, you compress it:
f[x][y] = min(f[x][y], f[x][k] + f[k][y])   // for each k in 1..V
Bellman-Ford relaxes all edges V-1 times. If a further relaxation is still possible in iteration V, a negative cycle exists.

Topological Sort

Topological sort orders vertices of a DAG (directed acyclic graph) such that for every edge u → v, u appears before v. It is the basis for build systems, dependency resolution, and course prerequisite ordering. The standard BFS-based approach (Kahn’s algorithm):
  1. Compute the in-degree of every vertex.
  2. Enqueue all vertices with in-degree 0.
  3. Dequeue a vertex, output it, decrement the in-degree of its neighbors, enqueue any neighbor whose in-degree reaches 0.
  4. If fewer than V vertices were output, the graph contains a cycle.
Time: O(V + E).