> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pebchip.top/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Core Data Structures: Arrays, Trees, Graphs, and Hash Tables

> Reference guide for essential data structures including linked lists, binary trees, heaps, hash tables, and graphs with complexity analysis.

The data structures you choose directly determine the performance characteristics of your software. Picking a hash table instead of a sorted array for membership queries, or a B+ tree instead of a binary search tree for range scans on disk, are decisions with measurable impact. This page provides a concise reference for the most important data structures—what they are, how they work, and their time and space trade-offs—so you can make those choices deliberately.

## Linear Structures

Linear structures store elements in a sequence. The trade-offs between them come down to where you spend time: at the ends, in the middle, or on memory locality.

| Structure                | Access     | Search | Insert (head/tail)                         | Insert (middle) | Delete         | Space                      |
| ------------------------ | ---------- | ------ | ------------------------------------------ | --------------- | -------------- | -------------------------- |
| **Array**                | O(1)       | O(n)   | O(n) (shift)                               | O(n)            | O(n)           | O(n)                       |
| **Linked list (singly)** | O(n)       | O(n)   | O(1) (head) / O(n) (tail without tail ptr) | O(1) with ptr   | O(1) with ptr  | O(n) + pointer overhead    |
| **Doubly linked list**   | O(n)       | O(n)   | O(1) (head and tail)                       | O(1) with ptr   | O(1) with ptr  | O(n) + 2× pointer overhead |
| **Stack**                | O(1) top   | O(n)   | O(1) push                                  | —               | O(1) pop       | O(n)                       |
| **Queue**                | O(1) front | O(n)   | O(1) enqueue                               | —               | O(1) dequeue   | O(n)                       |
| **Deque**                | O(1) ends  | O(n)   | O(1) both ends                             | —               | O(1) both ends | O(n)                       |

**Arrays** excel when you need cache-friendly sequential access or constant-time random access by index. Every element is contiguous in memory, so the CPU prefetcher is effective.

**Linked lists** excel when you frequently insert or delete from arbitrary positions and you already have a pointer to the location—no shifting required. The cost is pointer indirection (cache unfriendly) and per-node allocation overhead.

**Stacks** (LIFO) appear in function call management, expression evaluation, DFS, and undo/redo systems.

**Queues** (FIFO) appear in BFS, task scheduling, and producer-consumer pipelines.

***

## Trees

Trees are hierarchical structures where each node has zero or more children. The most important variants differ in their balance guarantees and therefore their worst-case performance.

### Binary Tree

A tree where each node has at most two children (left and right). Without balance constraints, a binary tree degenerates to a linked list in the worst case (O(n) search).

**Full binary tree**: every node has 0 or 2 children.\
**Complete binary tree**: all levels are fully filled except possibly the last, which is filled left to right.\
**Perfect binary tree**: all levels are fully filled.

### Binary Search Tree (BST)

A binary tree satisfying the **BST property**: for every node, all values in the left subtree are smaller and all values in the right subtree are larger.

| Operation | Average  | Worst case |
| --------- | -------- | ---------- |
| Search    | O(log n) | O(n)       |
| Insert    | O(log n) | O(n)       |
| Delete    | O(log n) | O(n)       |

The worst case occurs when elements are inserted in sorted order, producing a degenerate chain. Self-balancing BSTs fix this.

### AVL Tree

An AVL tree maintains the **balance factor** (height of left subtree − height of right subtree) at every node within {-1, 0, 1}. After an insertion or deletion that violates this invariant, the tree performs **rotations** to restore balance.

* Guaranteed O(log n) height.
* Stricter balance than red-black trees → faster lookups.
* More rotations on insert/delete → slower writes.

AVL trees are preferred when reads dominate (e.g., in-memory lookup tables).

### Red-Black Tree

A red-black tree is a self-balancing BST where each node is colored red or black and four invariants are maintained:

1. Every node is red or black.
2. The root is black.
3. No two adjacent red nodes (red node cannot have a red parent or red child).
4. Every path from a node to its descendant null pointers has the same number of black nodes.

These rules guarantee that the longest path is at most twice the shortest, bounding height at O(log n).

Red-black trees require fewer rotations than AVL trees on insert/delete, making them better for write-heavy workloads. They are the backing structure for `std::map`, `std::set` (C++ STL), `TreeMap` (Java), and `epoll`'s internal socket tracking in the Linux kernel.

### B+ Tree

A B+ tree is a multi-way balanced tree where every node can hold multiple keys (up to order *m*). All data records are stored in **leaf nodes**, which are linked in a doubly-linked list for efficient range scans. Internal nodes store only keys (routing information).

Properties:

* Tree height is O(log\_m n) — very shallow for large *m*.
* Each node fits in one disk page (typically 4 KB or 16 KB), minimizing I/O.
* Sequential range scans traverse the leaf linked list, not the tree.

B+ trees are the standard index structure for relational databases. **MySQL InnoDB** uses a B+ tree for both primary (clustered) indexes and secondary indexes. The shallow height means a query touching millions of rows typically reads only 3–4 pages.

***

## Heaps and Priority Queues

A **heap** is a complete binary tree stored implicitly in an array that satisfies the heap property:

* **Max-heap**: every parent is ≥ its children. Root holds the maximum.
* **Min-heap**: every parent is ≤ its children. Root holds the minimum.

### Array Representation

For a node at index `i` (1-indexed):

* Left child: `2i`
* Right child: `2i+1`
* Parent: `⌊i/2⌋`

No pointers are needed; the structure is implicit in the array layout.

### Operations

| Operation                       | Time     |
| ------------------------------- | -------- |
| Find min/max (peek)             | O(1)     |
| Insert                          | O(log n) |
| Extract min/max                 | O(log n) |
| Build heap from array (heapify) | O(n)     |
| Decrease/increase key           | O(log n) |

**Heapify** builds a valid heap in O(n) by calling sift-down from the last non-leaf node upward—faster than inserting n elements one by one (O(n log n)).

### Use Cases

* **Priority queues**: task schedulers, Dijkstra's shortest path, Prim's MST.
* **Heap sort**: in-place O(n log n) sort.
* **K-th largest/smallest**: maintain a min-heap of size k; O(n log k).
* **Median maintenance**: two heaps (max-heap for lower half, min-heap for upper half).

***

## Hash Tables

A hash table maps keys to values in O(1) average time by using a **hash function** to compute an array index from a key.

### Hash Functions

A good hash function distributes keys uniformly across the bucket array, minimizing collisions. Common approaches include polynomial rolling hash for strings, and bit-mixing algorithms (FNV-1a, MurmurHash3, xxHash) for general data. Cryptographic hash functions (SHA-256) are too slow for hash tables.

### Collision Resolution

**Separate chaining**: each bucket holds a linked list (or another structure) of entries that hash to it. Lookup walks the list. Works well at high load factors but adds pointer overhead.

**Open addressing**: all entries live in the array itself. On a collision, probe alternative slots:

* *Linear probing*: check `i+1, i+2, …` — simple but suffers clustering.
* *Quadratic probing*: check `i+1, i+4, i+9, …` — reduces primary clustering.
* *Double hashing*: use a second hash function for the step size — best distribution.

### Load Factor

`load factor α = n / m` where `n` is the number of entries and `m` is the number of buckets.

* At α \< 0.7, separate chaining and open addressing both perform well.
* Above α ≈ 0.75 (Java HashMap's threshold), performance degrades. The table **rehashes** by doubling capacity and reinserting all entries — O(n) amortized.

| Aspect             | Separate Chaining      | Open Addressing            |
| ------------------ | ---------------------- | -------------------------- |
| Memory per entry   | Key + value + pointer  | Key + value only           |
| Cache friendliness | Poor (pointer chasing) | Good (array locality)      |
| Deletion           | Easy (unlink node)     | Requires tombstone markers |
| High load factor   | Degrades gracefully    | Degrades quickly           |

Real-world hash maps: C++ `std::unordered_map` uses separate chaining. Rust's `HashMap` uses Robin Hood open addressing. Python `dict` uses a custom open-addressing scheme.

***

## Graphs

A **graph** G = (V, E) consists of a set of vertices V and edges E. Graphs may be directed or undirected, weighted or unweighted.

### Representation

| Representation       | Space    | Edge lookup | Enumerate neighbors | Best for      |
| -------------------- | -------- | ----------- | ------------------- | ------------- |
| **Adjacency matrix** | O(V²)    | O(1)        | O(V)                | Dense graphs  |
| **Adjacency list**   | O(V + E) | O(degree)   | O(degree)           | Sparse graphs |

Most real-world graphs are sparse, so adjacency lists (typically `vector<vector<pair<int,int>>>` or a linked-forward-star) are the standard choice in competitive programming and production graph libraries.

### BFS and DFS

**BFS (Breadth-First Search)** explores layer by layer using a queue. It finds the shortest path (by hop count) in an unweighted graph and is the basis for multi-source shortest path and bipartite checking.

**DFS (Depth-First Search)** explores as deep as possible before backtracking, using a stack (implicit via recursion). It underlies topological sort, cycle detection, SCC (Tarjan's / Kosaraju's), and many tree-based DP patterns.

Both run in O(V + E).

### Shortest Path Algorithms

| Algorithm                  | Graph type             | Time                      | Notes                                |
| -------------------------- | ---------------------- | ------------------------- | ------------------------------------ |
| **BFS**                    | Unweighted             | O(V + E)                  | Optimal hop count                    |
| **Dijkstra** (binary heap) | Non-negative weights   | O((V + E) log V)          | Greedy; fails on negative edges      |
| **Bellman-Ford**           | Any weights            | O(VE)                     | Detects negative cycles              |
| **SPFA**                   | Any weights            | O(VE) worst, often faster | Bellman-Ford with queue optimization |
| **Floyd-Warshall**         | Any weights, all pairs | O(V³)                     | DP over intermediate nodes           |

**Dijkstra** uses a greedy strategy: always relax the unvisited node with the smallest known distance. With a priority queue (min-heap), it runs in O((V + E) log V).

**Floyd-Warshall** computes all-pairs shortest paths using a simple DP recurrence:

```
f[k][x][y] = min(f[k-1][x][y], f[k-1][x][k] + f[k-1][k][y])
```

Because the first dimension has no effect on the final answer, you compress it:

```
f[x][y] = min(f[x][y], f[x][k] + f[k][y])   // for each k in 1..V
```

**Bellman-Ford** relaxes all edges V-1 times. If a further relaxation is still possible in iteration V, a negative cycle exists.

### Topological Sort

Topological sort orders vertices of a **DAG** (directed acyclic graph) such that for every edge u → v, u appears before v. It is the basis for build systems, dependency resolution, and course prerequisite ordering.

The standard BFS-based approach (Kahn's algorithm):

1. Compute the in-degree of every vertex.
2. Enqueue all vertices with in-degree 0.
3. Dequeue a vertex, output it, decrement the in-degree of its neighbors, enqueue any neighbor whose in-degree reaches 0.
4. If fewer than V vertices were output, the graph contains a cycle.

Time: O(V + E).