What is a B+ tree and why does MySQL use it for indexes?
What is a B+ tree and why does MySQL use it for indexes?
A B+ tree is a self-balancing, ordered tree where:
- Internal (non-leaf) nodes store only index keys and child pointers — no actual row data.
- Leaf nodes store the full data (or primary key values for secondary indexes) and are linked in a doubly-linked list in sorted order.
WHERE age BETWEEN 20 AND 30) or ordered scans. MySQL’s Memory engine uses hash indexes, but InnoDB requires range support.Why not a binary search tree or red-black tree? These are tall trees — O(log₂ n) height — and each level typically requires a disk I/O. A red-black tree with 1 million rows has roughly 20 levels, meaning 20 disk reads per lookup.Why B+ tree over B-tree? A B-tree stores data in every node, which reduces the number of keys that fit per page. A B+ tree stores data only in leaf nodes, so internal nodes are dense with keys, making the tree dramatically shorter — typically 3–4 levels for hundreds of millions of rows. InnoDB reads 16 KB per disk I/O, so more keys per page means fewer I/Os per lookup.Range queries: Because leaf nodes form a linked list sorted by key, a range scan needs only to find the first matching leaf node and then follow the list — a single disk seek followed by sequential reads.What is the difference between clustered and secondary indexes?
What is the difference between clustered and secondary indexes?
Clustered index (primary key index in InnoDB):
- The B+ tree leaf nodes contain the complete row data.
- There is exactly one clustered index per table — by default, InnoDB uses the primary key.
- Looking up a row by primary key requires traversing only one B+ tree to reach the data.
- Rows are physically stored in primary key order, so range scans on the primary key are sequential disk reads.
- The B+ tree leaf nodes contain the indexed column value(s) plus the primary key value — not the full row.
- A query that uses a secondary index typically requires two B+ tree traversals: first through the secondary index to find the primary key, then through the clustered index (primary key index) to fetch the full row. This second traversal is called a table lookback or row lookup.
- Exception: index covering (covering index). If all columns needed by a query exist in the secondary index itself, MySQL can return the result without the table lookback. This is a significant performance optimization.
SELECT id FROM users WHERE age = 30 AND sex = 'M' with a composite secondary index on (age, sex), the query is covered by the index — id is already stored in the leaf node — so no table lookback is needed.MyISAM: MyISAM uses non-clustered indexes for everything, including primary keys. All index leaf nodes store a physical row address (pointer to disk location). Both primary and secondary index lookups require following the pointer to fetch the actual row.Explain MVCC and how it enables repeatable reads
Explain MVCC and how it enables repeatable reads
MVCC (Multi-Version Concurrency Control) is InnoDB’s lock-free mechanism for implementing read isolation. Instead of blocking reads with write locks, InnoDB keeps multiple versions of each row and serves reads from the appropriate historical version.Three hidden columns on every InnoDB row:
DB_TRX_ID: The ID of the last transaction that modified this row.DB_ROLL_PTR: A pointer to the previous version of this row in the undo log.DB_ROW_ID: An implicit auto-increment ID used when no primary key is defined.
DB_ROLL_PTR. This creates a version chain from newest to oldest. The chain is maintained per row, not per transaction.Read View: When a transaction executes its first SELECT (under Repeatable Read), InnoDB creates a Read View snapshot containing:ids: The list of all currently active (uncommitted) transaction IDs.min_trx_id: The smallest active transaction ID.max_trx_id: The ID the next new transaction would receive.creator_trx_id: This transaction’s own ID.
DB_TRX_ID is visible to the current Read View:- If
DB_TRX_ID < min_trx_id→ the transaction was committed before this Read View was created → visible. - If
DB_TRX_ID >= max_trx_id→ the transaction started after the Read View → not visible. - If
DB_TRX_IDis inids→ the transaction is still active → not visible. - Otherwise (committed between
min_trx_idandmax_trx_id) → visible.
DB_ROLL_PTR to the previous version and repeats the check.How this enables Repeatable Read: Under Repeatable Read, the Read View is created once at the start of the transaction and reused for all subsequent reads. Even if another transaction commits a change during your transaction, your Read View was taken before that commit, so you continue reading the old version. The same query always returns the same result.Under Read Committed, a new Read View is created for every SELECT, so you see the latest committed data each time.What are the 4 transaction isolation levels and their tradeoffs?
What are the 4 transaction isolation levels and their tradeoffs?
SQL defines four isolation levels, each preventing a different class of concurrency anomaly.
*InnoDB prevents phantom reads under Repeatable Read using gap locks in addition to MVCC.Read Uncommitted: Transactions can read uncommitted changes from other transactions. This allows dirty reads — reading data that may be rolled back. Almost never used in practice.Read Committed: A transaction only sees data committed before each of its
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read | Performance |
|---|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible | Highest |
| Read Committed | Prevented | Possible | Possible | High |
| Repeatable Read | Prevented | Prevented | Possible* | Medium |
| Serializable | Prevented | Prevented | Prevented | Lowest |
SELECT statements. This prevents dirty reads but allows non-repeatable reads — re-reading the same row within a transaction can return different data if another transaction committed a change in between. Default level in PostgreSQL and Oracle.Repeatable Read: A transaction sees a consistent snapshot taken at the start of the transaction. Re-reading any row always returns the same result. This prevents non-repeatable reads. MySQL InnoDB default level. Phantom reads (new rows appearing in a range query) are theoretically possible but InnoDB’s gap locks largely prevent them.Serializable: Transactions execute as if they were fully sequential. All reads are blocking reads (shared locks). Maximum isolation but also the lowest concurrency. Use only when strict correctness requirements outweigh performance.MySQL InnoDB implements Read Committed and Repeatable Read using MVCC (no locking for reads). Serializable adds shared locks on all reads.What is the difference between binlog, redo log, and undo log?
What is the difference between binlog, redo log, and undo log?
MySQL uses three distinct logs, each serving a different purpose at a different layer.Undo log (InnoDB engine layer):
- Records the before-image of data before a write operation.
- Purpose 1 — transaction rollback: If a transaction aborts, InnoDB uses the undo log to restore the previous values.
- Purpose 2 — MVCC: Other transactions read historical row versions from the undo log chain when their Read View predates the current row version.
- Generated at transaction start, before the write.
- Records the after-image of data modifications — the changes that were applied.
- Purpose — crash recovery and durability: InnoDB writes changes to the redo log before updating the buffer pool page on disk. If the database crashes before the dirty page is flushed, InnoDB replays the redo log on restart to recover committed transactions. This is the WAL (Write-Ahead Logging) pattern.
- The redo log is a fixed-size circular file. A write pointer and a checkpoint pointer track what has been written vs. what has been flushed to data files.
- Generated during the transaction as changes happen.
- Records every DDL and DML operation that modifies data, in statement or row-image format.
- Purpose 1 — replication: The primary writes binlog; replicas consume it to stay in sync.
- Purpose 2 — point-in-time recovery: You can replay binlog on top of a backup to restore data to any point in time.
- Written after the transaction commits, in a separate write from the redo log.
| Log | Layer | Purpose | When Written |
|---|---|---|---|
| Undo log | InnoDB | Rollback + MVCC | Before the write |
| Redo log | InnoDB | Crash recovery | During transaction |
| Binlog | MySQL Server | Replication + backup | After commit |
Why is Redis fast? (5 reasons)
Why is Redis fast? (5 reasons)
Redis routinely handles hundreds of thousands of operations per second on a single instance. Five design decisions explain why:1. In-memory storage. All data lives in RAM. Memory access is roughly 100,000× faster than a random SSD read. Redis never waits for a disk seek to serve a read or write.2. Efficient data structures. Redis implements each data type with a structure optimized for its access patterns: dynamic strings for values, skip lists for sorted sets, hash tables for hashes and sets, linked lists for lists. Operations like
ZADD, HGET, and LPUSH are O(1) or O(log n) by design.3. Single-threaded command processing. Redis processes commands in a single thread, eliminating all lock contention and context-switching overhead. There are no deadlocks to resolve, no thread synchronization primitives, and no wasted CPU cycles from context switches. (Redis 6.0+ uses threads for I/O, but command execution remains single-threaded.)4. Non-blocking I/O with multiplexing. Redis uses an event loop (epoll on Linux, kqueue on macOS) to handle thousands of client connections concurrently without spawning a thread per client. A single thread monitors all connections and processes ready events. This avoids the memory and scheduling overhead of thousands of threads.5. Efficient memory management. Redis uses its own allocator (jemalloc by default) and manages memory expiration proactively: expired keys are deleted lazily (on access) and periodically (background scan). This avoids GC pauses and keeps the working set lean.What are Redis's data types and when would you use each?
What are Redis's data types and when would you use each?
Redis provides five primary data types plus specialized structures for specific use cases.String — the most basic type, stores any binary-safe value up to 512 MB. Use it for simple caching, counters (
INCR), feature flags, and rate limiting. Commands: SET, GET, INCR, EXPIRE.Hash — a map of string fields to string values. Ideal for representing objects with multiple attributes (user profiles, session data) where you want to read or update individual fields without fetching the entire object. More memory-efficient than storing a JSON string when fields are accessed individually. Commands: HSET, HGET, HGETALL, HDEL.List — an ordered sequence of strings, implemented as a doubly-linked list. Use it for task queues (LPUSH + BRPOP for a blocking consumer), activity feeds, or any FIFO/LIFO structure. Commands: LPUSH, RPUSH, BRPOP, LRANGE.Set — an unordered collection of unique strings. Use it for tracking unique visitors, user tags, or computing intersections/unions across sets (SINTERSTORE, SUNIONSTORE). All operations are O(1). Commands: SADD, SMEMBERS, SISMEMBER, SINTER.Sorted Set (ZSet) — a set where every member has a floating-point score. Members are stored in score order. Use it for leaderboards, priority queues, and time-series data keyed by timestamp. Commands: ZADD, ZREVRANGE, ZRANGEBYSCORE.Stream (Redis 5.0+) — an append-only log structure with consumer group support, acknowledgment, and message replay. Use it as a lightweight message queue for reliable event processing.Pub/Sub — a fire-and-forget broadcast channel. Messages are delivered to all active subscribers but are not persisted. Use it for real-time notifications where losing a message during downtime is acceptable.RDB vs AOF persistence: which should you use?
RDB vs AOF persistence: which should you use?
Redis offers two persistence mechanisms with different durability/performance tradeoffs, plus a hybrid mode.RDB (Redis Database Snapshot):
- Periodically writes a full binary snapshot of in-memory data to disk.
- Snapshot is generated by
bgsave, which forks a child process. The child writes the snapshot while the parent continues serving requests, using copy-on-write to avoid blocking. - Advantages: Fast restart (loading a binary file is much faster than replaying commands), compact file size, minimal runtime overhead.
- Disadvantages: You lose data written since the last snapshot on crash. If snapshots run every 5 minutes, up to 5 minutes of writes are lost.
- Appends every write command to a log file after it executes. On restart, Redis replays the log to reconstruct the dataset.
- Three fsync policies:
always(fsync every command — safest, slowest),everysec(fsync once per second — at most 1 second of data loss),no(OS decides — fastest, most data at risk). - Advantages: Much lower data loss — at most 1 second with
everysec. - Disadvantages: Larger files, slower restarts for large datasets, and AOF rewrite (compaction) adds periodic I/O load.
- The AOF file begins with an RDB snapshot (for fast loading) followed by AOF commands recorded since the snapshot. Combines fast startup with low data loss.
- Recommended for most production workloads.
- If you can tolerate some data loss and need fast restarts: RDB only.
- If data durability matters and you can accept slightly slower restarts: AOF with
everysec. - For production systems where both matter: mixed persistence.
- For pure caching (data can be rebuilt from source): disable persistence entirely for maximum performance.
How do you handle cache penetration, cache breakdown, and cache avalanche?
How do you handle cache penetration, cache breakdown, and cache avalanche?
These three failure modes describe different ways a caching layer can be bypassed under load, sending traffic directly to your database.Cache penetration — requests for keys that exist in neither the cache nor the database. Attackers or bugs generate random or non-existent keys; every request misses cache and hits the database.Solutions:
- Cache null values: When the database returns no result, store a null value in the cache with a short TTL. Subsequent requests for the same key are served from cache.
- Bloom filter: Add a Bloom filter in front of the cache. If the filter says a key definitely does not exist, return immediately without checking cache or database. The Bloom filter has a small false-positive rate but zero false negatives — if it says “no”, the key truly doesn’t exist.
- Mutex / distributed lock: Only one request is allowed to reload the key from the database; others wait or return a stale value. In Go,
singleflightis the standard tool for this pattern. - Never-expire hot keys: For keys that must always be available, set no TTL and invalidate manually when the underlying data changes.
- Jitter on TTLs: Add a random offset to each key’s TTL (e.g., base TTL ± 1–5 minutes) so keys expire at different times rather than all at once.
- Circuit breaker: If the database is overwhelmed, return a degraded response rather than queueing more requests that will time out.
- Redis high availability: Use Redis Sentinel or Cluster to prevent a Redis failure from being the trigger for a full avalanche.
- Read replicas and database sharding: Distribute database load so a traffic spike is survivable.
Redis List vs Pub/Sub vs Stream for message queuing: tradeoffs?
Redis List vs Pub/Sub vs Stream for message queuing: tradeoffs?
Redis offers three mechanisms for message queuing, each with different reliability characteristics.List (LPUSH + BRPOP):
- Producer pushes to one end with
LPUSH; consumer blocks and pops from the other withBRPOP. - Messages are delivered to exactly one consumer (first come, first served).
- No message acknowledgment — once a message is popped, it is gone. If the consumer crashes before processing, the message is lost.
- No consumer groups, no replay, no persistence beyond Redis’s own persistence.
- Use when: You want a simple, low-latency work queue with one consumer type and can accept at-most-once delivery.
- Producer publishes to a channel; all active subscribers receive a copy simultaneously.
- Messages are delivered to all current subscribers (fan-out). Subscribers not connected at publish time receive nothing — no message persistence.
- No acknowledgment, no replay, no consumer groups.
- Use when: You need fire-and-forget broadcast to multiple recipients and losing messages during downtime is acceptable. Real-time notifications, live dashboards, chat room broadcasts.
- An append-only log with auto-generated IDs (timestamp + sequence). Messages are persisted and replayable.
- Supports consumer groups: a message is delivered to one consumer within a group, and the consumer must acknowledge (
XACK) receipt. Unacknowledged messages stay in a pending list and can be claimed by other consumers. - Supports multiple independent consumer groups, each maintaining its own position in the stream.
- Use when: You need reliable at-least-once delivery, consumer group semantics, message replay, or are building a lightweight event bus. A closer approximation to Kafka than List or Pub/Sub.
| Feature | List | Pub/Sub | Stream |
|---|---|---|---|
| Delivery | One consumer | All active subscribers | One per group |
| Persistence | Redis persistence only | None | Yes |
| Acknowledgment | No | No | Yes (XACK) |
| Replay | No | No | Yes |
| Consumer groups | No | No | Yes |
| Best for | Simple queues | Broadcast | Reliable event processing |