Processes vs Threads
A process is an independent execution unit with its own virtual address space, file descriptor table, and OS-level resources. A thread is a lightweight execution unit that shares the address space and resources of its parent process.| Property | Process | Thread |
|---|---|---|
| Memory | Private address space | Shared with other threads in the same process |
| File descriptors | Private (inherited at fork, then diverge) | Shared |
| Communication | IPC (pipes, sockets, shared memory, signals) | Shared memory, global variables |
| Context switch cost | High — must switch page tables, TLB flush | Lower — only registers and stack pointer change |
| Crash isolation | A crashing process does not corrupt other processes | A crashing thread can corrupt the entire process |
| Creation cost | High (fork + exec) | Lower (pthread_create or std::thread) |
When to Use Each
Use separate processes when you need strong isolation—for example, sandboxed plugins, multi-tenant workers, or any scenario where one failure must not bring down the rest of the application. Use threads within a process when tasks share state naturally (e.g., a web server handling concurrent requests against the same in-memory cache) and when the context-switch overhead of processes would be prohibitive.Context Switching Cost
During a context switch the OS saves the current execution context (registers, program counter, stack pointer) and loads the saved context of the next process or thread. For threads within the same process, the virtual address space does not change, so the expensive TLB flush and page-table swap are avoided. That said, even thread context switches carry overhead from saving/restoring registers and flushing CPU caches, which accumulates under high thread counts.Inter-Process Communication
Linux provides several IPC mechanisms:| Mechanism | Description | Suitable for |
|---|---|---|
| Anonymous pipe | Byte stream between parent and child via fork | Parent-child communication |
| Named pipe (FIFO) | Named file in the filesystem; any process can open it | Unrelated processes |
| Message queue | Kernel-managed message list; structured data; lifetime tied to kernel | Asynchronous, structured IPC |
| Shared memory | Two processes map the same physical pages into their address spaces | High-throughput IPC (fastest mechanism) |
| Semaphore | Integer counter for mutual exclusion and synchronization (P/V operations) | Protecting shared memory |
| Signal | Asynchronous notification (e.g., SIGINT, SIGKILL) | Exception handling, process control |
| Socket | Full-duplex byte stream; works cross-host | Any two processes, including across the network |
Virtual Memory
Every process sees a flat, private virtual address space (0 to 2^48 on 64-bit Linux). The OS and MMU (Memory Management Unit) transparently map virtual addresses to physical RAM.Address Spaces and Page Tables
Virtual memory is divided into fixed-size pages (typically 4 KB). Physical RAM is divided into frames of the same size. The OS maintains a page table for each process that maps page numbers to frame numbers. A virtual address is split into:TLB (Translation Lookaside Buffer)
Walking the page table on every memory access would double or triple memory latency. The TLB is a small, fast hardware cache inside the CPU that stores recent virtual-to-physical translations.- A TLB hit resolves the translation in 1–2 cycles.
- A TLB miss triggers a page-table walk (multiple memory accesses) or a hardware page-table walker.
Page Faults
When a process accesses a virtual page that has no current mapping (the page is not in RAM), the CPU raises a page fault. The OS handles it by:- Checking whether the access is valid (within the process’s allocated virtual space).
- Finding the page on disk (swap space or a memory-mapped file).
- Allocating a free frame, loading the page, updating the page table.
- Resuming the faulting instruction.
CPU Scheduling
The scheduler decides which ready process or thread runs next. Every algorithm makes trade-offs between throughput, fairness, response time, and starvation avoidance.| Algorithm | How it works | Strength | Weakness |
|---|---|---|---|
| FCFS | Run in arrival order; non-preemptive | Simple | Long jobs block short ones (convoy effect) |
| SJF | Run shortest job first; non-preemptive | Optimal average turnaround time | Requires knowing runtime in advance; long jobs may starve |
| STCF / SRT | Preemptive SJF; new arrival can preempt current job | Better response time than SJF | Long jobs can starve indefinitely |
| Round Robin | Each job runs for a fixed time quantum, then preempts | Fair; no starvation | Turnaround time can be worse than FCFS |
| Priority | Higher-priority jobs run first | Flexible | Low-priority jobs may starve |
| MLFQ | Multiple queues with decreasing priority; new jobs start high, demote on CPU use | Balances response time and throughput | Complex; gaming by malicious processes |
- Turnaround time = completion time − arrival time
- Response time = first run time − arrival time
- Weighted turnaround = turnaround / run time (normalized fairness)
Synchronization
When threads share mutable state, you need synchronization primitives to prevent data races and enforce ordering.Mutex (Mutual Exclusion Lock)
A mutex has two states: locked and unlocked. Only one thread can hold the lock at a time; all others block until it is released.Semaphore
A semaphore is an integer counter supporting two atomic operations:- P (wait/down): decrement; block if the value would go negative.
- V (signal/up): increment; wake one blocked thread.
Condition Variables
A condition variable allows a thread to sleep until a specific condition is true, releasing the associated mutex atomically while sleeping.Spin Lock
A spin lock busy-waits (polls in a loop) instead of sleeping. It is faster than a mutex when the expected wait time is very short (less than a context switch), but wastes CPU cycles if the wait is long. Use spin locks for very short critical sections in kernel or real-time code; prefer mutexes in application code.Deadlock
A deadlock occurs when a set of processes each wait for a resource held by another member of the set—none can ever proceed. The four Coffman conditions must all hold simultaneously for deadlock to occur:| Condition | Meaning |
|---|---|
| Mutual exclusion | A resource can be held by only one process at a time |
| Hold and wait | A process holds at least one resource while waiting for another |
| No preemption | Resources cannot be forcibly taken from a process |
| Circular wait | A cycle exists in the resource-allocation graph |
- Eliminate hold-and-wait by requiring processes to request all resources at once.
- Allow preemption for resources that can be safely reclaimed.
- Impose a global ordering on resource types and require processes to request in that order (breaks circular wait).
I/O Models
How your application interacts with the OS during I/O determines its concurrency and performance characteristics.Blocking I/O
The simplest model. A call likeread() blocks the thread until data is available. One thread per connection is required, which limits scalability.
Non-blocking I/O
read() returns immediately with EAGAIN if no data is available. The application must poll repeatedly (“busy-wait”). CPU usage is high when data is sparse.
I/O Multiplexing: select / poll / epoll
A single thread monitors many file descriptors simultaneously and only acts when one is ready.| Mechanism | Storage | Max FDs | Per-call copy | Find ready FDs |
|---|---|---|---|---|
select | Bitmap | 1024 (FD_SETSIZE) | Full bitmap copied to kernel each call | O(n) scan |
poll | Array (linked list) | System limit | Full array copied each call | O(n) scan |
epoll | Red-black tree (kernel) + ready list | System limit | Register once with epoll_ctl; no per-call copy | O(1) — only ready FDs returned |
epoll is the Linux workhorse for high-concurrency servers (nginx, Redis, Node.js). It uses event-driven notification:
epoll supports two triggering modes:
- Level-triggered (LT):
epoll_waitkeeps returning the fd until its buffer is drained. Easier to use correctly; default mode. - Edge-triggered (ET):
epoll_waitfires once per state change. Requires you to drain the buffer completely in one shot. More efficient (fewer syscalls) but harder to implement correctly.
Asynchronous I/O
The process submits an I/O request and immediately continues. The OS completes the I/O entirely in the background and notifies the process (via signal or callback) when done. Unlike I/O multiplexing, the application does not perform the actualread/write—the OS does. Linux AIO and io_uring implement this model.
Zero-Copy
Traditional file transfer requires four context switches and four data copies (disk → kernel buffer → user buffer → socket buffer → NIC). Zero-copy techniques reduce this:mmap + write: maps the kernel buffer into user space; eliminates the kernel→user copy. Still two syscalls, four context switches, three data copies.sendfile: one syscall copies directly from kernel buffer to socket buffer. Two context switches, three data copies.sendfilewith SG-DMA: on kernels ≥ 2.4, the NIC DMA engine reads directly from the kernel buffer. Two context switches, two data copies (both DMA, no CPU involvement in data movement).