Skip to main content
Operating system concepts are the foundation beneath every application you write. When you debug a deadlocked service, tune a high-concurrency server, or investigate memory pressure, you are reasoning directly about processes, virtual memory, scheduling, and I/O models. This page covers the core OS concepts that come up most often in backend engineering and system design, drawn from both academic foundations and practical Linux internals.

Processes vs Threads

A process is an independent execution unit with its own virtual address space, file descriptor table, and OS-level resources. A thread is a lightweight execution unit that shares the address space and resources of its parent process.
PropertyProcessThread
MemoryPrivate address spaceShared with other threads in the same process
File descriptorsPrivate (inherited at fork, then diverge)Shared
CommunicationIPC (pipes, sockets, shared memory, signals)Shared memory, global variables
Context switch costHigh — must switch page tables, TLB flushLower — only registers and stack pointer change
Crash isolationA crashing process does not corrupt other processesA crashing thread can corrupt the entire process
Creation costHigh (fork + exec)Lower (pthread_create or std::thread)

When to Use Each

Use separate processes when you need strong isolation—for example, sandboxed plugins, multi-tenant workers, or any scenario where one failure must not bring down the rest of the application. Use threads within a process when tasks share state naturally (e.g., a web server handling concurrent requests against the same in-memory cache) and when the context-switch overhead of processes would be prohibitive.

Context Switching Cost

During a context switch the OS saves the current execution context (registers, program counter, stack pointer) and loads the saved context of the next process or thread. For threads within the same process, the virtual address space does not change, so the expensive TLB flush and page-table swap are avoided. That said, even thread context switches carry overhead from saving/restoring registers and flushing CPU caches, which accumulates under high thread counts.

Inter-Process Communication

Linux provides several IPC mechanisms:
MechanismDescriptionSuitable for
Anonymous pipeByte stream between parent and child via forkParent-child communication
Named pipe (FIFO)Named file in the filesystem; any process can open itUnrelated processes
Message queueKernel-managed message list; structured data; lifetime tied to kernelAsynchronous, structured IPC
Shared memoryTwo processes map the same physical pages into their address spacesHigh-throughput IPC (fastest mechanism)
SemaphoreInteger counter for mutual exclusion and synchronization (P/V operations)Protecting shared memory
SignalAsynchronous notification (e.g., SIGINT, SIGKILL)Exception handling, process control
SocketFull-duplex byte stream; works cross-hostAny two processes, including across the network

Virtual Memory

Every process sees a flat, private virtual address space (0 to 2^48 on 64-bit Linux). The OS and MMU (Memory Management Unit) transparently map virtual addresses to physical RAM.

Address Spaces and Page Tables

Virtual memory is divided into fixed-size pages (typically 4 KB). Physical RAM is divided into frames of the same size. The OS maintains a page table for each process that maps page numbers to frame numbers. A virtual address is split into:
[ Page number | Page offset ]
The MMU looks up the page number in the page table to find the frame number, then combines it with the page offset to produce the physical address.

TLB (Translation Lookaside Buffer)

Walking the page table on every memory access would double or triple memory latency. The TLB is a small, fast hardware cache inside the CPU that stores recent virtual-to-physical translations.
  • A TLB hit resolves the translation in 1–2 cycles.
  • A TLB miss triggers a page-table walk (multiple memory accesses) or a hardware page-table walker.
On a process context switch the TLB must be flushed (or tagged with an address space ID), which is part of why process switches are more expensive than thread switches.

Page Faults

When a process accesses a virtual page that has no current mapping (the page is not in RAM), the CPU raises a page fault. The OS handles it by:
  1. Checking whether the access is valid (within the process’s allocated virtual space).
  2. Finding the page on disk (swap space or a memory-mapped file).
  3. Allocating a free frame, loading the page, updating the page table.
  4. Resuming the faulting instruction.
Minor page faults (page exists but not yet mapped for this process, e.g., a copy-on-write page) are fast. Major page faults (page must be read from disk) are expensive—many microseconds to milliseconds.

CPU Scheduling

The scheduler decides which ready process or thread runs next. Every algorithm makes trade-offs between throughput, fairness, response time, and starvation avoidance.
AlgorithmHow it worksStrengthWeakness
FCFSRun in arrival order; non-preemptiveSimpleLong jobs block short ones (convoy effect)
SJFRun shortest job first; non-preemptiveOptimal average turnaround timeRequires knowing runtime in advance; long jobs may starve
STCF / SRTPreemptive SJF; new arrival can preempt current jobBetter response time than SJFLong jobs can starve indefinitely
Round RobinEach job runs for a fixed time quantum, then preemptsFair; no starvationTurnaround time can be worse than FCFS
PriorityHigher-priority jobs run firstFlexibleLow-priority jobs may starve
MLFQMultiple queues with decreasing priority; new jobs start high, demote on CPU useBalances response time and throughputComplex; gaming by malicious processes
Key metrics:
  • Turnaround time = completion time − arrival time
  • Response time = first run time − arrival time
  • Weighted turnaround = turnaround / run time (normalized fairness)
Round Robin with a very small quantum behaves like a fair-share scheduler but generates high context-switch overhead. The time quantum must be large enough that context-switch cost is a small fraction of quantum length but small enough that response time stays acceptable.

Synchronization

When threads share mutable state, you need synchronization primitives to prevent data races and enforce ordering.

Mutex (Mutual Exclusion Lock)

A mutex has two states: locked and unlocked. Only one thread can hold the lock at a time; all others block until it is released.
#include <pthread.h>

pthread_mutex_t count_mutex;
volatile long long cnt = 0;

void *thread(void *vargp) {
    long long niters = *((long long *)vargp);
    for (long long i = 0; i < niters; i++) {
        pthread_mutex_lock(&count_mutex);   // enter critical section
        cnt++;                              // critical section
        pthread_mutex_unlock(&count_mutex); // exit critical section
    }
    return NULL;
}

Semaphore

A semaphore is an integer counter supporting two atomic operations:
  • P (wait/down): decrement; block if the value would go negative.
  • V (signal/up): increment; wake one blocked thread.
Initialized to 1, a semaphore acts as a mutex. Initialized to 0, it enforces ordering (process A must run before process B).

Condition Variables

A condition variable allows a thread to sleep until a specific condition is true, releasing the associated mutex atomically while sleeping.
pthread_cond_wait(&cond, &mutex);    // atomically release mutex + sleep
pthread_cond_signal(&cond);          // wake one waiting thread
pthread_cond_broadcast(&cond);       // wake all waiting threads
This is the standard pattern for producer-consumer queues: the consumer sleeps on the condition variable when the queue is empty; the producer signals after adding an item.

Spin Lock

A spin lock busy-waits (polls in a loop) instead of sleeping. It is faster than a mutex when the expected wait time is very short (less than a context switch), but wastes CPU cycles if the wait is long. Use spin locks for very short critical sections in kernel or real-time code; prefer mutexes in application code.

Deadlock

A deadlock occurs when a set of processes each wait for a resource held by another member of the set—none can ever proceed. The four Coffman conditions must all hold simultaneously for deadlock to occur:
ConditionMeaning
Mutual exclusionA resource can be held by only one process at a time
Hold and waitA process holds at least one resource while waiting for another
No preemptionResources cannot be forcibly taken from a process
Circular waitA cycle exists in the resource-allocation graph
Prevention: break any one condition.
  • Eliminate hold-and-wait by requiring processes to request all resources at once.
  • Allow preemption for resources that can be safely reclaimed.
  • Impose a global ordering on resource types and require processes to request in that order (breaks circular wait).
Avoidance: use the Banker’s Algorithm to admit a resource request only if the resulting state is provably safe. Detection and recovery: let deadlocks happen, detect them with a resource-allocation graph, then resolve by killing or rolling back one process.

I/O Models

How your application interacts with the OS during I/O determines its concurrency and performance characteristics.

Blocking I/O

The simplest model. A call like read() blocks the thread until data is available. One thread per connection is required, which limits scalability.

Non-blocking I/O

read() returns immediately with EAGAIN if no data is available. The application must poll repeatedly (“busy-wait”). CPU usage is high when data is sparse.

I/O Multiplexing: select / poll / epoll

A single thread monitors many file descriptors simultaneously and only acts when one is ready.
MechanismStorageMax FDsPer-call copyFind ready FDs
selectBitmap1024 (FD_SETSIZE)Full bitmap copied to kernel each callO(n) scan
pollArray (linked list)System limitFull array copied each callO(n) scan
epollRed-black tree (kernel) + ready listSystem limitRegister once with epoll_ctl; no per-call copyO(1) — only ready FDs returned
epoll is the Linux workhorse for high-concurrency servers (nginx, Redis, Node.js). It uses event-driven notification:
epoll_create()   → creates the epoll instance
epoll_ctl()      → registers/modifies/removes a socket interest
epoll_wait()     → blocks until one or more FDs are ready; returns only ready ones
epoll supports two triggering modes:
  • Level-triggered (LT): epoll_wait keeps returning the fd until its buffer is drained. Easier to use correctly; default mode.
  • Edge-triggered (ET): epoll_wait fires once per state change. Requires you to drain the buffer completely in one shot. More efficient (fewer syscalls) but harder to implement correctly.

Asynchronous I/O

The process submits an I/O request and immediately continues. The OS completes the I/O entirely in the background and notifies the process (via signal or callback) when done. Unlike I/O multiplexing, the application does not perform the actual read/write—the OS does. Linux AIO and io_uring implement this model.

Zero-Copy

Traditional file transfer requires four context switches and four data copies (disk → kernel buffer → user buffer → socket buffer → NIC). Zero-copy techniques reduce this:
  • mmap + write: maps the kernel buffer into user space; eliminates the kernel→user copy. Still two syscalls, four context switches, three data copies.
  • sendfile: one syscall copies directly from kernel buffer to socket buffer. Two context switches, three data copies.
  • sendfile with SG-DMA: on kernels ≥ 2.4, the NIC DMA engine reads directly from the kernel buffer. Two context switches, two data copies (both DMA, no CPU involvement in data movement).
Zero-copy is why tools like Kafka and Nginx can sustain very high file-serving throughput with minimal CPU overhead.