> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pebchip.top/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Computer Networking: TCP, HTTP, and Web Protocols

> Understand TCP three-way handshake, four-way teardown, HTTP versions, WebSocket, SSL/TLS, and key networking concepts for backend engineers.

Networking knowledge underpins every backend system you build. Whether you are debugging a connection timeout, choosing between HTTP/2 and HTTP/3, or hardening an API against CSRF attacks, the concepts here—TCP connection management, HTTP evolution, real-time transports, TLS, and web security—come up constantly. This page distills the most important ideas from the transport and application layers so you can reason about your systems with confidence.

## TCP Connection Management

TCP is a **connection-oriented, reliable, byte-stream** transport protocol. Before any data flows, both sides must negotiate a shared connection state. After data exchange, both sides must cleanly tear down that state.

### Three-Way Handshake

The three-way handshake synchronizes sequence numbers and confirms that both sides can send and receive.

| Step | Sender | Packet                     | Receiver | State change          |
| ---- | ------ | -------------------------- | -------- | --------------------- |
| 1    | Client | `SYN` (seq=x)              | Server   | Server: `SYN_RCVD`    |
| 2    | Server | `SYN-ACK` (seq=y, ack=x+1) | Client   | Client: `ESTABLISHED` |
| 3    | Client | `ACK` (ack=y+1)            | Server   | Server: `ESTABLISHED` |

Key properties:

* The **first two packets cannot carry data**; the third ACK and all subsequent packets can.
* Sequence numbers (`seq`) prevent duplicate or reordered packets from being misinterpreted.
* The exchange confirms a working bidirectional path before any application data is sent.

### Why Two Handshakes Are Not Enough

A two-way handshake lets the server reach `ESTABLISHED` as soon as it receives a single SYN. This creates two problems:

1. **Historical duplicates** — a delayed SYN from a previous connection can arrive and cause the server to open a connection the client never intended. With three messages the client can send a RST when it receives a SYN-ACK for a connection it did not initiate.
2. **Sequence number asymmetry** — the client never gets to confirm that the server's chosen sequence number was received, so reliable communication cannot be established.

Three-way is the theoretical minimum for a reliable, bidirectional connection.

### Four-Way Teardown

Closing a TCP connection requires four messages because each direction must be shut down independently. The side that closes first is the **active closer**; only it enters `TIME_WAIT`.

| Step | Sender         | Packet | Meaning                               |
| ---- | -------------- | ------ | ------------------------------------- |
| 1    | Active closer  | `FIN`  | "I have no more data to send."        |
| 2    | Passive closer | `ACK`  | "Received your FIN."                  |
| 3    | Passive closer | `FIN`  | "I have no more data to send either." |
| 4    | Active closer  | `ACK`  | "Received your FIN."                  |

After sending the final ACK the active closer waits in `TIME_WAIT` before fully closing.

### TIME\_WAIT: What It Is and Why It Matters

`TIME_WAIT` lasts for **2×MSL** (Maximum Segment Lifetime). MSL is the longest time any segment can survive in the network before being discarded (Linux default: 60 s, so `TIME_WAIT` lasts up to 120 s).

**Why 2×MSL?**\
If the final ACK is lost, the passive closer retransmits its FIN. That FIN takes at most one MSL to arrive; the ACK takes at most one MSL to return. `TIME_WAIT` must cover the full round trip so the active closer can respond to any retransmitted FIN.

**Problems with too many TIME\_WAIT sockets:**

* **Exhausted file descriptors** — each socket occupies an fd; Linux enforces limits via `/proc/sys/fs/file-max`, `/etc/security/limits.conf`, and `/proc/sys/fs/nr_open`.
* **Exhausted ephemeral ports** — client-side ports in `TIME_WAIT` cannot be reused; the usable range is typically `32768–61000` (tunable via `net.ipv4.ip_local_port_range`).

If you see thousands of `TIME_WAIT` sockets you are likely in a high-throughput short-connection scenario. Common mitigations include connection pooling, enabling `SO_REUSEADDR`, or tuning `tcp_tw_reuse`.

***

## HTTP Versions

HTTP has evolved through four major versions, each addressing performance bottlenecks from the previous generation.

### HTTP/1.0

Every request opened a new TCP connection. Three-way handshake overhead was paid on every resource fetch.

### HTTP/1.1

* **Persistent connections** (`Connection: keep-alive`): one TCP connection can serve multiple requests, eliminating per-request handshake cost.
* **Pipelining**: the client can send multiple requests without waiting for each response, though responses must still arrive in order (head-of-line blocking at the HTTP layer).
* **Remaining limitations**: headers are sent uncompressed on every request; server can only respond in request order; no server push.

### HTTP/2

| Feature            | Detail                                                                                                                               |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------ |
| Binary framing     | All data is encoded as binary frames (Headers Frame + Data Frame), not text                                                          |
| Header compression | HPACK eliminates redundant headers across requests on the same connection                                                            |
| Multiplexing       | Multiple independent **streams** share one TCP connection; each stream has a unique ID; frames from different streams can interleave |
| Server push        | The server can proactively send resources (e.g., CSS, JS) before the client requests them                                            |

HTTP/2 solves HTTP-level head-of-line blocking but still suffers from **TCP-level** head-of-line blocking: a lost TCP segment stalls all streams until retransmission completes.

### HTTP/3

HTTP/3 replaces TCP with **QUIC** (Quick UDP Internet Connections), a protocol built on UDP.

* **No TCP head-of-line blocking** — each QUIC stream is independent; a lost packet only stalls the stream it belongs to.
* **0-RTT reconnection** — QUIC can resume a prior session without a full handshake, reducing latency for returning users.
* **Built-in encryption** — TLS 1.3 is mandatory and integrated into the QUIC handshake.
* **Connection migration** — connections are identified by a connection ID, not by the 4-tuple (src IP, src port, dst IP, dst port), so mobile clients survive network changes without reconnecting.

***

## GET vs POST

| Property        | GET                                                             | POST                                |
| --------------- | --------------------------------------------------------------- | ----------------------------------- |
| Purpose         | Retrieve a resource                                             | Submit data / trigger a side effect |
| Data location   | URL query string                                                | Request body                        |
| Idempotent      | Yes — calling it N times has the same effect as calling it once | Not guaranteed                      |
| Safe            | Yes — no server state change intended                           | No                                  |
| Cacheable       | Yes (by default)                                                | No (by default)                     |
| Bookmark/share  | Yes                                                             | No                                  |
| Data size limit | Constrained by URL length limits (\~2 KB practical)             | No defined limit                    |

**Idempotency** means you can safely retry a GET without worrying about duplicating effects. This is why browsers and proxies freely cache GET responses and why retry logic in HTTP clients defaults to retrying only idempotent methods.

A common interview trap: GET can technically have a body (the spec allows it) but most servers and intermediaries ignore it, so you should not rely on it.

***

## WebSocket vs SSE

Both WebSocket and Server-Sent Events (SSE) solve the problem of receiving server data without constant polling. They make different trade-offs.

| Property                     | WebSocket                                         | SSE                                                            |
| ---------------------------- | ------------------------------------------------- | -------------------------------------------------------------- |
| Direction                    | Bidirectional                                     | Server → client only                                           |
| Protocol                     | Custom WS protocol (upgrade from HTTP)            | Plain HTTP (chunked `text/event-stream`)                       |
| Binary support               | Yes                                               | No (text only)                                                 |
| Auto-reconnect               | Manual (heartbeat + retry logic required)         | Built into browser `EventSource`                               |
| Firewall/proxy compatibility | Can fail through strict proxies                   | Excellent (standard HTTP)                                      |
| Browser API                  | `WebSocket`                                       | `EventSource`                                                  |
| Use cases                    | Chat, gaming, collaborative editing, live trading | Feed updates, notifications, log streaming, AI token streaming |

**When to use WebSocket:** your application needs true bidirectional real-time communication—for example, a chat room where users send and receive messages, or a collaborative document editor.

**When to use SSE:** the server pushes a stream of events and the client only reads—for example, streaming LLM output tokens, live sports scores, or server log tailing. SSE is simpler to implement and works through most corporate proxies that block WebSocket upgrades.

***

## SSL/TLS

HTTPS adds a TLS layer between TCP and HTTP. HTTP traffic is plaintext on port 80; HTTPS encrypts it on port 443.

### TLS Handshake (RSA-based)

The TLS handshake negotiates encryption keys before any HTTP data flows. With RSA key exchange it takes four messages:

1. **ClientHello** — client sends its TLS version, a random nonce (`Client Random`), and a list of supported cipher suites.
2. **ServerHello** — server responds with the chosen cipher suite, its own random nonce (`Server Random`), and its **CA-signed digital certificate** (which contains the server's public key).
3. **Client key exchange** — after verifying the certificate, the client generates a third random value (`pre-master key`), encrypts it with the server's public key, and sends it. The client also signals that it is switching to encrypted communication.
4. **Server finished** — the server decrypts `pre-master key` using its private key. Both sides now derive the same **session key** from the three randoms and switch to symmetric encryption.

### Certificate Chain Verification

When your browser receives a server certificate it verifies the chain of trust:

1. Compute a hash of the certificate contents (H1).
2. Use the CA's public key (pre-installed in the OS or browser) to decrypt the certificate's signature, obtaining H2.
3. If H1 == H2 the certificate is authentic; otherwise the browser shows a warning.

Intermediate CAs sit between the root CA and the leaf certificate. Root CAs are kept offline precisely because a compromised root invalidates the entire trust chain.

### Common TLS Errors

| Error                          | Typical Cause                               |
| ------------------------------ | ------------------------------------------- |
| `ERR_CERT_AUTHORITY_INVALID`   | Self-signed cert or missing intermediate CA |
| `ERR_CERT_DATE_INVALID`        | Certificate expired or system clock wrong   |
| `ERR_CERT_COMMON_NAME_INVALID` | Hostname does not match CN or SAN           |
| `SSL_ERROR_RX_RECORD_TOO_LONG` | Plain HTTP sent to an HTTPS port            |

***

## Security

### CSRF

**Cross-Site Request Forgery (CSRF)** tricks an authenticated user's browser into sending a forged request to a target site. Because the browser automatically attaches cookies, the target server sees a legitimate-looking request.

Three standard defenses:

1. **CSRF token** — embed a per-session random token in every form (hidden field) or request header. The server rejects any request where the token is absent or wrong. Because attackers cannot read the token from a cross-origin page, they cannot forge a valid request.

2. **`SameSite` cookie attribute** — set `SameSite=Strict` to block cookies on all cross-site requests, or `SameSite=Lax` to allow safe navigations (GET) but block cross-site POSTs.

3. **Referer validation** — check the `Referer` header to ensure requests originate from your own domain. This is weaker because the header can be suppressed.

The recommended pattern is `HttpOnly` + `SameSite=Strict` cookies combined with a CSRF token for state-changing endpoints:

```js theme={null}
// Backend: set token in an HttpOnly cookie
res.cookie('token', accessToken, {
  httpOnly: true,     // JS cannot access (prevents XSS theft)
  secure: true,       // HTTPS only
  sameSite: 'Strict', // blocks cross-site requests
  maxAge: 1000 * 60 * 60 * 24 * 7
})
```

### JWT

A **JSON Web Token** is a stateless credential that lets the server avoid storing session state. It consists of three Base64-encoded sections separated by dots:

```
header.payload.signature
```

* **Header**: algorithm and token type (`{"alg":"HS256","typ":"JWT"}`).
* **Payload**: claims such as user ID, roles, and expiry. **Do not store sensitive data here—it is not encrypted, only signed.**
* **Signature**: `HMAC_SHA256(base64(header) + "." + base64(payload), secret)`. Prevents tampering.

| Aspect      | JWT                                                  | Session-Cookie                                         |
| ----------- | ---------------------------------------------------- | ------------------------------------------------------ |
| State       | Stateless — server holds no session data             | Stateful — server must store/retrieve session          |
| Scalability | Horizontal scaling is easy (no shared session store) | Requires a shared store (e.g., Redis)                  |
| Revocation  | Hard — token is valid until expiry                   | Easy — delete the session record                       |
| Best for    | Distributed systems, microservices, APIs             | Traditional monoliths where instant revocation matters |

Always transmit JWTs over HTTPS. Store them in `HttpOnly` cookies (not `localStorage`) to prevent XSS theft.

### CORS

**Cross-Origin Resource Sharing** is the browser mechanism that restricts JavaScript from reading responses from a different origin (scheme + hostname + port). Servers opt in by returning:

```
Access-Control-Allow-Origin: https://app.example.com
Access-Control-Allow-Methods: GET, POST
Access-Control-Allow-Headers: Content-Type, Authorization
```

For **preflight requests** (non-simple methods or custom headers), the browser first sends an `OPTIONS` request. The server must respond with the appropriate `Access-Control-Allow-*` headers before the browser sends the real request. CORS is enforced by the browser, not the server—a curl command ignores it entirely.
