gRPC vs REST
Both gRPC and REST are request/response protocols for inter-service communication. Choosing between them depends on your audience (internal vs. external) and performance requirements.Comparison table
| Dimension | gRPC | REST |
|---|---|---|
| Schema / contract | .proto file — strict, version-controlled | OpenAPI / ad hoc — flexible but inconsistent |
| Encoding | Protocol Buffers (binary, compact) | JSON (text, human-readable) |
| Transport | HTTP/2 (multiplexed, streaming) | HTTP/1.1 (one request per connection) |
| Streaming | Unary, client stream, server stream, bidirectional | Not natively supported |
| Code generation | Stubs generated from .proto for any supported language | Manual or tooling-dependent |
| Browser support | Limited | First-class |
| Debugging | Harder (binary payload) | Easier (readable JSON) |
| Best fit | Internal service-to-service calls | Public APIs, browser clients |
When to use each
Use gRPC for internal service-to-service communication where you control both ends. The binary encoding is smaller on the wire, HTTP/2 multiplexing reduces connection overhead, and generated stubs eliminate entire classes of integration bugs. Use REST when exposing a public API, integrating with external partners, or serving browser clients. JSON is universally understood and easy to debug with standard tools.Protobuf basics
Every gRPC service is defined in a.proto file. The compiler (protoc) generates client stubs and server interfaces for your target language:
- Define the API in a
.protofile. - Compile it to stub files for each language (Go, Java, Python, etc.).
- The server implements the generated interface and starts listening.
- The client calls stub methods as if they were local functions; gRPC handles serialization and transport.
Service Discovery with etcd
etcd is a distributed, strongly consistent key-value store written in Go. It uses the Raft consensus algorithm to replicate data across a cluster of nodes. Common use cases include service registration and discovery, distributed locks, configuration sharing, and leader election. etcd’s architecture is layered:- Client layer — SDK with built-in load balancing and automatic failover.
- API network layer — v3 API uses gRPC; v3 also exposes an HTTP/1.x gateway for non-gRPC clients.
- Raft layer — handles leader election, log replication, and consistency guarantees.
- Logic layer — KV store, MVCC, leases, authentication.
- Storage layer — write-ahead log (WAL) for crash safety, boltdb for persistent data.
Basic etcdctl operations
Service registration in Go
The following pattern registers a service instance under a lease. If the instance dies, the lease expires and the key is automatically removed:Service discovery in Go
The discovery side watches a key prefix and maintains a local map of available instances:Raft consensus overview
etcd’s consistency guarantee comes from the Raft algorithm. Every write goes through a single leader who replicates the log entry to a majority of followers before acknowledging the client. Key properties:- Leader election — followers start an election if they don’t hear from the leader within a randomized timeout. A candidate wins by collecting votes from a majority of the cluster.
- Log replication — the leader appends the client’s command to its log and sends it to all followers in parallel. Once a majority acknowledge, the entry is committed and applied.
- Safety — a candidate can only win an election if its log is at least as up-to-date as the majority’s log, preventing stale data from becoming authoritative.
- Random election timeouts — prevents split-vote deadlocks by ensuring followers start elections at different times.
CAP Theorem in Practice
The CAP theorem states that a distributed system can guarantee at most two of the following three properties simultaneously:- C — Consistency: every read returns the most recently written value or an error.
- A — Availability: every request receives a (possibly stale) response; the system never refuses.
- P — Partition tolerance: the system keeps operating even when network partitions split nodes into groups that cannot communicate.
| Type | Behavior | Examples |
|---|---|---|
| CP | Rejects or delays requests when it cannot guarantee consistency | etcd, ZooKeeper, distributed relational DBs |
| AP | Returns potentially stale data rather than refusing | Redis, Cassandra, DynamoDB |
Real-world tradeoffs
CP systems (like etcd) make writes wait for a majority of nodes to acknowledge. If the cluster loses quorum, writes fail. This is the right choice for service discovery, leader election, and configuration that must be correct. AP systems (like Redis) return the latest value a node has, even if replication hasn’t caught up. This is the right choice for caches, session stores, and real-time counters where a brief inconsistency is acceptable. Choosing C or A doesn’t mean completely abandoning the other property. A CP system still tries to serve reads from the leader as fast as possible; an AP system still replicates writes in the background. It’s a sliding scale, not a binary switch.Distributed Transactions
When a business operation spans multiple services with independent databases, you cannot use a local ACID transaction. Two common approaches handle this:Two-phase commit (2PC)
A coordinator asks all participants to prepare (lock resources and guarantee they can commit), then—once all confirm—sends a commit message to all. Strengths: strong consistency, atomicity across participants. Weaknesses: the coordinator is a single point of failure; if it crashes after prepare but before commit, participants are stuck holding locks indefinitely. High latency due to two round-trips. Not suitable for microservices at scale.SAGA pattern
A SAGA is a sequence of local transactions, each with a compensating transaction that can undo its effect. If step N fails, the system executes the compensating transactions for steps N-1, N-2, … 1 in reverse order.Eventual consistency
Most microservice systems accept eventual consistency for non-critical data. A write propagates to other services asynchronously via events or message queues. Services design their reads to tolerate brief staleness, and idempotent operations ensure that re-delivered events don’t cause double-writes.SpringCloud Stack
SpringCloud provides production-ready implementations of the most common microservice patterns for Java services.Service discovery: Nacos
Nacos is a popular service registry and configuration center widely used with SpringCloud. Each service registers itself on startup and deregisters on shutdown. Consumers query Nacos for the instance list and load-balance locally.OpenFeign for HTTP service calls
OpenFeign generates HTTP client code from an annotated Java interface, eliminating manualRestTemplate wiring:
API Gateway: Spring Cloud Gateway
The gateway is a single entry point that routes requests to the appropriate microservice, enforces authentication, rate-limits traffic, and rewrites paths.Circuit breakers
A circuit breaker monitors failures on an outbound call. If the failure rate exceeds a threshold, the circuit opens and subsequent calls fail fast (or return a fallback) without waiting for the downstream timeout. This prevents a slow service from cascading failures across the system. SpringCloud integrates with Resilience4j for circuit breaker, rate limiter, and retry policies.Distributed Locks
In a single-process application you usesynchronized or ReentrantLock. In a distributed system those primitives are local to one JVM instance. You need a lock whose state is visible to all instances.
Redis-based locks (SETNX)
Redis’sSET key value NX PX milliseconds command sets a key only if it does not exist, with an expiry time. This implements a basic distributed lock:
- Lock expiry under load — if the holder takes longer than the TTL, the lock expires, another instance acquires it, and the original holder still thinks it owns the lock.
- Wrong owner release — if instance A’s lock expires and instance B acquires it, then A finishes and calls
DEL, it deletes B’s lock.
Redisson (production-ready Redis locking)
Redisson addresses both problems:- Watchdog / auto-renewal — a background thread extends the lease every 10 seconds (one-third of the default TTL) as long as the holder is still running.
- Lock identity — the lock value is
UUID + threadId. Only the exact owner can release it. - Reentrant support — the lock tracks a reentry count so the same thread can lock again without deadlocking.
etcd-based locks
etcd’s lease mechanism provides a naturally expiring distributed lock:| Redis (Redisson) | etcd | |
|---|---|---|
| Performance | Very high (sub-millisecond) | Moderate |
| Consistency | AP (strong with Redlock across replicas) | CP (Raft-backed) |
| Setup | Simple | Requires etcd cluster |
| Best for | High-throughput locks (flash sales, rate limiting) | Infrastructure-level locks (leader election, config writes) |