Microservices Architecture
Overview
Microservices architecture structures an application as a collection of loosely coupled, independently deployable services, each organized around a business capability. Each service owns its data, runs in its own process, and communicates over the network.
The canonical reference is Sam Newman's Building Microservices (O'Reilly, 2nd edition 2021), supplemented by Monolith to Microservices (Newman, 2019) for migration strategies.
The First Rule of Microservices: Don't start with microservices. Start with a monolith, understand your domain, and decompose when you have evidence that the benefits outweigh the operational costs. (See dev/architecture/monoliths for the monolith-first approach.)
Service Decomposition
By Business Capability
Align services to what the business does (e.g., Order Management, Inventory, Payments). This creates stable boundaries because business capabilities change less frequently than technical layers.
By Subdomain (DDD-Aligned)
Use Domain-Driven Design bounded contexts as service boundaries (see dev/architecture/domain-driven-design):
- Core subdomains -- Your competitive advantage; build custom services.
- Supporting subdomains -- Necessary but not differentiating; simpler services or libraries.
- Generic subdomains -- Commodity; buy or use off-the-shelf (auth, email, payments).
Decomposition Heuristics
| Heuristic | Description |
|---|---|
| Single Responsibility | Each service does one thing well |
| Data ownership | Each service owns its data; no shared databases |
| Independent deployability | Changing one service does not require deploying another |
| Team alignment | One team can own and operate the service end-to-end |
| Bounded context boundary | Service boundaries align with DDD bounded contexts |
Inter-Service Communication
Synchronous Communication
| Pattern | Protocol | When to Use |
|---|---|---|
| Request/Response (REST) | HTTP/JSON | Simple CRUD, external APIs, broad tooling support |
| Request/Response (gRPC) | HTTP/2 + Protobuf | Internal service-to-service; high throughput, strong typing, streaming |
| GraphQL | HTTP/JSON | Client-driven queries; aggregating multiple services for a frontend |
Asynchronous Communication
| Pattern | Mechanism | When to Use |
|---|---|---|
| Event Notification | Message broker (topic/pub-sub) | Decoupled notification; consumers decide what to do |
| Event-Carried State Transfer | Message broker with payload | Reduce synchronous callbacks; consumer has needed data |
| Command Message | Message broker (queue) | Tell a specific service to do something |
| Async Request/Response | Correlation ID + reply queue | Need a response but don't want to block |
Rule of thumb: Prefer asynchronous communication for inter-service calls. Use synchronous only when a real-time response is required (e.g., user-facing request/response).
Communication Anti-Patterns
- Distributed monolith -- Services are "microservices" in name only; they deploy together, share databases, or cannot function independently.
- Chatty interfaces -- Excessive synchronous calls between services creating latency chains.
- Shared database -- Multiple services reading/writing the same tables destroys independent deployability.
API Gateway
An API gateway sits between external clients and internal services, providing:
- Request routing -- Routes client requests to the appropriate microservice
- Protocol translation -- External REST to internal gRPC, for example
- Authentication/Authorization -- Centralized security enforcement
- Rate limiting and throttling -- Protect services from traffic spikes
- Response aggregation -- Combine responses from multiple services for a single client call
Common implementations: Kong, AWS API Gateway, Azure API Management, Envoy, NGINX, Ocelot (.NET).
Service Mesh
A service mesh handles service-to-service networking concerns transparently via sidecar proxies:
┌──────────────────────┐ ┌──────────────────────┐
│ Service A │ │ Service B │
│ ┌────────────────┐ │ │ ┌────────────────┐ │
│ │ App Container │ │ │ │ App Container │ │
│ └───────┬────────┘ │ │ └───────▲────────┘ │
│ │ │ │ │ │
│ ┌───────▼────────┐ │ │ ┌───────┴────────┐ │
│ │ Sidecar Proxy │──┼────┼─▶│ Sidecar Proxy │ │
│ │ (Envoy) │ │ │ │ (Envoy) │ │
│ └────────────────┘ │ │ └────────────────┘ │
└──────────────────────┘ └──────────────────────┘
Control Plane (Istio / Linkerd)
Capabilities: Mutual TLS, traffic management, retries, circuit breaking, observability (distributed tracing, metrics), canary deployments.
Implementations: Istio, Linkerd, Consul Connect, AWS App Mesh.
Saga Pattern -- Distributed Transactions
Since each microservice owns its data, distributed transactions (2PC) are impractical. The saga pattern manages data consistency across services through a sequence of local transactions with compensating actions.
Choreography (Event-Driven)
Each service publishes events that trigger the next step. No central coordinator.
Order Service ──(OrderCreated)──▶ Payment Service
Payment Service ──(PaymentProcessed)──▶ Inventory Service
Inventory Service ──(InventoryReserved)──▶ Shipping Service
On failure:
Inventory Service ──(ReservationFailed)──▶ Payment Service (refund)
Payment Service ──(RefundProcessed)──▶ Order Service (cancel)
Pros: Simple, decoupled, no single point of failure. Cons: Hard to understand the overall flow; debugging is difficult; risk of cyclic dependencies.
Orchestration (Central Coordinator)
A saga orchestrator (process manager) coordinates the steps explicitly.
┌─────────────────┐
│ Saga Orchestrator│
│ (Order Saga) │
└────┬───┬───┬────┘
│ │ │
▼ ▼ ▼
Payment Inventory Shipping
Service Service Service
Pros: Clear flow, easier to understand and debug, centralized compensation logic. Cons: Orchestrator is a coupling point; risk of becoming a "god service."
Guidance: Use choreography for simple sagas (2-3 steps). Use orchestration for complex flows (4+ steps or complex compensation).
Distributed Data Management
| Pattern | Description |
|---|---|
| Database per Service | Each service has its own database; no shared access |
| API Composition | Query multiple services and aggregate results |
| CQRS | Separate read and write models for different optimization (see dev/architecture/event-driven) |
| Event Sourcing | Store state changes as events; derive current state (see dev/architecture/event-driven) |
| Saga | Manage distributed transactions through compensating actions |
| Outbox Pattern | Reliably publish events by writing to a local outbox table within the same transaction |
Service Discovery
Services need to find each other in a dynamic environment where instances come and go.
| Approach | Examples | Mechanism |
|---|---|---|
| Client-side discovery | Netflix Eureka, Consul | Client queries registry, picks instance |
| Server-side discovery | AWS ALB, Kubernetes Services | Load balancer/proxy routes to available instance |
| DNS-based | Consul DNS, Kubernetes CoreDNS | Resolve service name to IP(s) via DNS |
In Kubernetes environments, server-side discovery via Services and DNS is the default and usually sufficient.
Resilience Patterns
| Pattern | Purpose |
|---|---|
| Circuit Breaker | Stop calling a failing service; fail fast and allow recovery |
| Retry with Backoff | Retry transient failures with exponential backoff and jitter |
| Bulkhead | Isolate failures to prevent cascading (separate thread pools / connections) |
| Timeout | Set explicit timeouts on all remote calls; never wait forever |
| Fallback | Provide degraded but functional response when a service is unavailable |
| Health Check | Expose liveness and readiness endpoints for orchestrators |
When NOT to Use Microservices
Microservices introduce significant operational complexity. Do not use them when:
- Your team is small (< 8-10 developers) -- The overhead exceeds the benefit.
- Your domain is not well understood -- You will draw the wrong boundaries and create a distributed monolith.
- You lack operational maturity -- You need CI/CD, monitoring, distributed tracing, container orchestration, and on-call practices before microservices are viable.
- Latency is critical -- Every network hop adds latency; monoliths have zero network overhead for internal calls.
- Strong consistency is required everywhere -- Microservices embrace eventual consistency; if your domain requires ACID transactions across multiple entities, a monolith may be simpler.
- You are building an MVP or prototype -- Speed of iteration matters more than scalability at this stage.
Tradeoffs Summary
| Benefit | Cost |
|---|---|
| Independent deployability | Operational complexity (CI/CD per service, monitoring, tracing) |
| Technology heterogeneity | Polyglot overhead; harder to maintain standards |
| Team autonomy | Coordination overhead; contract management |
| Scalability per service | Network latency; serialization/deserialization cost |
| Fault isolation | Distributed failure modes (partial failures, network partitions) |
| Organizational alignment | Requires mature DevOps culture |
Best Practices
- Design for failure from day one: circuit breakers, retries, timeouts, bulkheads.
- Own your data: one database per service, no shared database access.
- Make inter-service communication observable: distributed tracing (OpenTelemetry), centralized logging, metrics.
- Use consumer-driven contract testing (Pact, Spring Cloud Contract) to prevent breaking changes.
- Prefer asynchronous communication; use synchronous calls only when necessary.
- Keep services small enough to be owned by a single team, but large enough to justify the operational overhead.
- Deploy independently, test independently, fail independently.