Microservices Architecture: From Monolith to Production-Ready Services
Key Takeaways
- →Most teams extract services too early before proving boundaries work inside a monolith — they then discover hidden coupling and regret the network calls and operational complexity
- →Extract reactively when you hit concrete signals: independent scaling (search CPU 80%, order CPU 20%), deploy bottlenecks (teams blocking each other), technology mismatch (need Python + GPU), fault isolation, or team autonomy — not because microservices are trendy
- →Each module owns its database tables; a repository never calls a handler; services never talk directly to HTTP — enforce this inside a monolith first or the coupling will be harder to see across networks
- →Sagas orchestrate distributed transactions; the outbox pattern ensures reliable event publishing (no message loss on crash); circuit breakers with bulkheads prevent cascade failures — three patterns you cannot skip
Most teams that adopt microservices do it too early. They split a 20-endpoint application into 12 services before they have 12 engineers, then spend the next year debugging network failures and deployment pipelines slower than the monolith ever was.
The monolith only becomes a problem when specific signals appear: deploys take 45 minutes, two teams block each other in the codebase, or you need to scale search independently because it consumes 10x the CPU.
Most microservices migrations fail because teams extract services too early without proving their boundaries work inside a monolith first[Beyer et al., 2016]. Start by building a modular monolith with strict data ownership. Extract reactively when you hit concrete signals like independent scaling needs or team autonomy bottlenecks. Use sagas for distributed transactions, the outbox pattern for reliable events[Apache Kafka Docs], and circuit breakers for resilience.
- Start modular monolith; prove boundaries before extracting services
- Extract only when you hit concrete signals, not because of hype
- Sagas + outbox + circuit breakers are the core resilience trio
graph TD
Start[Considering microservices?] --> Q1{Hit a concrete<br/>extraction signal?}
Q1 -->|No| Mono[Stay modular monolith ✓]
Q1 -->|Yes — which?| Sig{Signal type}
Sig -->|Scaling imbalance| Ext[Extract that domain]
Sig -->|Deploy bottleneck<br/>team blocking| Ext
Sig -->|Tech mismatch<br/>Python ML, GPU, etc.| Ext
Sig -->|Fault isolation<br/>blast radius| Ext
Ext --> Test{Boundary clean<br/>inside monolith?}
Test -->|No, hidden coupling| Fix[Fix coupling first<br/>still inside monolith]
Fix --> Test
Test -->|Yes| Net[Lift to network:<br/>gRPC behind feature flag]
Net --> Resil[Add resilience trio:<br/>saga + outbox + circuit breaker]
Resil --> Done[Production-ready service]
style Mono fill:#efe
style Done fill:#efe
style Fix fill:#fee
The diagram is the extraction decision tree: don't move to the network until you've proved the boundary is clean inside a single deployment. Most failed microservices migrations are teams that skipped the "fix coupling inside the monolith" step — extracting to gRPC just made the coupling harder to see.
The modular monolith: boundaries without the network
Before extracting services, prove that your boundaries are correct inside a single deployment. A modular monolith organizes code into domain modules that communicate through explicit interfaces, not direct database access. This step eliminates the biggest class of failed microservices migrations: teams that extract services only to discover they're tightly coupled and still change together.
// Each module exposes an interface — not its internals
public interface OrderModule {
OrderDTO createOrder(CreateOrderCommand cmd);
OrderDTO getOrder(UUID orderId);
}
public interface InventoryModule {
boolean reserveStock(UUID sku, int quantity);
void releaseStock(UUID sku, int quantity);
}
// Modules interact through interfaces, never through shared tables
@Service
class OrderModuleImpl implements OrderModule {
private final InventoryModule inventory; // injected interface
private final OrderRepository orders; // private to this module
public OrderDTO createOrder(CreateOrderCommand cmd) {
if (!inventory.reserveStock(cmd.sku(), cmd.quantity())) {
throw new InsufficientStockException(cmd.sku());
}
return OrderDTO.from(orders.save(Order.from(cmd)));
}
}The critical rule: each module owns its database tables. The OrderModuleImpl accesses the orders table. The InventoryModule accesses the inventory table. No module reads another module's tables directly. This enforcement inside a monolith is the whole point — you learn if your boundaries actually work before paying the operational cost of running separate services.
If you cannot enforce this discipline inside a monolith, extracting to separate services will not fix it. The coupling will just be harder to see, slower to debug, and more expensive to operate.
Before extracting, replace in-process calls with gRPC behind a feature flag. If nothing breaks, the boundary is clean. If hidden coupling surfaces (shared txns, cross-module joins, circular deps), fix it inside the monolith first. The cost is far lower than fixing it across network boundaries.
Extract when signals justify it, not because it's trendy
These are the signals that justify extraction:
| Signal | Example | Cost of Staying Monolith |
|---|---|---|
| Independent scaling | Search CPU 80%, order CPU 20%; 4× resource imbalance | Overpaying for components that don't need it |
| Deploy bottleneck | Teams waiting 3+ days for unrelated PRs to merge | Velocity cut by coordination overhead |
| Technology mismatch | Need Python + GPU for ML; rest is Java | Forcing different problems into one runtime |
| Fault isolation | One bug brings down the entire system | Cascading failures; unbounded blast radius |
| Team autonomy | Two 6+ person teams blocking each other daily | Org/arch mismatch; friction |
If none apply, stay with the monolith. Every service boundary adds network latency, partial failures, distributed tracing overhead, and on-call rotation. A team of 10 with one monolith moves faster than a team of 10 split across 5 services.
Service boundaries: business capabilities, not technical layers
[Transactional outbox]Split by business capability, not technical layer. A "database service" and "API service" must change together for every feature, defeating microservices entirely.
Good boundary test: Can one team develop, test, deploy, and run this service without coordinating with other teams for 80%+ of changes? If no, you have a distributed monolith.
Data duplication across services is not a bug — it's the price of independent availability. When CustomerService changes an address, it publishes an event. OrderService ignores it (keeps historical snapshot); ShippingService consumes it for future deliveries. Result: CustomerService can be down without affecting orders and shipments.
graph TD
GW["API Gateway"] -->|gRPC| CS["CustomerService<br/>owns: profiles, addresses"]
GW -->|gRPC| OS["OrderService<br/>owns: carts, orders, payments"]
GW -->|gRPC| SS["ShippingService<br/>owns: shipments, tracking"]
CS -->|"address.changed event"| Kafka["Kafka"]
OS -->|"order.placed event"| Kafka
Kafka -->|consumes| SS
Kafka -->|consumes| OS
CS --- CDB[(Customer DB)]
OS --- ODB[(Order DB)]
SS --- SDB[(Shipping DB)]
Communication patterns: sync vs. async
Every inter-service call is synchronous (caller waits) or asynchronous (caller publishes and moves on).
| Pattern | When | Trade-off |
|---|---|---|
| Sync (gRPC, HTTP) | Queries, immediate confirmations | Blocks if callee is slow; easier consistency |
| Async (Kafka, SQS) | Notifications, replication, workflows | Eventual consistency; ordering concerns |
For internal calls both under your control with < 100ms latency, gRPC is the default. Schema as contract, code generation, low enough latency that blocking doesn't cascade. HTTP/REST works but gRPC's binary framing and HTTP/2 multiplexing are safer.
// Inventory service — gRPC server
func (s *InventoryServer) ReserveStock(
ctx context.Context, req *pb.ReserveRequest,
) (*pb.ReserveResponse, error) {
err := s.store.Reserve(ctx, req.Sku, int(req.Quantity))
if err != nil {
if errors.Is(err, ErrInsufficientStock) {
return &pb.ReserveResponse{Success: false}, nil
}
return nil, status.Errorf(codes.Internal, "reserve: %v", err)
}
return &pb.ReserveResponse{
Success: true,
ReservationId: uuid.New().String(),
}, nil
}When the caller does not need an immediate response — for notifications, data replication, or workflows that span multiple services — publish an event. The producing service does not know or care who consumes it. New consumers can be added without modifying the producer.
// Order service publishes an event after creating an order
type OrderCreatedEvent struct {
OrderID string `json:"order_id"`
CustomerID string `json:"customer_id"`
Total int64 `json:"total_cents"`
CreatedAt time.Time `json:"created_at"`
}
func (s *OrderService) CreateOrder(ctx context.Context, cmd CreateOrderCmd) error {
order := Order{ID: uuid.New().String()}
// Write order + outbox event in the same DB transaction
return s.repo.WithTx(ctx, func(tx *sql.Tx) error {
if err := s.repo.InsertOrder(ctx, tx, &order); err != nil {
return err
}
return s.repo.InsertOutboxEvent(ctx, tx, OutboxEvent{
AggregateID: order.ID,
EventType: "order.created",
Payload: OrderCreatedEvent{OrderID: order.ID},
})
})
}The critical constraint: never do synchronous-only across services (you'll cascade failures when any service is slow) and never do asynchronous-only for queries (the client still needs the data).
Never publish an event and then update the database (or vice versa) as two separate operations. If the second fails, your systems are permanently inconsistent — you'll have published events for orders that don't exist, or orders that exist with no published events. Always use the outbox pattern: write the event to your database in the same transaction as the state change, then relay it to Kafka from there.
Saga + outbox: distributed transactions you can actually debug
[Transactional outbox]Two patterns carry the distributed-transaction story. Sagas break a multi-service operation ("create order, reserve inventory, charge payment") into steps, each with a compensating action that undoes it on downstream failure. Outbox solves the dual-write problem by writing the event into the same database transaction as the state change; a separate relay polls the outbox table (using FOR UPDATE SKIP LOCKED for concurrent relay instances) and publishes to Kafka.
The minimal outbox schema + relay-polling loop:
-- Outbox table — written in the same TX as business state
CREATE TABLE outbox_events (
id BIGSERIAL PRIMARY KEY,
aggregate TEXT NOT NULL, -- e.g. 'order'
aggregate_id TEXT NOT NULL, -- e.g. order UUID
event_type TEXT NOT NULL, -- e.g. 'OrderCreated'
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
published_at TIMESTAMPTZ -- NULL until relay publishes
);
CREATE INDEX outbox_unpublished_idx
ON outbox_events (id) WHERE published_at IS NULL;// Relay loop — runs on multiple replicas concurrently;
// FOR UPDATE SKIP LOCKED makes parallel relays safe.
func (r *Relay) tick(ctx context.Context) error {
return pgx.BeginTxFunc(ctx, r.db, pgx.TxOptions{}, func(tx pgx.Tx) error {
rows, err := tx.Query(ctx, `
SELECT id, event_type, payload FROM outbox_events
WHERE published_at IS NULL
ORDER BY id
LIMIT 100
FOR UPDATE SKIP LOCKED
`)
if err != nil { return err }
defer rows.Close()
for rows.Next() {
var id int64; var typ string; var payload []byte
if err := rows.Scan(&id, &typ, &payload); err != nil { return err }
if err := r.kafka.Publish(ctx, typ, payload); err != nil { return err }
if _, err := tx.Exec(ctx, `UPDATE outbox_events SET published_at = now() WHERE id = $1`, id); err != nil {
return err
}
}
return nil
})
}Two production rules visible in the code: (1) SKIP LOCKED lets multiple relay replicas run safely without a leader-election dance; (2) the publish-then-mark-published order means a crash between Kafka publish and DB UPDATE results in a duplicate publish — which is fine because consumers are idempotent.
The two compose: your service applies business state + writes to the outbox in one transaction, the relay publishes the event, and the saga orchestrator reacts. Never a state where the state changed but the event didn't fire (or vice versa). Two-phase commit (2PC) is a non-starter across services — it blocks resources and fails catastrophically when the coordinator crashes.
For the full Go orchestrator skeleton, compensation-on-failure code, outbox SQL schema, and relay-polling loop, see Event-Driven Microservices in Go: Kafka, Sagas, and the Outbox Pattern. The non-negotiables for sagas:
- Persist saga state to a database. An in-memory saga orchestrator that crashes mid-flight leaves orphaned work (reserved inventory, pending charges) with no recovery path.
- Retry compensations with exponential backoff, dead-letter after N attempts, and alert a human. Compensations fail too.
- Idempotent consumers are mandatory. At-least-once delivery + relay retries means every consumer sees duplicates.
Resilience: fail fast, isolate blast radius
Every cross-service call needs three patterns: circuit breaker (fail fast when a dependency is down rather than waiting on a timeout), bulkhead (separate connection pools per downstream so one slow service cannot exhaust all your workers), and timeouts (context deadline + per-call ceiling)[Beyer et al., 2016]. Together they prevent a single slow dependency from cascading into every caller. Only retry idempotent operations — retrying CreateOrder without an idempotency key[Stripe idempotency] duplicates the order.
Implementation details, the race condition most circuit-breaker code gets wrong (read-lock → write-lock transition), and the bulkhead semaphore pattern in Go are in Building Resilient Distributed Systems with Go: Circuit Breakers, Bulkheads, and Timeout Patterns.
For production implementations, field patterns, and configurations tuned to your latencies, see Building Resilient Distributed Systems with Go.
API Gateway and service mesh patterns
[Kubernetes docs]An API gateway sits between external clients and internal services, handling cross-cutting concerns: authentication, rate limiting, request routing. Keep it thin — route traffic, verify tokens, enforce limits. Do not aggregate responses in the gateway (a slow service will slow all requests). Use a Backend-for-Frontend (BFF) service for aggregation — it's a regular microservice that stitches responses together.
Production gotchas
Saga state in database. Persist saga state and run background retries. In-memory state is lost on crash.
Idempotency keys. Payment service maps (customer_id, key) → result; returns cached result on replay to prevent duplicate charges.
Probe window < breaker window. If probes run every 30s but breakers trip after 10s, service discovery reports healthy while clients hit open breakers — cascading failures.
Graceful degradation. If payment is down, reject if the client sees the error now; queue pending if you retry in background.
Pre-extraction checklist
- Boundaries proven in monolith (gRPC behind feature flag, nothing breaks)
- One concrete extraction signal (scaling, autonomy, fault isolation, deploy bottleneck)
- Sagas persist state to database; background job retries; dead-letter after N failures
- Outbox relay runs; events in same txn as state change
- Circuit breakers, bulkheads, timeouts all configured
- Distributed tracing wired (trace ID propagated, business context on spans)
- Health checks meaningful (not just process liveness)
- Each service owns its tables (no cross-service joins)
When to use each model
| Factor | Monolith | Modular Monolith | Microservices |
|---|---|---|---|
| Team size | 1-15 engineers | 10-40 engineers | 30+ engineers |
| Deploy frequency | Daily | Daily | Per-service, daily |
| Scaling | Entire application | Entire application | Individual services |
| Data consistency | ACID | ACID within modules | Eventual consistency |
| Debugging | Stack traces | Stack traces | Distributed tracing |
| Operational cost | One pipeline | One pipeline | N pipelines, N dashboards |
| Latency | Nanoseconds | Nanoseconds | Milliseconds |
| Blast radius | System down | Module down | Service down |
The safest path: start monolith → modular monolith → extract one service → repeat only when signals justify. Some modules never leave the monolith, and that's fine. A "residual monolith" is cheaper than 3 services with 3 CI/CD pipelines, 3 health dashboards, and 3 on-call rotations.
Teams that succeed can explain exactly why each boundary exists. If the answer is "best practice," it was premature. Extract only after proving boundaries work inside a monolith, setting up distributed tracing, and hiring engineers who understand resilience patterns.
The four cross-cutting code patterns every microservice fleet shares
Distributed tracing context propagation — the W3C traceparent header, set once at the edge and threaded through every downstream call. Without it, "the request was slow" becomes unfalsifiable in a 12-service mesh:
import (
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
"go.opentelemetry.io/otel"
)
// Wrap the inbound HTTP server: extracts traceparent from headers,
// starts a span named after the route, attaches it to ctx for handlers.
func TraceMiddleware(svc string) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return otelhttp.NewHandler(next, svc,
otelhttp.WithTracerProvider(otel.GetTracerProvider()))
}
}
// Wrap the outbound HTTP client: injects traceparent into outgoing requests.
func TracedClient(base *http.Transport) *http.Client {
return &http.Client{Transport: otelhttp.NewTransport(base)}
}Idempotent event consumer — the dedupe table the saga article mentions but rarely shows. Without it, a Kafka rebalance during deploy reprocesses up to one batch worth of events:
// ProcessOnce runs handler exactly once per (topic, key, offset) combination.
// The dedupe row is inserted in the SAME transaction as the side effect,
// so the handler's writes and the dedupe entry commit atomically.
func ProcessOnce(ctx context.Context, db *sql.DB, e Event, handler func(*sql.Tx) error) error {
tx, err := db.BeginTx(ctx, nil)
if err != nil { return err }
defer tx.Rollback()
res, err := tx.ExecContext(ctx, `
INSERT INTO event_dedupe (topic, partition_key, offset_id, processed_at)
VALUES ($1, $2, $3, now())
ON CONFLICT DO NOTHING
`, e.Topic, e.Key, e.Offset)
if err != nil { return err }
n, _ := res.RowsAffected()
if n == 0 {
return nil // already processed; skip handler, commit empty tx is fine
}
if err := handler(tx); err != nil { return err }
return tx.Commit()
}A health endpoint that knows the difference between liveness and readiness — the most common production bug is one endpoint that mixes them, causing a flaky downstream to restart your healthy pod:
// /healthz — process-only check. Does NOT verify dependencies. The Kubelet
// uses this to decide if the process is wedged and needs restart.
func Healthz(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusOK); _, _ = w.Write([]byte("ok"))
}
// /readyz — checks every critical dependency. Failure removes the pod from
// the Service endpoints (so traffic stops flowing in) but does NOT restart
// the process. Critical: each check has its own short timeout.
func Readyz(deps ...func(context.Context) error) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
ctx, cancel := context.WithTimeout(r.Context(), 3*time.Second)
defer cancel()
for _, dep := range deps {
if err := dep(ctx); err != nil {
http.Error(w, err.Error(), http.StatusServiceUnavailable)
return
}
}
w.WriteHeader(http.StatusOK)
}
}A service-mesh sidecar config that wires it all together — Istio's DestinationRule setting per-service circuit-breaker thresholds, paired with RetryPolicy that won't amplify cascades:
# istio-destinationrule.yaml — outbound resilience config for one service
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: orders-service
spec:
host: orders.svc.cluster.local
trafficPolicy:
connectionPool:
http:
http2MaxRequests: 100
maxRequestsPerConnection: 10
outlierDetection:
consecutive5xxErrors: 5 # eject host after 5 consecutive 5xx
interval: 30s
baseEjectionTime: 30s # cool-off before re-adding
maxEjectionPercent: 50 # never eject >50% of upstream pods
retryPolicy:
attempts: 3
perTryTimeout: 1s
retryOn: "5xx,reset,connect-failure"The four pieces compose: tracing tells you where time went, dedupe makes consumers safe to replay, separated probes prevent restart cascades, and the destination rule ensures one slow upstream can't take everything down.
Frequently Asked Questions
Why extract one service instead of splitting into 12 from day one?
Because you'll get the boundaries wrong. A modular monolith forces you to prove boundaries work inside a single deployment before paying for separate CI/CD, monitoring, on-call rotation, and network latency. If you cannot enforce data ownership and clean interfaces in a monolith, microservices will not save you — the coupling will just be distributed, slower to debug, and more expensive to operate.
Sync or async between services?
Sync (gRPC) for queries and immediate confirmations. Async (Kafka) for notifications, data replication, and workflows where eventual consistency is fine. Never sync-only — you'll cascade failures. Never async-only for queries — clients still need answers.
Orchestrated vs. choreographed sagas?
Orchestration: central coordinator, easier to reason about. Choreography: services publish and react, simpler for 2-3 steps, spaghetti beyond that. Use orchestration for production.
What breaks first when you scale microservices?
Teams run out of on-call rotation. Infrastructure complexity compounds faster than team size. A 20-person team scaling from 1 monolith to 8 services adds 7 health dashboards, 7 alerting configs, 7 on-call schedules. The operational overhead grows faster than the independence benefit. Only extract when the pain from staying monolith exceeds the pain of the new operational cost.
Keep Reading
- Event-Driven Microservices in Go: Kafka, Sagas, and the Outbox Pattern — The full implementation of the event-driven patterns introduced here: saga orchestration, outbox relay, and idempotent consumers
- REST vs gRPC vs GraphQL: A Production Decision Guide — How to choose the right protocol for each communication pattern in your microservices architecture
- Building Resilient Distributed Systems with Go — Circuit breakers, bulkheads, and graceful degradation patterns that keep microservices running when dependencies fail
Engineering Team
A multidisciplinary team of backend engineers, architects, and DevOps practitioners shipping deep dives into distributed systems and production infrastructure.
Read Next
Kafka vs RabbitMQ vs NATS vs SQS: Choosing the Right Message Broker
Kafka vs RabbitMQ vs NATS vs SQS: delivery semantics, ordering, throughput, ops complexity, and a decision framework with Go code.
Consistent Hashing: The Algorithm Behind Every Scalable Distributed System
Adding one cache server shouldn't invalidate every key. Consistent hashing with virtual nodes and bounded loads — full Go and Java implementations.
REST vs gRPC vs GraphQL: A Production Decision Guide
How a team serving mobile, microservices, and third-party integrations ended up running REST, gRPC, and GraphQL together.