When should we stop using a monolith and extract services?

Extract a service when you hit one of these concrete signals: independent scaling needs (some components consume 10x resources), deploy frequency bottlenecks (teams blocking each other in the codebase), technology mismatch (you need Python for ML but everything else is Java), fault isolation (one bug brings down the system), or team autonomy (two teams >6 engineers stepping on each other daily). If none of these apply, stay with the monolith.

Should we use synchronous (gRPC) or asynchronous (Kafka) communication between services?

Use synchronous gRPC for queries and commands that need immediate responses, where both sides are under your control and latency is tolerable (milliseconds). Use asynchronous Kafka/RabbitMQ for cross-service workflows, data replication, and notifications where eventual consistency is acceptable. Never do synchronous-only across services — you'll cascade failures. Never do asynchronous-only for queries — you'll make the client's job impossible.

What's the difference between orchestrated and choreographed sagas?

Orchestration: a central saga coordinator calls each service in sequence and handles compensation on failure. Easier to reason about, easier to monitor. Choreography: each service publishes events and reacts to others' events, no central coordinator. Simpler for 2-3 step workflows, becomes spaghetti beyond that. For production systems, use orchestration.

How do we avoid cascading failures when one service is slow or down?

Use three patterns together: circuit breaker (fail fast, avoid thundering herd), bulkhead (isolate connection pools per downstream service), and timeouts (never block forever). Without all three, a single slow service can exhaust all your threads/connections and take down everything that depends on it.

Is the modular monolith worth the effort, or should we just split into services from the start?

The modular monolith is absolutely worth it. It forces you to prove your boundaries are correct inside a single deployment before adding network calls, separate CI/CD, and operational complexity. If you cannot enforce clean boundaries in a monolith, microservices will not save you — the coupling will just be harder to see and fix.

#microservices #system-design #distributed-systems #kafka #kubernetes

Microservices Architecture: From Monolith to Production-Ready Services

BackendBytes Engineering Team

Mar 2, 2026

14 min read

Microservices Architecture: From Monolith to Production-Ready Services

Part of Series: Microservices Architecture

Lesson 5 of 5

→Most teams extract services too early before proving boundaries work inside a monolith — they then discover hidden coupling and regret the network calls and operational complexity
→Extract reactively when you hit concrete signals: independent scaling (search CPU 80%, order CPU 20%), deploy bottlenecks (teams blocking each other), technology mismatch (need Python + GPU), fault isolation, or team autonomy — not because microservices are trendy
→Each module owns its database tables; a repository never calls a handler; services never talk directly to HTTP — enforce this inside a monolith first or the coupling will be harder to see across networks
→Sagas orchestrate distributed transactions; the outbox pattern ensures reliable event publishing (no message loss on crash); circuit breakers with bulkheads prevent cascade failures — three patterns you cannot skip

Most teams that adopt microservices do it too early. They split a 20-endpoint application into 12 services before they have 12 engineers, then spend the next year debugging network failures and deployment pipelines slower than the monolith ever was.

The monolith only becomes a problem when specific signals appear: deploys take 45 minutes, two teams block each other in the codebase, or you need to scale search independently because it consumes 10x the CPU.

TL;DR

Most microservices migrations fail because teams extract services too early without proving their boundaries work inside a monolith first^{[Beyer et al., 2016]}. Start by building a modular monolith with strict data ownership. Extract reactively when you hit concrete signals like independent scaling needs or team autonomy bottlenecks. Use sagas for distributed transactions, the outbox pattern for reliable events^{[Apache Kafka Docs]}, and circuit breakers for resilience.

Start modular monolith; prove boundaries before extracting services
Extract only when you hit concrete signals, not because of hype
Sagas + outbox + circuit breakers are the core resilience trio

graph TD
    Start[Considering microservices?] --> Q1{Hit a concrete<br/>extraction signal?}
    Q1 -->|No| Mono[Stay modular monolith ✓]
    Q1 -->|Yes — which?| Sig{Signal type}
    Sig -->|Scaling imbalance| Ext[Extract that domain]
    Sig -->|Deploy bottleneck<br/>team blocking| Ext
    Sig -->|Tech mismatch<br/>Python ML, GPU, etc.| Ext
    Sig -->|Fault isolation<br/>blast radius| Ext
    Ext --> Test{Boundary clean<br/>inside monolith?}
    Test -->|No, hidden coupling| Fix[Fix coupling first<br/>still inside monolith]
    Fix --> Test
    Test -->|Yes| Net[Lift to network:<br/>gRPC behind feature flag]
    Net --> Resil[Add resilience trio:<br/>saga + outbox + circuit breaker]
    Resil --> Done[Production-ready service]
    style Mono fill:#efe
    style Done fill:#efe
    style Fix fill:#fee

The diagram is the extraction decision tree: don't move to the network until you've proved the boundary is clean inside a single deployment. Most failed microservices migrations are teams that skipped the "fix coupling inside the monolith" step — extracting to gRPC just made the coupling harder to see.

The modular monolith: boundaries without the network

Before extracting services, prove that your boundaries are correct inside a single deployment. A modular monolith organizes code into domain modules that communicate through explicit interfaces, not direct database access. This step eliminates the biggest class of failed microservices migrations: teams that extract services only to discover they're tightly coupled and still change together.

// Each module exposes an interface — not its internals
public interface OrderModule {
    OrderDTO createOrder(CreateOrderCommand cmd);
    OrderDTO getOrder(UUID orderId);
}
 
public interface InventoryModule {
    boolean reserveStock(UUID sku, int quantity);
    void releaseStock(UUID sku, int quantity);
}
 
// Modules interact through interfaces, never through shared tables
@Service
class OrderModuleImpl implements OrderModule {
    private final InventoryModule inventory; // injected interface
    private final OrderRepository orders;    // private to this module
 
    public OrderDTO createOrder(CreateOrderCommand cmd) {
        if (!inventory.reserveStock(cmd.sku(), cmd.quantity())) {
            throw new InsufficientStockException(cmd.sku());
        }
        return OrderDTO.from(orders.save(Order.from(cmd)));
    }
}

The critical rule: each module owns its database tables. The OrderModuleImpl accesses the orders table. The InventoryModule accesses the inventory table. No module reads another module's tables directly. This enforcement inside a monolith is the whole point — you learn if your boundaries actually work before paying the operational cost of running separate services.

If you cannot enforce this discipline inside a monolith, extracting to separate services will not fix it. The coupling will just be harder to see, slower to debug, and more expensive to operate.

Test boundaries before extracting

Before extracting, replace in-process calls with gRPC behind a feature flag. If nothing breaks, the boundary is clean. If hidden coupling surfaces (shared txns, cross-module joins, circular deps), fix it inside the monolith first. The cost is far lower than fixing it across network boundaries.

Extract when signals justify it, not because it's trendy

These are the signals that justify extraction:

Signal	Example	Cost of Staying Monolith
Independent scaling	Search CPU 80%, order CPU 20%; 4× resource imbalance	Overpaying for components that don't need it
Deploy bottleneck	Teams waiting 3+ days for unrelated PRs to merge	Velocity cut by coordination overhead
Technology mismatch	Need Python + GPU for ML; rest is Java	Forcing different problems into one runtime
Fault isolation	One bug brings down the entire system	Cascading failures; unbounded blast radius
Team autonomy	Two 6+ person teams blocking each other daily	Org/arch mismatch; friction

If none apply, stay with the monolith. Every service boundary adds network latency, partial failures, distributed tracing overhead, and on-call rotation. A team of 10 with one monolith moves faster than a team of 10 split across 5 services.

Service boundaries: business capabilities, not technical layers

^{[Transactional outbox]}

Split by business capability, not technical layer. A "database service" and "API service" must change together for every feature, defeating microservices entirely.

Good boundary test: Can one team develop, test, deploy, and run this service without coordinating with other teams for 80%+ of changes? If no, you have a distributed monolith.

Data duplication across services is not a bug — it's the price of independent availability. When CustomerService changes an address, it publishes an event. OrderService ignores it (keeps historical snapshot); ShippingService consumes it for future deliveries. Result: CustomerService can be down without affecting orders and shipments.

graph TD
    GW["API Gateway"] -->|gRPC| CS["CustomerService<br/>owns: profiles, addresses"]
    GW -->|gRPC| OS["OrderService<br/>owns: carts, orders, payments"]
    GW -->|gRPC| SS["ShippingService<br/>owns: shipments, tracking"]

    CS -->|"address.changed event"| Kafka["Kafka"]
    OS -->|"order.placed event"| Kafka
    Kafka -->|consumes| SS
    Kafka -->|consumes| OS

    CS --- CDB[(Customer DB)]
    OS --- ODB[(Order DB)]
    SS --- SDB[(Shipping DB)]

Communication patterns: sync vs. async

Every inter-service call is synchronous (caller waits) or asynchronous (caller publishes and moves on).

Pattern	When	Trade-off
Sync (gRPC, HTTP)	Queries, immediate confirmations	Blocks if callee is slow; easier consistency
Async (Kafka, SQS)	Notifications, replication, workflows	Eventual consistency; ordering concerns

For internal calls both under your control with < 100ms latency, gRPC is the default. Schema as contract, code generation, low enough latency that blocking doesn't cascade. HTTP/REST works but gRPC's binary framing and HTTP/2 multiplexing are safer.

// Inventory service — gRPC server
func (s *InventoryServer) ReserveStock(
    ctx context.Context, req *pb.ReserveRequest,
) (*pb.ReserveResponse, error) {
    err := s.store.Reserve(ctx, req.Sku, int(req.Quantity))
    if err != nil {
        if errors.Is(err, ErrInsufficientStock) {
            return &pb.ReserveResponse{Success: false}, nil
        }
        return nil, status.Errorf(codes.Internal, "reserve: %v", err)
    }
    return &pb.ReserveResponse{
        Success:       true,
        ReservationId: uuid.New().String(),
    }, nil
}

When the caller does not need an immediate response — for notifications, data replication, or workflows that span multiple services — publish an event. The producing service does not know or care who consumes it. New consumers can be added without modifying the producer.

// Order service publishes an event after creating an order
type OrderCreatedEvent struct {
    OrderID    string    `json:"order_id"`
    CustomerID string    `json:"customer_id"`
    Total      int64     `json:"total_cents"`
    CreatedAt  time.Time `json:"created_at"`
}
 
func (s *OrderService) CreateOrder(ctx context.Context, cmd CreateOrderCmd) error {
    order := Order{ID: uuid.New().String()}
 
    // Write order + outbox event in the same DB transaction
    return s.repo.WithTx(ctx, func(tx *sql.Tx) error {
        if err := s.repo.InsertOrder(ctx, tx, &order); err != nil {
            return err
        }
        return s.repo.InsertOutboxEvent(ctx, tx, OutboxEvent{
            AggregateID: order.ID,
            EventType:   "order.created",
            Payload:     OrderCreatedEvent{OrderID: order.ID},
        })
    })
}

The critical constraint: never do synchronous-only across services (you'll cascade failures when any service is slow) and never do asynchronous-only for queries (the client still needs the data).

The Dual-Write Trap

Never publish an event and then update the database (or vice versa) as two separate operations. If the second fails, your systems are permanently inconsistent — you'll have published events for orders that don't exist, or orders that exist with no published events. Always use the outbox pattern: write the event to your database in the same transaction as the state change, then relay it to Kafka from there.

Saga + outbox: distributed transactions you can actually debug

^{[Transactional outbox]}

Two patterns carry the distributed-transaction story. Sagas break a multi-service operation ("create order, reserve inventory, charge payment") into steps, each with a compensating action that undoes it on downstream failure. Outbox solves the dual-write problem by writing the event into the same database transaction as the state change; a separate relay polls the outbox table (using FOR UPDATE SKIP LOCKED for concurrent relay instances) and publishes to Kafka.

The minimal outbox schema + relay-polling loop:

-- Outbox table — written in the same TX as business state
CREATE TABLE outbox_events (
    id          BIGSERIAL PRIMARY KEY,
    aggregate   TEXT NOT NULL,           -- e.g. 'order'
    aggregate_id TEXT NOT NULL,           -- e.g. order UUID
    event_type  TEXT NOT NULL,           -- e.g. 'OrderCreated'
    payload     JSONB NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    published_at TIMESTAMPTZ           -- NULL until relay publishes
);
CREATE INDEX outbox_unpublished_idx
    ON outbox_events (id) WHERE published_at IS NULL;

// Relay loop — runs on multiple replicas concurrently;
// FOR UPDATE SKIP LOCKED makes parallel relays safe.
func (r *Relay) tick(ctx context.Context) error {
    return pgx.BeginTxFunc(ctx, r.db, pgx.TxOptions{}, func(tx pgx.Tx) error {
        rows, err := tx.Query(ctx, `
            SELECT id, event_type, payload FROM outbox_events
            WHERE published_at IS NULL
            ORDER BY id
            LIMIT 100
            FOR UPDATE SKIP LOCKED
        `)
        if err != nil { return err }
        defer rows.Close()
 
        for rows.Next() {
            var id int64; var typ string; var payload []byte
            if err := rows.Scan(&id, &typ, &payload); err != nil { return err }
            if err := r.kafka.Publish(ctx, typ, payload); err != nil { return err }
            if _, err := tx.Exec(ctx, `UPDATE outbox_events SET published_at = now() WHERE id = $1`, id); err != nil {
                return err
            }
        }
        return nil
    })
}

Two production rules visible in the code: (1) SKIP LOCKED lets multiple relay replicas run safely without a leader-election dance; (2) the publish-then-mark-published order means a crash between Kafka publish and DB UPDATE results in a duplicate publish — which is fine because consumers are idempotent.

The two compose: your service applies business state + writes to the outbox in one transaction, the relay publishes the event, and the saga orchestrator reacts. Never a state where the state changed but the event didn't fire (or vice versa). Two-phase commit (2PC) is a non-starter across services — it blocks resources and fails catastrophically when the coordinator crashes.

For the full Go orchestrator skeleton, compensation-on-failure code, outbox SQL schema, and relay-polling loop, see Event-Driven Microservices in Go: Kafka, Sagas, and the Outbox Pattern. The non-negotiables for sagas:

Persist saga state to a database. An in-memory saga orchestrator that crashes mid-flight leaves orphaned work (reserved inventory, pending charges) with no recovery path.
Retry compensations with exponential backoff, dead-letter after N attempts, and alert a human. Compensations fail too.
Idempotent consumers are mandatory. At-least-once delivery + relay retries means every consumer sees duplicates.

Resilience: fail fast, isolate blast radius

Every cross-service call needs three patterns: circuit breaker (fail fast when a dependency is down rather than waiting on a timeout), bulkhead (separate connection pools per downstream so one slow service cannot exhaust all your workers), and timeouts (context deadline + per-call ceiling)^{[Beyer et al., 2016]}. Together they prevent a single slow dependency from cascading into every caller. Only retry idempotent operations — retrying CreateOrder without an idempotency key^{[Stripe idempotency]} duplicates the order.

Implementation details, the race condition most circuit-breaker code gets wrong (read-lock → write-lock transition), and the bulkhead semaphore pattern in Go are in Building Resilient Distributed Systems with Go: Circuit Breakers, Bulkheads, and Timeout Patterns.

For production implementations, field patterns, and configurations tuned to your latencies, see Building Resilient Distributed Systems with Go.

API Gateway and service mesh patterns

^{[Kubernetes docs]}

An API gateway sits between external clients and internal services, handling cross-cutting concerns: authentication, rate limiting, request routing. Keep it thin — route traffic, verify tokens, enforce limits. Do not aggregate responses in the gateway (a slow service will slow all requests). Use a Backend-for-Frontend (BFF) service for aggregation — it's a regular microservice that stitches responses together.

Production gotchas

Saga state in database. Persist saga state and run background retries. In-memory state is lost on crash.

Idempotency keys. Payment service maps (customer_id, key) → result; returns cached result on replay to prevent duplicate charges.

Probe window < breaker window. If probes run every 30s but breakers trip after 10s, service discovery reports healthy while clients hit open breakers — cascading failures.

Graceful degradation. If payment is down, reject if the client sees the error now; queue pending if you retry in background.

Pre-extraction checklist

Boundaries proven in monolith (gRPC behind feature flag, nothing breaks)
One concrete extraction signal (scaling, autonomy, fault isolation, deploy bottleneck)
Sagas persist state to database; background job retries; dead-letter after N failures
Outbox relay runs; events in same txn as state change
Circuit breakers, bulkheads, timeouts all configured
Distributed tracing wired (trace ID propagated, business context on spans)
Health checks meaningful (not just process liveness)
Each service owns its tables (no cross-service joins)

When to use each model

Factor	Monolith	Modular Monolith	Microservices
Team size	1-15 engineers	10-40 engineers	30+ engineers
Deploy frequency	Daily	Daily	Per-service, daily
Scaling	Entire application	Entire application	Individual services
Data consistency	ACID	ACID within modules	Eventual consistency
Debugging	Stack traces	Stack traces	Distributed tracing
Operational cost	One pipeline	One pipeline	N pipelines, N dashboards
Latency	Nanoseconds	Nanoseconds	Milliseconds
Blast radius	System down	Module down	Service down

The safest path: start monolith → modular monolith → extract one service → repeat only when signals justify. Some modules never leave the monolith, and that's fine. A "residual monolith" is cheaper than 3 services with 3 CI/CD pipelines, 3 health dashboards, and 3 on-call rotations.

Teams that succeed can explain exactly why each boundary exists. If the answer is "best practice," it was premature. Extract only after proving boundaries work inside a monolith, setting up distributed tracing, and hiring engineers who understand resilience patterns.

The four cross-cutting code patterns every microservice fleet shares

Distributed tracing context propagation — the W3C traceparent header, set once at the edge and threaded through every downstream call. Without it, "the request was slow" becomes unfalsifiable in a 12-service mesh:

import (
    "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
    "go.opentelemetry.io/otel"
)
 
// Wrap the inbound HTTP server: extracts traceparent from headers,
// starts a span named after the route, attaches it to ctx for handlers.
func TraceMiddleware(svc string) func(http.Handler) http.Handler {
    return func(next http.Handler) http.Handler {
        return otelhttp.NewHandler(next, svc,
            otelhttp.WithTracerProvider(otel.GetTracerProvider()))
    }
}
 
// Wrap the outbound HTTP client: injects traceparent into outgoing requests.
func TracedClient(base *http.Transport) *http.Client {
    return &http.Client{Transport: otelhttp.NewTransport(base)}
}

Idempotent event consumer — the dedupe table the saga article mentions but rarely shows. Without it, a Kafka rebalance during deploy reprocesses up to one batch worth of events:

// ProcessOnce runs handler exactly once per (topic, key, offset) combination.
// The dedupe row is inserted in the SAME transaction as the side effect,
// so the handler's writes and the dedupe entry commit atomically.
func ProcessOnce(ctx context.Context, db *sql.DB, e Event, handler func(*sql.Tx) error) error {
    tx, err := db.BeginTx(ctx, nil)
    if err != nil { return err }
    defer tx.Rollback()
 
    res, err := tx.ExecContext(ctx, `
        INSERT INTO event_dedupe (topic, partition_key, offset_id, processed_at)
        VALUES ($1, $2, $3, now())
        ON CONFLICT DO NOTHING
    `, e.Topic, e.Key, e.Offset)
    if err != nil { return err }
 
    n, _ := res.RowsAffected()
    if n == 0 {
        return nil // already processed; skip handler, commit empty tx is fine
    }
    if err := handler(tx); err != nil { return err }
    return tx.Commit()
}

A health endpoint that knows the difference between liveness and readiness — the most common production bug is one endpoint that mixes them, causing a flaky downstream to restart your healthy pod:

// /healthz — process-only check. Does NOT verify dependencies. The Kubelet
// uses this to decide if the process is wedged and needs restart.
func Healthz(w http.ResponseWriter, _ *http.Request) {
    w.WriteHeader(http.StatusOK); _, _ = w.Write([]byte("ok"))
}
 
// /readyz — checks every critical dependency. Failure removes the pod from
// the Service endpoints (so traffic stops flowing in) but does NOT restart
// the process. Critical: each check has its own short timeout.
func Readyz(deps ...func(context.Context) error) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        ctx, cancel := context.WithTimeout(r.Context(), 3*time.Second)
        defer cancel()
        for _, dep := range deps {
            if err := dep(ctx); err != nil {
                http.Error(w, err.Error(), http.StatusServiceUnavailable)
                return
            }
        }
        w.WriteHeader(http.StatusOK)
    }
}

A service-mesh sidecar config that wires it all together — Istio's DestinationRule setting per-service circuit-breaker thresholds, paired with RetryPolicy that won't amplify cascades:

# istio-destinationrule.yaml — outbound resilience config for one service
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
  name: orders-service
spec:
  host: orders.svc.cluster.local
  trafficPolicy:
    connectionPool:
      http:
        http2MaxRequests: 100
        maxRequestsPerConnection: 10
    outlierDetection:
      consecutive5xxErrors: 5      # eject host after 5 consecutive 5xx
      interval: 30s
      baseEjectionTime: 30s        # cool-off before re-adding
      maxEjectionPercent: 50       # never eject >50% of upstream pods
    retryPolicy:
      attempts: 3
      perTryTimeout: 1s
      retryOn: "5xx,reset,connect-failure"

The four pieces compose: tracing tells you where time went, dedupe makes consumers safe to replay, separated probes prevent restart cascades, and the destination rule ensures one slow upstream can't take everything down.

Frequently Asked Questions

Why extract one service instead of splitting into 12 from day one?

Because you'll get the boundaries wrong. A modular monolith forces you to prove boundaries work inside a single deployment before paying for separate CI/CD, monitoring, on-call rotation, and network latency. If you cannot enforce data ownership and clean interfaces in a monolith, microservices will not save you — the coupling will just be distributed, slower to debug, and more expensive to operate.

Sync or async between services?

Sync (gRPC) for queries and immediate confirmations. Async (Kafka) for notifications, data replication, and workflows where eventual consistency is fine. Never sync-only — you'll cascade failures. Never async-only for queries — clients still need answers.

Orchestrated vs. choreographed sagas?

Orchestration: central coordinator, easier to reason about. Choreography: services publish and react, simpler for 2-3 steps, spaghetti beyond that. Use orchestration for production.

What breaks first when you scale microservices?

Teams run out of on-call rotation. Infrastructure complexity compounds faster than team size. A 20-person team scaling from 1 monolith to 8 services adds 7 health dashboards, 7 alerting configs, 7 on-call schedules. The operational overhead grows faster than the independence benefit. Only extract when the pain from staying monolith exceeds the pain of the new operational cost.

Keep Reading

Event-Driven Microservices in Go: Kafka, Sagas, and the Outbox Pattern — The full implementation of the event-driven patterns introduced here: saga orchestration, outbox relay, and idempotent consumers
REST vs gRPC vs GraphQL: A Production Decision Guide — How to choose the right protocol for each communication pattern in your microservices architecture
Building Resilient Distributed Systems with Go — Circuit breakers, bulkheads, and graceful degradation patterns that keep microservices running when dependencies fail

Was this article helpful?

Your feedback directly shapes our editorial depth and technical accuracy.