Skip to content

TCP vs UDP vs QUIC: Protocol Selection Under Production Load

BackendBytes Engineering Team
BackendBytes Engineering Team
9 min read
TCP vs UDP vs QUIC: Protocol Selection Under Production Load

Key Takeaways

  • TCP head-of-line blocking: a single lost packet queues all subsequent packets across all streams until the lost one retransmits — on lossy networks, a dropped packet can stall 5+ fresh prices for 50ms
  • UDP has zero handshake overhead (0-RTT) but zero delivery guarantees — the application is responsible for sequence numbers, deduplication, and congestion control
  • QUIC (HTTP/3) multiplexes independent streams over UDP so packet loss on stream 3 doesn't block streams 1, 2, and 4 — the key fix for mobile networks with 1-5% packet loss
  • BBR congestion control estimates bottleneck bandwidth instead of waiting for loss; on high-latency WAN links (100ms+), BBR recovers 2-25% faster than CUBIC's loss-based approach

The classic real-time-feed-on-TCP production incident. A trading or telemetry feed runs over WebSocket (TCP). During high-loss conditions — congested mobile networks, market-volatility traffic spikes — TCP's head-of-line blocking surfaces: a single lost packet stalls every subsequent packet on the same connection until retransmission completes, adding 200 ms to 2 seconds of staleness to every following price. We debugged this on multiple production low-latency feeds.

The fix is surgical: move time-sensitive feeds off TCP. For a price feed, a lost packet means "skip to the next one" — not "retransmit and delay everything else." Move to UDP[RFC 768] with application-level sequence numbers (drop out-of-order, no retransmit), or to QUIC[RFC 9000, 2021] which keeps reliability per-stream so loss on one stream doesn't block others.

Bottom Line

TCP[RFC 9293] guarantees delivery and ordering but blocks on lost packets; UDP[RFC 768] is instant but lossy; QUIC[RFC 9000, 2021] takes the best of both via UDP with stream-level reliability. Pick TCP for correctness (payments, orders), UDP for latency (gaming, feeds), and HTTP/3 (QUIC) for mobile-heavy web traffic.

  • TCP: Best for APIs, chat, payments — correctness before speed
  • UDP: Best for real-time feeds, gaming, VoIP — speed before reliability
  • QUIC/HTTP/3: Best for web traffic — modern mobile + concurrent requests

The quick start

Choose your transport based on what the data requires, not what feels default:

Use caseProtocolHandshakeLoss handlingOrderingBest for
Payments, orders, authTCP1 RTT + TLSRetransmit until successStrictCorrectness non-negotiable
Real-time feeds, gaming, VoIPUDP0 RTTDrop & move onOptional (you add)Latency > delivery
Web APIs, chat, databasesTCP1 RTT + TLSRetransmit until successStrictStandard middleware
Mobile web, concurrent HTTPQUIC/HTTP/31 RTT (0-RTT return)Retransmit per-streamPer-streamModern browsers + lossy links

TCP: Correctness at the Cost of Speed

[RFC 9293]

TCP's five guarantees — ordered delivery, reliability, flow control, congestion control, checksums — come with latency overhead. You get a promise: every byte you send will arrive exactly once, in order, and the sender will slow down if the network can't keep up.

Setup overhead: The three-way handshake (SYN, SYN-ACK, ACK) burns one RTT before data moves. Add TLS 1.3 and you're at 2 RTTs (40ms on a 20ms cross-datacenter link). For file transfers and databases, this cost is negligible. For real-time feeds, it's fatal.

sequenceDiagram
    participant C as Client
    participant S as Server

    Note over C,S: TCP — 1 RTT before data flows
    C->>S: SYN
    S-->>C: SYN-ACK
    C->>S: ACK + data
    S-->>C: response

    Note over C,S: UDP — 0 RTT, fire and forget
    C->>S: datagram
    Note right of S: no ack
    C->>S: datagram
    C->>S: datagram
    Note right of S: packet 2 lost?<br/>App decides what to do.

The gap matters on every connection: TCP spends the first RTT negotiating; UDP sends the first byte immediately. For a ticker feeding 100 updates/sec, the one-RTT setup is noise. For a connection handshake that's a one-shot, TCP's overhead amortises over millions of packets. For real-time voice, gaming, or telemetry, UDP wins because you'd rather lose a packet than wait for its retransmit.

Head-of-line blocking: TCP buffers out-of-order packets until the missing one arrives. If packet B is lost and C, D arrive, TCP holds them until B retransmits — even though C and D are already there. For a price feed sending 100 ticks/sec, a 50ms retransmit delay stalls 5 fresh prices. By the time they deliver, they're worthless.

Congestion control tuning matters: Linux ships with CUBIC (default) and BBR. CUBIC is loss-based — it ramps the congestion window until a packet drops, then backs off. It's conservative and fair to other flows but ramps slowly on high-latency WAN links (can take minutes to recover after loss). BBR is model-based — it estimates bottleneck bandwidth and minimum RTT, then paces packets to match. BBR fills the pipe faster and recovers quicker but can starve CUBIC flows on shared links. For internal datacenter traffic with high bandwidth/latency (cross-region replication), enable BBR:

sudo sysctl -w net.ipv4.tcp_congestion_control=bbr

When to use TCP: APIs, chat, payments, order execution, state mutations. Any message where a dropped packet means a broken guarantee. The latency cost is the price of correctness.

For the full request lifecycle with TCP handshake and TLS negotiation, see What Happens When You Type a URL.

UDP: Speed Over Reliability

[RFC 768]

UDP is the opposite bargain: send a packet to an IP:port, that's it. No connection, no ACKs, no retransmissions, no ordering. If it arrives, fine. If it gets lost in a congested router buffer, gone. The application is responsible for everything.

What you gain:

  • 0-RTT startup: No handshake before data flows
  • No blocking on loss: Each datagram is independent — lose packet 5, and packet 6 arrives instantly
  • You control send rate: No congestion window slowing you down
  • Minimal overhead: 8-byte header vs TCP's 20+ bytes

What you lose:

  • No retransmission: Lost packets are gone
  • No ordering guarantee: Datagrams may arrive out of order
  • No flow control: You can overwhelm a slow receiver
  • You implement reliability: Sequence numbers, ACKs, and congestion control are your problem

The key insight: UDP is right when stale data is worse than no data. A game state update from 5ms ago is useless; you want the latest. A price tick from 50ms ago is worthless; you want now. For DNS lookups, VoIP, multiplayer games, and real-time telemetry, UDP is the only choice.

Here's a production UDP price-feed server in Go with sequence-based deduplication to discard stale packets:

package main
 
import (
    "bytes"
    "encoding/binary"
    "math"
    "net"
    "sync"
    "sync/atomic"
    "time"
)
 
type PriceUpdate struct {
    Sequence    uint64
    Symbol      [8]byte
    Price       float64
    TimestampMs int64
}
 
func (s *PriceFeedServer) PublishPrice(symbol string, price float64) {
    seq := atomic.AddUint64(&s.seq, 1)
    update := PriceUpdate{
        Sequence:    seq,
        Price:       price,
        TimestampMs: time.Now().UnixMilli(),
    }
    copy(update.Symbol[:], symbol)
    buf := make([]byte, 32)
    binary.BigEndian.PutUint64(buf[0:8], update.Sequence)
    copy(buf[8:16], update.Symbol[:])
    binary.BigEndian.PutUint64(buf[16:24], math.Float64bits(update.Price))
    binary.BigEndian.PutUint64(buf[24:32], uint64(update.TimestampMs))
    s.clients.Range(func(_, value any) bool {
        s.conn.WriteToUDP(buf, value.(*net.UDPAddr))
        return true
    })
}
 
func (c *PriceFeedClient) Receive(handler func(string, float64, int64)) {
    buf := make([]byte, 32)
    for {
        c.conn.SetReadDeadline(time.Now().Add(5 * time.Second))
        n, err := c.conn.Read(buf)
        if err != nil || n < 32 {
            continue
        }
        seq := binary.BigEndian.Uint64(buf[0:8])
        symbol := string(bytes.TrimRight(buf[8:16], "\x00"))
        lastSeq, _ := c.lastSeq.LoadOrStore(symbol, uint64(0))
        if seq <= lastSeq.(uint64) {
            continue
        }
        c.lastSeq.Store(symbol, seq)
        price := math.Float64frombits(binary.BigEndian.Uint64(buf[16:24]))
        ageMs := time.Now().UnixMilli() - int64(binary.BigEndian.Uint64(buf[24:32]))
        handler(symbol, price, ageMs)
    }
}

What it doesn't do: retransmissions, ACKs, or ordering guarantees. Dropped packets mean skipped ticks. For order execution or any message you cannot afford to lose, use TCP or a durable queue like Kafka with TCP backing.

TCP Production Tuning

Connection pooling amortizes handshake cost. Key settings:

transport := &http.Transport{
    MaxIdleConnsPerHost:  100,      // Default: 2
    IdleConnTimeout:      90 * time.Second,
    DialContext: (&net.Dialer{
        Timeout:   5 * time.Second,
        KeepAlive: 30 * time.Second,
    }).DialContext,
    TLSHandshakeTimeout:    5 * time.Second,
    ResponseHeaderTimeout:  10 * time.Second,
}
client := &http.Client{Transport: transport, Timeout: 30 * time.Second}

Server-side kernel tuning for high connection counts:

sudo sysctl -w net.core.somaxconn=65535
sudo sysctl -w net.ipv4.tcp_fastopen=3           # 0-RTT on returns
sudo sysctl -w net.core.rmem_max=16777216        # 16MB socket buffers
sudo sysctl -w net.ipv4.tcp_tw_reuse=1           # Reuse TIME_WAIT sockets
sudo sysctl -w net.ipv4.ip_local_port_range="1024 65535"

QUIC and HTTP/3: The Best of Both

QUIC (RFC 9000) is UDP with TCP's reliability features, but without head-of-line blocking. Each stream is independent — lose a packet on stream 1, and stream 2 keeps flowing unaffected. QUIC was built for the internet's actual conditions: high latency, mobile handoffs (IP changes), and packet loss.

The head-of-line-blocking comparison in one picture — the entire reason QUIC exists:

graph TB
    subgraph TCP[TCP — head-of-line blocking]
        T1[stream A: packets 1, 2, 3] --> Tlost[packet 2 lost]
        Tlost --> Tstall[ALL streams stall<br/>waiting for retransmit]
        T2[stream B: packets 1, 2, 3] -.->|blocked too —<br/>shares the connection| Tstall
        Tstall --> Trecv[receive in order<br/>after retransmit]
    end
    subgraph QUIC[QUIC — stream-independent reliability]
        Q1[stream A: packets 1, 2, 3] --> Qlost[packet 2 lost on stream A]
        Qlost --> Qa[stream A retransmits<br/>independently]
        Q2[stream B: packets 1, 2, 3] -->|unaffected| Qb[stream B keeps<br/>flowing]
    end
    style Tstall fill:#fdd
    style Qb fill:#dfd
    style Qa fill:#ffd

The diagram captures why a 1 percent loss kicks HTTP/3 ahead of HTTP/2 by 20-40 percent on concurrent requests: TCP serialises everything behind the lost packet; QUIC lets unaffected streams continue.

Key wins over TCP:

  • Handshake: 1-RTT initial (vs TCP 1-RTT + TLS 1-RTT = 2 RTTs)
  • Return connections: 0-RTT with session resumption
  • Connection migration: Phone switches WiFi to LTE without breaking the connection (QUIC uses Connection ID, not 4-tuple)
  • Stream independence: Packet loss on one stream doesn't block others
  • Built-in encryption: TLS 1.3 is integral, not layered on top

Under 1% simulated packet loss, HTTP/3 over QUIC shows 20–40% lower latency than HTTP/2 for concurrent requests. The practical play: enable HTTP/3 at your CDN (Cloudflare, Fastly, AWS CloudFront all support it) and let the CDN handle QUIC termination and multiplexing. Building a QUIC server from scratch is rarely justified unless you're a CDN or VPN provider managing high-concurrency public traffic. [RFC 9000, 2021]

Debugging and Observability

When things go wrong, understand where:

# Show all listening sockets with protocol
ss -tuln
 
# Spot TCP connection state issues (TIME_WAIT buildup)
ss -tan | awk '{print $1}' | sort | uniq -c | sort -rn
 
# Capture TCP handshakes in real time
tcpdump -i eth0 'tcp[tcpflags] & (tcp-syn) != 0' -n
 
# Capture UDP traffic on port
tcpdump -i eth0 udp port 9000
 
# Check interface packet loss
ip -s link show eth0  # Look for RX errors, dropped, overrun
 
# Measure latency to a host
ping -c 100 example.com | tail -1
 
# Test TCP throughput
iperf3 -s                    # server
iperf3 -c server_ip -t 30    # client
 
# Test UDP throughput
iperf3 -c server_ip -u -b 100M -t 30
 
# Simulate packet loss to test protocol behavior
sudo tc qdisc add dev eth0 root netem loss 2%
sudo tc qdisc del dev eth0 root  # Clean up
``` <Cite id="iproute2-docs" />
 
To check if HTTP/3 negotiation is working:
 
```bash
curl -v --http3 https://example.com 2>&1 | grep -E "alt-svc|HTTP/"
# Look for: < alt-svc: h3=":443"; ma=86400
# And:      Using HTTP/3

Production Checklist

When architecting a transport layer, work through this decision tree:

Protocol selection:

  • Use TCP for payments, orders, authentication, state mutations — any workload where a dropped or reordered packet breaks the guarantee
  • Use UDP for real-time feeds, gaming, VoIP, sensor telemetry — where stale data is worse than no data, and you can tolerate loss
  • Use QUIC/HTTP/3 for mobile-heavy web traffic with concurrent requests — enable at your CDN and let it handle QUIC termination
  • Hybrid approach: Use both (e.g., UDP for price feeds, TCP for order execution as in the opening example)

TCP tuning (if applicable):

  • Client-side: Raise MaxIdleConnsPerHost to match concurrency (default 2 is often too low)
  • Server-side: Enable tcp_fastopen=3 for 0-RTT on returning clients
  • Kernel: Increase socket buffers (rmem_max, wmem_max to 16MB+) for high-throughput datacenter transfers
  • Kernel: Set tcp_tw_reuse=1 to avoid TIME_WAIT socket exhaustion on servers with many short-lived connections

UDP safety:

  • Implement sequence numbers if you need to discard stale or duplicate packets
  • Implement your own congestion control — pacing at the application level — to avoid saturating shared links
  • Monitor packet loss under load — use tc qdisc to simulate loss and verify graceful degradation

Observability:

  • Monitor TCP connection states — TIME_WAIT accumulation indicates connection churn
  • Profile handshake overhead — measure cumulative time spent on 3-way handshakes under realistic load
  • Test protocol behavior under loss — use synthetic packet loss to validate choices before they break in production

The diagnostic toolkit: kernel knobs, loss simulation, latency probe

A sysctl block that captures the production-tested defaults for high-RPS TCP servers — paste it into your AMI bake or DaemonSet so kernel tuning travels with the deploy:

# /etc/sysctl.d/99-tcp-tuning.conf — production defaults
# Apply with: sysctl -p /etc/sysctl.d/99-tcp-tuning.conf
 
# Larger socket buffers for high-bandwidth × latency datacenter transfers.
# Raises BDP ceiling — a 10 Gbit link with 1ms RTT = 1.25 MB BDP.
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
 
# TIME_WAIT sockets eat ephemeral ports under high connection churn.
# tcp_tw_reuse=1 lets the kernel reuse TIME_WAIT slots SAFELY (RFC 6191).
# Do NOT enable tcp_tw_recycle — it breaks NAT and was removed in 4.12.
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65535
 
# 0-RTT for returning clients (fastopen=3 = both client and server).
# Saves the 3-way handshake on warm reconnects; ~20-50ms per connection.
net.ipv4.tcp_fastopen = 3
 
# BBR congestion control — meaningfully better than CUBIC on lossy WAN paths.
# Flip with caution; some middleboxes punish unrecognised cwnd patterns.
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

Loss simulation with tc qdisc — adds 5% packet loss + 50ms latency to a single interface so a UDP-based service can be tested under real-world WAN conditions: [iproute2 (ss/ip)]

# Inject loss + latency on eth0 outbound. Run inside a netns or test pod.
tc qdisc add dev eth0 root netem loss 5% delay 50ms 10ms distribution normal
 
# Verify with iperf3 or ping:
ping -c 100 8.8.8.8 | tail -3
# rtt min/avg/max/mdev = 51.2/57.8/89.4/8.1 ms — packet loss surfaces in mdev
 
# Clean up:
tc qdisc del dev eth0 root

A Go latency-probe that sweeps TCP vs UDP vs QUIC against the same target — paste-ready for an SRE dashboard that validates the protocol-choice assumption holds under your network conditions:

package probe
 
import (
    "context"
    "net"
    "time"
)
 
// MeasureRoundTrip returns the wire RTT for one of {tcp, udp} to addr:port.
// For QUIC, use http3.RoundTrip; the API surface is similar.
func MeasureRoundTrip(ctx context.Context, network, addr string) (time.Duration, error) {
    var d net.Dialer
    start := time.Now()
    conn, err := d.DialContext(ctx, network, addr)
    if err != nil { return 0, err }
    rtt := time.Since(start)
 
    // For UDP we have no handshake — measure send→reply round trip explicitly.
    if network == "udp" {
        _, _ = conn.Write([]byte("ping"))
        buf := make([]byte, 64)
        rttStart := time.Now()
        _ = conn.SetReadDeadline(time.Now().Add(3 * time.Second))
        _, _ = conn.Read(buf)
        rtt = time.Since(rttStart)
    }
    _ = conn.Close()
    return rtt, nil
}

Pair the probe with a 1-minute Prometheus recording rule so the protocol RTT becomes a graphable signal — when "TCP feels slower than UDP" turns into a real customer report, you have data instead of guesses:

# probe-rules.yml — recording rules for transport-layer latency dashboards
groups:
  - name: transport-probes
    interval: 60s
    rules:
      - record: probe_rtt_seconds:tcp:p95
        expr: histogram_quantile(0.95, sum(rate(probe_rtt_seconds_bucket{network="tcp"}[5m])) by (le, target))
      - record: probe_rtt_seconds:udp:p95
        expr: histogram_quantile(0.95, sum(rate(probe_rtt_seconds_bucket{network="udp"}[5m])) by (le, target))

The ratio tcp:p95 / udp:p95 is the canonical "is TCP overhead worth it on this path?" metric — under 1.3 means the TCP overhead is negligible; over 2 means you have a packet-loss problem that's amplifying retransmits and a UDP-based protocol would be measurably faster.

QUIC Connection Migration: Surviving the IP Change

The single most underrated QUIC feature is connection migration. TCP identifies a connection by the 4-tuple of source IP, source port, destination IP, destination port. Change any one of those — and a phone walking out of a cafe does that the moment it switches from WiFi to LTE — and the kernel tears the socket down. Every in-flight HTTP request fails, the application gets a connection reset, and the user sees a spinner while the browser reopens TCP, redoes TLS, and replays the request. On a typical mobile session, this is a 200-400ms stall that happens every time the radio handoff fires.

QUIC sidesteps the 4-tuple entirely. Each connection carries an opaque 64-bit Connection ID that both peers negotiate during the handshake. Packets are routed by Connection ID, not by IP — so when the client's IP changes, the server still recognises the connection from the ID and continues delivering data on the existing streams. The handoff cost drops from a full handshake-and-retry to a single round-trip path validation.

Walk through the IP-change scenario step by step. The user is mid-stream watching a video on HTTP/3 when they leave WiFi:

T0: WiFi connection established — Connection ID = 0xa3f7...
    Client IP = 192.168.1.42, Server IP = 203.0.113.10
    Streams 1, 3, 5 carrying video segments; HPACK table warm.
 
T1: WiFi signal drops — radio switches to LTE.
    Client IP changes: 192.168.1.42 -> 100.64.7.81 (carrier NAT).
    Client OS surfaces an "IP changed" event to the QUIC stack.
 
T2: Client sends a PATH_CHALLENGE frame on the new path.
    Same Connection ID 0xa3f7..., new source IP 100.64.7.81.
    Frame contains 8 random bytes the server must echo.
 
T3: Server receives the packet, sees Connection ID 0xa3f7...,
    recognises the connection, but treats the new path as
    UNVALIDATED — caps send rate to 3x the bytes received
    on this path (anti-amplification rule, RFC 9000 §8.2).
 
T4: Server replies with PATH_RESPONSE echoing the 8 bytes.
    Round trip 1 complete — path is validated.
 
T5: Server lifts the amplification cap, resumes full pacing.
    Streams 1, 3, 5 keep flowing — no streams reset, no
    HTTP requests replayed, no TLS handshake. The video
    stalls for one RTT (~50ms on LTE) instead of 300ms.

The anti-amplification rule at T3 is what stops attackers from using QUIC servers as DDoS reflectors — without it, a spoofed source IP could trick the server into firing a flood at an unsuspecting victim. The 3x cap means the worst an attacker can amplify is the bytes they actually sent. [RFC 9000, 2021]

Connection IDs are also rotated. Both peers can issue NEW_CONNECTION_ID frames to introduce fresh IDs, then retire old ones with RETIRE_CONNECTION_ID. This prevents a passive observer who saw the original handshake from linking a user's WiFi traffic to their LTE traffic — the Connection ID changes across the migration, so to a network observer, the LTE flow looks unrelated. Privacy by construction.

To watch migration in production, capture a QUIC trace and grep for path frames:

# Capture QUIC traffic on the wire (UDP/443 by default).
sudo tcpdump -i any -w quic.pcap 'udp port 443'
 
# Decode with Wireshark — needs SSLKEYLOGFILE export from the client
# so Wireshark can decrypt the QUIC payloads.
SSLKEYLOGFILE=/tmp/quic.keys curl --http3 https://example.com/
wireshark -o tls.keylog_file:/tmp/quic.keys quic.pcap
 
# Filter in Wireshark for migration events:
#   quic.frame_type == 0x1a   -- PATH_CHALLENGE
#   quic.frame_type == 0x1b   -- PATH_RESPONSE
#   quic.frame_type == 0x18   -- NEW_CONNECTION_ID

The quic-go library exposes a Tracer hook so server-side code can record migrations directly without packet capture — log the event with the new remote address and increment a Prometheus counter so the dashboard reflects how often clients are actually migrating in your traffic mix. On a mobile-heavy property, expect 10-30 percent of long-lived connections to migrate at least once per session — every one of those is a stall HTTP/2 over TCP would have suffered and HTTP/3 over QUIC absorbed silently.

Frequently Asked Questions

What is head-of-line blocking in TCP?

Head-of-line blocking occurs when TCP holds back newer data in the stream while waiting for a lost packet to be retransmitted. All subsequent packets queue behind the lost one, even if they have already arrived, causing latency spikes for time-sensitive applications.

When should you use UDP instead of TCP?

Use UDP when data is time-sensitive and stale data is worthless — real-time video, game state updates, live price feeds, and DNS lookups. UDP drops lost packets instead of retransmitting them, so the receiver always gets the latest data.

How does QUIC solve TCP's head-of-line blocking problem?

QUIC multiplexes independent streams over a single UDP connection. A lost packet only blocks the stream it belongs to — other streams continue delivering data. QUIC also integrates TLS 1.3 into the handshake for 0-RTT connection setup.

What is the difference between CUBIC and BBR congestion control?

CUBIC is loss-based — it increases the congestion window until a packet drops, then backs off. BBR is model-based — it estimates bottleneck bandwidth and minimum RTT to pace packets without waiting for loss. BBR recovers faster but can be less fair to CUBIC flows.

Keep Reading

  • What Happens When You Type a URL — The full request lifecycle from DNS resolution through TCP/TLS handshake to application response, with debugging tools for each layer
  • HTTP Protocol Evolution Guide — HTTP/1.1 keep-alive through HTTP/2 multiplexing to HTTP/3 over QUIC, and what each version changed at the transport layer
  • DNS Records Production Guide — A, AAAA, CNAME, SRV, and the TTL tradeoffs that affect every TCP connection before it starts
BackendBytes Engineering Team
BackendBytes Engineering Team

Engineering Team

A multidisciplinary team of backend engineers, architects, and DevOps practitioners shipping deep dives into distributed systems and production infrastructure.

Read Next