Every SQL query you send to PostgreSQL goes through a sophisticated pipeline before a single row is read from disk. Understanding this pipeline is the key to writing performant queries and designing effective schemas.
PostgreSQL processes queries in four stages: parsing, rewriting, planning/optimizing, and execution. The planner is where the magic happens -- it considers every possible way to execute your query and picks the cheapest one.
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.name, COUNT(o.id) as order_count
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2024-01-01'
GROUP BY u.name
ORDER BY order_count DESC
LIMIT 10;The output of EXPLAIN ANALYZE is your roadmap. Each line represents a plan node: a specific operation like a sequential scan, index scan, hash join, or sort. The indentation shows parent-child relationships, and the numbers tell you exactly how long each step took.
The planner assigns a cost to each possible execution plan and picks the cheapest one. Costs are measured in arbitrary units that roughly correspond to disk page fetches. The two key cost parameters are seq_page_cost (default 1.0) and random_page_cost (default 4.0).
Understanding the cost model explains why PostgreSQL sometimes chooses a sequential scan over an index scan. If a query returns more than roughly 10-15% of the table, a sequential scan reading pages in order is faster than an index scan jumping randomly between pages.
Pro Tip: Use
pg_stat_statementsto identify your most expensive queries in production. Optimizing the top 5 queries often yields more improvement than any infrastructure change.
PostgreSQL supports three join algorithms: Nested Loop, Hash Join, and Merge Join. The planner chooses based on table sizes, available indexes, and sort orders.
Nested Loop is best for small tables or when an index exists on the inner table. It iterates over the outer table and for each row, looks up matching rows in the inner table.
Hash Join builds a hash table from the smaller table, then probes it for each row of the larger table. It excels when neither table has a useful index on the join column.
Merge Join requires both inputs to be sorted on the join column. It is efficient for large tables that are already sorted, either through an index or a preceding sort operation.
-- Force a specific join strategy for testing
SET enable_hashjoin = off;
SET enable_mergejoin = off;
EXPLAIN ANALYZE SELECT ...; -- Will use nested loopThe planner relies heavily on table statistics to estimate the number of rows each operation will return. Running ANALYZE keeps these statistics current. The pg_stats view shows what the planner knows about each column, including the most common values, null fraction, and number of distinct values.
When statistics are stale or inaccurate, the planner makes bad choices. This is the single most common cause of query performance problems in production PostgreSQL deployments.
EXPLAIN (ANALYZE, BUFFERS) on slow queriesANALYZE on tables with stale statisticsThe query planner is one of PostgreSQL's greatest strengths. By understanding how it makes decisions, you can write queries that work with the planner rather than against it. Always let the planner do its job -- but give it the indexes and statistics it needs to do it well.
System Architecture Group
Experts in distributed systems, scalability, and high-performance computing.
Understanding the fundamental differences between HTTP/1.1 and HTTP/2 with visual diagrams and practical examples
How to run schema migrations on billion-row tables without any downtime. Covers expand-contract pattern, online DDL tools, and rollback strategies.
Why REST is failing your microservices. A gentle introduction to defining services with .proto files.