Abstract data flow representing database performance
Database

Mastering PostgreSQL Performance: Indexing, Query Optimization, and Beyond

A deep dive into PostgreSQL performance tuning — from indexing strategies to query analysis to connection pooling — to keep your database blazing fast.

Protize Team
#postgresql #database #performance #sql #optimization

Your application can have the cleanest architecture in the world — perfect separation of concerns, beautiful TypeScript types, elegant API design — but if your database queries are slow, your users will suffer. A 500ms database query can make an otherwise snappy application feel sluggish. A 5-second query can make it feel broken.

PostgreSQL is extraordinarily powerful, but most developers use only a fraction of its capabilities. In this deep-dive, we’ll cover everything you need to go from “the database is probably fine” to genuine confidence in your database performance.


Understanding Query Execution

Before you can optimize anything, you need to understand what PostgreSQL is actually doing when it runs your queries. The EXPLAIN ANALYZE command is your X-ray machine:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
  u.name,
  u.email,
  COUNT(p.id) as post_count,
  MAX(p.published_at) as last_post_date
FROM users u
LEFT JOIN posts p ON u.id = p.author_id
WHERE u.created_at > '2025-01-01'
  AND p.published = true
GROUP BY u.id, u.name, u.email
ORDER BY post_count DESC
LIMIT 10;

The output looks intimidating at first, but you only need to understand a few key nodes:

Node TypeWhat It MeansGood or Bad?
Seq ScanReading entire table row by row🔴 Usually bad on large tables
Index ScanUsing a B-tree index✅ Good
Index Only ScanAll data comes from the index✅ Best
Bitmap Heap ScanUsing index, then fetching rows🔶 OK
Hash JoinJoining using a hash table✅ Good for large datasets
Nested LoopFor each row in A, scan B⚡ Great with indexes
SortSorting in memory or disk🔶 Check if index can sort

Indexing Strategies

Indexes are the single biggest performance lever in PostgreSQL. But indexes are not free — they consume disk space and slow down writes. The goal is to add indexes strategically.

Rule of Thumb

Add an index when:

  1. The column appears in WHERE, JOIN ON, or ORDER BY clauses
  2. The table has more than 10,000 rows
  3. The query is called frequently
  4. The column has high cardinality (many distinct values)

1. Basic B-Tree Index

The default and most common index type:

-- Index for email lookups (login)
CREATE INDEX idx_users_email ON users(email);

-- Index for filtering published posts
CREATE INDEX idx_posts_published ON posts(published);

-- Index for sorting by date (DESC for newest-first queries)
CREATE INDEX idx_posts_published_at ON posts(published_at DESC);

2. Composite Indexes

When you always filter on multiple columns together, one composite index beats two single indexes:

-- Query: WHERE author_id = $1 AND published = true ORDER BY published_at DESC
-- A composite index handles all three in one shot:
CREATE INDEX idx_posts_author_published_date
ON posts(author_id, published, published_at DESC);

Column order matters. The leftmost column must match the first filter in your query. PostgreSQL can use a composite index on (author_id, published) for queries filtering on author_id alone, but NOT for queries filtering on published alone.

-- ✅ This query uses idx_posts_author_published_date
WHERE author_id = 5 AND published = true

-- ✅ This also uses it (leftmost prefix match)
WHERE author_id = 5

-- ❌ This does NOT use it (doesn't start with author_id)
WHERE published = true

3. Partial Indexes

A partial index only indexes rows matching a condition. If you have 10 million posts but only 50,000 are published, a partial index on published posts is tiny and fast:

-- Much smaller index — only covers published posts
CREATE INDEX idx_posts_published_only
ON posts(published_at DESC, author_id)
WHERE published = true;

-- This query uses the partial index instantly
SELECT * FROM posts
WHERE published = true
ORDER BY published_at DESC
LIMIT 20;

4. Full-Text Search Index

PostgreSQL has built-in full-text search — no external service needed:

-- Add a computed tsvector column for efficient full-text search
ALTER TABLE posts ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(excerpt, '')), 'B') ||
    setweight(to_tsvector('english', coalesce(content, '')), 'C')
  ) STORED;

-- Create a GIN index for fast full-text queries
CREATE INDEX idx_posts_search ON posts USING GIN(search_vector);

-- Full-text search query with ranking
SELECT
  id,
  title,
  excerpt,
  ts_rank(search_vector, query) AS rank
FROM posts, to_tsquery('english', 'javascript & performance & nextjs') query
WHERE search_vector @@ query
  AND published = true
ORDER BY rank DESC
LIMIT 10;

5. Expression Indexes

Index the result of a function — useful for case-insensitive searches:

-- Allow case-insensitive email lookups without full-table scan
CREATE INDEX idx_users_email_lower ON users(lower(email));

-- Now this query is fast even without exact case matching
SELECT * FROM users WHERE lower(email) = lower('User@Example.COM');

Query Optimization Techniques

Avoid SELECT *

Always specify the columns you need. SELECT * fetches all columns including large text fields you may not need:

-- ❌ Bad — fetches entire content column on every row
SELECT * FROM posts ORDER BY published_at DESC LIMIT 20;

-- ✅ Good — fetches only what's needed for the listing
SELECT id, title, slug, excerpt, cover_image, published_at
FROM posts
WHERE published = true
ORDER BY published_at DESC
LIMIT 20;

Use CTEs for Readability Without Sacrificing Performance

-- Find the top 5 most active authors in the last 30 days
WITH recent_posts AS (
  SELECT author_id, COUNT(*) as post_count
  FROM posts
  WHERE published_at > NOW() - INTERVAL '30 days'
    AND published = true
  GROUP BY author_id
),
ranked_authors AS (
  SELECT
    u.id,
    u.name,
    u.email,
    rp.post_count,
    RANK() OVER (ORDER BY rp.post_count DESC) as rank
  FROM users u
  JOIN recent_posts rp ON u.id = rp.author_id
)
SELECT * FROM ranked_authors WHERE rank <= 5;

Use RETURNING to Avoid Extra Queries

-- ❌ Bad — two queries to insert and then fetch the new record
INSERT INTO posts (title, content, author_id) VALUES ($1, $2, $3);
SELECT * FROM posts WHERE id = lastval();

-- ✅ Good — one query does both
INSERT INTO posts (title, content, author_id)
VALUES ($1, $2, $3)
RETURNING id, title, slug, created_at;

Batch Inserts

-- ❌ Bad — N separate INSERT queries in a loop
INSERT INTO post_tags (post_id, tag) VALUES (1, 'javascript');
INSERT INTO post_tags (post_id, tag) VALUES (1, 'react');
INSERT INTO post_tags (post_id, tag) VALUES (1, 'performance');

-- ✅ Good — one INSERT with multiple rows
INSERT INTO post_tags (post_id, tag)
VALUES (1, 'javascript'), (1, 'react'), (1, 'performance');

With Prisma, use createMany:

await prisma.postTag.createMany({
  data: tags.map(tag => ({ postId: post.id, tag })),
});

Connection Pooling with PgBouncer

Every PostgreSQL connection consumes ~5-10MB of RAM. Without connection pooling, a Node.js app with 50 concurrent users can exhaust your database’s connection limit and start throwing errors.

PgBouncer is a lightweight connection pooler that sits between your app and PostgreSQL:

Node.js App → PgBouncer (port 6432) → PostgreSQL (port 5432)
              [1000 client connections]   [20 DB connections]
sudo apt-get install pgbouncer
sudo nano /etc/pgbouncer/pgbouncer.ini
[databases]
; Forward all connections for "myapp" to PostgreSQL
myapp = host=127.0.0.1 port=5432 dbname=myapp

[pgbouncer]
listen_port = 6432
listen_addr = 127.0.0.1
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

; Transaction pooling — best for ORMs like Prisma
pool_mode = transaction

; Allow up to 1000 app connections
max_client_conn = 1000

; Maintain only 20 real PostgreSQL connections
default_pool_size = 20

; Log slow queries
log_stats = 1
stats_period = 60

Update your DATABASE_URL to point to PgBouncer:

# Before: Direct to PostgreSQL
DATABASE_URL="postgresql://user:pass@localhost:5432/myapp"

# After: Through PgBouncer
DATABASE_URL="postgresql://user:pass@localhost:6432/myapp"

Monitoring Query Performance

Install the pg_stat_statements extension to find your slowest queries:

-- Enable the extension (requires superuser)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find the 10 slowest queries
SELECT
  LEFT(query, 100) AS query_preview,
  calls,
  ROUND(mean_exec_time::numeric, 2) AS avg_ms,
  ROUND(total_exec_time::numeric, 2) AS total_ms,
  rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Find queries with the most sequential scans
SELECT
  schemaname,
  tablename,
  seq_scan,
  seq_tup_read,
  idx_scan,
  idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
  AND n_live_tup > 10000  -- Only large tables
ORDER BY seq_scan DESC;

Regular Maintenance

PostgreSQL needs occasional maintenance to stay fast:

-- VACUUM removes dead rows left by updates/deletes
-- ANALYZE updates statistics the query planner uses
VACUUM ANALYZE posts;

-- Full VACUUM — reclaims disk space (locks the table briefly)
VACUUM FULL posts;

-- View table bloat
SELECT
  tablename,
  pg_size_pretty(pg_total_relation_size(tablename::regclass)) AS total_size,
  pg_size_pretty(pg_relation_size(tablename::regclass)) AS table_size,
  pg_size_pretty(pg_total_relation_size(tablename::regclass) -
    pg_relation_size(tablename::regclass)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(tablename::regclass) DESC;

Enable autovacuum in postgresql.conf (usually on by default):

autovacuum = on
autovacuum_vacuum_scale_factor = 0.1   # Vacuum when 10% of rows are dead
autovacuum_analyze_scale_factor = 0.05 # Analyze when 5% of rows change

Conclusion

PostgreSQL performance is a deep topic, but even applying the basics — adding strategic indexes, using EXPLAIN ANALYZE to understand query plans, batching operations, and adding connection pooling with PgBouncer — can turn a sluggish application into a responsive one.

Start by identifying your slowest queries with pg_stat_statements, run EXPLAIN ANALYZE on each one, look for sequential scans on large tables, and add the appropriate index. Most performance problems are solved with just a few well-placed indexes.

Enjoyed this article?

Subscribe to get more insights like this delivered to your inbox.