Database Optimisation

Database performance is one of those areas where small improvements can have outsized impact on the overall user experience. A query that takes two seconds instead of fifty milliseconds does not just make one page slow. It ties up a database connection, increases server response times, and under load, creates a cascading effect that degrades performance across the entire application. At CodeVault Labs, database optimisation is a regular part of our development process, not a reaction to performance complaints. This article covers the strategies we use most frequently to keep databases fast and reliable.

Understanding Query Execution with EXPLAIN

Before optimising any query, you need to understand how the database is executing it. Every major relational database provides an EXPLAIN command that shows the query execution plan, the sequence of operations the database will perform to produce the result. In PostgreSQL, EXPLAIN ANALYZE runs the query and shows actual execution times alongside the plan, giving you concrete data about where time is being spent.

Learning to read execution plans is one of the most valuable skills a developer can acquire. Look for sequential scans on large tables, which indicate missing indexes. Watch for nested loop joins on large datasets, which might benefit from hash or merge joins. Identify sort operations that could be eliminated with an index that matches the sort order. Pay attention to the estimated versus actual row counts, as large discrepancies suggest the database's statistics are out of date and a manual ANALYZE might be needed.

We make it a habit to EXPLAIN queries during development, not just when performance problems arise. A query that performs well on a development database with a thousand rows might perform terribly on a production database with ten million rows. Understanding the execution plan early helps you design indexes and structure queries to scale gracefully from day one.

Indexing Strategies That Matter

Indexes are the most powerful tool for improving query performance, but they are not free. Each index consumes storage space and adds overhead to write operations, because the database must update every relevant index whenever a row is inserted, updated, or deleted. The goal is to create indexes that benefit your most important queries without creating unnecessary overhead for writes.

Start by indexing columns that appear in WHERE clauses, JOIN conditions, and ORDER BY clauses of your most frequent queries. Primary keys are indexed automatically, and foreign keys should almost always be indexed to support efficient joins. Beyond these basics, composite indexes that cover multiple columns can be extremely effective. An index on (status, created_at) allows the database to efficiently find all pending orders sorted by date, using a single index scan instead of filtering and sorting separately.

The order of columns in a composite index matters significantly. The database can use a composite index for queries that filter on the leftmost columns but not for queries that only filter on later columns. An index on (country, city, suburb) supports queries filtering by country alone, by country and city, or by all three. But it does not help a query that filters only by city or only by suburb. Design your composite indexes around the queries you actually run, not around the table structure.

Partial indexes, which index only rows matching a specific condition, can be remarkably effective for common query patterns. If ninety percent of your queries filter for active records, an index on (created_at) WHERE status = 'active' will be smaller, faster to scan, and cheaper to maintain than an index on the full table. PostgreSQL's support for partial indexes is excellent and we use them extensively in production systems.

Query Design and Anti-Patterns

How you write your queries affects performance as much as your indexing strategy. One of the most common anti-patterns is the N+1 query problem, where your application loads a list of items and then executes a separate query for each item to load related data. Loading twenty orders and then running twenty individual queries to fetch the customer for each order results in twenty-one database round trips when a single query with a JOIN would have accomplished the same result in one round trip.

SELECT * is another anti-pattern that persists despite being well-known. Fetching all columns when you only need three wastes bandwidth between the database and your application server, prevents the database from using covering indexes that contain only the needed columns, and can cause issues when columns are added to the table in the future. Always specify the columns you need.

Subqueries in WHERE clauses can sometimes be replaced with JOINs for better performance, though modern query planners are increasingly good at optimising subqueries automatically. Correlated subqueries, where the inner query references the outer query, are particularly expensive because they must be re-executed for each row of the outer query. If you find yourself writing a correlated subquery, consider whether a JOIN, a window function, or a CTE (Common Table Expression) could achieve the same result more efficiently.

Caching Strategies

The fastest database query is the one you never execute. Caching frequently accessed data in memory eliminates database load entirely for cached requests and can reduce response times from milliseconds to microseconds. Redis is our preferred caching solution for web applications due to its speed, data structure support, and built-in expiration mechanisms.

The challenge with caching is keeping cached data consistent with the database. The simplest approach is time-based expiration, where cached data is automatically invalidated after a set period. This works well for data that does not need to be perfectly current, such as product listings, blog posts, or configuration settings. For data that must be current, you need active invalidation, where your application removes or updates cached data whenever the underlying database record changes.

Cache-aside is the most common pattern, where the application checks the cache first and falls back to the database on a cache miss, then stores the result in the cache for future requests. Write-through caching updates the cache at the same time as the database, ensuring consistency but adding latency to write operations. Choose your caching strategy based on your consistency requirements and read-to-write ratio. Applications with high read volumes and tolerance for slightly stale data benefit enormously from aggressive caching with time-based expiration.

Schema Design for Performance

Good schema design is the foundation of database performance. Normalisation reduces data redundancy and ensures data integrity, but excessive normalisation can require complex joins that degrade query performance. We typically normalise to third normal form as a starting point, then selectively denormalise specific fields when query performance demands it. For example, storing a customer's name directly on an order record eliminates a join when displaying order lists, at the cost of needing to update multiple records if the customer's name changes.

Data types matter more than most developers realise. Using TEXT for a field that will always be a two-character country code wastes storage and slows comparisons. Using BIGINT for a field that will never exceed a few thousand values wastes eight bytes per row compared to SMALLINT. On a table with millions of rows, these differences add up to significant storage and performance impacts. Choose the smallest data type that accommodates your data with room for reasonable growth.

Table partitioning can dramatically improve performance on very large tables. By dividing a table into smaller partitions based on a key like date or region, the database can skip entire partitions when executing queries that filter on the partition key. A query for orders from the current month only needs to scan the current month's partition, ignoring years of historical data. PostgreSQL's declarative partitioning makes this relatively straightforward to implement.

Monitoring and Ongoing Maintenance

Database optimisation is not a one-time activity. Query patterns change as applications evolve, data volumes grow, and user behaviour shifts. We set up monitoring on every production database to track slow queries, connection pool utilisation, cache hit rates, and index usage. PostgreSQL's pg_stat_statements extension provides invaluable data about which queries consume the most total time, helping you focus optimisation efforts where they will have the greatest impact.

Regular maintenance tasks like updating statistics, rebuilding bloated indexes, and vacuuming dead tuples are essential for sustained performance. PostgreSQL's autovacuum handles much of this automatically, but high-traffic tables may need more aggressive vacuum settings or manual maintenance during low-traffic periods. Monitor your table and index sizes over time and investigate unexpected growth, which can indicate bloat or design issues that need attention.

Key Takeaways

Database performance depends on understanding your query patterns and designing your indexes, queries, and schema to support them efficiently. Use EXPLAIN to understand execution plans before optimising. Create targeted indexes that serve your most important queries without excessive write overhead. Avoid common anti-patterns like N+1 queries and SELECT *. Implement caching to eliminate unnecessary database load. Design your schema with performance in mind from the start. And monitor your database continuously, because performance is not something you achieve once and forget about. It requires ongoing attention as your application and data grow.

Is Your Database Holding You Back?

We audit and optimise database performance for web applications of all sizes. Let us find and fix the bottlenecks slowing down your application.

Get a Database Audit