Solving Speed Issues After Migrating PostgreSQL to RDS

Database performance can take a hit when switching to Amazon RDS. Here’s how we fixed the slow queries.

Network Ninja software engineers recently migrated a PostgreSQL (“Postgres”) database that was hosted on a web server to an Amazon RDS service. The unexpected result was slower “large” queries on RDS. Load testing showed significant performance issues when multiple users were doing query-heavy actions at the same time.

This problem was solved with two PostgreSQL optimizations:

  • Increasing work_mem, which decreased the disk swapping and reduced our minimum query times by ~20%, and
  • Increasing max_parallel_workers, which reduced our maximum query times by ~50%.

These two steps reduced the average execution time of two large query sets by ~50%. With these optimizations, we concluded that switching to RDS would achieve ~30% performance improvement over our local implementation, while also improving the execution time consistency on complex queries.

We’re Hiring!

Our 100% remote (work from home) company is looking for talented people to join our diverse team. Check out the current job openings.

Steps to Fix Slower Large Queries on RDS

Note: this is what worked for us. Your mileage may vary.

Step 1: Set Up a Basic Load Testing Environment

Tech Infrastructure

To run our optimization tests, we used an environment with:

  • Postgres 12.7
  • Localhost: EC2 Ubuntu m4.10xlarge 160GiB RAM 40 CPUs
  • RDS: db.r6g.8xlarge 256GiB RAM, 32 vCPUs

Create a .psqlrc file in your home directory and add “\timing on” to the file - this will cause all queries to display query times.

echo "\timing on" > ~/.psqlrc

Create a .pgpass file in your home directory and add an entry for each database in the format HOST:PORT:DB_NAME:USER:PASSWORD - this password file provides Postgres your database host & authentication details.

echo "your_host:your_port:your_db_name:your_user:your_password" > ~/.pgpass

Set the permissions on your .pgpass file to disallow any access to world or group - using less strict permissions will cause the .pgpass file to be ignored.

chmod 600 ~/.pgpass

Create a file for each of your test queries containing a single query (e.g. query1.sql, query2.sql, etc.).

echo "your 1st large sql query goes here" > ~/query1.sql

Create the test file, where each sql file you created above will be run. Use the \i command in the test file to execute each file (if you’re running more than one file, separate the files by a newline). Note: the same sql file can be specified multiple times if you want to run it multiple times.

echo "\i ~/query1.sql" > ~/mytestfile.txt

Execute the test file and indicate how many queries you want to execute in parallel - this example will run 20 parallel queries via the “-P 20” argument.

xargs -d "\n" -n 1 -P 20 psql -U {DB_USER} -h {DB_HOST} -d {DB_NAME} -a -c < ~/mytestfile.txt

The output of the command above will provide a baseline of execution times to improve upon with optimization.

Step 2: Optimize PostgreSQL to Reduce Disk Swapping

Less Swapping, More Speed

Memory speeds are faster than disk speeds. From the PostgreSQL wiki:

If you do a lot of complex sorts, and have a lot of memory, then increasing the work_mem parameter allows PostgreSQL to do larger in-memory sorts which, unsurprisingly, will be faster than disk-based equivalents.

First, Locate your PostgreSQL configuration file (postgresql.conf) and enable the log_temp_files parameter - setting this value to 0 (zero) will ensure that a log entry is made for each temporary file created (including sorts, which we are particularly interested in here).

postgresql.conf

log_temp_files = 0

Restart PostgreSQL. Now you can see which queries are swapping to disk. Log output should be similar to:

2021-11-01 14:09:02.640 CDT [24737] LOG: temporary file: path “base/pgsql_tmp/pgsql_tmp24737.0”, size 111263744

…where the size is in bytes and the number at the end of the temp file name is the index when multiple temp files are required for a single query.

Next, run the EXPLAIN ANALYZE command on your test queries to show you if a SORT function is using the disk. Review the output of this query, and if you see “Sort Method: external merge Disk: [Number]kB”, then your query is indeed using swap disk for sorting.

psql "EXPLAIN ANALYZE the rest of your query goes here"
-------------------------------------------------------------------------- 
… 
Sort Method: external merge  Disk: [Number]kB 
… 

To prevent disk swapping for sorts (slower), and use memory instead (faster), increase the work_mem parameter in postgresql.conf - we increased ours from 20MB to 40MB and saw immediate speed improvements.

postgresql.conf

# work_mem = 20MB
work_mem = 40MB

Once you’ve updated the parameter, restart PostgreSQL.

Next, run your EXPLAIN ANALYZE queries again to confirm they’re using system memory for sorts.

psql "EXPLAIN ANALYZE the rest of your query goes here"
--------------------------------------------------------------------------------
…
Sort Method: quicksort   Memory: [Number]kB
…

This method reduced our minimum query times by ~20%.

Step 3: Optimize PostgreSQL to Increase Parallel Workers

The final step is to override max_parallel_workers in your postgresql.conf from “engine-default” to match the local database.

postgresql.conf

# max_parallel_workers = engine-default
max_parallel_workers = (integer)

Substitute the value from your local database you’re migrating from for (integer) above.

This method reduced our maximum query times by ~50%.

Conclusion & Final Outcomes

We solved the slow large queries issue after migrating PostgreSQL from a web server to Amazon RDS by eliminating disk swap usage and increasing parallel workers. Check out the final results below.

Query Set 1 (5 simultaneous operations)

  • Local: Min 13s, Max 17s, Avg 15s
  • Original RDS: Min 12s, Max 25s, Avg 18s
  • Final RDS: Min 10s, Max 10s, Avg 10s

33% improvement vs. Local, 44% improvement vs. Original RDS

Query Set 2 (5 simultaneous operations):

  • Local: Min 12s, Max 13s, Avg 12s
  • Original RDS: Min 10s, Max 20s, Avg 15s
  • Final RDS: Min 7s, Max 8s, Avg 7s

42% improvement vs. Local, 53% improvement vs. Original RDS

Related Resources

Date

Reading Time

6 minutes

Category

Network Ninja

Are you a developer? We’re hiring! Join our team of thoughtful, talented people.