PG Casts

New in Postgres 9.6: Parallel Queries

Episode #21 hosted by Jake Worth

In this episode, we'll be looking at a new feature in Postgres 9.6, parallel queries.

Postgres Version 9.6

Transcript

Parallel queries provide parallel execution of sequential scans, joins, and aggregates.

To see this feature in action, let's set up a test database. And to make the performance gains obvious, we'll need a lot of data. I made a financial ledger with fifty million random dollar amounts and dates going back ten years; the setup is included at the bottom of the transcript.

Next, we'll write a read-only query that uses a sequential scan. The aggregate functions are great candidates for this. We'll use sum to add up the amounts in our ledger, and tack on explain analyze for benchmarking and explanation.

explain analyze select sum(amount) from ledger;

Reading the output, we can see that Postgres has chosen to run this query sequentially. Great, just what we need.

Parallel queries are not enabled by default. To turn them on, we need to increase a config param called max_parallel_workers_per_gather.

show max_parallel_workers_per_gather;

Let's raise it to four, which happens to be the number of cores on this workstation.

set max_parallel_workers_per_gather to 4;

Explaining the query again, we can see that Postgres is now choosing a parallel query. And it's about four times faster.

explain analyze sum(amount) from ledger;

To sum up, for sequential scans on Postgres 9.6, turn this feature on and enjoy the speed boost.

As I mentioned before, only sequential scans can be parallelized. Let's create a scenario where this feature is unavailable, by adding an index to the table. The indexed column changes things, as we shall see: 

create index on ledger(date);

With the indexed column in the query, we don't get to use parallelism: 

explain analyze sum(amount) from ledger where date = (current_date - 1);

To enable this feature permanently, add the parameter to your Postgres config file.

That's all for today's episode; thanks for watching.

Setup

create table ledger (
  id serial primary key,
  date date not null,
  amount decimal(12,2) not null
);

insert into ledger (date, amount)
select current_date - (random() * 3650)::integer,
(random() * 1000000)::decimal(12,2) - 50000
from generate_series(1,50000000);

Resources