PostgreSQL Data Partitioning

Krzysztof Słomka
5 min readJan 17, 2023

--

Photo by Caspar Camille Rubin on Unsplash

One of the features that make PostgreSQL stand out is its built-in support for data partitioning. Data partitioning is a technique used to divide a large table into smaller, more manageable pieces called partitions. This can improve the performance, scalability, and maintainability of your database. In this post, we will explore how to partition data in PostgreSQL with examples.

In PostgreSQL, table partitioning is achieved by using the built-in table inheritance feature, where a table is split into multiple smaller tables called partitions. Each partition can have its indexes, constraints, and storage parameters.

One of the most common uses of partitioning in PostgreSQL is range partitioning. This method divides a large table into smaller, more manageable pieces based on a range of values. The most frequent use case for this type of partitioning is to partition a table of time-based data, such as a table of sales data or a table of log data, by a date or timestamp column.

CREATE TABLE sales_2022_Q1 (LIKE sales);
CREATE TABLE sales_2022_Q2 (LIKE sales);
...
ALTER TABLE sales ADD PARTITION sales_2022_Q1 CHECK (date >= '2022-01-01' AND date < '2022-04-01');
ALTER TABLE sales ADD PARTITION sales_2022_Q2 CHECK (date >= '2022-04-01' AND date < '2022-07-01');
...

After partitioning, the original table will act as a “master” table, and queries will be automatically directed to the appropriate partition based on the condition defined in the partitioning rule.

After a quick overview of the most common usage now we can dive into more details. Let’s first understand the different types of partitioning available in PostgreSQL. There are three main types of partitioning: range partitioning, list partitioning, and hash partitioning.

Range partitioning

This type of partitioning is used to divide a table into partitions based on a range of values. For example, you can partition a table of sales data by date (see above), with each partition containing data for a specific month or quarter. For example, a large e-commerce company might have a table that contains all sales data for the last 10 years. To improve query performance and make it easier to maintain the data, they could partition the table by year, creating a separate partition for each year. This way, any queries that only need to access data for a specific year can be directed to the appropriate partition, rather than scanning the entire table. Additionally, it is also easy to drop or archive old partitions that are no longer needed.

-- create the sales table
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
date DATE,
product VARCHAR(255),
quantity INTEGER,
price DECIMAL(10,2)
);

-- create the partition tables
CREATE TABLE sales_2022_Q1 (LIKE sales);
CREATE TABLE sales_2022_Q2 (LIKE sales);
CREATE TABLE sales_2022_Q3 (LIKE sales);
CREATE TABLE sales_2022_Q4 (LIKE sales);

-- attach the partition tables to the sales table
ALTER TABLE sales ADD PARTITION sales_2022_Q1 CHECK (date >= '2022-01-01' AND date < '2022-04-01');
ALTER TABLE sales ADD PARTITION sales_2022_Q2 CHECK (date >= '2022-04-01' AND date < '2022-07-01');
ALTER TABLE sales ADD PARTITION sales_2022_Q3 CHECK (date >= '2022-07-01' AND date < '2022-10-01');
ALTER TABLE sales ADD PARTITION sales_2022_Q4 CHECK (date >= '2022-10-01' AND date < '2023-01-01');

List partitioning

List partitioning is useful when you have a table that contains categorical data, and you want to partition the table based on the values in a specific column. For example, you can partition a table of customers by country, with each partition containing data for a specific country.

-- create the customers table
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
country VARCHAR(255),
email VARCHAR(255)
);

-- create the partition tables
CREATE TABLE customers_us (LIKE customers);
CREATE TABLE customers_eu (LIKE customers);
CREATE TABLE customers_ru (LIKE customers);
...

-- attach the partition tables to the customers table
ALTER TABLE customers ADD PARTITION customers_us FOR VALUES IN ('US');
ALTER TABLE customers ADD PARTITION customers_eu FOR VALUES IN ('AT', 'BE', 'BG', 'HR', 'CY', 'CZ', 'DK', 'EE', 'FI', 'FR', 'DE', 'GR', 'HU', 'IE', 'IT', 'LV', 'LT', 'LU', 'MT', 'NL', 'PL', 'PT', 'RO', 'SK', 'SI', 'ES', 'SE');
ALTER TABLE customers ADD PARTITION customers_ru FOR VALUES IN ('RU');
...

With this setup, any INSERT or SELECT statements that specify a country value of ‘US’, from EU countries, or ‘RU’ will be directed to the appropriate partition table.

Hash partitioning

This type of partitioning is used to divide a table into partitions based on a hash value. This type of partitioning is useful when you want to evenly distribute data across partitions, but don’t have a specific value to partition on.

-- Create the main table
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
order_number INTEGER,
customer_id INTEGER,
total_amount DECIMAL(10,2)
);

-- Create the partition tables
CREATE TABLE orders_p1 (LIKE orders);
CREATE TABLE orders_p2 (LIKE orders);
CREATE TABLE orders_p3 (LIKE orders);

-- Attach the partition tables to the main table
ALTER TABLE orders ADD PARTITION BY HASH (order_number)
(PARTITION orders_p1, PARTITION orders_p2, PARTITION orders_p3);

Disadvantages

There are several potential disadvantages to using partitioning in PostgreSQL:

  • Complexity
    Partitioning can make your database more complex, as it involves creating and managing multiple tables and partitions. This can make it more difficult to understand and maintain the database.
  • Overhead
    Partitioning can add overhead to your database, as it requires more resources to manage the multiple partitions. This can lead to slower performance, especially when inserting or updating data.
  • Query complexity
    Queries that span multiple partitions can be more complex, as they may require joining data from multiple partitions. This can lead to slower query performance.
  • Data consistency
    Partitioning can make it more difficult to ensure data consistency, as data is spread across multiple partitions. This can make it harder to maintain constraints and ensure data integrity.
  • Data migration
    If you need to change the partitioning strategy or move data between partitions, it can be a complex and time-consuming process.
  • Lack of flexibility
    Partitioning strategies are usually based on a specific column or set of columns, and it can be difficult to change or adapt them if your data or requirements change.

It’s worth noting that partitioning is not always the best solution for every use case and it’s important to analyze the trade-offs and decide if it’s the right approach for your specific scenario. Additionally, it’s important to carefully plan and test the partitioning strategy before implementing it in production.

In this post, I explained the different types of partitioning available in PostgreSQL and provided examples of how to implement them. I also showed the benefits of using partitioning in Postgres, such as improving performance and making it easier to manage large data sets.

What to remember?

PostgreSQL supports three main types of partitioning: range partitioning, list partitioning, and hash partitioning. Each type of partitioning is useful in different scenarios.

Range partitioning is used to divide a table into partitions based on a range of values. For example, you can partition a table of sales data by date, with each partition containing data for a specific month or quarter.

List partitioning is used to divide a table into partitions based on a list of values. For example, you can partition a table of customers by country, with each partition containing data for a specific country.

Hash partitioning is used to divide a table into partitions based on a hash value. This type of partitioning is useful when you want to evenly distribute data across partitions, but don’t have a specific value to partition on.

--

--

Krzysztof Słomka
Krzysztof Słomka

Written by Krzysztof Słomka

My name is Krzysztof, I'm a software architect and developer, with experience of leading teams and delivering large scalable projects for over 13 years...

No responses yet