Detailed explanation of the operation of postgresql implementation of partition processing of existing data tables

Background: A table created in the early stage of the business has 20 fields, no super long and large fields. As the system runs, the data volume reaches nearly 130 million rows of about 60GB. At present, the overall stability is currently available. Considering the continued growth of subsequent data, we plan to partition the table first. The main factor considering partitioning is that the data in this table is mainly inserted and querying the most recently inserted data. There will be a small number of full query operations for the table in the future.

1. Back up existing data

Create a backup table and back up all data tot_test_backInside and inside. Backup this way will only copy the table structure and table data, and will not include indexes and constraints. Moreover, this method does not involve locking operations. The overall execution is very fast, and the backup of 60GB of data is completed in about 5 minutes.

create table public.t_test_back as (select * from public.t_test);

2. Delete the original table

Before deleting the original table, remember to keep the table building statement, the original table index and constraints, the original table building statement is as follows:

CREATE TABLE public.t_test (
    id bigserial NOT NULL,
    column1 int8 NOT NULL,
    column2 varchar NOT NULL,
    column3 varchar NULL,
    create_time timestamptz NULL,
    CONSTRAINT t_test_pkey PRIMARY KEY (id)
);

CREATE INDEX column1_idx ON public.t_test USING btree (column1);
CREATE INDEX column2_idx ON public.t_test USING btree (column2);

Delete the original table and use it directlydropStatement:

drop table public.t_test;

When I executed it, there was no response for about ten minutes. Finally, I checked and checked.pg_stat_activityView, discoverpublic.t_testbesidesactivesql execution.

In the following way, all SQL in execution will be interrupted.

-- Find the query that is still executing in the current table
select pid, query, state from pg_stat_activity where state = 'state' and query like '%t_test%';
-- Cancel the query
select pg_cancel_backend(pid);
-- You can also force abort the session
select pg_terminate_backend(pid);

After canceling the executed query,dropThe operation is completed quickly, and the normal execution estimate can be completed in about 5 minutes.

3. Create a partition table

Create a partition table based on the original table table building statement, where the partition field needs to be part of the pk, I use the time fieldcreate_timedorangePartition:

CREATE TABLE public.t_test (
    id bigserial NOT NULL,
    column1 int8 NOT NULL,
    column2 varchar NOT NULL,
    column3 varchar NULL,
    create_time timestamptz NOT null DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT t_test_pkey PRIMARY KEY (id, create_time)
) partition by range(create_time);

The amount of data I have in a year is about 80 million. I build a table based on one partition in half a year, and finally there is a DEFAULT partition table to store data outside the partition:

create table public.t_test_2022_02 partition of public.t_test for values from ('2022-06-01 00:00:00') to ('2022-12-31 23:59:59');

create table public.t_test_2023_01 partition of public.t_test for values from ('2023-01-01 00:00:00') to ('2023-06-30 23:59:59');
create table public.t_test_2023_02 partition of public.t_test for values from ('2023-07-01 00:00:00') to ('2023-12-31 23:59:59');

create table public.t_test_2024_01 partition of public.t_test for values from ('2024-01-01 00:00:00') to ('2024-06-30 23:59:59');
create table public.t_test_2024_02 partition of public.t_test for values from ('2024-07-01 00:00:00') to ('2024-12-31 23:59:59');

create table public.t_test_default partition of public.t_test DEFAULT;

I'll use the original index directly on the main table of the partition table:

CREATE INDEX column1_idx ON public.t_test USING btree (column1);
CREATE INDEX column2_idx ON public.t_test USING btree (column2);

Since I'm usingpostgresqlofserialType doidThe self-increasing sequence of the field, so the latest value of the sequence needs to be updated to be larger than the maximum id of the previous table. Prevent primary key conflicts.

alter sequence public.t_test_id_seq restart 340000000;

4. Data recovery

Since the data in my table mainly uses incremental data, I can restore the service after inserting the largest id data into the database table. Finally, use sql to insert the remaining data into the new table:

insert into public.t_test (select * from public.t_test_back);

This is the introduction to this article about the detailed explanation of the operation of postgresql implementation of existing data table partition processing. For more related postgresql data table partition processing content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!