Background: A table created in the early stage of the business has 20 fields, no super long and large fields. As the system runs, the data volume reaches nearly 130 million rows of about 60GB. At present, the overall stability is currently available. Considering the continued growth of subsequent data, we plan to partition the table first. The main factor considering partitioning is that the data in this table is mainly inserted and querying the most recently inserted data. There will be a small number of full query operations for the table in the future.
1. Back up existing data
Create a backup table and back up all data tot_test_back
Inside and inside. Backup this way will only copy the table structure and table data, and will not include indexes and constraints. Moreover, this method does not involve locking operations. The overall execution is very fast, and the backup of 60GB of data is completed in about 5 minutes.
create table public.t_test_back as (select * from public.t_test);
2. Delete the original table
Before deleting the original table, remember to keep the table building statement, the original table index and constraints, the original table building statement is as follows:
CREATE TABLE public.t_test ( id bigserial NOT NULL, column1 int8 NOT NULL, column2 varchar NOT NULL, column3 varchar NULL, create_time timestamptz NULL, CONSTRAINT t_test_pkey PRIMARY KEY (id) ); CREATE INDEX column1_idx ON public.t_test USING btree (column1); CREATE INDEX column2_idx ON public.t_test USING btree (column2);
Delete the original table and use it directlydrop
Statement:
drop table public.t_test;
When I executed it, there was no response for about ten minutes. Finally, I checked and checked.pg_stat_activity
View, discoverpublic.t_test
besidesactive
sql execution.
In the following way, all SQL in execution will be interrupted.
-- Find the query that is still executing in the current table select pid, query, state from pg_stat_activity where state = 'state' and query like '%t_test%'; -- Cancel the query select pg_cancel_backend(pid); -- You can also force abort the session select pg_terminate_backend(pid);
After canceling the executed query,drop
The operation is completed quickly, and the normal execution estimate can be completed in about 5 minutes.
3. Create a partition table
Create a partition table based on the original table table building statement, where the partition field needs to be part of the pk, I use the time fieldcreate_time
dorange
Partition:
CREATE TABLE public.t_test ( id bigserial NOT NULL, column1 int8 NOT NULL, column2 varchar NOT NULL, column3 varchar NULL, create_time timestamptz NOT null DEFAULT CURRENT_TIMESTAMP, CONSTRAINT t_test_pkey PRIMARY KEY (id, create_time) ) partition by range(create_time);
The amount of data I have in a year is about 80 million. I build a table based on one partition in half a year, and finally there is a DEFAULT partition table to store data outside the partition:
create table public.t_test_2022_02 partition of public.t_test for values from ('2022-06-01 00:00:00') to ('2022-12-31 23:59:59'); create table public.t_test_2023_01 partition of public.t_test for values from ('2023-01-01 00:00:00') to ('2023-06-30 23:59:59'); create table public.t_test_2023_02 partition of public.t_test for values from ('2023-07-01 00:00:00') to ('2023-12-31 23:59:59'); create table public.t_test_2024_01 partition of public.t_test for values from ('2024-01-01 00:00:00') to ('2024-06-30 23:59:59'); create table public.t_test_2024_02 partition of public.t_test for values from ('2024-07-01 00:00:00') to ('2024-12-31 23:59:59'); create table public.t_test_default partition of public.t_test DEFAULT;
I'll use the original index directly on the main table of the partition table:
CREATE INDEX column1_idx ON public.t_test USING btree (column1); CREATE INDEX column2_idx ON public.t_test USING btree (column2);
Since I'm usingpostgresql
ofserial
Type doid
The self-increasing sequence of the field, so the latest value of the sequence needs to be updated to be larger than the maximum id of the previous table. Prevent primary key conflicts.
alter sequence public.t_test_id_seq restart 340000000;
4. Data recovery
Since the data in my table mainly uses incremental data, I can restore the service after inserting the largest id data into the database table. Finally, use sql to insert the remaining data into the new table:
insert into public.t_test (select * from public.t_test_back);
This is the introduction to this article about the detailed explanation of the operation of postgresql implementation of existing data table partition processing. For more related postgresql data table partition processing content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!