Author Profile:
The earliest postgresql user in China has been using postgresql in 2001. Since the end of 2003 to 2014, he has served as the forum section of the PGSQL China Community Forum PostgreSQL, participated in Postgresql discussions and published more than 7,000 special articles. With 15 years of experience in erp design, development and implementation, the open source mrp system PostMRP is my work. This application software is a professional manufacturing management software system based on Postgresql. Currently, he works at China First Logistics Holdings Co., Ltd./Yunlibao (Beijing) Technology Co., Ltd., and is the company's R&D manager.
1. Test environment
Operating system: CentOS 6.4
Postgresql version number: 10.0
CPU: Intel(R) Xeon(R) CPU E5-2407 v2 @ 2.40GHz 4 cores 4 threads
Memory: 32G
Hard disk: 2T SAS 7200
2. Compile and install PostgreSQL 10
-Compilation, installation and initialization
[root@ad source]# git clone git:///git/ [root@ad source]# cd postgresql [root@ad source]# ./configure --prefix=/usr/local/pgsql10 [root@ad postgresql]# gmake -j 4 [root@ad postgresql]# gmake install [root@ad postgresql]# su postgres [postgres@ad postgresql]# /usr/local/pgsql10/bin/initdb --no-locale -E utf8 -D /home/postgres/data10/ -U postgres
- Modify some parameters
listen_addresses = '*' port = 10000 shared_buffers = 8096MB maintenance_work_mem = 512MB effective_cache_size = 30GB log_destination = 'csvlog' logging_collector = on log_directory = 'log' log_filename = 'postgresql-%Y-%m-%d_%H%M%' log_file_mode = 0600 log_checkpoints = off log_connections = off log_disconnections = off log_duration = off log_line_prefix = '%m %h %a %u %d %x [%p] ' log_statement = 'none' log_timezone = 'PRC' track_activity_query_size = 4096 max_wal_size = 32GB min_wal_size = 2GB checkpoint_completion_target = 0.5
pg_hba.conf adds license entry
host all all 192.168.1.0/24 trust
-Start the service
[postgres@ad data10]$ /usr/local/pgsql10/bin/pg_ctl start -D /home/postgres/data10/ --Connect to the database [postgres@ad data10]$ /usr/local/pgsql10/bin/psql -p 10000 -U postgres -h 127.0.0.1 -d postgres psql (10devel) Type "help" for help. postgres=#
The partition table of PostgreSQL is the same as the previous version. It also needs to create the main table first and then create the subtable to use the inherited features, but there is no need to write the rules manually. This is quite good. Currently, range and list partitions are supported. I don’t know if other methods will be supported when the official version of 10 is released.
range partition table
1. Partition main table
create table order_range(id bigserial not null,userid integer,product text, createdate date) partition by range ( createdate );
The partition main table cannot establish global constraints. Use partition by range(xxx) to explain the partition method. xxx can be multiple fields, expressions... For details, see/docs/devel/static/
2. Partition subtable
create table order_range(id bigserial not null,userid integer,product text, createdate date not null) partition by range ( createdate ); create table order_range_201701 partition of order_range(id primary key,userid,product, createdate) for values from ('2017-01-01') to ('2017-02-01'); create table order_range_201702 partition of order_range(id primary key,userid,product, createdate) for values from ('2017-02-01') to ('2017-03-01');
- illustrate:
- The primary table must be specified when creating a partition table.
- The number of columns in the partition table and the primary table must be completely consistent.
- The columns of the partitioned table can be individually increased by Default value, or constraints.
- When the user inserts the database into the main table, the system automatically routes to the corresponding partition. If the corresponding partition is not found, an error is thrown.
- Specify the value of the partition constraint (range, LIST value), range, LIST cannot overlap, overlapping routes will get stuck.
- The columns of the specified partition must be set to not null. If the main table is not set when creating the main table, the system will automatically add them.
- Range partition range is >=minimum and <maximum...
- It is not supported to move data from one area to another through update methods, and an error will be reported if you do so. If you want to do this, you need to delete the original record and then INSERT a new record.
- When modifying the field name of the main table, all partitions will be automatically modified at the same time.
- TRUNCATE When the primary table is primary, records for all inherited table partitions are cleared. If you want to clear a single partition, operate on the partition.
- When DROP main table, all subtables will be DROP together. If you drop a single partition, please operate on the partition.
- Use psql to view the detailed definition of partition tables.
postgres=# \d+ order_range Table "public.order_range" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+---------+-----------+----------+-----------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('order_range_id_seq'::regclass) | plain | | userid | integer | | | | plain | | product | text | | | | extended | | createdate | date | | not null | | plain | | Partition key: RANGE (createdate) Partitions: order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-02-01'), order_range_201702 FOR VALUES FROM ('2017-02-01') TO ('2017-03-01') postgres=#
list partition table
1. Partition main table
create table order_list(id bigserial not null,userid integer,product text,area text, createdate date) partition by list( area );
2. Partition subtable
create table order_list_gd partition of order_list(id primary key,userid,product,area,createdate) for values in ('Guangdong'); create table order_list_bj partition of order_list(id primary key,userid,product,area,createdate) for values in ('Beijing');
Multi-level partition table
Partition by region first, then by date
1. Main table
create table order_range_list(id bigserial not null,userid integer,product text,area text, createdate date) partition by list ( area );
2. First-level partition table
create table order_range_list_gd partition of order_range_list for values in ('Guangdong') partition by range(createdate); create table order_range_list_bj partition of order_range_list for values in ('Beijing') partition by range(createdate);
3. Secondary partition table
create table order_range_list_gd_201701 partition of order_range_list_gd(id primary key,userid,product,area,createdate) for values from ('2017-01-01') to ('2017-02-01'); create table order_range_list_gd_201702 partition of order_range_list_gd(id primary key,userid,product,area,createdate) for values from ('2017-02-01') to ('2017-03-01'); create table order_range_list_bj_201701 partition of order_range_list_bj(id primary key,userid,product,area,createdate) for values from ('2017-01-01') to ('2017-02-01'); create table order_range_list_bj_201702 partition of order_range_list_bj(id primary key,userid,product,area,createdate) for values from ('2017-02-01') to ('2017-03-01');
Direct operation of partitions is also subject to partition rules
postgres=# insert into order_range_201702 (id,userid,product,createdate) values(1,
(random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01'));
ERROR: new row for relation "order_range_201702" violates partition constraint
DETAIL: Failing row contains (1, 322345, 51a9357a78416d11a018949a42dd2f8d, 2017-01-01).
INSERT prompts partition constraints violated
postgres=# update order_range_201701 set createdate='2017-02-01' where createdate='2017-01-17';
ERROR: new row for relation "order_range_201701" violates partition constraint
DETAIL: Failing row contains (1, 163357, 7e8fbe7b632a54ba1ec401d969f3259a, 2017-02-01).
UPDATE prompts partition constraints violated
If the partition table is an external table, the constraints are invalid, as will be introduced later
Use ALTER TABLE xxx ATTACH[DETACH] PARTITION Add or delete partitions
1. Remove partition
Enter 2 test data
postgres=# insert into order_range (userid,product,createdate) values((random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01'::date+ (random()*31)::integer)); INSERT 0 1 Time: 25.006 ms postgres=# insert into order_range (userid,product,createdate) values((random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01'::date+ (random()*31)::integer)); INSERT 0 1 Time: 7.601 ms postgres=# select * from order_range; id | userid | product | createdate ----+--------+----------------------------------+------------ 1 | 163357 | 7e8fbe7b632a54ba1ec401d969f3259a | 2017-01-17 2 | 349759 | 8095c9036295d3c800dace9069f9c102 | 2017-01-27 (2 rows)
Delete the partition
postgres=# alter table order_range detach partition order_range_201701; ALTER TABLE Time: 14.129 ms
Check the partition is gone
postgres=# \d+ order_range; Table "public.order_range" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+---------+-----------+----------+-----------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('order_range_id_seq'::regclass) | plain | | userid | integer | | | | plain | | product | text | | | | extended | | createdate | date | | not null | | plain | | Partition key: RANGE (createdate) Partitions: order_range_201702 FOR VALUES FROM ('2017-02-01') TO ('2017-03-01') postgres=#
The data can't be found
postgres=# select * from order_range; id | userid | product | createdate ----+--------+---------+------------ (0 rows) Time: 0.505 ms
But the partition table is still there
postgres=# select * from order_range_201701; id | userid | product | createdate ----+--------+----------------------------------+------------ 1 | 163357 | 7e8fbe7b632a54ba1ec401d969f3259a | 2017-01-17 2 | 349759 | 8095c9036295d3c800dace9069f9c102 | 2017-01-27 (2 rows) Time: 0.727 ms
2. Add partition
postgres=# alter table order_range attach partition order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-02-01'); ERROR: column "createdate" in child table must be marked NOT NULL Time: 0.564 ms
Add the subtable, the constraints need to be consistent with the main table
postgres=# alter table order_range_201701 alter column createdate set not null; ALTER TABLE Time: 17.345 ms postgres=# alter table order_range attach partition order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-01-15'); ERROR: partition constraint is violated by some row Time: 1.276 ms
When added, it can modify its constraint range, but the data must be within the scope of the constraint's rule.
postgres=# alter table order_range attach partition order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-02-01'); ALTER TABLE Time: 18.407 ms
The partition table is added back
postgres=# \d+ order_range Table "public.order_range" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+---------+-----------+----------+-----------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('order_range_id_seq'::regclass) | plain | | userid | integer | | | | plain | | product | text | | | | extended | | createdate | date | | not null | | plain | | Partition key: RANGE (createdate) Partitions: order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-02-01'), order_range_201702 FOR VALUES FROM ('2017-02-01') TO ('2017-03-01') postgres=# select * from order_range; id | userid | product | createdate ----+--------+----------------------------------+------------ 1 | 163357 | 7e8fbe7b632a54ba1ec401d969f3259a | 2017-01-17 2 | 349759 | 8095c9036295d3c800dace9069f9c102 | 2017-01-27 (2 rows) Time: 0.627 ms
Add external tables as partition tables
-Add a new library to create the required external tables
[postgres@ad root]$ /usr/local/pgsql10/bin/psql -p 10000 -U postgres -h 127.0.0.1 -d postgres psql (10devel) Type "help" for help. #Create a databasepostgres=# create database postgres_fdw; CREATE DATABASE postgres_fdw=# create table order_range_fdw(id bigserial not null,userid integer,product text, createdate date not null); CREATE TABLE postgres_fdw=# #Enter a test data postgres_fdw=# insert into order_range_fdw (userid,product,createdate) values((random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01'::date- (random()*31)::integer)); INSERT 0 1 postgres_fdw=# select * from order_range_fdw; id | userid | product | createdate ----+--------+----------------------------------+------------ 2 | 300686 | 55956a07742d6aebdef7ebb78c2400d7 | 2016-12-22 (1 row)
- Add external table order_range_fdw in the postgres library
[postgres@ad root]$ /usr/local/pgsql10/bin/psql -p 10000 -U postgres -h 127.0.0.1 -d postgres psql (10devel) Type "help" for help. #Add postgres_fdw modulepostgres=# create extension postgres_fdw; CREATE EXTENSION #Create an external serverpostgres=# CREATE SERVER foreign_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '192.168.1.10', port '10000', dbname 'postgres_fdw'); CREATE SERVER #Create external server user IDpostgres=# CREATE USER MAPPING FOR postgres postgres-# SERVER foreign_server postgres-# OPTIONS (user 'postgres', password ''); CREATE USER MAPPING #Create external tablepostgres=# CREATE FOREIGN TABLE order_range_fdw ( postgres(# id bigint not null, postgres(# userid integer, postgres(# product text, postgres(# createdate date not null postgres(# ) postgres-# SERVER foreign_server postgres-# OPTIONS (schema_name 'public', table_name 'order_range_fdw'); CREATE FOREIGN TABLE #Query datapostgres=# select * from order_range_fdw; id | userid | product | createdate ----+--------+----------------------------------+------------ 2 | 300686 | 55956a07742d6aebdef7ebb78c2400d7 | 2016-12-22 (1 row) --Add external tables to the partition tableorder_rangeDown #Add partition tablepostgres=# alter table order_range attach partition order_range_fdw FOR VALUES FROM ('1900-01-01') TO ('2017-01-01'); ALTER TABLE #View all partition tables under order_rangepostgres=# \d+ order_range Table "public.order_range" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+---------+-----------+----------+-----------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('order_range_id_seq'::regclass) | plain | | userid | integer | | | | plain | | product | text | | | | extended | | createdate | date | | not null | | plain | | Partition key: RANGE (createdate) Partitions: order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-02-01'), order_range_201702 FOR VALUES FROM ('2017-02-01') TO ('2017-03-01'), order_range_fdw FOR VALUES FROM ('1900-01-01') TO ('2017-01-01') #Query datapostgres=# select * from order_range where createdate<'2017-01-01'; id | userid | product | createdate ----+--------+----------------------------------+------------ 2 | 300686 | 55956a07742d6aebdef7ebb78c2400d7 | 2016-12-22 (1 row) #View execution planpostgres=# explain select * from order_range where createdate<'2017-01-01'; QUERY PLAN -------------------------------------------------------------------------------- Append (cost=100.00..131.79 rows=379 width=48) -> Foreign Scan on order_range_fdw (cost=100.00..131.79 rows=379 width=48) (2 rows) #Test to see if the data can be updatedpostgres=# insert into order_range (userid,product,createdate) values((random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01'::date- (random()*31)::integer)); ERROR: cannot route inserted tuples to a foreign table postgres=# update order_range set createdate='2016-12-01' where createdate='2016-12-22'; UPDATE 1 postgres=# select * from order_range where createdate<'2017-01-01'; id | userid | product | createdate ----+--------+----------------------------------+------------ 2 | 300686 | 55956a07742d6aebdef7ebb78c2400d7 | 2016-12-01 (1 row) postgres=# delete from order_range where createdate='2016-12-01'; DELETE 1 postgres=# select * from order_range where createdate<'2017-01-01'; id | userid | product | createdate ----+--------+---------+------------ (0 rows) postgres=#
When inserting data, it cannot be routed to external tables. What are the considerations? ? ? , the source code is just a prompt /* We do not yet have a way to insert into a foreign partition */
There is no way to do this yet, guess the subsequent version should be able to implement it
Let's talk about some problems with using external tables as partition tables
1. Unable to constrain data inserted outside the constraint into the partition table, as shown below
postgres=# \d+ order_range Table "public.order_range" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description ------------+---------+-----------+----------+-----------------------------------------+----------+--------------+------------- id | bigint | | not null | nextval('order_range_id_seq'::regclass) | plain | | userid | integer | | | | plain | | product | text | | | | extended | | createdate | date | | not null | | plain | | Partition key: RANGE (createdate) Partitions: order_range_201701 FOR VALUES FROM ('2017-01-01') TO ('2017-02-01'), order_range_201702 FOR VALUES FROM ('2017-02-01') TO ('2017-03-01'), order_range_fdw FOR VALUES FROM ('1900-01-01') TO ('2017-01-01') postgres=# postgres=# insert into order_range_fdw (id,userid,product,createdate) values(1, (random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01')); INSERT 0 1 postgres=# select * from order_range; id | userid | product | createdate ----+--------+----------------------------------+------------ 1 | 163357 | 7e8fbe7b632a54ba1ec401d969f3259a | 2017-01-17 2 | 349759 | 8095c9036295d3c800dace9069f9c102 | 2017-01-27 1 | 621895 | 5546c6e2a7006b52b5c2df55e19b3759 | 2017-02-01 4 | 313019 | 445316004208e09fb4e7eda2bf5b0865 | 2017-01-01 1 | 505836 | 6e9232c4863c82a2e97b9157996572ea | 2017-01-01 (5 rows) postgres=# select * from order_range where createdate ='2017-01-01'; id | userid | product | createdate ----+--------+---------+------------ (0 rows)
If this operation will cause mismatch in the data query.
2. Cannot push down when executing SQL
If the execution of Sql cannot be pushed down, there are great performance problems with the execution of aggregation functions. You must pay special attention to it when using it, as shown below
postgres=# delete from order_range_fdw; DELETE 1 postgres=# insert into order_range_fdw (id,userid,product,createdate) values(1, (random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2016-01-01')); INSERT 0 1 postgres=# insert into order_range_fdw (id,userid,product,createdate) values(1, (random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2016-02-01')); INSERT 0 1 #Access order_range, the basic execution ispostgres=# explain analyze select count(1) from order_range where createdate<'2017-01-01'; QUERY PLAN ------------------------------------------------------------------------------------------ Aggregate (cost=178.27..178.28 rows=1 width=8) (actual time=0.656..0.656 rows=1 loops=1) -> Append (cost=100.00..175.42 rows=1138 width=0) (actual time=0.647..0.649 rows=2 loops=1) -> Foreign Scan on order_range_fdw (cost=100.00..175.42 rows=1138 width=0) (actual time=0.647..0.648 rows=2 loops=1) Planning time: 0.267 ms Execution time: 1.122 ms (5 rows) #Direct access to external tablespostgres=# explain analyze select count(1) from order_range_fdw where createdate<'2017-01-01'; QUERY PLAN ------------------------------------------------------------------------------------------- Foreign Scan (cost=102.84..155.54 rows=1 width=8) (actual time=0.661..0.662 rows=1 loops=1) Relations: Aggregate on (public.order_range_fdw) Planning time: 0.154 ms Execution time: 1.051 ms (4 rows)
3. When the partition table that needs to be accessed by SQL query contains "external area table" and "non-external area table", Parallel Seq Scan cannot be used, as shown below
#Insert 100W data into the partition tablepostgres=# insert into order_range (userid,product,createdate) SELECT (random()::numeric(7,6)*1000000)::integer,md5(random()::text),('2017-01-01'::date+ (random()*58)::integer) from generate_series(1,1000000); INSERT 0 1000000 #Access all partition tablespostgres=# explain select count(1) from order_range; QUERY PLAN --------------------------------------------------------------------------------------- Aggregate (cost=24325.22..24325.23 rows=1 width=8) -> Append (cost=0.00..21558.23 rows=1106797 width=0) -> Seq Scan on order_range_201701 (cost=0.00..11231.82 rows=580582 width=0) -> Seq Scan on order_range_201702 (cost=0.00..10114.02 rows=522802 width=0) -> Foreign Scan on order_range_fdw (cost=100.00..212.39 rows=3413 width=0) (5 rows) #Access only "non-external area table"postgres=# explain select count(1) from order_range where createdate>='2017-01-01'; QUERY PLAN ------------------------------------------------------------------------------------- Finalize Aggregate (cost=17169.84..17169.85 rows=1 width=8) -> Gather (cost=17169.62..17169.83 rows=2 width=8) Workers Planned: 2 -> Partial Aggregate (cost=16169.62..16169.63 rows=1 width=8) -> Append (cost=0.00..15803.52 rows=146440 width=0) -> Parallel Seq Scan on order_range_201701 (cost=0.00..8449.86 rows=80636 width=0) Filter: (createdate >= '2017-01-01'::date) -> Parallel Seq Scan on order_range_201702 (cost=0.00..7353.66 rows=65804 width=0) Filter: (createdate >= '2017-01-01'::date) (9 rows) postgres=#
Application scenarios of external area tables
Leave all the cold numbers on the business library that are no longer modified to another node, and then hang them as an external area table. This can keep the capacity of the business library as light as possible, and will not invade the business, which is very friendly. But be aware of the problem that Sql execution cannot be pushed down, and the problem of not being able to use Parallel Seq Scan.
If the fdw partition insert routing problem and the push-down problem of SQL statement execution in later versions can be solved, then it can be used as an olap application.
4. Establish a test business table
The following is a user income and expenditure flow table
-Not partition table
create table t_pay_all (id serial not null primary key,userid integer not null,pay_money float8 not null,createdate date not null); create index t_pay_all_userid_idx on t_pay_all using btree(userid); create index t_pay_all_createdate_idx on t_pay_all using btree(createdate);
-Partition table
Generate 12 partitions, one table per month
create table t_pay (id serial not null,userid integer not null,pay_money float8 not null,createdate date not null) partition by range (createdate); create table t_pay_201701 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-01-01') to ('2017-02-01'); create index t_pay_201701_createdate_idx on t_pay_201701 using btree(createdate); create index t_pay_201701_userid_idx on t_pay_201701 using btree(userid); create table t_pay_201702 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-02-01') to ('2017-03-01'); create index t_pay_201702_createdate_idx on t_pay_201702 using btree(createdate); create index t_pay_201702_userid_idx on t_pay_201702 using btree(userid); create table t_pay_201703 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-03-01') to ('2017-04-01'); create index t_pay_201703_createdate_idx on t_pay_201703 using btree(createdate); create index t_pay_201703_userid_idx on t_pay_201703 using btree(userid); create table t_pay_201704 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-04-01') to ('2017-05-01'); create index t_pay_201704_createdate_idx on t_pay_201704 using btree(createdate); create index t_pay_201704_userid_idx on t_pay_201704 using btree(userid); create table t_pay_201705 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-05-01') to ('2017-06-01'); create index t_pay_201705_createdate_idx on t_pay_201705 using btree(createdate); create index t_pay_201705_userid_idx on t_pay_201705 using btree(userid); create table t_pay_201706 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-06-01') to ('2017-07-01'); create index t_pay_201706_createdate_idx on t_pay_201706 using btree(createdate); create index t_pay_201706_userid_idx on t_pay_201706 using btree(userid); create table t_pay_201707 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-07-01') to ('2017-08-01'); create index t_pay_201707_createdate_idx on t_pay_201707 using btree(createdate); create index t_pay_201707_userid_idx on t_pay_201707 using btree(userid); create table t_pay_201708 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-08-01') to ('2017-09-01'); create index t_pay_201708_createdate_idx on t_pay_201708 using btree(createdate); create index t_pay_201708_userid_idx on t_pay_201708 using btree(userid); create table t_pay_201709 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-09-01') to ('2017-10-01'); create index t_pay_201709_createdate_idx on t_pay_201709 using btree(createdate); create index t_pay_201709_userid_idx on t_pay_201709 using btree(userid); create table t_pay_201710 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-10-01') to ('2017-11-01'); create index t_pay_201710_createdate_idx on t_pay_201710 using btree(createdate); create index t_pay_201710_userid_idx on t_pay_201710 using btree(userid); create table t_pay_201711 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-11-01') to ('2017-12-01'); create index t_pay_201711_createdate_idx on t_pay_201711 using btree(createdate); create index t_pay_201711_userid_idx on t_pay_201711 using btree(userid); create table t_pay_201712 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2017-12-01') to ('2018-01-01'); create index t_pay_201712_createdate_idx on t_pay_201712 using btree(createdate); create index t_pay_201712_userid_idx on t_pay_201712 using btree(userid);
V. Performance Testing
Data import
- Generate 1000W records of test data (distribute as evenly as possible)
postgres=# copy (select (random()::numeric(7,6)*1000000)::integer as userid,round((random()*100)::numeric,2) as pay_money,('2017-01-01'::date+ (random()*364)::integer) as createtime from generate_series(1,10000000)) to '/home/pg/'; COPY 10000000 Time: 42674.548 ms (00:42.675)
-Non-partitioned table data import test
postgres=# copy t_pay_all(userid,pay_money,createdate) from '/home/pg/'; COPY 10000000 Time: 114258.743 ms (01:54.259)
-Partition table data import test
postgres=# copy t_pay(userid,pay_money,createdate) from '/home/pg/'; COPY 10000000 Time: 186358.447 ms (03:06.358) postgres=#
Conclusion: The performance difference is about half when importing data, so it is best to directly import partition table data when importing large data volumes, and then directly operate on the partition table.
Query the data for a certain day - directly retrieve data from cache
-Not partition table
postgres=# explain (analyze,buffers) select * from t_pay_all where createdate ='2017-06-01'; QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on t_pay_all (cost=592.06..50797.88 rows=27307 width=20) (actual time=14.544..49.039 rows=27384 loops=1) Recheck Cond: (createdate = '2017-06-01'::date) Heap Blocks: exact=22197 Buffers: shared hit=22289 -> Bitmap Index Scan on t_pay_all_createdate_idx (cost=0.00..585.24 rows=27307 width=0) (actual time=7.121..7.121 rows=27384 loops=1) Index Cond: (createdate = '2017-06-01'::date) Buffers: shared hit=92 Planning time: 0.153 ms Execution time: 51.583 ms (9 rows) Time: 52.272 ms
-Partition table
postgres=# explain (analyze,buffers) select * from t_pay where createdate ='2017-06-01'; QUERY PLAN ---------------------------------------------------------------------------------------------- Append (cost=608.92..6212.11 rows=27935 width=20) (actual time=4.880..27.032 rows=27384 loops=1) Buffers: shared hit=5323 -> Bitmap Heap Scan on t_pay_201706 (cost=608.92..6212.11 rows=27935 width=20) (actual time=4.879..21.990 rows=27384 loops=1) Recheck Cond: (createdate = '2017-06-01'::date) Heap Blocks: exact=5226 Buffers: shared hit=5323 -> Bitmap Index Scan on t_pay_201706_createdate_idx (cost=0.00..601.94 rows=27935 width=0) (actual time=3.399..3.399 rows=27384 loops=1) Index Cond: (createdate = '2017-06-01'::date) Buffers: shared hit=97 Planning time: 0.521 ms Execution time: 30.061 ms (11 rows)
Conclusion: The Planning time time of the partitioned table is obviously higher than that of the non-partitioned table, but it can be basically ignored compared to the Execution time.
Query data in a certain time range
1. The time range falls in the same partition
-Not partition table
postgres=# explain (analyze,buffers)select * from t_pay_all where createdate >='2017-06-01' AND createdate<'2017-07-01'; QUERY PLAN ------------------------------------------------------------------------------------------ Bitmap Heap Scan on t_pay_all (cost=19802.01..95862.00 rows=819666 width=20) (actual time=115.210..459.547 rows=824865 loops=1) Recheck Cond: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Heap Blocks: exact=63701 Buffers: shared read=66578 -> Bitmap Index Scan on t_pay_all_createdate_idx (cost=0.00..19597.10 rows=819666 width=0) (actual time=101.453..101.453 rows=825865 loops=1) Index Cond: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared read=2877 Planning time: 0.166 ms Execution time: 504.297 ms (9 rows) Time: 505.021 ms postgres=# explain (analyze,buffers)select count(1) from t_pay_all where createdate >='2017-06-01' AND createdate<'2017-07-01'; QUERY PLAN ---------------------------------------------------------------------------------------------- Finalize Aggregate (cost=90543.96..90543.97 rows=1 width=8) (actual time=335.334..335.335 rows=1 loops=1) Buffers: shared hit=351 read=66593 -> Gather (cost=90543.74..90543.95 rows=2 width=8) (actual time=334.988..335.327 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=351 read=66593 -> Partial Aggregate (cost=89543.74..89543.75 rows=1 width=8) (actual time=330.796..330.797 rows=1 loops=3) Buffers: shared read=66578 -> Parallel Bitmap Heap Scan on t_pay_all (cost=19802.01..88689.92 rows=341528 width=0) (actual time=124.126..303.125 rows=274955 loops=3) Recheck Cond: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Heap Blocks: exact=25882 Buffers: shared read=66578 -> Bitmap Index Scan on t_pay_all_createdate_idx (cost=0.00..19597.10 rows=819666 width=0) (actual time=111.233..111.233 rows=825865 loops=1) Index Cond: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared read=2877 Planning time: 0.213 ms Execution time: 344.013 ms (17 rows) Time: 344.759 ms postgres=#
-Partition table
postgres=# explain (analyze,buffers)select * from t_pay where createdate >='2017-06-01' AND createdate<'2017-07-01'; QUERY PLAN ------------------------------------------------------------------------------------------- Append (cost=0.00..17633.97 rows=824865 width=20) (actual time=0.020..272.926 rows=824865 loops=1) Buffers: shared hit=5261 -> Seq Scan on t_pay_201706 (cost=0.00..17633.97 rows=824865 width=20) (actual time=0.019..170.128 rows=824865 loops=1) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared hit=5261 Planning time: 0.779 ms Execution time: 335.351 ms (7 rows) Time: 336.676 ms postgres=# explain (analyze,buffers)select count(1) from t_pay where createdate >='2017-06-01' AND createdate<'2017-07-01'; QUERY PLAN -------------------------------------------------------------------------------------------- Finalize Aggregate (cost=12275.86..12275.87 rows=1 width=8) (actual time=144.023..144.023 rows=1 loops=1) Buffers: shared hit=5429 -> Gather (cost=12275.64..12275.85 rows=2 width=8) (actual time=143.966..144.016 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=5429 -> Partial Aggregate (cost=11275.64..11275.65 rows=1 width=8) (actual time=140.230..140.230 rows=1 loops=3) Buffers: shared hit=5261 -> Append (cost=0.00..10416.41 rows=343694 width=0) (actual time=0.022..106.973 rows=274955 loops=3) Buffers: shared hit=5261 -> Parallel Seq Scan on t_pay_201706 (cost=0.00..10416.41 rows=343694 width=0) (actual time=0.020..68.952 rows=274955 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared hit=5261 Planning time: 0.760 ms Execution time: 145.289 ms (15 rows) Time: 146.610 ms
Querying within the same partition is obvious
2. Not in the same partition
-Not partition table
postgres=# explain (analyze,buffers)select count(1) from t_pay_all where createdate >='2017-06-01' AND createdate<'2017-12-01'; QUERY PLAN ------------------------------------------------------------------------------------------- Finalize Aggregate (cost=132593.42..132593.43 rows=1 width=8) (actual time=717.848..717.848 rows=1 loops=1) Buffers: shared hit=33571 read=30446 dirtied=9508 written=4485 -> Gather (cost=132593.20..132593.41 rows=2 width=8) (actual time=717.782..717.841 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=33571 read=30446 dirtied=9508 written=4485 -> Partial Aggregate (cost=131593.20..131593.21 rows=1 width=8) (actual time=714.096..714.097 rows=1 loops=3) Buffers: shared hit=33319 read=30446 dirtied=9508 written=4485 -> Parallel Seq Scan on t_pay_all (cost=0.00..126330.64 rows=2105024 width=0) (actual time=0.059..545.016 rows=1675464 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Rows Removed by Filter: 1661203 Buffers: shared hit=33319 read=30446 dirtied=9508 written=4485 Planning time: 0.178 ms Execution time: 721.822 ms (14 rows) Time: 722.521 ms
-Partition table
postgres=# explain (analyze,buffers)select count(1) from t_pay where createdate >='2017-06-01' AND createdate<'2017-12-01'; QUERY PLAN ------------------------------------------------------------------------------------------ Finalize Aggregate (cost=69675.98..69675.99 rows=1 width=8) (actual time=714.560..714.560 rows=1 loops=1) Buffers: shared hit=27002 read=5251 -> Gather (cost=69675.77..69675.98 rows=2 width=8) (actual time=714.426..714.551 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=27002 read=5251 -> Partial Aggregate (cost=68675.77..68675.78 rows=1 width=8) (actual time=710.416..710.416 rows=1 loops=3) Buffers: shared hit=26774 read=5251 -> Append (cost=0.00..63439.94 rows=2094330 width=0) (actual time=0.023..536.033 rows=1675464 loops=3) Buffers: shared hit=26774 read=5251 -> Parallel Seq Scan on t_pay_201706 (cost=0.00..10416.41 rows=343694 width=0) (actual time=0.021..67.935 rows=274955 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Buffers: shared hit=5261 -> Parallel Seq Scan on t_pay_201707 (cost=0.00..10728.06 rows=354204 width=0) (actual time=0.007..54.999 rows=283363 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Buffers: shared hit=5415 -> Parallel Seq Scan on t_pay_201708 (cost=0.00..10744.08 rows=354738 width=0) (actual time=0.007..55.117 rows=283791 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Buffers: shared hit=5423 -> Parallel Seq Scan on t_pay_201709 (cost=0.00..10410.71 rows=343714 width=0) (actual time=0.007..53.402 rows=274971 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Buffers: shared hit=5255 -> Parallel Seq Scan on t_pay_201710 (cost=0.00..10737.41 rows=354494 width=0) (actual time=0.007..55.475 rows=283595 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Buffers: shared hit=5420 -> Parallel Seq Scan on t_pay_201711 (cost=0.00..10403.29 rows=343486 width=0) (actual time=0.036..57.635 rows=274789 loops=3) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-12-01'::date)) Buffers: shared read=5251 Planning time: 1.217 ms Execution time: 718.372 ms (30 rows)
Cross-partition query, which performs comparable performance when crossing half of the partitions.
Query a user data in a certain month - directly retrieve data from cache
1. All data falls in the partition, and the amount of data is very small.
-Not partition table
postgres=# explain (analyze,buffers) select * from t_pay_all where createdate>='2017-06-01' AND createdate<'2017-07-01' and userid=268460; QUERY PLAN -------------------------------------------------------------------------------------------- Index Scan using t_pay_all_userid_idx on t_pay_all (cost=0.43..48.68 rows=1 width=20) (actual time=0.053..0.071 rows=7 loops=1) Index Cond: (userid = 268460) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Rows Removed by Filter: 10 Buffers: shared hit=20 Planning time: 0.149 ms Execution time: 0.101 ms (7 rows) Time: 0.676 ms
-Partition table
postgres=# explain (analyze,buffers) select * from t_pay where createdate >='2017-06-01' AND createdate<'2017-07-01' and userid=268460; QUERY PLAN ------------------------------------------------------------------------------------------ Append (cost=0.42..12.47 rows=2 width=20) (actual time=0.019..0.032 rows=7 loops=1) Buffers: shared hit=10 -> Index Scan using t_pay_201706_userid_idx on t_pay_201706 (cost=0.42..12.47 rows=2 width=20) (actual time=0.018..0.029 rows=7 loops=1) Index Cond: (userid = 268460) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared hit=10 Planning time: 0.728 ms Execution time: 0.064 ms (8 rows) Time: 1.279 ms
In the case of very few returns, the non-partitioned table has advantages because the planning time overhead of the distribution table is large.
2. The data falls in other partitions and the amount of data is relatively large.
-Not partition table
postgres=# explain (analyze,buffers) select * from t_pay_all where createdate >='2017-06-01' AND createdate<'2017-07-01' and userid=302283 ; QUERY PLAN --------------------------------------------------------------------------------------------- Bitmap Heap Scan on t_pay_all (cost=19780.69..22301.97 rows=683 width=20) (actual time=91.778..91.803 rows=2 loops=1) Recheck Cond: ((userid = 302283) AND (createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Heap Blocks: exact=9 Buffers: shared hit=2927 -> BitmapAnd (cost=19780.69..19780.69 rows=683 width=0) (actual time=91.767..91.767 rows=0 loops=1) Buffers: shared hit=2918 -> Bitmap Index Scan on t_pay_all_userid_idx (cost=0.00..183.00 rows=8342 width=0) (actual time=0.916..0.916 rows=11013 loops=1) Index Cond: (userid = 302283) Buffers: shared hit=41 -> Bitmap Index Scan on t_pay_all_createdate_idx (cost=0.00..19597.10 rows=819666 width=0) (actual time=90.837..90.837 rows=825865 loops=1) Index Cond: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared hit=2877 Planning time: 0.172 ms Execution time: 91.851 ms (14 rows) Time: 92.534 ms
-Partition table
postgres=# explain (analyze,buffers) select * from t_pay where createdate >='2017-06-01' AND createdate<'2017-07-01' and userid=302283 ; QUERY PLAN ------------------------------------------------------------------------------------------- Append (cost=0.42..12.47 rows=2 width=20) (actual time=0.042..0.046 rows=2 loops=1) Buffers: shared hit=7 -> Index Scan using t_pay_201706_userid_idx on t_pay_201706 (cost=0.42..12.47 rows=2 width=20) (actual time=0.041..0.045 rows=2 loops=1) Index Cond: (userid = 302283) Filter: ((createdate >= '2017-06-01'::date) AND (createdate < '2017-07-01'::date)) Buffers: shared hit=7 Planning time: 0.818 ms Execution time: 0.096 ms (8 rows) Time: 1.499 ms
This is the biggest advantage of partition tables, and the performance improvement is not ordinary
Index maintenance
-Not partition table
postgres=# REINDEX INDEX t_pay_all_createdate_idx; REINDEX Time: 11827.344 ms (00:11.827)
-Partition table
postgres=# REINDEX INDEX t_pay_201706_createdate_idx; REINDEX Time: 930.439 ms postgres=#
This is also an advantage of partition tables, which can be rebuilt for the index of a certain partition.
Delete the entire partition data
-Not partition table
postgres=# delete from t_pay_all where createdate >='2017-06-01' and createdate<'2017-07-01'; DELETE 824865 Time: 5775.545 ms (00:05.776)
-Partition table
postgres=# truncate table t_pay_201706; TRUNCATE TABLE Time: 177.809 ms
It is also an advantage of partition tables, you can directly truncate a partition
Full table scan
-Not partition table
postgres=# explain analyze select count(1) from t_pay; QUERY PLAN --------------------------------------------------------------------------------------------- Finalize Aggregate (cost=107370.96..107370.97 rows=1 width=8) (actual time=971.561..971.561 rows=1 loops=1) -> Gather (cost=107370.75..107370.96 rows=2 width=8) (actual time=971.469..971.555 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=106370.75..106370.76 rows=1 width=8) (actual time=967.378..967.378 rows=1 loops=3) -> Append (cost=0.00..96800.40 rows=3828141 width=0) (actual time=0.019..698.882 rows=3061712 loops=3) -> Parallel Seq Scan on t_pay_201701 (cost=0.00..8836.14 rows=349414 width=0) (actual time=0.017..48.716 rows=279531 loops=3) -> Parallel Seq Scan on t_pay_201702 (cost=0.00..8119.94 rows=321094 width=0) (actual time=0.007..33.072 rows=256875 loops=3) -> Parallel Seq Scan on t_pay_201703 (cost=0.00..9079.47 rows=359047 width=0) (actual time=0.006..37.153 rows=287238 loops=3) -> Parallel Seq Scan on t_pay_201704 (cost=0.00..8672.67 rows=342968 width=0) (actual time=0.006..35.317 rows=274374 loops=3) -> Parallel Seq Scan on t_pay_201705 (cost=0.00..8975.23 rows=354923 width=0) (actual time=0.006..36.571 rows=283938 loops=3) -> Parallel Seq Scan on t_pay_201706 (cost=0.00..20.00 rows=1000 width=0) (actual time=0.000..0.000 rows=0 loops=3) -> Parallel Seq Scan on t_pay_201707 (cost=0.00..8957.04 rows=354204 width=0) (actual time=0.006..36.393 rows=283363 loops=3) -> Parallel Seq Scan on t_pay_201708 (cost=0.00..8970.38 rows=354738 width=0) (actual time=0.006..37.015 rows=283791 loops=3) -> Parallel Seq Scan on t_pay_201709 (cost=0.00..8692.14 rows=343714 width=0) (actual time=0.006..35.187 rows=274971 loops=3) -> Parallel Seq Scan on t_pay_201710 (cost=0.00..8964.94 rows=354494 width=0) (actual time=0.006..36.566 rows=283595 loops=3) -> Parallel Seq Scan on t_pay_201711 (cost=0.00..8685.86 rows=343486 width=0) (actual time=0.006..35.198 rows=274789 loops=3) -> Parallel Seq Scan on t_pay_201712 (cost=0.00..8826.59 rows=349059 width=0) (actual time=0.006..36.523 rows=279247 loops=3) Planning time: 0.706 ms Execution time: 977.364 ms (20 rows) Time: 978.705 ms postgres=#
-Partition table
postgres=# explain analyze select count(1) from t_pay_all; QUERY PLAN ------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=116900.63..116900.64 rows=1 width=8) (actual time=644.093..644.093 rows=1 loops=1) -> Gather (cost=116900.42..116900.63 rows=2 width=8) (actual time=644.035..644.087 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=115900.42..115900.43 rows=1 width=8) (actual time=640.587..640.587 rows=1 loops=3) -> Parallel Seq Scan on t_pay_all (cost=0.00..105473.33 rows=4170833 width=0) (actual time=0.344..371.965 rows=3061712 loops=3) Planning time: 0.164 ms Execution time: 645.438 ms (8 rows) Time: 646.027 ms
The partition table is behind during full scan, but it can basically be received.
Add new partitions and import data
- Generate new partition data
copy (select userid,pay_money,createdate+31 as createdate from t_pay_201712) to '/home/pg/';
- Create a new partition
create table t_pay_201801 partition of t_pay(id primary key,userid,pay_money,createdate) for values from ('2018-01-01') to ('2018-02-01'); create index t_pay_201801_createdate_idx on t_pay_201801 using btree(createdate); create index t_pay_201801_userid_idx on t_pay_201801 using btree(userid);
-Not partition table
postgres=# copy t_pay_all(userid,pay_money,createdate) from '/home/pg/'; COPY 837741 Time: 18105.024 ms (00:18.105)
-Partition table
postgres=# copy t_pay(userid,pay_money,createdate) from '/home/pg/'; COPY 837741 Time: 13864.950 ms (00:13.865) postgres=#
New partition data import maintains advantages
This is the article about the detailed explanation of PostgreSQL 10 partition table and performance test report. For more related PostgreSQL 10 partition table performance test content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!