Detailed explanation of n methods for PostgreSQL to improve batch data import performance

Keywords: batch data import, data loading, large-scale insertion, speed up, increase speed
In the era of diversified choice, many things in life are like this, and there is no unchanging way or method for everything. No matter white cats or black cats, the one who can catch mice is a good cat, and the one that suits you is the best.
The same is true for improving batch data import. No method is the best. Before applying any method, weigh the pros and cons and make choices based on your actual situation.
Before batch importing data, no matter what method is taken, be sure to make corresponding backups.
After the import is completed, the corresponding object needs to be ANALYZE operation, so that the query optimizer will generate a correct execution plan according to the latest statistical information.

The following formally introduces n methods to improve the performance of batch data import.

Method 1: Disable automatic submission.

psql
\set AUTOCOMMIT off

other
BEGIN;
Perform batch data import
COMMIT;

Method 2: Set the table to UNLOGGED.

Before importing data, change the table to UNLOGGED mode, and after importing, change it back to LOGGED mode.

ALTER TABLE tablename SET UNLOGGED;
Perform batch data import
ALTER TABLE tablename LOGGED;

advantage:
The import information does not record WAL logs, greatly reducing io and improving import speed.
shortcoming:
1. In the replication environment, the table cannot be set to UNLOGGED mode.
2. Once a power outage occurs during the import process, the database will not be able to clean the database, and the data of all UNLOGGED tables in the database will be lost.

Method 3: Rebuild the index.

Before importing data, delete the index on the relevant table and recreate it after the import is completed.

DROP INDEX indexname;
Perform batch data import
CREATE INDEX ...;

Methods for querying index definitions on tables

select * from pg_indexes where tablename ='tablename' and schemaname = 'schemaname';

Method 4: Rebuild foreign keys.

Before importing data, delete the foreign keys on the relevant table and recreate them after the import is completed.

ALTER TABLE ...
 DROP CONSTRAINT ... ;
Perform batch data import
ALTER TABLE ...
 ADD CONSTRAINT ... 
 FOREIGN KEY ...
 REFERENCES ...;

For related information, please query pg_constraint.

Method 5: Deactivate the trigger

Before importing data, first DISABLE remove the trigger on the related table, and then re-ENABLE after the import is completed.

ALTER TABLE tablename DISABLE TRIGGER ALL; 
Perform batch data import
ALTER TABLE tablename ENABLE TRIGGER ALL;

For related information, please query pg_trigger.

Method 6: Insert change copy

COPY is optimized for batch data loading.

COPY ... FROM 'xxx';

Method 7: Change single value insert to multi-value insert

Reduce the time for SQL parsing.

Method 8: Insert to PREPARE

Reduce parsing consumption by using PREPARE preparatory statements.

PREPARE fooplan (int, text, bool, numeric) AS
 INSERT INTO foo VALUES($1, $2, $3, $4);
EXECUTE fooplan(1, 'Hunter Valley', 't', 200.00);

Method 9: Modify parameters

Increase maintenance_work_mem, increase max_wal_size.

Method 10: Turn off archive mode and lower the wal log level.

Modify the archive_mode parameter to control the archive to turn on and off. Reduce the wal_level value to minimal to reduce log information recording.
This method requires restarting the database and planning downtime. In addition, if there is a replication library, it needs to be considered.

This is the end of this article about n methods to improve the performance of batch data import. For more related PostgreSQL batch data import content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!