SoFunction
Updated on 2025-04-07

VACUUM command usage description in PostgreSQL

Whenever rows in a table in a PostgreSQL database are updated or deleted, the dead row is left behind. VACUUM will remove them to enable the space to be reused. If a table is not emptied, it becomes bloated, wastes disk space and slows down sequential table scans, and also slows down index scans on a smaller scale.

The VACUUM command can only remove these no longer needed line versions (also known as tuples). If the transaction ID of the deleted transaction (stored in the xmax system column) is older than the oldest transaction (xmin bounds) that is still active in the PostgreSQL database (or the entire cluster of shared tables), then this tuple will no longer be needed.

Pay attention to the following three situations to suppress the xmin boundary in PostgreSQL cluster

1. Find long-running transactions

We can find long-running transactions and then use the pg_terminate_backend() function to terminate the database session that blocks the VACUUM command.

2. Find the copy slot

A replication slot is a data structure that saves PostgreSQL servers from discarding the information that the standby server still needs. If replication is delayed or the backup server is shut down, the replication slot prevents the VACUUM command from deleting old lines.

Replication slots provide an automated way to ensure that the master does not remove WAL blocks until they are received by all slaves. And the master server does not remove rows that may cause recovery conflicts even when the slave server is disconnected.

The replication slot only retains the known required number of WAL blocks rather than more than the required number.

Using a replication slot can avoid this problem: no protection is provided for any time period when the slave is not connected.

We can use the pg_drop_replication_slot() function to discard unwanted copy slots.

This situation only happens in physical replication when the hot_standby_feedback parameter is set to on. If it is logical copying, there will be a similar danger, but only the system directory will be affected.

3. Find prepared transactions

The second-stage submission protocol is an atomic confirmation protocol. It is a distributed algorithm used to coordinate all processes participating in distributed atomic transactions and determine whether to commit or terminate (rollback) the transaction.

During the second stage commit process, a distributed transaction first uses PREPARE TRANSACTION to prepare the current transaction for the second stage commit. If the PREPARE TRANSACTION command fails for any reason, it becomes ROLLBACK and the current transaction is cancelled.

Then we use COMMIT PREPARED to submit a transaction that was previously prepared for two-stage commits.

Once a transaction is ready, it will remain in a "wandering" state until it is committed or aborted. Normally, the transaction does not remain in the ready state for a long time, but sometimes errors occur so the transaction must be manually removed by the administrator.

We can also use ROLLBACK PREPARED to cancel a transaction that was previously prepared for two-stage commits.

Supplement: postgresql vacuum operation

In PostgreSQL database management, regular vacuum is an important task.

Effects of vacuumum

1.1 Free it and then use the disk space occupied by the updated/deleted lines.

1.2 Update statistics used in POSTGRESQL query plan

1.3 Prevent the loss of very old data due to resetting transaction IDs.

The first reason is that the insertion, update and deletion of PostgreSQL data are not really placed in the database space. If space is released from time to time, due to too much data, the query speed will drop greatly.

The second reason is that when PostgreSQL is doing query processing, in order to improve query speed, it will determine the execution plan based on statistical data. If it is not updated in time, the query effect may not be as expected.

The third reason is that every transaction in PostgreSQL will generate a transaction ID, but this number has an upper limit. When the transaction ID reaches the maximum value, it will start looping from the minimum value again. In this way, if the previous data is not released in time, the original old data will be lost due to the loss of the transaction ID.

Although there is automatic vacuum in the new version of Postgresql, if it is a large batch of data IO, it may cause slow automatic execution, and you need to cooperate with manual execution and your own script to clean the database.

1. vacuumdb is the encapsulation of SQL command VACUUM

So using vacuumdb and vacuum to clean the database is OK, the effect is the same.

Several important parameters in

You can use vacuumdb --help to query.

-a/--all vacuum all databases

-d dbname only vacuum dbname database

-f/--full Execute full vacuum

-t table only vacuum table this data table

-z/--analyze Calculate statistics for use by the optimizer

3. Switch to the postgres user

vacuumdb -d yourdbname -f -z -v to clean up your database.

Or add to conrtab 15 1 * * * postgres vacuumdb -d mydb -f -z -v >> /tmp/

Clean up every day at one minute.

4. How to check whether my XID is close to the critical value command:

select age(datfrozenxid) from pg_database;

or:

select max(age(datfrozenxid)) from pg_database;

5. However, what we care about is which large table group wants to be a real vacuum

SELECT relname, age(relfrozenxid) as xid_age, pg_size_pretty(pg_table_size(oid)) as table_size FROM pg_class WHERE relkind = 'r' and pg_table_size(oid) > 1073741824
ORDER BY age(relfrozenxid) DESC LIMIT 20;

This command is to query sort by the oldest XID, view tables larger than 1G and are in the top 20.

Here is an example:

relname | xid_age | table_size
------------------------+-----------+------------
postgres_log | 199785216 | 12 GB
statements | 4551790 | 1271 MB
normal_statement_times | 31 | 12 GB

Then you can vacuum each table separately:

vacuumdb --analyze --verbose --table 'postgres_log' mydb

The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.