SoFunction
Updated on 2025-04-14

Analysis and solution of PostgreSQL table inflation problem

1. Definition

Table inflation refers to the continuous increase in the space of the file system occupied by the table's data and indexes without major changes in the amount of effective data. This phenomenon will cause relationship files to be filled with a large number of holes, thus wasting a lot of disk space.

2. Reason

Table bloating in PostgreSQL is usually caused by a mixed storage of UNDO data (a consistent view used to roll back transactions and maintain transactions) and table data. The specific reasons include the following aspects:

1. MVCC (multi-version concurrency control) mechanism:

  • Old version data retention: PostgreSQL uses the MVCC mechanism to handle concurrent access, allowing read operations to be performed without locking tables. When a record is updated or deleted, the original record is not immediately removed from disk, but is marked as invisible to support uncommitted transaction rollback or for snapshot reading. If these old versions of data cannot be cleaned up in time, they will occupy disk space and cause table bloat.
  • dead tuple: Over time, a large number of "dead" rows (i.e. rows that are no longer reachable) will accumulate in the table. If these dead tuples are not cleaned up in time, they will occupy disk space.

2. Frequent update and deletion operations:

  • Dead tuple accumulation: Frequent updates and deletion operations directly lead to a large number of "dead" rows in the table. Table bloat is particularly severe in environments with high update and deletion rates, because every time these operations occur, rows that are no longer reachable are left.

3. Uncommitted transactions:

  • Resource usage: Transactions that have not been submitted or terminated for a long time will occupy resources, resulting in the accumulation of "dead" rows, and thus the table inflation.

4. Fillfactor settings:

  • Free space: The table's fill factor setting will also affect table inflation. A lower fill factor means more free space is left in each data page to reduce the possibility that the page will split frequently due to updates. But this increases the free space for each page, causing the actual disk usage of the table to increase. Conversely, a higher fill factor may cause insufficient space when data rows are updated and pages need to be reassigned.

5. The autovacuum mechanism is insufficient:

  • Untimely cleaning: Although PostgreSQL provides an automatic autovacuum mechanism to clean "dead" rows regularly, in some cases, such as high concurrent transactions, long transactions, etc., autovacuum may not be able to clean dead tuples in time, resulting in table bloating.

6. Other factors:

  • Failed copy slot: Failed replication slot may cause autovacuum to not work properly.
  • Index status issues: Concurrent access to tables and indexes may affect the effect of VACUUM.
  • Disk I/O performance: Poor disk I/O performance may lead to inefficiency of VACUUM and dead tuples cannot be cleaned in time.

III. Influence

Table bloating has a significant impact on the performance and stability of the database, including the following aspects:

1. Increased storage cost:

Inflated tables take up more disk space and increase storage costs.

2. Query performance deteriorates:

  • Dataset enlargement: The database needs to be searched in a larger data set, resulting in extended query execution time.
  • Reduced index efficiency: Table expansion may lead to a decrease in index structure efficiency.

3. The backup recovery time is extended:

The corresponding backup and recovery time will also be extended after the table becomes larger.

4. System resource consumption increases:

  • CPU, memory, and I/O resources: Handling bloated tables requires more CPU, memory, and I/O resources.

5. Data fragmentation:

Table bloat can lead to data fragmentation, further affecting performance and increasing the complexity of database management.

4. Solution

Solving table inflation problems usually involves the following steps:

1. Perform VACUUM operations regularly:

  • Normal VACUUM: Clean up dead tuples, but space will not be reorganized, and space on disk will not be released, but subsequent insertions will prioritize the insertion of free space according to free space management.
  • VACUUM FULL: Cleaning and freeing disk space, but the lock level acquired is high, which will block all access. It is suitable for tables that frequently undergo large-scale updates to data and can be executed during business peak periods.
  • Manual VACUUM:UNDO data can be cleaned faster by adjusting the behavior of VACUUM (such as VACUUM(FULL, FREEZE).

2. Enable and configure the autovacuum mechanism:

  • Make sure autovacuum is on: PostgreSQL provides an automatic autovacuum mechanism that can automatically trigger vacuum operations based on thresholds.
  • Adjust autovacuum parameters: Such as autovacuum_vacuum_cost_delay and autovacuum_naptime to ensure that the autovacuum process can clean up "dead" lines in a timely manner.
  • Monitor autovacuum effects: Regularly check the execution status and effect of autovacuum to ensure its normal operation.

3. Use the pg_repack or pg_reorg tool:

  • Online reorganization: For severely bloated tables, you can use tools such as pg_repack or pg_reorg to reorganize the table and index to reclaim space. These tools can work without locking tables, with less impact on the production environment.
  • Execution process
    • Preparation phase: Reserve enough disk space and adjust database parameters (such as idle_in_transaction_session_timeout).
    • Execution phase: Create new tables, copy data, create indexes, exchange tables, etc.
    • Monitoring and logging: Monitor the reorganization process and record logs for problem investigation.

4. Reasonably design databases and queries:

  • Avoid frequent updates and deletion operations: Reduce the accumulation of "dead" behavior.
  • Using partition table: For large tables that are frequently updated, partitioning the table can be considered to reduce the size and bloating of a single table.
  • Set the fill factor reasonably: Set the fill factor reasonably according to the update frequency and data volume of the table to reduce the possibility of table expansion.

5. Monitoring and early warning:

  • Establish a monitoring system: Monitor the expansion of the table in real time and set a threshold alarm. Once the table expansion phenomenon is found, it can be quickly responded to and processed.
  • Regular analysis: Regularly analyze the expansion of the table and the reasons, and take corresponding optimization measures.

6. Other optimization measures:

  • Configure REDO logs: If possible, REDO logs can be configured to separate UNDO data and REDO logs to reduce the impact of table bloat.
  • Database maintenance best practices: Regular database maintenance activities such as index optimization, statistical information updates, etc. can also help manage UNDO data.

V. Implementation Instructions

1. Enable and configure the autovacuum mechanism:

Make sure autovacuum is on

ALTER SYSTEM SET autovacuum = on;
SELECT pg_reload_conf();

Adjust autovacuum parameters:

ALTER SYSTEM SET autovacuum_vacuum_cost_delay = 20ms;
ALTER SYSTEM SET autovacuum_naptime = 1min;
SELECT pg_reload_conf();

Monitor autovacuum effects

SELECT * FROM pg_stat_autovacuum;

2. Perform VACUUM operations regularly:

Manually execute VACUUM

VACUUM FULL tablename;

Set up timing tasks

0 2 * * * psql -d yourdatabase -c "VACUUM FULL tablename"

3. Use the pg_repack tool:

Install pg_repack extension

CREATE EXTENSION pg_repack;

Execute pg_repack

pg_repack -h your_host -p your_port -d your_database -t your_table

Monitor the reorganization process

SELECT * FROM pg_stat_activity WHERE query LIKE '%pg_repack%';

4. Reasonably design databases and queries:

Using partition table

CREATE TABLE your_table (
    id serial PRIMARY KEY,
    data text
) PARTITION BY RANGE (id);

CREATE TABLE your_table_partition1 PARTITION OF your_table
FOR VALUES FROM (1) TO (1000000);

CREATE TABLE your_table_partition2 PARTITION OF your_table
FOR VALUES FROM (1000001) TO (2000000);

Set the fill factor reasonably

ALTER TABLE your_table SET (fillfactor = 80);

5. Monitoring and early warning:

Establish a monitoring system

CREATE OR REPLACE FUNCTION check_table_bloat() RETURNS void AS $$
DECLARE
   r RECORD;
BEGIN
   FOR r IN
       SELECT schemaname, tablename, bloat
       FROM (
           SELECT
               schemaname,
               tablename,
               ROUND(CASE WHEN otta=0 OR relpages=0 OR relpages=otta THEN 0.0 ELSE relpages/otta::numeric END, 2) AS bloat
           FROM (
               SELECT
                    AS schemaname,
                    AS tablename,
                   COALESCE(, 0) AS reltuples,
                   COALESCE(, 0) AS relpages,
                   COALESCE(, 0) AS expected_reltuples,
                   CASE WHEN  > 0 THEN
                       (::bigint * )::bigint / (::bigint * (SELECT setting FROM pg_settings WHERE name='block_size')::int)
                   ELSE
                       0
                   END AS otta
               FROM
                   pg_class cc
                   JOIN pg_namespace nn ON  = 
                   LEFT JOIN (
                       SELECT
                           ,
                           ,
                           ( * ( + pg_column_size(, 'ctid') + 24))::bigint AS total_bytes
                       FROM
                           pg_class c
                           LEFT JOIN pg_stat_all_tables s ON  = 
                       WHERE
                            NOT IN ('pg_catalog', 'information_schema')
                           AND  = 'r'
                   ) ce ON  = 
               WHERE
                    NOT IN ('pg_catalog', 'information_schema')
                   AND  = 'r'
           ) a
       ) b
       WHERE bloat > 1.0
   LOOP
       RAISE NOTICE 'Schema: %, Table: %, Bloat: %', , , ;
   END LOOP;
END;
$$ LANGUAGE plpgsql;

The purpose of this function check_table_bloat is to check whether the tables in the PostgreSQL database have "bloat", that is, whether the disk space occupied by the table exceeds the space required for the actual amount of data stored. The function calculates the "bloat" of each table through a series of nested queries, and uses the RAISE NOTICE statement to output relevant information for tables with an expansion rate greater than 1.0.

illustrate

  1. External query: Iterate through all tables with calculated expansion rates greater than 1.0, and output their schemaname, tablename and bloat.

  2. Inner query: Calculate the expansion rate of each table. Multiple nested subqueries are used here:

    • The first subquery (aliaseda) Calculate the actual number of pages for each table (relpages) and ideal page count (otta) ratio, that is, the expansion rate. The ideal number of pages is based on the number of rows in the table (reltuples) and the size of each line (rellen) and block size (by querypg_settingsThe tableblock_sizeSet it to calculate it.
    • The second subquery (aliased asce) calculates the expected number of rows and total bytes for each table, and is used to subsequently calculate the ideal number of pages.
  3. Filter conditions: Exclude system mode (pg_catalogandinformation_schema) and unconventional tables (relkindIt does not equal 'r', that is, it is not a normal table).

  4. Function definition:useCREATE OR REPLACE FUNCTIONThe statement defines a name calledcheck_table_bloatThe function has no parameters and the return type isvoid, means no value is returned. Function body usePL/pgSQLLanguage writing.

  5. Loop and output:useFOR ... IN ... LOOPThe statement traversal query results and usesRAISE NOTICEThe statement outputs expansion information.

Please make appropriate adjustments and optimizations to the functions according to your actual database environment and needs. This function can be run regularly as part of database maintenance to check and handle table bloat problems.

Summarize

Table inflation is a common problem in PostgreSQL databases, mainly manifested in the increasing space of table data and indexes, while the actual data volume has not changed significantly. This is mainly caused by factors such as the MVCC mechanism, frequent updates and deletion, uncommitted transactions, filling factor settings and insufficient autovacuum mechanism. Inflated tables will lead to problems such as increased storage costs, decreased query performance, extended backup and recovery time, and increased system resource consumption. To solve these problems, you can perform VACUUM operations regularly, enable and configure the autovacuum mechanism, use pg_repack or pg_reorg tools for online reorganization, rationally design databases and queries, and establish a monitoring and early warning system. In particular, it can be created bycheck_table_bloatFunctions are used to check the expansion of the table regularly and take timely measures to ensure the performance and stability of the database.

The above is the detailed content of the analysis and solution of PostgreSQL table inflation problem. For more information about PostgreSQL table inflation problem, please pay attention to my other related articles!