background
A while ago, a customer reported that his RDS PostgreSQL could not be written, and the error message was as follows:
postgres=# select * from test;
id
----
(0 rows)postgres=# insert into test select 1;
ERROR: database is not accepting commands to avoid wraparound data loss in database "xxxx"
HINT: Stop the postmaster and vacuum that database in single-user mode.
You might also need to commit or roll back old prepared transactions.
After the RDS engineer intervened and dealt with it, the problem was immediately resolved.
Basic principles of XID
XID definition
XID (Transaction ID) is the transaction number inside PostgreSQL. Each transaction will be assigned an XID, incremented in sequence. Each tuple header in PostgreSQL data will store the XID of the tuple (Transaction ID) of the tuple (Transaction ID), and then the kernel constructs the database consistent reading through this XID. When the transaction isolation level is repeatable read, assuming that if there are two transactions, xid1=200 and xid2=201, then only tuples of t_xmin <= 200 are seen in xid1, and tuples of t_xmin > 200 are not seen.
typedef uint32 TransactionId; /* Transaction number definition, 32-bit unsigned integer */ typedef struct HeapTupleFields { TransactionId t_xmin; /* Transaction number that inserts the tuple */ TransactionId t_xmax; /* Delete or lock the transaction number of the tuple */ /*** Other attributes omitted ***/ } HeapTupleFields; struct HeapTupleHeaderData { union { HeapTupleFields t_heap; DatumTupleFields t_datum; } t_choice; /*** Other attributes omitted ***/ };
XID Issuing Mechanism
From the above structure we can see that XID is a 32-bit unsigned integer, that is, the range of XID is 0 to 2^32-1; then what should I do if the transaction exceeds 2^32-1? In fact, XID is a ring, and after exceeding 2^32-1, it will be allocated from scratch. The above conclusion is also proved through the source code:
// Invalid transaction number#define InvalidTransactionId ((TransactionId) 0) // Boot transaction number, used in the database initialization process (BKI execution)#define BootstrapTransactionId ((TransactionId) 1) // Freeze transaction numbers are used to represent very old tuples, which are earlier than all normal transaction numbers (that is, visible)#define FrozenTransactionId ((TransactionId) 2) // The first normal transaction number#define FirstNormalTransactionId ((TransactionId) 3) // Generate xid with the lower 32 bits of FullTransactionId as an unsigned integer#define XidFromFullTransactionId(x) ((uint32) (x).value) static inline void FullTransactionIdAdvance(FullTransactionId *dest) { dest->value++; while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId) dest->value++; } FullTransactionId GetNewTransactionId(bool isSubXact) { /*** Omit ***/ full_xid = ShmemVariableCache->nextFullXid; xid = XidFromFullTransactionId(full_xid); /*** Omit ***/ FullTransactionIdAdvance(&ShmemVariableCache->nextFullXid); /*** Omitted *** return full_xid; } static void AssignTransactionId(TransactionState s) { /*** Omit ***/ s->fullTransactionId = GetNewTransactionId(isSubXact); if (!isSubXact) XactTopFullTransactionId = s->fullTransactionId; /*** Omit ***/ } TransactionId GetTopTransactionId(void) { if (!FullTransactionIdIsValid(XactTopFullTransactionId)) AssignTransactionId(&TopTransactionStateData); return XidFromFullTransactionId(XactTopFullTransactionId); }
As you can see, the new transaction number is saved in the shared variable cache: ShmemVariableCache->nextFullXid. After each transaction number is issued, adjust its value upwards and skip the above three special values. The three special components are 0, 1 and 2 respectively. The functions can be found in the code comments above.
XID rollback mechanism
As mentioned earlier, XID is a ring, which is allocated to 2^32-1 and then started from 3. So how does the kernel compare the sizes of two transactions? For example, xid has gone through such a process 3-> 2^32-1 -> 5, so how does the kernel know that 5 This transaction is behind 2^32-1? Let's look at the code again:
/* * TransactionIdPrecedes --- is id1 logically < id2? */ bool TransactionIdPrecedes(TransactionId id1, TransactionId id2) { /* * If either ID is a permanent XID then we can just do unsigned * comparison. If both are normal, do a modulo-2^32 comparison. */ int32 diff; if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2)) return (id1 < id2); diff = (int32) (id1 - id2); return (diff < 0); }
As you can see, the kernel uses a tricky method: (int32) (id1 - id2) < 0, the value range of 32-bit signed integers is -2^31 to 231-1, and the value obtained by 5-(232-1) is larger than 2^31-1, so converting to int32 will turn into a negative number. However, there is a problem here. The "Latest Transaction Number-Oldest Transaction Number" must be less than 2^31. Once it is greater, a rewind will occur, resulting in the data generated by the old transaction being invisible to the new transaction.
XID rollback prevention
As mentioned earlier, "Latest Transaction Number - Oldest Transaction Number" must be less than 2^31, otherwise a rewind will occur, causing the data generated by the old transaction to be invisible to the new transaction. How does the kernel avoid this problem? The kernel handles it like this: by regularly updating the XID of the tuple generated by the old transaction to FrozenTransactionId, that is, to 2, to recycle the XID, and the tuple with XID of 2 is visible to all transactions. This process is called XID freeze. In this way, XID can be recycled to ensure |Latest transaction number - Oldest transaction number | < 2^31.
In addition to the kernel's automatic freezing and recycling XID, we can also manually perform xid freezing and recycling through commands or sql.
- Query the age of the database or table. The database age refers to: "Latest transaction number - the oldest transaction number in the database", and the table age refers to: "Latest transaction number - the oldest transaction number in the table"
# Check the age of each librarySELECT datname, age(datfrozenxid) FROM pg_database; #1 library age sorting for each tableSELECT ::regclass as table_name, greatest(age(),age()) as age FROM pg_class c LEFT JOIN pg_class t ON = WHERE IN ('r', 'm') order by age desc; # Check the age of 1 tableselect oid::regclass,age(relfrozenxid) from pg_class where oid='schema name.table name'::regclass::oid;
Manually freeze sql to recycle tuples of a table:
vacuum freeze Table name;
Manually freeze the command to recycle all tables in a library xid:
vacuumdb -d Library name --freeze --jobs=30 -h Connection string -p Port number -U LibraryOwner
The freeze recycling process is an IO-repeated operation. The kernel will describe all pages of the table and then update the t_xmin field of the tuple that meets the requirements to 2. Therefore, this process needs to be carried out at the business peak to avoid affecting the business.
There are three kernel parameters related to freezing and recycling: vacuum_freeze_min_age, vacuum_freeze_table_age and autovacuum_freeze_max_age. Since the author does not have a deep understanding of these three parameters, he will not show off his skills here. Interested students can find information to learn about it by themselves.
Solution
Problem analysis
Based on the above principle analysis, we know that "Latest transaction number - Oldest transaction number" = 2^31-1000000, that is, when there is only one million available xid left, the kernel will prohibit instance writing and report an error: database is not accepting commands to avoid wraparound data loss in database. At this time, you must connect to "xxxx" in the prompt to freeze the table to recycle more XIDs.
void SetTransactionIdLimit(TransactionId oldest_datfrozenxid, Oid oldest_datoid) { TransactionId xidVacLimit; TransactionId xidWarnLimit; TransactionId xidStopLimit; TransactionId xidWrapLimit; TransactionId curXid; Assert(TransactionIdIsNormal(oldest_datfrozenxid)); /* * xidWrapLimit = the oldest transaction number + 0x7FFFFFFFF. Once the current transaction number reaches xidWrapLimit, a volume back will occur. */ xidWrapLimit = oldest_datfrozenxid + (MaxTransactionId >> 1); if (xidWrapLimit < FirstNormalTransactionId) xidWrapLimit += FirstNormalTransactionId; /* * Once the current transaction number reaches xidStopLimit, the instance will not be written, and 1000000 xid is reserved for vacuum * Each vacuum table needs to occupy one xid */ xidStopLimit = xidWrapLimit - 1000000; if (xidStopLimit < FirstNormalTransactionId) xidStopLimit -= FirstNormalTransactionId; /* * Once the current transaction number reaches xidWarnLimit, it will be received continuously * WARNING: database "xxxx" must be vacuumed within 2740112 transactions */ xidWarnLimit = xidStopLimit - 10000000; if (xidWarnLimit < FirstNormalTransactionId) xidWarnLimit -= FirstNormalTransactionId; /* * Force autovacuums will be triggered once the current transaction number reaches xidVacLimit */ xidVacLimit = oldest_datfrozenxid + autovacuum_freeze_max_age; if (xidVacLimit < FirstNormalTransactionId) xidVacLimit += FirstNormalTransactionId; /* Grab lock for just long enough to set the new limit values */ LWLockAcquire(XidGenLock, LW_EXCLUSIVE); ShmemVariableCache->oldestXid = oldest_datfrozenxid; ShmemVariableCache->xidVacLimit = xidVacLimit; ShmemVariableCache->xidWarnLimit = xidWarnLimit; ShmemVariableCache->xidStopLimit = xidStopLimit; ShmemVariableCache->xidWrapLimit = xidWrapLimit; ShmemVariableCache->oldestXidDB = oldest_datoid; curXid = XidFromFullTransactionId(ShmemVariableCache->nextFullXid); LWLockRelease(XidGenLock); /* Log the info */ ereport(DEBUG1, (errmsg("transaction ID wrap limit is %u, limited by database with OID %u", xidWrapLimit, oldest_datoid))); /* * If the current transaction number >= the oldest transaction number + autovacuum_freeze_max_age * Trigger autovacuum to clean up the oldest database. If multiple databases meet the requirements, clean up in the oldest order. * Tag the current autovacuum by setting the flag bit and do it again after the autovacuum is finished. */ if (TransactionIdFollowsOrEquals(curXid, xidVacLimit) && IsUnderPostmaster && !InRecovery) SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER); /* Give an immediate warning if past the wrap warn point */ if (TransactionIdFollowsOrEquals(curXid, xidWarnLimit) && !InRecovery) { char *oldest_datname; if (IsTransactionState()) oldest_datname = get_database_name(oldest_datoid); else oldest_datname = NULL; if (oldest_datname) ereport(WARNING, (errmsg("database \"%s\" must be vacuumed within %u transactions", oldest_datname, xidWrapLimit - curXid), errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); else ereport(WARNING, (errmsg("database with OID %u must be vacuumed within %u transactions", oldest_datoid, xidWrapLimit - curXid), errhint("To avoid a database shutdown, execute a database-wide VACUUM in that database.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); } } bool TransactionIdFollowsOrEquals(TransactionId id1, TransactionId id2) { int32 diff; if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2)) return (id1 >= id2); diff = (int32) (id1 - id2); return (diff >= 0); } FullTransactionId GetNewTransactionId(bool isSubXact) { /*** Omit ***/ full_xid = ShmemVariableCache->nextFullXid; xid = XidFromFullTransactionId(full_xid); if (TransactionIdFollowsOrEquals(xid, ShmemVariableCache->xidVacLimit)) { TransactionId xidWarnLimit = ShmemVariableCache->xidWarnLimit; TransactionId xidStopLimit = ShmemVariableCache->xidStopLimit; TransactionId xidWrapLimit = ShmemVariableCache->xidWrapLimit; Oid oldest_datoid = ShmemVariableCache->oldestXidDB; /*** Omit ***/ if (IsUnderPostmaster && TransactionIdFollowsOrEquals(xid, xidStopLimit)) { char *oldest_datname = get_database_name(oldest_datoid); /* complain even if that DB has disappeared */ if (oldest_datname) ereport(ERROR, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("database is not accepting commands to avoid wraparound data loss in database \"%s\"", oldest_datname), errhint("Stop the postmaster and vacuum that database in single-user mode.\n" "You might also need to commit or roll back old prepared transactions, or drop stale replication slots."))); /*** Omit ***/ } /*** Omit ***/ } /*** Omit ***/ }
Problem positioning
# Check the age of each librarySELECT datname, age(datfrozenxid) FROM pg_database; #1 library age sorting for each tableSELECT ::regclass as table_name, greatest(age(),age()) as age FROM pg_class c LEFT JOIN pg_class t ON = WHERE IN ('r', 'm') order by age desc; # Check the age of 1 tableselect oid::regclass,age(relfrozenxid) from pg_class where oid='schema name.table name'::regclass::oid;
Problem solving
- Through the first sql above, find the oldest database. The database age refers to: |Latest transaction number - the oldest transaction number in the database |
- Through the second sql above, find the oldest table, and then execute the table in turn: vacuum freeze table name, freeze the old transaction number in the table to recycle the table. The table age refers to: | Latest transaction number - the oldest transaction number in the table|
- Operation and maintenance scripts
Single process shell script
# Vacuum freeze on the top 50 oldest tables in the specified database for cmd in `psql -Uusername -pPort number -hConnection string -dDatabase name -c "SELECT 'vacuum freeze '||::regclass||';' as vacuum_cmd FROM pg_class c LEFT JOIN pg_class t ON = WHERE IN ('r', 'm') order by greatest(age(),age()) desc offset 50 limit 50;" | grep -v vacuum_cmd | grep -v row | grep vacuum`; do psql -U username -p port number -h connection string -d database name -c "$cmd" done
Multi-process Python scripts
from multiprocessing import Pool import psycopg2 args = dict(host='', port=5432, dbname='Database Name', user='username', password='password') def vacuum_handler(sql): sql_str = "SELECT ::regclass as table_name, greatest(age(),age()) as age FROM pg_class c LEFT JOIN pg_class t ON = WHERE IN ('r', 'm') order by age desc limit 10; " try: conn = (**args) cur = () (sql) () cur = () (sql_str) print () () except Exception as e: print str(e) # Conduct vacuum freeze on the top 1000 oldest tables in the specified database, and 32 processes execute concurrentlydef multi_vacuum(): pool = Pool(processes=32) sql_str = "SELECT 'vacuum freeze '||::regclass||';' as vacuum_cmd FROM pg_class c LEFT JOIN pg_class t ON = WHERE IN ('r', 'm') order by greatest(age(),age()) desc limit 1000;"; try: conn = (**args) cur = () (sql_str) rows = () for row in rows: cmd = row['vacuum_cmd'] pool.apply_async(vacuum_handler, (cmd, )) () () () except Exception as e: print str(e) multi_vacuum()
Friendly Tips
vacuum freeze will scan all pages of the table and update it. It is an IO-repeated operation. During the operation, you must control the concurrency number, otherwise it will be very easy to hang the instance.
Author information
Xie Guiqi (window name: Yuan Qing) joined Alibaba Cloud after graduation in 2020 and has been engaged in RDS PostgreSQL-related work, and is good at solving various online RDS PostgreSQL operation, maintenance and control problems.
Summarize
This is the end of this article about PostgreSQL transaction rewinding. For more related PostgreSQL transaction rewinding content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!