Analysis of PostgreSQL long transaction concept

We should have heard of the dangers of long-term affairs in many places, such as long-term affairs that can cause table inflation. So what is a long transaction in PostgreSQL?

First of all, there is no definition of "long transaction" in the official documents of PostgreSQL. It seems that everyone agrees that a transaction that has been executed for a long time but has not been submitted is considered a "long transaction". The definitions of long transactions are often different in different databases. So what is a long transaction in PostgreSQL?

For example, as shown below, I start a transaction through begin in a session, and then execute a simple query statement and delay committing it. Is this considered a long transaction?

bill=# begin;
BEGIN
bill=*# select 1;
?column?
----------
1
(1 row)

bill=*#

To figure out this issue, we might as well think about why we mentioned long-term affairs. This is because long transactions in pg will affect garbage collection in tables, causing the table to grow older and cannot be freeze. Will the transactions opened in our above session have any impact? Actually, it doesn't. We can observe it through the pg_stat_activity view:

bill=# select * from pg_stat_activity where pid = 26192;
-[ RECORD 1 ]----+------------------------------
datid | 16385
datname | bill
pid | 26192
leader_pid |
usesysid | 16384
usename | bill
application_name | psql
client_addr |
client_hostname |
client_port | -1
backend_start | 2022-03-02 11:49:49.433165+08
xact_start | 2022-03-02 14:34:04.494416+08
query_start | 2022-03-02 14:34:06.946754+08
state_change | 2022-03-02 14:34:06.947207+08
wait_event_type | Client
wait_event | ClientRead
state | idle in transaction
backend_xid |
backend_xmin |
query | select 1;
backend_type | client backend

The main reason why it causes table bloating is that the two fields backend_xid and backend_xmin are both empty.

/* ----------
 * LocalPgBackendStatus
 *
 * When we build the backend status array, we use LocalPgBackendStatus to be
 * able to add new values to the struct when needed without adding new fields
 * to the shared memory. It contains the backend status as a first member.
 * ----------
 */
typedef struct LocalPgBackendStatus
{
  /*
   * Local version of the backend status entry.
   */
  PgBackendStatus backendStatus;
  /*
   * The xid of the current transaction if available, InvalidTransactionId
   * if not.
   */
  TransactionId backend_xid;
  /*
   * The xmin of the current session if available, InvalidTransactionId if
   * not.
   */
  TransactionId backend_xmin;
} LocalPgBackendStatus;

backend_xid indicates a transaction that has applied for a transaction number, such as transactions such as adding, deleting, modifying, and DLL operations. backend_xid continues from the start of the transaction number to the end of the transaction.

backend_xmin represents the snapshot during SQL execution, the largest committed transaction visible.

And what is the reason for the surface expansion? When there are unfinished SQL statements or unfinished transactions holding transaction IDs in the database, garbage is generated during this transaction, or within this SQL execution time range, the garbage cannot be recycled, causing the database to bloat.

That is to judge the minimum values of backend_xid and backend_xmin in the current database. Any garbage generated by transactions exceeding this minimum value cannot be recycled.

Therefore, how should we write if we want to monitor the long-term transaction? Take a long transaction that exceeds 1 hour as an example:

select count(*) from pg_stat_activity where state <> 'idle' 
and (backend_xid is not null or backend_xmin is not null) 
and now()-xact_start > interval '3600 sec'::interval;

Therefore, for transactions, only after some DML or DDL operations are executed can it be considered as what we usually call a long transaction. Otherwise, it can only be regarded as what we often call long connections. Of course, long connections also have many disadvantages, such as occupancy of memory, CPU and other resources.

In practical applications, we should monitor long-term transactions and prevent them from happening as much as possible. For example, some batch operations may easily lead to long transactions, and we can try to arrange them to be executed during the peak period of business. At the same time, if automatic submission is turned off in our application, submission must be added after execution.

This is the end of this article about the analysis of PostgreSQL long transaction concept. For more related PostgreSQL long transaction content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!