SoFunction
Updated on 2025-04-08

Postgresql switch log implementation

os: centos 7.4

db: postgresql 9.6

Switch log:

select pg_rotate_logfile();

Switch xlog:

select pg_switch_xlog();

Supplement: Postgresql stream copy master-slave switching + resetxlog

Lord 192.168.3.46

From 192.168.3.47

Phenomenon: Since the xlog log of the 46 main library reaches more than 7T, the disk space is insufficient, so the log needs to be cleared through the pg_resetxlog command.

solve:

1. Modify postgresql parameters

checkpoint_timeout = 60min   # range 30s-1d
max_wal_size = 4GB 
min_wal_size = 1GB 
checkpoint_completion_target = 0.9
wal_keep_segments = 50

2.pg_resetxlog command clear log

Operation process:

1. Stop the host

#46
pg_ctl stop -D /data/postgresql/data

2. Improve from node

#47
pg_ctl promote -D /data/postgresql/data

After completing the above two steps, I found that I could not connect to pgpool, and there are the following prompts:

11370 Apr 10 21:04:00 usvr47 pgpool[21766]: [11-1] 2019-04-10 21:04:00: pid 21766: LOG: new connection received
11371 Apr 10 21:04:00 usvr47 pgpool[21766]: [11-2] 2019-04-10 21:04:00: pid 21766: DETAIL: connecting host= port=26224
11372 Apr 10 21:04:00 usvr47 pgpool[21766]: [12-1] 2019-04-10 21:04:00: pid 21766: LOG: failed to connect to PostgreSQL server on "192.168.3.46:5432", getsockopt() detected error "Connection refused"
11373 Apr 10 21:04:00 usvr47 pgpool[21766]: [13-1] 2019-04-10 21:04:00: pid 21766: LOG: degenerate backend request for node_id: 0 from pid [21766] is canceled because failover is disallowed on the node
11374 Apr 10 21:04:00 usvr47 pgpool[21766]: [14-1] 2019-04-10 21:04:00: pid 21766: FATAL: failed to create a backend connection
11375 Apr 10 21:04:00 usvr47 pgpool[21766]: [14-2] 2019-04-10 21:04:00: pid 21766: DETAIL: executing failover on backend

Line 3 is pgpool's health check health_check (the configuration file does not have regular checks), and it is found that the main library cannot be connected;

Line 4 is that pgool triggers failover, which will be promoted from the library to the main library (the configuration file does not have automatic failover configured, so the command pg_ctl premote to switch manually);

However, the configuration of backend_flag1 = 'DISALLOW_TO_FAILOVER' in pgpool means that failover is not allowed, so it cannot be logged in through psql -p 9999 -h 192.168.3.45 -U postgres; after changing the command to backend_flag0 = 'ALLOW_TO_FAILOVER' and reloading, you can log in.

This parameter causes pgpool to be unable to log in if the main library cannot be connected, resulting in the client being unable to connect to the library.

At this time, you can see that the status of the main library (node ​​id is 0) is down through the show pool_nodes command.

3. Clear the original master node xlog

#46
#View oid xidpg_controldata -D /data/postgresql/data
#Qing Logpg_resetxlog -o 24579 -x 58288 -f /data/postgresql/data

7T space, cleaning time is about 30 minutes.

4. Resync data

46 as 47's slave, resync data

pg_basebackup -D /data/postgresql/data -Fp -Xs -v -P -h 192.168.3.47 -p 5432 -U repl

By default, the main library checkpoint will be waiting for the main library to be synchronized. To avoid long-term waiting, checkpoint manually in the main control.

After synchronization is completed:

mv  
pg_ctl start /data/postgresql/data

At this time, the new master and slave can be established.

5. Check out pgpool

[postgres@usvr-3-46 ~]$ psql -p 9999 -h 192.168.3.45 -U postgres
Password for user postgres: 
psql (9.6.3)
Type "help" for help.
postgres=# show pool_nodes;
 node_id |  hostname  | port | status | lb_weight | role  | select_cnt | load_balance_node | replication_delay 
---------+--------------+------+--------+-----------+---------+------------+-------------------+-------------------
 0    | 192.168.3.46 | 5432 |down   | 0.500000 | standby | 66014   | false       | 72
 1    | 192.168.3.47 | 5432 | up   | 0.500000 | primary | 66559   | true       | 0
(2 rows)

At this time, it is found that 3.46 is still down in pgpool status, and the invalid node needs to be re-joined to the cluster through the pcp command.

pcp_attach_node -U postgres -h 192.168.3.45 -p 9898 -n 0

where 0 is node_id.

Notice:

The file read by the password of the pcp command is. If you forget the password, you can perform the following operations:

[root@usvr47 ~]# pg_md5 postgres
e8a48653851e28c69d0506508fb27fc5
[root@usvr47 ~]# echo "postgres:e8a48653851e28c69d0506508fb27fc5" >> 

Instead of generating a password through the command pg_md5 -m -u postgres -p, put it in the pool_passwd file. The password of this file is the password to connect to postgresql, so be sure to pay attention.

The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.