PostgreSQL+Pgpool implements HA master-subsidy switching operation

PostgreSQL stream replication implements HA master-subsidy switching

Environment description and host planning

operating system	Host Name	Host	Role	port
CentOS 7	master	10.0.0.11	PG-Master	54321
CentOS 7	slave	10.0.0.12	PG-Slave	54321
CentOS 7	pool	10.0.0.13	pgpool	54321

Basic environment configuration (all host operations)

Configure HOSTS

echo -e "10.0.0.11 master\n10.0.0.12 slave\n10.0.0.13 pool" >> /etc/hosts # Execute once

Configure a unified time (if configured, please ignore it)

yum install -y ntpdate &amp;&amp; ntpdate 
echo -e "# sync time from \n5 * * * * /usr/sbin/ntpdate  &gt; /dev/null 2&gt;&amp;1
" &gt;&gt; /var/spool/cron/root # Write timing tasks，Just execute once

Create postgres user

useradd postgres && echo "your_password" | passwd --stdin postgres

Configure key-free login

su - postgres
ssh-keygen -t rsa -f /home/postgres/.ssh/id_rsa -P "" 
cd ~/.ssh/
ssh-copy-id postgres@master # Three hosts executescp authorized_keys postgres@slave:~/.ssh # Execute only on the master hostscp authorized_keys postgres@pool:~/.ssh # Only inmasterHost execution

Install Postgresql database (PG9.6)

yum install -y /pub/repos/yum/9.6/redhat/rhel-7-x86_64/pgdg-centos96-9.

yum install -y postgresql96-server postgresql96-contrib postgresql96 postgresql96-libs

Create a unified directory structure

mkdir /data1/pg_{data,bin,logs} -p

chown -R /data1/

Modify system variables

vi /etc/profile #Add the following contentexport PGHOME=/usr/pgsql-9.6/
export PGDATA=/data1/pg_data
export PGPORT=54321
export PATH=$PATH:$PGHOME/bin
#Effectivesource /etc/profile

PostgreSQL stream replication structure (master and slave host operations)

Master Host Operation

Initialize the system

/usr/pgsql-9.6/bin/postgresql96-setup initdb

vi /usr/lib/systemd/system/postgresql-9.

Modify postgresql-9.

The content is as follows:

# Include the default config:
.include /usr/lib/systemd/system/postgresql-9.

[Service]
Environment=PGDATA=/data1/pg_data

Restart PG service

systemctl daemon-reload
su - postgres -c '/usr/pgsql-9.6/bin/initdb -D /data1/pg_data'
systemctl restart postgresql-9.6
systemctl enable postgresql-9.

Modify the system configuration (the following is the postgres user operation)

cp /data1/pg_data/pg_hba.conf{,.bak} 
cat &gt;/data1/pg_data/pg_hba.conf&lt;&lt;EOF
local all    all            trust
host all    all      10.0.0.11/32   trust
host all    all      10.0.0.12/32   trust
host all    all      0.0.0.0/0    md5
host all    all      ::1/128     trust
host replication  stream_replication  0.0.0.0/0    md5
EOF
#host replication  stream_replication  0.0.0.0/0    md5 Replicate users for streams

64G

cp /data1/pg_data/{,.bak}
cat &gt;/data1/pg_data/&lt;&lt;EOF
listen_addresses = '*'
port = 54321
max_connections = 256
shared_buffers = 16GB
effective_cache_size = 48GB
work_mem = 64MB
maintenance_work_mem = 2GB
min_wal_size = 2GB
max_wal_size = 4GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
wal_level = hot_standby
wal_log_hints = on
max_wal_senders = 1
hot_standby = on
logging_collector = on
log_directory = 'pg_log'
EOF
#Remember to restart after operation pg_ctl restart

128G

listen_addresses = '*'
port = 54321
max_connections = 256
shared_buffers = 32GB
effective_cache_size = 96GB
work_mem = 128MB
maintenance_work_mem = 2GB
min_wal_size = 2GB
max_wal_size = 4GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
wal_level = hot_standby
wal_log_hints = on
max_wal_senders = 1
hot_standby = on
logging_collector = on
log_directory = 'pg_log'

Create stream replication users (stream_replication) and PGPool users (srcheck) in the main library

CREATE USER stream_replication replication LOGIN CONNECTION LIMIT 5 ENCRYPTED PASSWORD 'your_password';

CREATE USER srcheck replication LOGIN CONNECTION LIMIT 5 ENCRYPTED PASSWORD 'your_password';

Modify the main library pg_hba.conf file (see cat >/data1/pg_data/pg_hba.conf<<EOF for operation)

host replication stream_replication 0.0.0.0/0 md5

slave host operation

Initialize the system

/usr/pgsql-9.6/bin/postgresql96-setup initdb

vi /usr/lib/systemd/system/postgresql-9.

Modify postgresql-9.

The content is as follows:

# Include the default config:
.include /usr/lib/systemd/system/postgresql-9.

[Service]
Environment=PGDATA=/data1/pg_data

Restart PG service

systemctl daemon-reload

Copy the basic backup to the backup server

rm -rf /data1/pg_data # If there is no important data to operate, it is mainly synchronized main library path

su - postgres -c 'pg_basebackup -D $PGDATA --format=p -h master -p 54321 -U stream_replication -W'

Modify the library configuration information

cp $PGHOME/share/ $PGDATA/

vi $PGDATA/

Add the following content

standby_mode='on'
primary_conninfo = 'host=master port=54321 user=stream_replication password=your_password'
restore_command = ''
recovery_target_timeline = 'latest'
# Restart PG servicesystemctl restart postgresql-9.6
systemctl enable postgresql-9.

verify

Master node execution

create table test (id int4, create_time timestamp(0) without time zone);
insert into test values (1, now());
select * from test;

Preparing node execution

select * from test;

Other queries

Enter the test database test, execute the following command on the main library to return f, and return t on the backup library. select pg_is_in_recovery();

Run the following command to view the snapshot, which returns the main library record point and the backup library record point; for each additional write in the main library, the value of the record point will be increased by 1.

select txid_current_snapshot();

Run the following command to view the main and backup synchronization status.

select * from pg_stat_replication;

The synchronization status displayed by the field state is: startup (connection), catchup (synchronization), and streaming (synchronization); the modes displayed by the field sync_state are: async (asynchronization), sync (synchronization), and potential (although it is now in asynchronous mode, it is possible to upgrade to synchronous mode).

Main and backup switch

Assuming the main library crashes, how can the backup library switch from read-only to read-write state? Just modify the hot_standby in the backup library to off, delete it, and restart the library to provide services.

PGPool2 (pool host operation)

Install PGPool2

yum install -y /yum/rpms/3.6/redhat/rhel-7-x86_64/pgpool-II-release-3.
yum -y install pgpool-II-pg96 pgpool-II-pg96-debuginfo pgpool-II-pg96-devel pgpool-II-pg96-extensions
systemctl enable  #Turn on automatic startup

Add Pgpool-II to run the user

useradd postgres # The environment is readychown -R  /etc/pgpool-II
chown -R  /var/run/pgpool/

Configure pool_hba.conf

cp /etc/pgpool-II/pool_hba.conf{,.bak}

vi /etc/pgpool-II/pool_hba.conf

Add content

host all all 0.0.0.0/0 md5

Configuration

Execute after logging in:

postgres=# select rolname,rolpassword from pg_authid;
  rolname  |    rolpassword    
--------------------+-------------------------------------
 pg_signal_backend | 
 srcheck   | md5662c10f61b27a9ab38ce69157186b25f
 postgres   | md5d3612d57ee8d4c147cf27b11e3a0974d
 stream_replication | md59279ef6b904bc483e4f85e6d44cfc0ed
(4 rows)

vi /etc/pgpool-II/pool_passwd

Add the content of SQL execution results, in the form $rolname:$rolpassword For example:

srcheck:md5662c10f61b27a9ab38ce69157186b25f

or:

pg_md5 -u postgres your_password

vi /etc/pgpool-II/ ## Join postgres: output of the previous command

Configuration

cp /etc/pgpool-II/{,.bak}

vi /etc/pgpool-II/

The content is as follows:

# CONNECTIONS

listen_addresses = '*'
port = 54321
socket_dir = '/var/run/pgpool'
pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/var/run/pgpool'

# - Backend Connection Settings -

backend_hostname0 = 'master'
backend_port0 = 54321
backend_weight0 = 1
backend_data_directory0 = '/data1/pg_data'
backend_flag0 = 'ALLOW_TO_FAILOVER'

backend_hostname1 = 'slave'
backend_port1 = 54321
backend_weight1 = 1
backend_data_directory1 = '/data1/pg_data'
backend_flag1 = 'ALLOW_TO_FAILOVER'

# - Authentication -

enable_pool_hba = on
pool_passwd = 'pool_passwd'

# FILE LOCATIONS

pid_file_name = '/var/run/pgpool/'
logdir = '/data1/pg_logs'

replication_mode = off
load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'

sr_check_period = 5
sr_check_user = 'srcheck'
sr_check_password = '123456'
sr_check_database = 'postgres'

# HEALTH CHECK Health Check
health_check_period = 10
health_check_timeout = 20
health_check_user = 'srcheck'
health_check_password = '123456'
health_check_database = 'postgres'

# FAILOVER AND FAILBACK

failover_command = '/data1/pg_bin/failover_stream.sh %H'

failover_stream.sh script

vim /data1/pg_bin/failover_stream.sh
chmod 777 /data1/pg_bin/failover_stream.sh
chmod u+s /sbin/ifconfig 
chmod u+s /usr/sbin
pgpool -n -d -D &gt; /data1/pg_logs/ 2&gt;&amp;1 &amp; ## start uppgpool -m fast stop ## closure

failover_stream.sh content:

#! /bin/sh 
# Failover command for streaming replication. 
# Arguments: $1: new master hostname. 

new_master=$1 
trigger_command="$PGHOME/bin/pg_ctl promote -D $PGDATA" 

# Prompte standby database. 
/usr/bin/ssh -T $new_master $trigger_command 

exit 0;

When executing pgpool -n -d -D > /data1/pg_logs/ 2>&1 &, you can view the cluster status:

[postgres@pool pgpool-II]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay 
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------
 0  | master | 54321 | up  | 0.500000 | primary | 0   | false    | 0
 1  | slave | 54321 | up  | 0.500000 | standby | 0   | true    | 0
(2 rows)

If the cluster status is not found, perform the following operations on the master and slave hosts:

[postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 9898 -n 0
[postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 9898 -n 1
#Details query commandpcp_attach_node

HA switch

Simulate master host downtime

Master side:

[postgres@master ~]$ pg_ctl stop
waiting for server to shut down.... done
server stopped

Current cluster status

[postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres
psql (9.6.1)
Type "help" for help.

postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0  | master | 5432 | down| 0.500000 | standby | 0    | false | 0
 1  | slave  | 5432 | up  | 0.500000 | primary | 0    | true | 0
(2 rows)

I found that the master is already standby and down the machine

Modify the master and start

When the master host goes down, the slave host PG database becomes the master library. Just modify the master to become the slave library of slave

[postgres@master ~]$ vim 
standby_mode='on'
primary_conninfo = 'host=slave port=54321 user=stream_replication password=your_password'
restore_command = ''
recovery_target_timeline = 'latest'

Synchronous timeline

#If the timeline conflict is reported, stop the PG service first, and then execute the synchronous timeline. If you know, look at the status directly[postgres@master ~]$ pg_rewind --target-pgdata=/data1/pg_data --source-server='host=slave port=54321 user=postgres dbname=postgres'
servers diverged at WAL position 0/5000098 on timeline 1
rewinding from last common checkpoint at 0/5000028 on timeline 1
Done!
# Restart the database[postgres@master ~]$ pg_ctl start

Check the current status again

[postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres
postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0  | master | 5432 | down| 0.500000 | standby | 0    | false | 0
 1  | slave  | 5432 | up  | 0.500000 | primary | 0    | true | 0
(2 rows)

#Note that although the master has been started, it is still down. You need to manually add the master node to pgpool. The node_id of the master is 0, so -n 0[postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 54321 -n 0
# Prompt enter password, enter pcp management password#Check the current status[postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres
postgres=# show pool_nodes;
 node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay 
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------
 0  | master | 5432 | up | 0.500000 | standby | 0    | false | 0
 1  | slave  | 5432 | up  | 0.500000 | primary | 0    | true | 0
(2 rows)

Now both nodes are up.

Master-slave two node pgpool health check script (pgpool_check.sh)

Note: This script is used based on the fact that PGpool is only installed on the master and slave. After the master host has a pgpool process, you can execute sh pgpool_check.sh & on the slave host.

#! /bin/bash
# Check Master host pgpool-process

while true
do
 pgcount=$(nmap 10.0.0.11|egrep '9898|9999'|wc -l)

 if [ $pgcount -eq 2 ] ; then
  echo 'Master host pgpool is GOOD！！！' > /dev/null 2>&1
 else
  echo -e "Master host pgpool is \033[31m BAD!!! \033[0m"
  echo -e "Master host pgpool is \033[31m BAD!!! \033[0m"
  echo -e "Master host pgpool is \033[31m BAD!!! \033[0m"
  echo -e "SYSTEM WILL DO THE SHELL : \033[34m su - postgres -c 'pgpool -n -d -D > /data1/pg_logs/ 2>&1 &' \033[0m"
  su - postgres -c 'pgpool -n -d -D > /data1/pg_logs/ 2>&1 &'
  pgport=$(netstat -lntup|egrep '9898|9999'|wc -l)
  [ $pgport -gt 0 ] && echo -e "Slave host pgpool is \033[32m RUNNING!!! \033[0m"
  exit 0
 fi
done

The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.