pgpool replication and load balancing operations

Introduction

pgpool-II is a middleware located between the postgresql database and the client. It can provide the following functions:

1. Connection pool

pgpool-II keeps connections that have been connected to the postgresql database and reuses them when connecting in with the same parameters, reducing connection overhead and increasing the overall system throughput.

2. Copy

pgpool-II can manage multiple postgresql databases, activate the replication function and enable a real-time backup in 2 or more postgresql nodes to be established as possible, so that if one of the nodes fails, the service can not be continued by the terminal.

3. Load balancing

If the database is replicated, executing a select query in any server will return the same result. pgpool-II utilizes the replication function to reduce the load on each postgresql server. It enhances the overall throughput of the system by querying all available servers by distributing select. Under ideal conditions, the read performance should be proportional to the number of postgresql. The load balancing function works best in scenarios where a large number of users perform many read queries at the same time.

4. Limit connections that exceed the limit

Postgresql will limit the current maximum number of connections, and when this number is reached, the new connection will be rejected. Increasing this number of connections will increase resource consumption and have a certain negative impact on the global performance of the system. pgpool-II also supports limiting the maximum number of connections, but her approach is to put the connection into the queue instead of returning an error immediately.

In addition, pgpool-II also supports parallel query, and the data is divided into multiple servers, so a query can be executed simultaneously on multiple servers to reduce the overall execution time. Parallel queries are very effective when querying large-scale data.

Test environment

IP	application	Version
10.10.10.56	postgresql	9.2.15
10.10.10.57	postgresql	9.2.15
10.10.10.56	pgpool-II	3.5

Environment description: Two postgresqls, pgpool and one postgresql are on a server.

Test requirements: Test pgpool to implement postgresql replication and load balancing

Install pgpool

1. Source code installation

wget /?f=pgpool-II-3.5.
tar -zxvf pgpool-II-3.5. 
cd pgpool-II-3.5.3
./configure --prefix=/usr/local/pgpool --with-openssl
#Error configuration: error: libpq is not installed or libpq is old#Installing yum install postgresql-develmake &amp;&amp; make install

2. Install pgpool-regclass

Using PostgreSQL 8.0 to PostgreSQL 9.3, it is highly recommended to install the pgpool_regclass function in PostgreSQL that needs access because it is used internally by pgpool-II. If you don't do this, there will be problems with handling the same table names in different schemas (no temporary tables); while PostgreSQL 9.4 or later does not require it

cd pgpool-II-3.5.3/src/sql/pgpool-regclass
make &amp;&amp; make install

-bash-4.2$ psql -f  template1
or
psql template1
=# CREATE EXTENSION pgpool_regclass;

It should be executed in each database accessed through pgpool-II or CREATE EXTENSION. You don't need to do this in the database you created after you execute "psql -f template1" or CREATE EXTENSION, because this template database will be cloned into a newly created database.

3. Create insert_lock table

If you use insert_lock in replication mode, it is highly recommended to create the pgpool_catalog.insert_lock table for mutual exclusion. Until now, insert_lock can still work. However, in this case, pgpool-II needs to lock the inserted target table. This behavior is similar to the pgpool-II 2.2 and 2.3 series. Because the table lock conflicts with VACUUM, the INSERT operation may therefore wait for a long time.

cd pgpool-II-3.5.3/src/sql/
psql -f insert_lock.sql tempalte1

insert_lock.sql should be executed in each database accessed through pgpool-II. You don't need to do this in the database you created after you execute "psql -f insert_lock.sql template1", because this template database will be cloned into a newly created database.

4. Install pgpool_recovery

If you use online reply, you need some functions: pgpool_recovery, pgpool_remote_start, pgpool_switch_xlog.

In addition, the included tool pgpoolAdmin controls pgpool-II to start and stop and reconnect to the PostgreSQL nodes on the backend. It requires the function pgpool_pgctl; and pgpoolAdmin requires the function pgpool_pgctl to stop/restart/reload PostgreSQL parameters.

Four functions can be installed using the same method as installing pgpool_regclass. However, unlike pgpool_regclass, you only need to install these functions in the template1 database.

cd pgpool-II-3.5.3/src/sql/pgpool-recovery
make &amp;&amp; make install
psql -f  template1
or
psql template1
=# CREATE EXTENSION pgpool_recovery

Configure pgpool

1. Configuration

pgpool provides a pcp interface, which can view and manage the status of pgpool, and can remotely operate pgpool, which is used to authenticate pcp-related commands.

cd /usr/local/pgpool
cp etc/ etc/
bin/pg_md5 -m -u pgpool -p
Will be generated automaticallypool_passwd
postgres:md533c14731dfdfsdfsdf91e8d10c4bff5
cat etc/
pgpool:md533c14731dfdfsdfsdf91e8d10c4bff5

2. Configure pool_hba.conf

Relevant permission settings can be performed through pool_hba.conf, similar to postgresql's pg_hba.conf

cd /usr/local/pgpool
cp etc/pool_hba. etc/pool_hba.conf
bin/pg_md5 -m -u postgres -p
Will be generated automaticallypool_passwd
postgres:md533c14731dfdfsdfsdf91e8d10c4bff5

**Notice:

1. The encryption method in pool_hba.conf must be the same as that in pg_hba.conf, both of which are mds, trust or other methods. If it is different, an error will be reported.

2. The user in pool_hba.conf must exist in the postgresql database

3. Configuration

cd /usr/local/pgpool
cp etc/ etc/
#Create related directoriesmkdir -p /var/run/pgpool
mkdir -p /usr/local/pgpool/logs
vim etc/
listen_addresses = '*'
port = 9999
socket_dir = '/tmp'
listen_backlog_multiplier = 2
serialize_accept = off
pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/tmp'
backend_hostname0 = '10.10.10.56'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/data/pgsql/data'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '10.10.10.57'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/data/pgsql/data'
backend_flag1 = 'ALLOW_TO_FAILOVER'
enable_pool_hba = on
pool_passwd = 'pool_passwd'
debug_level = 0
pid_file_name = '/var/run/pgpool/'
logdir = '/usr/local/pgpool/logs'
replication_mode = on
load_balance_mode = on
master_slave_mode = off

in:

replication_mode = on replication mode is turned on, and real-time backup can be performed on two postgresql databases

load_balance_mode = on load balancing mode is turned on, load balancing can be achieved

3. Start pgpool

/usr/local/pgpool/bin/pgpool

4. Connect to pgpool

psql -U postgres -p 9999 -h 10.10.10.56
Password for user postgres: 
psql (9.2.15)
Type "help" for help.
postgres=#

test

1. Create a benchmark_replication database

psql -U postgres -p 9999 -h 10.10.10.56
Password for user postgres: 
psql (9.2.15)
Type "help" for help.
postgres=#create database bench_replication;

We can log in to postgresql on 56 and 57 to view it separately

psql -U postgres -p 5432 -h 10.10.10.56
Password for user postgres: 
psql (9.2.15)
Type "help" for help.

postgres=# \list
          List of databases
  Name  | Owner | Encoding | Collate | Ctype | Access privileges 
-------------------+----------+----------+-------------+-------------+-----------------------
 bench_replication | postgres | UTF8  | en_US.UTF-8 | en_US.UTF-8 | 
 postgres   | postgres | UTF8  | en_US.UTF-8 | en_US.UTF-8 | 
 template0   | postgres | UTF8  | en_US.UTF-8 | en_US.UTF-8 | =c/postgres   +
     |   |   |    |    | postgres=CTc/postgres
 template1   | postgres | UTF8  | en_US.UTF-8 | en_US.UTF-8 | =c/postgres   +
     |   |   |    |    | postgres=CTc/postgres
(4 rows)

You can see that both postgresqls have created a benchmark_replication database, so the replication of pgpool is successful.

test

#Installing pgbenchyum install postgresql-contrib
pgbench -i -h 10.10.10.56 -U postgres -p 9999 bench_replication

In all pgsql nodes, the tables and data listed below are created, indicating that the replication is running normally.

Table name	Number of rows
branches	1
tellers	10
accounts	100000
history	0

Summarize

The above just briefly introduces the replication and load balancing of pgpool-II as a starter, and there are still many functions that we have not introduced, so we need to continue to explore.

Supplement: Summary of pitfalls encountered during use of pgpool

1. The copy mode has low reliability

The earliest use of replication mode, the data was entered into pgpool and then pgpool was written into n postgres respectively. It was found that data inconsistency often occurred, resulting in only one database in the end.

2，online recovery

PIRT-based online recovery configuration complex

3. Master and standby mode based on stream replication

This uses the new feature of postgres9. The early configuration tests are very easy, and failover is also very useful. However, when the service connects to pgpool, transactions often report errors. Postgres error: failed to read kind from backend. I mentioned this in my previous article, but it has not been solved yet.

4. Disturbance of connection number

num_init_children was originally understood as the size of a pool. If it exceeds it, it will automatically amplify, but in fact it is often not enough. To be precise, this value is also the maximum number of concurrent connections initiated from the client supported by pgpool-II.

Therefore, try to match this value as large as possible, and you must restart pgpool for changes to this value.

5. Client_idle_limit does not configure

When a client is idle to client_idle_limit seconds after executing the last query, the connection to this client will be disconnected. The connection should not be disconnected by pgpool, but the application should actively disconnect. If pgpool is disconnected, the client will be unavailable.

Of course, pgpool also has an advantage, it can quickly find connected applications. Because each connection is a separate process, after starting, num_init_children processes can accept connections.

Use # ps -ef |grep pgpool to see

pgpool: wait for connection request The process is an empty process, waiting for connection.

pgpool: postgres dbtest 10.115.53.167(51883) idle These processes are processes in use, and you can see which machine it comes from, what user it is, and what database it is connected to.

Of course, you can also check the connection status using select * from pg_stat_activity

The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.