Postgresql backup and incremental recovery solution

Preface

The database used at work has been Postgresql, an open source database that anyone can use for commercial purposes. When using Postgresql, the most obvious feeling for me is that this database is really good. Although the installation package of the database is really small, its performance and operation are not inferior to that of other large databases in commercial industries. In addition, it feels really good to operate the database directly under the command line interface. When using databases, we as database administrators in small companies have a job that cannot be avoided, that is, data backup and recovery issues. Although PostgreSQL has a lot of things in all aspects, it does not support incremental backup in terms of database backup, which is really a pity. However, the flaws do not conceal the merits. In general, this is a good database software.

Before, we were inPostgresql master-slave asynchronous stream replication schemeIn the section 》, the master-slave asynchronous stream replication environment of Postgresql is deployed. The purpose of master-slave replication is to achieve data backup, achieve high availability and fault-tolerant data. The following is a brief introduction to the scene backup and recovery plan when we operate and maintain the Postgresql database.

Incremental backup

When PostgreSQL is writing, any modification information made to the data file will be first written to the WAL log (write-pre-log), and then physically modified the data file. When the database server is restarted, PostgreSQL will first read the WAL log and restore the data file when it is started. Therefore, theoretically speaking, if we have a basic backup of the database (also known as full backup), and combined with the WAL log, we can restore the database to any point in time.

The above knowledge points are very important, because to put it bluntly, the incremental backup of our scenario is redone and restored through basic backup + incremental WAL logs.

Incremental backup settings

To demonstrate the relevant functions, we are based onPostgresql master-slave asynchronous stream replication schemeCreate relevant management directory on the environment pghost1 server in the section 》

Switch to postgres user

mkdir -p /data/pg10/backups
mkdir -p /data/pg10/archive_wals

The backups directory can be used to store basic backups

The archive_wals directory is naturally used to store archives

Next, we modify the relevant settings of our files

wal_level = replica

archive_mode = on

archive_command = '/usr/bin/lz4 -q -z %p /data/pg10/archive_wals/%f.lz4'

The default value of the archive_command parameter is an empty string, and its value can be a shell command or a complex shell script.

In the shell command or script of archive_command, %p can be used to represent the file name of the WAL file to be archived with full path information, and %f can be used to represent the file name of the WAL file that does not contain path information.

Modifying wal_level and archive_mode parameters requires restarting the database before it can take effect. Modifying archive_command does not require restarting, just reload, for example:

postgres=# SELECT pg_reload_conf();

postgres=# show archive_command ;

Create a basic backup

We use the pg_basebackup command introduced before to create basic backups. Basic backup is very important, and our data recovery cannot be without it. It is recommended that we generate our basic backup periodically based on relevant business strategies.

$ pg_basebackup -Ft -Pv -Xf -z -Z5 -p 25432 -D /data/pg10/backups/

In this way, we successfully generate our basic data backup

Set restore point

Generally, we need to create a restore point based on when important events occur, and restore to the state before the event occurs through basic backup and archive.

The system function to create a restore point is: pg_create_restore_point, which is defined as follows:

postgres=# SELECT pg_create_restore_point('domac-201810141800');

Restore to the specified restore point

Next, we use an example to restore our data to the restore point we set.

First, we create a test table:

CREATE TABLE test_restore(
 id SERIAL PRIMARY KEY,
 ival INT NOT NULL DEFAULT 0,
 description TEXT,
 created_time TIMESTAMPTZ NOT NULL DEFAULT now()
);

Initialize some test data as the basic data, as shown below:

postgres=# INSERT INTO test_restore (ival) VALUES (1);
INSERT 0 1
postgres=# INSERT INTO test_restore (ival) VALUES (2);
INSERT 0 1
postgres=# INSERT INTO test_restore (ival) VALUES (3);
INSERT 0 1
postgres=# INSERT INTO test_restore (ival) VALUES (4);
INSERT 0 1

postgres=# select * from test_restore;
 id | ival | description |   created_time
----+------+-------------+-------------------------------
 1 | 1 |    | 2018-10-14 11:13:41.57154+00
 2 | 2 |    | 2018-10-14 11:13:44.250221+00
 3 | 3 |    | 2018-10-14 11:13:46.311291+00
 4 | 4 |    | 2018-10-14 11:13:48.820479+00
(4 rows)

And create a basic backup as described above. If it is a test, one thing you need to note is that since the WAL file is only archived if it is written 16MB, there may be very little writing in the test stage. You can manually switch WAL after performing the basic backup. For example:

postgres=# select pg_switch_wal();
 pg_switch_wal
---------------
 0/1D01B858
(1 row)

Or by setting the archive_timeout parameter, force switch to the new WAL segment when the timeout threshold is reached.

Next, create a restore point like this:

postgres=# select pg_create_restore_point('domac-1014');
 pg_create_restore_point
-------------------------
 0/1E0001A8
(1 row)

Next we make some changes to the data, and we delete all data in test_restore:

postgres=# delete from test_restore;
DELETE 4

The following is an experiment that restores to the restore point named "domac-1014", as shown below:

Stop the database

$ pg_ctl stop -D /data/pg10/db

Remove old data directory

$ rm -rf /data/pg10/db

$ mkdir db && chmod 0700 db

$ tar -xvf /data/pg10/backups/ -C /data/pg10/db

cp $PGHOME/share/ /pgdata/10/data/

chmod 0600 /pgdata/10/data/

Modify , modify the following configuration information:

restore_command = '/usr/bin/lz4 -d /data/pg10/archive_wals/%f.lz4 %p'
recovery_target_name = 'domac-1014

Then start the database and enter the recovery state and observe the log as shown below:

bash-4.2$ pg_ctl start -D /data/pg10/db
waiting for server to start....2018-10-14 11:26:56.949 UTC [8397] LOG: listening on IPv4 address "0.0.0.0", port 25432
2018-10-14 11:26:56.949 UTC [8397] LOG: listening on IPv6 address "::", port 25432
2018-10-14 11:26:56.952 UTC [8397] LOG: listening on Unix socket "/tmp/..25432"
2018-10-14 11:26:56.968 UTC [8398] LOG: database system was interrupted; last known up at 2018-10-14 09:26:59 UTC
2018-10-14 11:26:57.049 UTC [8398] LOG: starting point-in-time recovery to "domac-1014"
/data/pg10/archive_wals/.lz4: No such file or directory
2018-10-14 11:26:57.052 UTC [8398] LOG: restored log file "" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.077 UTC [8398] LOG: restored log file "000000020000000000000016" from archive
2018-10-14 11:26:57.191 UTC [8398] LOG: redo starts at 0/16000060
2018-10-14 11:26:57.193 UTC [8398] LOG: consistent recovery state reached at 0/16000130
2018-10-14 11:26:57.193 UTC [8397] LOG: database system is ready to accept read only connections
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.217 UTC [8398] LOG: restored log file "000000020000000000000017" from archive
 done
server started
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.384 UTC [8398] LOG: restored log file "000000020000000000000018" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.513 UTC [8398] LOG: restored log file "000000020000000000000019" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.699 UTC [8398] LOG: restored log file "00000002000000000000001A" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.805 UTC [8398] LOG: restored log file "00000002000000000000001B" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.982 UTC [8398] LOG: restored log file "00000002000000000000001C" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:58.116 UTC [8398] LOG: restored log file "00000002000000000000001D" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:58.310 UTC [8398] LOG: restored log file "00000002000000000000001E" from archive
2018-10-14 11:26:58.379 UTC [8398] LOG: recovery stopping at restore point "domac-1014", time 2018-10-14 11:17:20.680941+00
2018-10-14 11:26:58.379 UTC [8398] LOG: recovery has paused
2018-10-14 11:26:58.379 UTC [8398] HINT: Execute pg_wal_replay_resume() to continue.

After restarting, we query the test_restore table to see if the data is recovered normally:

postgres=# select * from test_restore;
 id | ival | description |   created_time
----+------+-------------+-------------------------------
 1 | 1 |    | 2018-10-14 11:13:41.57154+00
 2 | 2 |    | 2018-10-14 11:13:44.250221+00
 3 | 3 |    | 2018-10-14 11:13:46.311291+00
 4 | 4 |    | 2018-10-14 11:13:48.820479+00
(4 rows)

You can see that the data has been restored to the specified restore point: domac-1014.

At this time, it can be removed to avoid the next time the data is restarted and the data is restored to the restore point again

Summarize

Backup and recovery are very important tasks in database management. In daily operations and maintenance, we need to backup relevant policies as needed and conduct periodic recovery tests to ensure the security of the data.

Okay, the above is the entire content of this article. I hope that the content of this article has a certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support.