PostgreSQL operation to recover accidentally deleted data

In Oracle; delete tables or delete table records by mistake; there is a flashback feature that does not require downtime, and records can be retrieved perfectly. Of course, there are some other recovery tools; such as odu tools and gdul tools. All data can be retrieved. PostgreSQL currently has no flashback feature. How to restore accidentally deleted data without shutting down. Fortunately, there is a complete hot backup.

The method described in this article is to use hot backup to recover data on another server; then import the formal environment; this will not affect database operations. This method also works in Oracle recovery. Several conditions must be met

1. Have a complete backup of basic data files and archive file backups. Therefore, backup is very important.

2. Have a server with the same Postgres software installed

Example simulation explanation

The process simulates the error deletion table tbl_lottu_drop; subsequent dml/ddl operations are performed; indicating that the official database is still working normally. In another database, PITR recovery is based on database. Recover the data of table tbl_lottu_drop.

1. Create a valid backup

Postgres201 ： Online database server
Postgres202 ： Operate the server
postgres=# select pg_start_backup(now()::text); 
 pg_start_backup 
-----------------
 0/F000060
(1 row)
[postgres@Postgres201 ~]$ rsync -acvz -L --exclude "pg_xlog" --exclude "pg_log" $PGDATA /data/backup/20180428
postgres=# select pg_stop_backup(); 
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
 pg_stop_backup 
----------------
 0/F000168
(1 row)

2. Simulation error operation

2.1 Create an object table that needs to be restored to tbl_lottu_drop. And insert 1000 records. It also ensures that data is written from the data cache to disk.

lottu=> create table tbl_lottu_drop (id int);
CREATE TABLE
lottu=> insert into tbl_lottu_drop select generate_series(1,1000); 
INSERT 0 1000
lottu=> \c lottu postgres
You are now connected to database "lottu" as user "postgres".

2.2 This obtains a time: used for the subsequent database-based PITR recovery (of course, after actual operation, you can only remember one approximate time; it is often inaccurate; the time you may remember is after the misoperation. Here is a description of how to obtain and restore it to that time point)

lottu=# select now();
    now    
-------------------------------
 2018-04-28 20:47:31.617808+08
(1 row)
lottu=# checkpoint;
CHECKPOINT
lottu=# select pg_xlogfile_name(pg_switch_xlog());
  pg_xlogfile_name  
--------------------------
 000000030000000000000010
(1 row)

2.3 Perform drop table

lottu=# drop table tbl_lottu_drop;
DROP TABLE

2.4 Follow-up dml/ddl operation; it indicates that the official database is still working normally

lottu=# create table tbl_lottu_log (id int);
CREATE TABLE
lottu=# insert into tbl_lottu_log values (1),(2);
INSERT 0 2
lottu=# checkpoint;
CHECKPOINT
lottu=# select pg_xlogfile_name(pg_switch_xlog());
  pg_xlogfile_name  
--------------------------
 000000030000000000000011
(1 row)

3. Recovery operation

3.1 Copy the backup to the Postgres202 database

[postgres@Postgres201 20180428]$ cd /data/backup/20180428
[postgres@Postgres201 20180428]$ ll
total 4
drwx------. 18 postgres postgres 4096 Apr 28 20:42 data
[postgres@Postgres201 20180428]$ rsync -acvz -L data [email protected]:/data/postgres

3.2 Delete unnecessary files

[postgres@Postgres202 data]$ cd $PGDATA
[postgres@Postgres202 data]$ rm backup_label.old  tablespace_map.old

3.3 Restore the backup tablespace soft link

[postgres@Postgres202 data]$ cat tablespace_map 
16385 /data/pg_data/lottu
[postgres@Postgres202 data]$ mkdir -p /data/pg_data
[postgres@Postgres202 data]$ cd pg_tblspc/
[postgres@Postgres202 pg_tblspc]$ mv 16385/ /data/pg_data/lottu
[postgres@Postgres202 pg_tblspc]$ ln -s /data/pg_data/lottu ./16385
[postgres@Postgres202 pg_tblspc]$ ll
total 0
lrwxrwxrwx. 1 postgres postgres 19 Apr 28 23:12 16385 -> /data/pg_data/lottu

3.4 Copy the wal log to the pg_xlog directory on the Postgres202 database; which log to start copying?

[postgres@Postgres202 data]$ mkdir -p pg_xlog/archive_status
[postgres@Postgres202 data]$ cat backup_label 
START WAL LOCATION: 0/F000060 (file 00000003000000000000000F)
CHECKPOINT LOCATION: 0/F000098
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2018-04-28 20:42:15 CST
LABEL: 2018-04-28 20:42:13.244358+08

Check the backup_label; know that 0000000300000000000000000F starts to the wal log being written.

[postgres@Postgres202 pg_xlog]$ ll
total 65540
-rw-------. 1 postgres postgres 16777216 Apr 28 20:42 00000003000000000000000F
-rw-------. 1 postgres postgres  313 Apr 28 20:42 00000003000000000000000F.
-rw-------. 1 postgres postgres 16777216 Apr 28 20:48 000000030000000000000010
-rw-------. 1 postgres postgres 16777216 Apr 28 20:50 000000030000000000000011
-rw-------. 1 postgres postgres 16777216 Apr 28 20:55 000000030000000000000012

3.5 Editing files

[postgres@Postgres202 data]$ vi  
restore_command = 'cp /data/arch/%f %p'   # . 'cp /mnt/server/archivedir/%f %p'
recovery_target_time = '2018-04-28 20:47:31.617808+08'
recovery_target_inclusive = false
recovery_target_timeline = 'latest'

3.6 Start the database; and verify the data

[postgres@Postgres202 data]$ pg_start
server starting
[postgres@Postgres202 data]$ ps -ef | grep postgres
root  1098 1083 0 22:32 pts/0 00:00:00 su - postgres
postgres 1099 1098 0 22:32 pts/0 00:00:00 -bash
root  1210 1195 0 22:55 pts/1 00:00:00 su - postgres
postgres 1211 1210 0 22:55 pts/1 00:00:00 -bash
postgres 1442  1 1 23:16 pts/0 00:00:00 /opt/pgsql96/bin/postgres
postgres 1450 1442 0 23:16 ?  00:00:00 postgres: checkpointer process 
postgres 1451 1442 0 23:16 ?  00:00:00 postgres: writer process 
postgres 1459 1442 0 23:16 ?  00:00:00 postgres: wal writer process 
postgres 1460 1442 0 23:16 ?  00:00:00 postgres: autovacuum launcher process 
postgres 1461 1442 0 23:16 ?  00:00:00 postgres: archiver process last was 
postgres 1462 1442 0 23:16 ?  00:00:00 postgres: stats collector process 
postgres 1464 1099 0 23:16 pts/0 00:00:00 ps -ef
postgres 1465 1099 0 23:16 pts/0 00:00:00 grep postgres
[postgres@Postgres202 data]$ psql
psql (9.6.0)
Type "help" for help.
postgres=# \c lottu lottu
You are now connected to database "lottu" as user "lottu".
lottu=> \dt
   List of relations
 Schema |  Name  | Type | Owner 
--------+----------------+-------+-------
 public | pitr_test  | table | lottu
 public | tbl_lottu_drop | table | lottu
 
 lottu=> select count(1) from tbl_lottu_drop;
 count 
-------
 1000
(1 row)

From this point of view, the data has been restored; the copy to the online database operation is skipped.

Extension point

The following explains how to find the time of misoperation. That is, recovery_target_time = '2018-04-28 20:47:31.617808+08' time point. The above is obtained previously;

1. Use pg_xlogdump to parse this log.

[postgres@Postgres201 pg_xlog]$ pg_xlogdump -b 00000003000000000000000F 000000030000000000000012 > 
pg_xlogdump: FATAL: error in WAL record at 0/12000648: invalid record length at 0/12000680: wanted 24, got 0

2. This log can be found from it

rmgr: Transaction len (rec/tot):  8/ 34, tx:  1689, lsn: 0/100244A0, prev 0/10024460, desc: COMMIT 2018-04-28 20:45:49.736013 CST
rmgr: Standby  len (rec/tot):  24/ 50, tx:   0, lsn: 0/100244C8, prev 0/100244A0, desc: RUNNING_XACTS nextXid 1690 latestCompletedXid 1689 oldestRunningXid 1690
rmgr: Heap  len (rec/tot):  3/ 3130, tx:  1690, lsn: 0/10024500, prev 0/100244C8, desc: INSERT off 9
 blkref #0: rel 16385/16386/2619 fork main blk 15 (FPW); hole: offset: 60, length: 5116
rmgr: Btree  len (rec/tot):  2/ 7793, tx:  1690, lsn: 0/10025140, prev 0/10024500, desc: INSERT_LEAF off 385
 blkref #0: rel 16385/16386/2696 fork main blk 1 (FPW); hole: offset: 1564, length: 452
rmgr: Heap  len (rec/tot):  2/ 184, tx:  1690, lsn: 0/10026FD0, prev 0/10025140, desc: INPLACE off 16
 blkref #0: rel 16385/16386/1259 fork main blk 0
rmgr: Transaction len (rec/tot):  88/ 114, tx:  1690, lsn: 0/10027088, prev 0/10026FD0, desc: COMMIT 2018-04-28 20:46:37.718442 CST; inval msgs: catcache 49 catcache 45 catcache 44 relcache 32784
rmgr: Standby  len (rec/tot):  24/ 50, tx:   0, lsn: 0/10027100, prev 0/10027088, desc: RUNNING_XACTS nextXid 1691 latestCompletedXid 1690 oldestRunningXid 1691
rmgr: Standby  len (rec/tot):  24/ 50, tx:   0, lsn: 0/10027138, prev 0/10027100, desc: RUNNING_XACTS nextXid 1691 latestCompletedXid 1690 oldestRunningXid 1691
rmgr: XLOG  len (rec/tot):  80/ 106, tx:   0, lsn: 0/10027170, prev 0/10027138, desc: CHECKPOINT_ONLINE redo 0/10027138; tli 3; prev tli 3; fpw true; xid 0:1691; oid 40976; multi 1; offset 0; oldest xid 1668 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 1691; online
rmgr: Standby  len (rec/tot):  24/ 50, tx:   0, lsn: 0/100271E0, prev 0/10027170, desc: RUNNING_XACTS nextXid 1691 latestCompletedXid 1690 oldestRunningXid 1691
rmgr: Standby  len (rec/tot):  24/ 50, tx:   0, lsn: 0/10027218, prev 0/100271E0, desc: RUNNING_XACTS nextXid 1691 latestCompletedXid 1690 oldestRunningXid 1691
rmgr: XLOG  len (rec/tot):  80/ 106, tx:   0, lsn: 0/10027250, prev 0/10027218, desc: CHECKPOINT_ONLINE redo 0/10027218; tli 3; prev tli 3; fpw true; xid 0:1691; oid 40976; multi 1; offset 0; oldest xid 1668 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 1691; online
rmgr: XLOG  len (rec/tot):  0/ 24, tx:   0, lsn: 0/100272C0, prev 0/10027250, desc: SWITCH 
rmgr: Standby  len (rec/tot):  24/ 50, tx:   0, lsn: 0/11000028, prev 0/100272C0, desc: RUNNING_XACTS nextXid 1691 latestCompletedXid 1690 oldestRunningXid 1691
rmgr: Standby  len (rec/tot):  16/ 42, tx:  1691, lsn: 0/11000060, prev 0/11000028, desc: LOCK xid 1691 db 16386 rel 32784 
rmgr: Heap  len (rec/tot):  8/ 2963, tx:  1691, lsn: 0/11000090, prev 0/11000060, desc: DELETE off 16 KEYS_UPDATED 
 blkref #0: rel 16385/16386/1247 fork main blk 8 (FPW); hole: offset: 88, length: 5288

According to the "32784" log, you can see that the table tbl_lottu_drop inserts 1000 records at 2018-04-28 20:46:37.718442 (so there is no problem with the recovery time point 2018-04-28 20:47:31.617808+08); that is, it is also operated when the transaction id is 1690. And perform a delete operation when the transaction id is 1691.

So the above can also be rewritten as:

restore_command = 'cp /data/arch/%f %p'   # . 'cp /mnt/server/archivedir/%f %p'
recovery_target_xid = '1690' 
recovery_target_inclusive = false 
recovery_target_timeline = 'latest'

Supplement: Analysis of PostgreSQL multiple recovery instances

Postgresql archive recovery instance analysis (timeline mechanism)

This article introduces the Postgresql archive recovery method and the meaning of the timeline based on examples.

1 Parameter configuration

sed -ir "s/#*max_wal_senders.*/max_wal_senders = 10/" $PGDATA/
sed -ir "s/#*wal_level.*/wal_level = replica/" $PGDATA/
sed -ir "s/#*archive_mode.*/archive_mode = on/" $PGDATA/
sed -ir "s/#*archive_command.*/archive_command = 'test ! -f \${PGHOME}\/archive\/%f \&\& cp %p \${PGHOME}\/archive\/%f'/" $PGDATA/

2 Data operations

date;psql -c "create table test00 (id int primary key, info text)"
Sat Apr 1 10:09:55 CST 2017
date;psql -c "insert into test00 values(generate_series(1,50000), repeat(md5(random()::text), 1000))"
Sat Apr 1 10:10:10 CST 2017
date;psql -c "create table test01 (id int primary key, info text)"
Sat Apr 1 10:10:48 CST 2017
date;psql -c "insert into test01 values(generate_series(1,50000), repeat(md5(random()::text), 1000))"
Sat Apr 1 10:10:53 CST 2017

3 Make basic backups

sed -ir "s/#*max_wal_senders.*/max_wal_senders = 10/" $PGDATA/

Configure the pg_hba.conf channel

pg_basebackup -Fp -P -x -D ~/bin/data/pg_root21 -l basebackup21

4 Continue business operations on the database (after simulation, the business continues to be issued and then a failure occurs)

date;psql -c "create table test02 (id int primary key, info text)"
Sat Apr 1 10:15:59 CST 2017
date;psql -c "insert into test02 values(generate_series(1,100000), repeat(md5(random()::text), 1000))"
Sat Apr 1 10:16:09 CST 2017

Timeline (the abbreviation of the third line stands for Create Insert)

-10:09:55----10:10:10-----10:10:48-----10:10:53------10:15:59----10:16:09--
----|-----------|------------|------------|-------------|-----------|------
C test00-----I test00-----C test01-----I test01-----C test01-----I test01--

Situation 1

No archive_timeout is set, the entire data directory is deleted

5 The data directory was dropped by rm -rf (simulated error operation)

rm -rf pg_root20/

6 Archive recovery

cp -r pg_root21 pg_root20

Modify pg_hba.conf to prevent users from connecting

cp $PGHOME/share/ ./
sed -ir "s/#*restore_command.*/restore_command = 'cp \${PGHOME}\/archive\/%f %p'/" $PGDATA/

7 Recovery results

test02 exists, but the data in it is considered to be an uncommitted transaction and the table is empty (all the contents of the last xlog file are lost).

Situation 2

Set archive_timeout, the entire data directory is deleted, the archive timeout is 60s, after the data in the test02 table is poured, xlog will automatically switch and archive

(sed -ir "s/#*archive_timeout.*/archive_timeout = 60/" $PGDATA/)

5 The data directory was dropped by rm -rf (simulated error operation)

rm -rf pg_root20/

6 Archive recovery

cp -r pg_root21 pg_root20

Modify pg_hba.conf to prevent users from connecting

cp $PGHOME/share/ ./
sed -ir "s/#*restore_command.*/restore_command = 'cp \${PGHOME}\/archive\/%f %p'/" $PGDATA/

7 Recovery results

test02 exists, and the data also exists (because the archive sets the timeout switch, the last xlog will be archived).

Situation 3 (Important)

Set archive_timeout, try to restore multiple times based on the estimated time point, and cannot determine which time point you want to restore to. The archive timeout is 60s

(sed -ir "s/#*archive_timeout.*/archive_timeout = 60/" $PGDATA/)

5 The data directory was dropped by rm -rf (simulated error operation)

rm -rf pg_root20/

6 Archive recovery

cp -r pg_root21 pg_root20

Modify pg_hba.conf to prevent users from connecting

cp $PGHOME/share/ ./
sed -ir "s/#*restore_command.*/restore_command = 'cp \${PGHOME}\/archive\/%f %p'/" $PGDATA/

(1) recovery_target_time = ‘2017-4-1 10:09:47' (before the basic backup time)

here------------------------------------------------------------------------
-10:09:55----10:10:10-----10:10:48-----10:10:53------10:15:59----10:16:09--
----|-----------|------------|------------|---basebackup---|--------|------
C test00-----I test00-----C test01-----I test01-----C test02-----I test02--

result:

The recovery time is set before the basic backup, so here will be restored to the earliest time point: the basic backup point.

LOG: recovery stopping before commit of transaction 1175, time 2017-04-01 10:15:59.597495+08

Note: It is impossible to restore to the point before the basic backup, so when doing basic backup, please ensure data consistency.

(2) recovery_target_time = ‘2017-4-1 10:10:00' (before the basic backup time)

---------here---------------------------------------------------------------
-10:09:55----10:10:10-----10:10:48-----10:10:53------10:15:59----10:16:09--
----|-----------|------------|------------|---basebackup---|--------|------
C test00-----I test00-----C test01-----I test01-----C test02-----I test02--

result:

The recovery time is set before the basic backup, so here will be restored to the earliest time point: the basic backup point.

LOG: recovery stopping before commit of transaction 1175, time 2017-04-01 10:15:59.597495+08

Note: It is impossible to restore to the point before the basic backup, so when doing basic backup, please ensure data consistency.

(3) recovery_target_time = ‘2017-4-1 10:16:00'

-------------------------------------------------------------here-----------
-10:09:55----10:10:10-----10:10:48-----10:10:53------10:15:59----10:16:09--
----|-----------|------------|------------|---basebackup---|--------|------
C test00-----I test00-----C test01-----I test01-----C test02-----I test02--

result:

Table test02 exists, but there is no data. It means that if the time is after the basic backup point, it can be restored to any time point. A new timeline is created after recovery.

LOG: last completed transaction was at log time 2017-04-01 10:15:59.597495+08

(3.1) Continue to restore on the basis of (3): recovery_target_time = ‘2017-4-1 10:17:00'

------------------------------------------------------------------------here
-10:09:55----10:10:10-----10:10:48-----10:10:53------10:15:59----10:16:09--
----|-----------|------------|------------|---basebackup---|--------|------
C test00-----I test00-----C test01-----I test01-----C test02-----I test02--

result:

The same 3, this time the recovery creates a new timeline 3. If the recovery is performed on this timeline, the database will go to archive to find the xlog of timeline 2, but the logs in the archive directory should be of timeline 1, so there will be an error that the xlog cannot be found.

cp: cannot stat pathto/archive/00000002000000000000000A': No such file or directory

Note: According to the above conclusion, please use the original archive file every time you restore, that is, if you try to restore again, please use the basic backup to restore again. Do not continue on the basis of the previous recovery, otherwise the archive file will not be found due to timeline switching.

other

Compressed archive logs

sed -ir "s/#*archive_command.*/archive_command = 'gzip -c %p > \${PGHOME}\/archive\/%'/" $PGDATA/
sed -ir "s/#*restore_command.*/restore_command = 'gzip -d -c \${PGHOME}\/archive\/% > %p'/" $PGDATA/

1. (Several important parameters)

Postgresql9.6 manual (translated by Peng Yuwei)

restore_command (string)

A local shell command to obtain an archived segment of the WAL file family. This parameter is required for archive recovery, but is optional for stream replication. Any %f in this string will be replaced by the name of the file obtained from the archive, and any %p will be replaced by the copy target pathname on the server (this pathname is relative to the current working directory, i.e. the data directory of the cluster). Any %r will be replaced by the name of the file containing the previous available restart point. Among those files that must be retained for making a recovery rebootable, this file is the earliest, so this information can be used to truncate the archive to the minimum required to support restarting from the current recovery. %r is usually only used in a warm configuration (see Section 26.2). To embed a true % character, you need to write it as %%. It is important that the command returns a zero exit status only if it succeeds. The command will be asked for the file name that does not exist in the archive, and when asked in this way it must return non-zero.

recovery_target_time (timestamp)

This parameter specifies the timestamp to which recovery will enter.

recovery_target_xid (string)

This parameter specifies the transaction ID to be entered by the recovery. Remember that although transaction IDs are assigned sequentially at the beginning of a transaction, transactions may be completed in different numerical orders. Those transactions submitted before (and may also include that transaction) will be resumed. The exact stop point is also affected by recovery_target_inclusive.

recovery_target_timeline (string)

Specifies to restore to a specific timeline. The default value is to restore along the current timeline when the base backup is established. Setting this parameter to latest restores to the latest timeline found in the archive, which is useful in a fallback server. In addition, you only need to set this parameter in a complex re-recovery situation, in which case you need to return to a state that itself arrives after a point in time recovery. See Section 25.3.5 for related discussions

2. About the timeline

Postgresql9.6 manual (translated by Peng Yuwei)

The ability to restore a database to a previous point in time brings some complexity, which is somewhat similar to science fiction about time travel and parallel universes. For example, in the initial history of the database, let's say you dropped a critical table at 5:15 pm on Tuesday, but didn't realize your mistake until noon on Wednesday. Don't worry, you take out your backup, restore it to the time point of Tuesday at 5:14 pm, and go online. In this history of the database universe, you never discarded the table. But suppose you realize later that this is not a good idea and want to go back to some time Wednesday morning in the original history. You can't do that, while your database is running online, it rewrites certain WAL segment files that could have led you to the time you wish to go back. Therefore, to avoid this, you need to distinguish the WAL record sequence generated after completing point-in-time recovery from the WAL record sequence generated in the initial database history.

To solve this problem, PostgreSQL has a timeline concept. Whenever an archive recovery is completed, a new timeline is created to identify the WAL record sequence generated after the recovery. The timeline ID number is part of the WAL segment file name, so a new timeline does not rewrite the WAL data generated by the previous timeline. There are actually many different timelines that can be archived. While this may seem like a useless feature, it often plays a life-saving role. Given that you are not quite sure which point in time you need to recover to, you may have to do multiple point-in-time recovery attempts and errors until you finally find the best place to branch out of the old history. Without a timeline, the processing will quickly generate a bunch of unmanageable chaos. And with the timeline, you can return to any previous state, including the state in the timeline branch you gave up earlier.

Each time a new timeline is created, PostgreSQL creates a "Timeline History" file that shows when and which timeline is branched out from. These historical files are necessary to allow the system to select the correct WAL segment files when the system recovers from an archive containing multiple timelines. Therefore, similar to WAL segment files, they are also archived to the WAL archive area. Historical files are very small text files, so it is very costly to keep them indefinitely and are also very suitable (and segment files are very large). If you like, you can add comments to a history file to record how and why you want to create that timeline. This annotation will be particularly valuable when you have a lot of intricate and different timelines due to the results of the experiment.

The default behavior of recovery is to recover along the same timeline, which is the current timeline at the time of the basic backup creation. If you want to restore to a child timeline (i.e., you want to return to a state that was generated after a recovery attempt), you need to specify the target timeline ID in it. You cannot restore to a timeline that was branched out earlier than the base backup.

The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.