SoFunction
Updated on 2025-04-13

Simple data backup using Python

Data backup principle

Data backup, i.e. copying and storing data, refers to copying data from one location to another to prevent the loss or corruption of the original data. Data backup usually includes the following core parts:

  • Select Data: Determine the data to be backed up.
  • Select storage media: Select the media used to store backup data, such as hard disk, cloud storage, etc.
  • Perform a backup: Copy the data to the storage medium.
  • Verify backup: Ensure the integrity and recoverability of the backup data.
  • Periodic updates: Perform backups regularly to keep data up to date.

Select data

Selecting the data you need to back up is the first step in data backup. This usually includes important files, databases, configuration files, etc.

Select a storage medium

Choosing the media to store backup data is the key to data backup. Common storage media include:

  • External hard drive: Easy to use, suitable for small data backups.
  • Network Storage (NAS): Suitable for medium-sized data backup, providing a centralized storage solution.
  • Cloud Storage: Suitable for large data backups, providing high availability and scalability.

Perform a backup

Performing a backup is the process of copying data into a storage medium. In Python, you can use the shutil library to perform file backup.

import shutil
import os
def backup_files(source_folder, destination_folder):
    if not (destination_folder):
        (destination_folder)
    for root, dirs, files in (source_folder):
        for file in files:
            source_file = (root, file)
            destination_file = (destination_folder, file)
            shutil.copy2(source_file, destination_file)

Verify backup

Verifying backups is an important step in ensuring the integrity and recoverability of backup data. You can use the filecmp library to compare source files and backup files.

import filecmp
def verify_backup(source_folder, destination_folder):
    for root, dirs, files in (source_folder):
        for file in files:
            source_file = (root, file)
            destination_file = (destination_folder, file)
            if not (source_file, destination_file, shallow=False):
                print(f"Backup verification failed for file: {file}")
                return False
    print("Backup verification successful.")
    return True

Updated regularly

Regular updates to backup data are the key to keeping the data up to date. You can perform backup tasks regularly using the schedule library.

import schedule
import time
def schedule_backup(source_folder, destination_folder, interval=24):
    def backup_task():
        print("Starting backup...")
        backup_files(source_folder, destination_folder)
        verify_backup(source_folder, destination_folder)
    (interval).(backup_task)
    while True:
        schedule.run_pending()
        (1)

Complete data backup tool

Now we can combine the above parts to create a complete data backup tool.

import shutil
import os
import filecmp
import schedule
import time
def backup_files(source_folder, destination_folder):
    if not (destination_folder):
        (destination_folder)
    for root, dirs, files in (source_folder):
        for file in files:
            source_file = (root, file)
            destination_file = (destination_folder, file)
            shutil.copy2(source_file, destination_file)
def verify_backup(source_folder, destination_folder):
    for root, dirs, files in (source_folder):
        for file in files:
            source_file = (root, file)
            destination_file = (destination_folder, file)
            if not (source_file, destination_file, shallow=False):
                print(f"Backup verification failed for file: {file}")
                return False
    print("Backup verification successful.")
    return True
def schedule_backup(source_folder, destination_folder, interval=24):
    def backup_task():
        print("Starting backup...")
        backup_files(source_folder, destination_folder)
        verify_backup(source_folder, destination_folder)
    (interval).(backup_task)
    while True:
        schedule.run_pending()
        (1)
#User Examplesource_folder = "/path/to/source/folder"
destination_folder = "/path/to/destination/folder"
schedule_backup(source_folder, destination_folder, interval=24)

In the above code, we define a schedule_backup function that accepts source folder, destination folder, and backup interval as parameters. This function first performs a file backup, then verifies the integrity of the backup, and performs backup tasks regularly using the schedule library.

Advanced features

Compressed backup

In order to save storage space and improve backup efficiency, backup data is usually compressed. You can use the zipfile library to create compressed backup files.

import zipfile
def compress_backup(source_folder, destination_zip):
    with (destination_zip, 'w', zipfile.ZIP_DEFLATED) as zipf:
        for root, dirs, files in (source_folder):
            for file in files:
                ((root, file))
def backup_files_compressed(source_folder, destination_zip):
    compress_backup(source_folder, destination_zip)
    print(f"Backup completed and compressed to: {destination_zip}")
# Example of using compressed backupsdestination_zip = "/path/to/destination/"
backup_files_compressed(source_folder, destination_zip)

Off-site backup

To improve data security, off-site backup is a common practice. The backup data can be uploaded to a remote server using the paramiko library.

import paramiko
def remote_backup(source_zip, remote_host, remote_user, remote_password, remote_folder):
    ssh = ()
    ssh.set_missing_host_key_policy(())
    (remote_host, username=remote_user, password=remote_password)
    sftp = ssh.open_sftp()
    (source_zip, (remote_folder, (source_zip)))
    ()
    ()
# Example of using off-site backupremote_host = ""
remote_user = "username"
remote_password = "password"
remote_folder = "/path/to/remote/backup/folder"
remote_backup(destination_zip, remote_host, remote_user, remote_password, remote_folder)

Multi-platform support

In order for the data backup tool to run on multiple platforms, the characteristics and limitations of different platforms need to be considered. You can use the platform module to detect the current operating system and adjust the code as needed.

import platform
def get_platform():
    return ()
if get_platform() == "Windows":
    # Windows-specific codeelif get_platform() == "Darwin":
    # macOS specific codeelse:
    # Linux-specific code

Summarize

Data backup tools are an important part of protecting data security. By combining shutil, filecmp, schedule, zipfile, paramiko and other related libraries, we can create a powerful data backup tool. This article introduces in detail the principles, implementation methods and specific code examples of data backup, and hopes it will be helpful to you.

Remember that data backups may involve privacy and security issues. When using data backup tools, please make sure to comply with relevant laws and regulations and obtain the necessary permissions and consents.

This is the end of this article about using Python to implement simple data backup. For more related Python data backup content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!