A detailed explanation of deserialization vulnerability in python pickle

1. Introduction

Hello everyone, today we will talk about deserialization attacks in Python. Let’s first look at what is serialization and deserialization. Simply put,SerializationIt is to convert the data structure into a byte stream, so that we can save the data to a file or transmit it over the network.DeserializationThis is to convert these byte streams back to the original data structure.

In Python, one of the commonly used modules is Pickle. It can help us easily perform serialization and deserialization operations. For example, you can serialize and save a complex Python object and deserialize it back when needed.

Overview of deserialization attacks

There are vulnerabilities in the deserialization process: If we deserialize an untrusted data source, it may trigger a deserialization attack. An attacker can embed malicious code into the serialized data. When you deserialize this data, these malicious codes will be executed, which may lead to data breaches, system crashes, and even allow the attacker to remotely control your system.

2. Python Pickle module overview

Basic features of Pickle

The Pickle module is built into Python, and it is mainly used to serialize and deserialize Python objects. You can use Pickle to save any Python object (including complex data structures) into a byte stream and then load it back when needed.

How Pickle works

Pickle works very simply. When serializing, it will convert the Python object into a byte stream, and when deserializing, it will restore the byte stream into a Python object. Let’s take a look at some specific examples below.

Serialization and deserialization of Pickle

Serialization

Serialization is to convert Python objects into byte streams. We can useandCome and do this.Serialize the object and write it to the file.Returns a byte stream.

import pickle

# Create an objectdata = {'name': 'Alice', 'age': 25, 'city': 'New York'}

# Serialize the object and write to the filewith open('', 'wb') as file:
    (data, file)

# Or return a byte streamdata_bytes = (data)

Deserialization

Deserialization is to restore the byte stream to a Python object. We can useandCome and do this.Read the byte stream from the file and deserialize it,Then directly deserialize a byte stream.

import pickle

# Deserialize objects from a filewith open('', 'rb') as file:
    data = (file)

# Or directly deserialize a byte streamdata = (data_bytes)

3. Principles of deserialization attack

Attack mechanism

Now let's take a look at what the deserialization attack is. Attackers can embed malicious code into the serialized data. When you deserialize this data, these malicious codes will be executed. In other words, if youUntrusted data sourcesDeserializing data is equivalent to giving the attacker the opportunity to execute code in your system.

What can an attacker do

An attacker can use deserialization vulnerabilities to execute arbitrary commands, modify or steal data.

Sample code

To illustrate the problem more clearly, let's look at a simple example of a deserialization attack.

import pickle
import os

# Construct malicious codeclass Malicious:
    def __reduce__(self):
        return (, ('echo Hacked!',))

# Serialize malicious objectsmalicious_data = (Malicious())

# Execute malicious code during deserialization(malicious_data)

In this example, we create a name calledMaliciousclass. This type of__reduce__The method returns a tuple, the first element is, the second element is the command to be executed. When we deserialize this object,('echo Hacked!')Will be executed, and the output is "Hacked!"

Detailed explanation

Construct malicious code: We defined aMaliciousclass, and in__reduce__Specifies the command to execute in the method.
Serialize malicious objects: We useSerialize this malicious object.
Deserialize malicious objects: When we useWhen deserializing this object,__reduce__The method will be called and the specified command will be executed.

4. How to prevent Pickle deserialization attacks

The principle of safe deserialization

The first principle to prevent deserialization attacks is:Avoid deserialization from untrusted sources. Deserialization can only be used if you fully trust the source of the data.

Actual defense method

Let's look at some specific defense methods and code examples.

Safe deserialization code examples

If you have to deserialize with Pickle, you can consider overloadingfind_classTo qualify the scope restriction deserialized object types:

import pickle
import types

# Custom Unpickler, limit deserialization typesclass RestrictedUnpickler():
    def find_class(self, module, name):
        if module == "builtins" and name in {"str", "list", "dict", "set", "int", "float", "bool"}:
            return getattr(__import__(module), name)
        raise (f"global '{module}.{name}' is forbidden")

def restricted_loads(s):
    return RestrictedUnpickler((s)).load()

In this example, we have customized aRestrictedUnpicklerClass, only allows deserialization of certain safe built-in types.

Use other secure serialization modules (such as JSON)

A safer approach is to use JSON instead of Pickle for serialization and deserialization. JSON only supports basic data types and does not execute arbitrary code, so it is safer.

import json

# Serialize objectsdata = {'name': 'Alice', 'age': 25, 'city': 'New York'}
data_json = (data)

# Deserialize objectsdata = (data_json)

Summarize

This article explains what is serialization and deserialization, as well as the Pickle module in Python. The article also explains the principle of deserialization attack in detail and gives an attack code example. Finally, we discuss how to prevent Pickle deserialization attacks and provide some specific defense methods.

This is the end of this article about a detailed explanation of the deserialization vulnerability in python pickle. For more related contents of deserialization vulnerability in python pickle, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!