1. Introduction
Hello everyone, today we will talk about deserialization attacks in Python. Let’s first look at what is serialization and deserialization. Simply put,SerializationIt is to convert the data structure into a byte stream, so that we can save the data to a file or transmit it over the network.DeserializationThis is to convert these byte streams back to the original data structure.
In Python, one of the commonly used modules is Pickle. It can help us easily perform serialization and deserialization operations. For example, you can serialize and save a complex Python object and deserialize it back when needed.
Overview of deserialization attacks
There are vulnerabilities in the deserialization process: If we deserialize an untrusted data source, it may trigger a deserialization attack. An attacker can embed malicious code into the serialized data. When you deserialize this data, these malicious codes will be executed, which may lead to data breaches, system crashes, and even allow the attacker to remotely control your system.
2. Python Pickle module overview
Basic features of Pickle
The Pickle module is built into Python, and it is mainly used to serialize and deserialize Python objects. You can use Pickle to save any Python object (including complex data structures) into a byte stream and then load it back when needed.
How Pickle works
Pickle works very simply. When serializing, it will convert the Python object into a byte stream, and when deserializing, it will restore the byte stream into a Python object. Let’s take a look at some specific examples below.
Serialization and deserialization of Pickle
Serialization
Serialization is to convert Python objects into byte streams. We can useand
Come and do this.
Serialize the object and write it to the file.
Returns a byte stream.
import pickle # Create an objectdata = {'name': 'Alice', 'age': 25, 'city': 'New York'} # Serialize the object and write to the filewith open('', 'wb') as file: (data, file) # Or return a byte streamdata_bytes = (data)
Deserialization
Deserialization is to restore the byte stream to a Python object. We can useand
Come and do this.
Read the byte stream from the file and deserialize it,
Then directly deserialize a byte stream.
import pickle # Deserialize objects from a filewith open('', 'rb') as file: data = (file) # Or directly deserialize a byte streamdata = (data_bytes)
3. Principles of deserialization attack
Attack mechanism
Now let's take a look at what the deserialization attack is. Attackers can embed malicious code into the serialized data. When you deserialize this data, these malicious codes will be executed. In other words, if youUntrusted data sourcesDeserializing data is equivalent to giving the attacker the opportunity to execute code in your system.
What can an attacker do
An attacker can use deserialization vulnerabilities to execute arbitrary commands, modify or steal data.
Sample code
To illustrate the problem more clearly, let's look at a simple example of a deserialization attack.
import pickle import os # Construct malicious codeclass Malicious: def __reduce__(self): return (, ('echo Hacked!',)) # Serialize malicious objectsmalicious_data = (Malicious()) # Execute malicious code during deserialization(malicious_data)
In this example, we create a name calledMalicious
class. This type of__reduce__
The method returns a tuple, the first element is, the second element is the command to be executed. When we deserialize this object,
('echo Hacked!')
Will be executed, and the output is "Hacked!"
Detailed explanation
-
Construct malicious code: We defined a
Malicious
class, and in__reduce__
Specifies the command to execute in the method. -
Serialize malicious objects: We use
Serialize this malicious object.
-
Deserialize malicious objects: When we use
When deserializing this object,
__reduce__
The method will be called and the specified command will be executed.
4. How to prevent Pickle deserialization attacks
The principle of safe deserialization
The first principle to prevent deserialization attacks is:Avoid deserialization from untrusted sources. Deserialization can only be used if you fully trust the source of the data.
Actual defense method
Let's look at some specific defense methods and code examples.
Safe deserialization code examples
If you have to deserialize with Pickle, you can consider overloadingfind_class
To qualify the scope restriction deserialized object types:
import pickle import types # Custom Unpickler, limit deserialization typesclass RestrictedUnpickler(): def find_class(self, module, name): if module == "builtins" and name in {"str", "list", "dict", "set", "int", "float", "bool"}: return getattr(__import__(module), name) raise (f"global '{module}.{name}' is forbidden") def restricted_loads(s): return RestrictedUnpickler((s)).load()
In this example, we have customized aRestrictedUnpickler
Class, only allows deserialization of certain safe built-in types.
Use other secure serialization modules (such as JSON)
A safer approach is to use JSON instead of Pickle for serialization and deserialization. JSON only supports basic data types and does not execute arbitrary code, so it is safer.
import json # Serialize objectsdata = {'name': 'Alice', 'age': 25, 'city': 'New York'} data_json = (data) # Deserialize objectsdata = (data_json)
Summarize
This article explains what is serialization and deserialization, as well as the Pickle module in Python. The article also explains the principle of deserialization attack in detail and gives an attack code example. Finally, we discuss how to prevent Pickle deserialization attacks and provide some specific defense methods.
This is the end of this article about a detailed explanation of the deserialization vulnerability in python pickle. For more related contents of deserialization vulnerability in python pickle, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!