Practical environment
protoc-25.
Download address:
/protocolbuffers/protobuf/releases
/protocolbuffers/protobuf/releases/download/v25.4/protoc-25.
protobuf 5.27.2
pip install protobuf==5.27.2
Python 3.9.13
Problem domain
The example that this article will use is a very simple "address book" application that can read and write people's contact information from files. Everyone in the address book has a name, an ID, an email address and a contact number.
How to serialize and retrieve such structured data? There are several ways to solve this problem:
Use Python pickle. This is the default method because it is built into the language, but it doesn't handle schema evolution well, and it doesn't work well if you need to share data with applications written in C++ or Java.
You can invent a special way to encode data items into a single string, for example encode 4 integers as "12:3:-23:67". This is a simple and flexible approach, although it does require writing one-time encoding and parsing code, and the runtime cost of parsing is small. This is best for encoding very simple data.
Serialize data to XML. This approach is very attractive because XML is (to some extent) human readable and has binding libraries for many languages. This may be a good choice if you want to share data with other applications/projects. However, XML is well known to be space-intensive and encoding/decoding it can cause huge performance losses to the application. In addition, accessing simple fields in XML DOM tree access is much more complex.
Protocol buffers can be used instead of these options. Protocol buffers are a flexible, efficient and automated solution to this problem. Using protocol buffers, you can write the data structure you want to store..proto
describe. The protocol buffer compiler will create a class from the file that implements automatic encoding and parsing of protocol buffer data in a valid binary format. The generated class is provided for the fields that make up the protocol buffergetters
andsetters
Method and handle the details of reading and writing protocol buffers as a unit. Importantly, the protocol buffer format supports the idea of extending the format over time so that the code can still read data encoded in the old format.
Define the protocol format (write a proto file)
To create an address book application, you need to.proto
The file begins..proto
The definition in the file is simple: add a message to each data structure to be serialized (message
), and then specify a name and type for each field in the message.
Example:
syntax = "proto2"; // proto2 specifies the version of proto bufferpackage tutorial; message Person { optional string name = 1; optional int32 id = 2; optional string email = 3; enum PhoneType { PHONE_TYPE_UNSPECIFIED = 0; PHONE_TYPE_MOBILE = 1; PHONE_TYPE_HOME = 2; PHONE_TYPE_WORK = 3; } message PhoneNumber { optional string number = 1; optional PhoneType type = 2 [default = PHONE_TYPE_HOME]; } repeated PhoneNumber phones = 4; // The phones field is a duplicate field that can contain multiple phone numbers.} message AddressBook { repeated Person people = 1; }
illustrate:
The above.proto
Filespackage
The declaration begins, which helps prevent naming conflicts between different projects. In Python, packages are usually determined by directory structure, so.proto
File definedpackage
No impact on the generated code. However, one should still be declaredpackage
to avoid name conflicts in protocol buffer namespaces as well as in non-Python languages.
Next, the message definition. A message is just a collection of fields of type. Many standard simple data types can be used as field types, includingbool
、int32
、float
、double
andstring
. You can also add more structure to messages by using other message types as field types - in the example above,Person
Message containsPhoneNumber
News, andAddressBook
Message containsPerson
information. You can even define the message types nested in other messages - as above,PhoneNumber
The type is defined inPerson
middle. If you want one of the fields to have one of the predefined list of values, you can also define an enum type - here you want to specify a phone number that can be one of the following phone types:
PHONE_TYPE_MOBILE
PHONE_TYPE_HOME
PHONE_TYPE_WORK
The "=1" and "=2" tags on each element identify the unique "tags" used by this field in binary encoding, which ensures that during the serialization and deserialization process, each field can be correctly identified and processed. These numeric tags are converted to namespace and type signatures at compile time, thus ensuring the uniqueness of the fields. Using tag numbers from 1-15 is one byte less encoding than using higher numbers, so as an optimization, it is possible to decide to use these tags for common or repeated elements, and using tag numbers from 16 and higher for less commonly used optional elements. Each element in the repeating field requires a re-encoded mark, so repeating fields are particularly suitable for this optimization.
Each field must be annotated with one of the following modifiers:
-
optional
: This field can be set or not. If the optional field value is not set, the default value is used. For simple types, you can specify your own default value, just like in the example phone numbertype
What is done. Otherwise, the system default value will be used: the default value of the numeric type is zero, the default value of the string type is empty, and the default value of the Boolean type isfalse
. For embedded messages, the default value is always the "default instance" or "prototype" of the message, which does not have any fields set. When the accessor is called to get the value of an optional (or required) field that has not been explicitly set, the default value of that field is always returned. -
repeated
: This field can be repeated as many times (including zero times), indicating that the field can contain multiple values. Treat duplicate fields as dynamically sized arrays, and the order of duplicate values will be preserved in the protocol buffer. -
required
: The value of this field must be provided, otherwise the message will be considered "uninitialized". Serializing an uninitialized message will throw an exception. Parsing uninitialized messages will fail. Other than that, the required fields behave exactly the same as the optional fields.
important
required
is permanent, marking the field asrequired
Be very careful when you are. If you want to stop writing or sending required fields at some point, changing that field to optional fields will be a problem - old readers will think that messages without this field are incomplete and may accidentally reject or delete them. You should consider writing application-specific custom validation routines for protocol buffers. Use strongly disapprove on Googlerequired
Field; most messages defined in proto2 syntax use onlyoptional
andrepeated
. (Proto3 does not support it at allrequired
Field. )
Compile protocol buffer
Now there is.proto
, the next thing you need to do is generate read and writeAddressBook
(as well asPerson
andPhoneNumber
) the required class for message. To do this, it is necessary to.proto
Run the protocol buffer compiler onprotoc
:
1. Downloadprotoc
After decompression,protoc
Wherebin
Add directory path to system environment variables
>protoc --version libprotoc 25.4
2. Now run the compiler, specifying the source directory (where the application source code is located - if no value is provided, use the current directory), the target directory (the storage directory of the code you want to generate; usually with$SRC_DIR
Same) and.proto
path. as follows:
protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/
Because I want a Python class, I use--python_out
Options - Similar options are provided for other supported languages.
protoc
Can also be used--pyi_out
Generate python stubs (.pyi).
This will generate the corresponding target directory you specifiedxxxx_pb2.py
Practice: cmd opens the console and entersaddressbook.proto3
The directory you are in, and then execute the following command
protoc --python_out=. addressbook.proto2
After the command is executed successfully, it will be generated in the current directory and.proto2
The directory of the file with the same name (in the exampleaddressbook
), the corresponding py file is automatically generated in the directory (in the exampleproto2_pb2.py
, copy it toaddressbook.proto2
The directory is located and named asaddressbook_pb2.py
)
Protocol Buffer API
Unlike generating Java and C++ protocol buffer code, the Python protocol buffer compiler will not directly generate data access code for you. On the contrary (if you checkaddressbook_pb2.py
, you'll see), it generates special descriptors for all your messages, enums, and fields, as well as some mysterious empty classes, one class for each message type.
# -*- coding: utf-8 -*- # Generated by the protocol buffer compiler. DO NOT EDIT! # source: addressbook.proto2 # Protobuf Python Version: 4.25.4 """Generated protocol buffer code.""" from import descriptor as _descriptor from import descriptor_pool as _descriptor_pool from import symbol_database as _symbol_database from import builder as _builder # @@protoc_insertion_point(imports) _sym_db = _symbol_database.Default() DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(b'\n\x12\x61\x64\x64ressbook.proto2\x12\x08tutorial\"\xa3\x02\n\x06Person\x12\x0c\n\x04name\x18\x01 \x01(\t\x12\n\n\x02id\x18\x02 \x01(\x05\x12\r\n\x05\x65mail\x18\x03 \x01(\t\x12,\n\x06phones\x18\x04 \x03(\x0b\x32\\x1aX\n\x0bPhoneNumber\x12\x0e\n\x06number\x18\x01 \x01(\t\x12\x39\n\x04type\x18\x02 \x01(\x0e\x32\:\x0fPHONE_TYPE_HOME\"h\n\tPhoneType\x12\x1a\n\x16PHONE_TYPE_UNSPECIFIED\x10\x00\x12\x15\n\x11PHONE_TYPE_MOBILE\x10\x01\x12\x13\n\x0fPHONE_TYPE_HOME\x10\x02\x12\x13\n\x0fPHONE_TYPE_WORK\x10\x03\"/\n\x0b\x41\x64\x64ressBook\x12 \n\x06people\x18\x01 \x03(\x0b\x32\') _globals = globals() _builder.BuildMessageAndEnumDescriptors(DESCRIPTOR, _globals) _builder.BuildTopDescriptorsAndMessages(DESCRIPTOR, 'addressbook.proto2_pb2', _globals) if _descriptor._USE_C_DESCRIPTORS == False: DESCRIPTOR._options = None _globals['_PERSON']._serialized_start=33 _globals['_PERSON']._serialized_end=324 _globals['_PERSON_PHONENUMBER']._serialized_start=130 _globals['_PERSON_PHONENUMBER']._serialized_end=218 _globals['_PERSON_PHONETYPE']._serialized_start=220 _globals['_PERSON_PHONETYPE']._serialized_end=324 _globals['_ADDRESSBOOK']._serialized_start=326 _globals['_ADDRESSBOOK']._serialized_end=373 # @@protoc_insertion_point(module_scope)
The important line in each class is__metaclass__ =
. They can be considered as templates for creating classes. When loading,GeneratedProtocolMessageType
The metaclass uses the specified descriptor to create all Python methods required to use each message type and add them to the relevant class. Then you can use fully populated classes in your code.
The ultimate effect of all this is that you can usePerson
class, as if it defines each field of the Message base class as a regular field. For example:
import addressbook_pb2 person = addressbook_pb2.Person() = 1234 = "John Doe" = "jdoe@" phone = () = "555-4321" = addressbook_pb2.Person.PHONE_TYPE_HOME
Note that these assignments are not just about adding arbitrary new fields to a general Python object. If you try to assign undefined fields in the .proto file, it will raiseAttributeError
. If you assign a field to a value of the wrong type, it will raiseTypeError
. Additionally, reading the value of the field before setting the field returns the default value.
enumerate
The metaclass extends an enum into a set of symbolic constants with integer values. Therefore, for example, constantaddressbook_pb2..PHONE_TYPE_WORK
The value of is 2.
Standard message method
Each message class also contains many other methods that allow you to check or manipulate the entire message, including:
-
IsInitialized()
: Check whether all required fields have been set. -
__str__()
: Returns the readable representation of the message, especially suitable for debugging. (Usually called like thisstr(message)
orprint(message)
) -
CopyFrom(other_msg)
: Overwrite the message with the value of the given message. -
Clear()
: Clear all elements to return to empty state.
These methods implement the Message interface. For more information, see Message'sComplete API documentation。
Parsing and serialization
Each protocol buffer class has a method to write and read messages of the selected type using the protocol buffer binary format. These methods include:
-
SerializeToString()
: Serialize the message and return it as a string. Note that bytes is binary, not text; onlystr
Types are used as convenient containers. -
ParseFromString(data)
: parses the message from the given string.
These are just some of the options used for parsing and serialization. Again, see the Message API reference for the complete list.
important
Protocol Buffers and Object-Oriented Design Protocol Buffers classes are basically data holders (such as structures in C) and do not provide other functions; they are not good primary citizens in the object model. If you want to add richer behavior to the generated class, the best way is to wrap the generated protocol buffer class in an application-specific class. If you can't control it
.proto
Packaging protocol buffers is also a good idea to design files (for example, if files from another project are being reused). In this case, you can use wrapper classes to build interfaces that are more suitable for your application's unique environment: hide some data and methods, expose convenient functions, etc. They should never be added to the class inheritance generated by inheritance. This breaks the internal mechanism and is not a good object-oriented practice anyway.
Write a message
Suppose the first thing you want a address book application to do is write personal details into the address book file. To do this, instances of protocol buffer classes need to be created and populated, and then they are written to the output stream.
This sample code is read from the fileAddressBook
, add a new one to it according to user inputPerson
, and then put the new oneAddressBook
Write back to the file again. The part of the code generated by the direct call or reference to the protocol compiler has been highlighted.
#!/usr/bin/env python3 # -*- coding:utf-8 -*- import addressbook_pb2 import os def PromptForAddress(person): '''Fill Person message based on user input''' = int(input('Enter person ID number: ')) = input('Enter name: ') email = input('Enter email address (blank for none): ') if email != '': = email while True: number = input('Enter a phone number (or leave blank to finish): ') if number == '': break phone_number = () phone_number.number = number phone_type = input('Is this a mobile, home, or work phone? ') if phone_type == 'mobile': phone_number.type = addressbook_pb2..PHONE_TYPE_MOBILE elif phone_type == 'home': phone_number.type = addressbook_pb2..PHONE_TYPE_HOME elif phone_type == 'work': phone_number.type = addressbook_pb2..PHONE_TYPE_WORK else: print('Unknown phone type; leaving as default value.') address_book = addressbook_pb2.AddressBook() # Read the existing address bookif ('my_addressbook.db'): with open('my_addressbook.db', 'rb') as f: address_book.ParseFromString(()) # Add a mailing addressPromptForAddress(address_book.()) # Write the mailing address to diskwith open('my_addressbook.db', 'wb') as f: (address_book.SerializeToString())
After running the program, enter the content according to the prompts, as shown below
Enter person ID number: 1 Enter name: shouke Enter email address (blank for none): shouke@ Enter a phone number (or leave blank to finish): 15813735565 Is this a mobile, home, or work phone? mobile Enter a phone number (or leave blank to finish):
Read the message
This example reads the file created by the above example and prints all the information in it
# -*- coding:utf-8 -*- import addressbook_pb2 def ListPeople(address_book): '''Travel through all people in the address book and print related information''' for person in address_book.people: print('Person ID: ', ) print('Name: ', ) if ('email'): print('E-mail address: ', ) for phone_number in : if phone_number.type == addressbook_pb2..PHONE_TYPE_MOBILE: print('Mobile phone #: ', end='') elif phone_number.type == addressbook_pb2..PHONE_TYPE_HOME: print('Home phone #: ', end='') elif phone_number.type == addressbook_pb2..PHONE_TYPE_WORK: print('Work phone #: ', end='') print(phone_number.number) address_book = addressbook_pb2.AddressBook() # Read the existing address bookwith open('my_addressbook.db', 'rb') as f: address_book.ParseFromString(()) ListPeople(address_book)
Run output:
Person ID: 1 Name: shouke E-mail address: shouke@ Mobile phone #: 15813735565
Another example
In the example, a name is definedDevice
The message, it has 4 fields:name
、price
,type
andlabels
。
syntax = "proto3"; message Device { string name = 1; int32 price = 2; string type = 3; map<string, string> labels = 15; }
according toFile generation python file
protoc --python_out=.
Automatically generate in the current directorydevice
Contents anddevice/proto3_pb2.py
document
Use the generated py file (copy the above py file and rename it todevice_pb2.py
, and store the following files in the same directory as the same level)
my_test.py
# -*- coding:utf-8 -*- import device_pb2 # Create a Person object and set the field valuedevice = device_pb2.Device() = 'Lenovo Xiaoxing' = 3999 = 'Notebook' ['color'] = 'red' ['outlook'] = 'fashionable' # Serialize Person object to binary stringserialized_device = () print(f"Serialized data:{serialized_device}") # Deserialize binary strings to a new Person objectnew_device = device_pb2.Device() new_device.ParseFromString(serialized_device) # Output the field value of the new Device objectprint(type(new_device.labels)) # <class 'google._upb._message.ScalarMapContainer'> for label, value in new_device.(): print(label, value) # The output content is like: color redprint(new_device.labels) # {'color': 'red', 'outlook': 'fashionable'} print(f'反Serialized data:Device name={new_device.name}, price={new_device.price}, type={new_device.type}, Label={new_device.labels}') # Output:反Serialized data:Device name=Lenovo Little Star, price=3999, type=Notebook, Label={'color': 'red', 'outlook': 'fashionable'}
Reference link
/getting-started/pythontutorial/
/getting-started/pythontutorial/
/programming-guides/proto3/
This is the introduction to this article about the basic introduction to using Protocol Buffers in Python. For more related content on using Protocol Buffers in Python, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!