Four ways to generate MySQL primary keys and their comparisons

Preface

In database design, the choice of primary key is crucial. It is not only a unique identifier for data rows, but also directly affects query efficiency, data storage and even the scalability of system architecture. Common primary key generation methods in MySQL include self-increasing ID, UUID, Snowflake algorithm, etc. Each method has its own unique applicable scenarios and advantages and disadvantages. Next, we will analyze the common four common methods of primary key ID generation.

1. Self-increase ID

What is self-increasing ID?

Self-increasing ID is the built-in primary key generation method in MySQL. passAUTO_INCREMENTKeyword, every time a new data is inserted, the primary key value will be automatically added to 1.

Case: Order Table

CREATE TABLE orders (
    id INT AUTO_INCREMENT PRIMARY KEY,
    user_id INT,
    amount DECIMAL(10,2)
);

When inserting data, no need to specifyidFields:

INSERT INTO orders (user_id, amount) VALUES (1001, 99.99);

advantage

Simple and efficient: Automatically generate databases, low development cost.
Good query performance: The primary key is incremented in order to reduce index fragmentation, and the B+ tree structure is more compact.
Small storage space: Usually usedINT(4 bytes) orBIGINT(8 bytes).

shortcoming

Difficulty in dividing databases and tables: The self-increasing ID may be repeated in distributed systems.
Expose business information: The continuously increasing ID may be inferred by traffic volume (such as order number).
Trouble with data migration: It is easy to conflict when merging tables from different databases.

2. UUID

What is UUID?

UUID is a 128-bit string (eg550e8400-e29b-41d4-a716-446655440000), theoretically unique in the world.

Case: User table

CREATE TABLE users (
    uuid CHAR(36) PRIMARY KEY,
    name VARCHAR(50)
);

Generate UUID when inserting data:

INSERT INTO users (uuid, name) VALUES (UUID(), 'Zhang San');

advantage

Globally unique: There is no need to worry about ID conflicts in distributed systems.
High safety: Unordered IDs avoid exposing traffic.

shortcoming

Large storage space: A 36-character string takes up more space (if it is optimized to binary, it still requires 16 bytes).
Poor query performance: Unordered IDs cause frequent splitting of indexes and slowing down insertion.
Poor readability: Long strings are difficult to remember and are not intuitive enough when debugging.

3. Snowflake algorithm (Snowflake)

What is the snowflake algorithm?

The Snowflake Algorithm is a distributed ID generation algorithm that opens the source of Twitter, generating a 64-bit long integer number with the following structure:
Symbol bit (1 bit) + timestamp (41 bit) + machine ID (10 bit) + serial number (12 bit)。

Case: Distributed log table

// Java code example (using Hutool tool library)Snowflake snowflake = (1, 1);
long id = (); // Generate an ID similar to 6726434627886811136

Used when creating tablesBIGINTtype:

CREATE TABLE logs (
    id BIGINT PRIMARY KEY,
    content TEXT
);

advantage

Distributed friendly: IDs generated by different machines will not be repeated.
Performance and storage balance: The BIGINT type only needs 8 bytes and maintains an incremental trend.
Time orderly: The creation time can be directly calculated based on the ID.

shortcoming

Relying on server clock: Clock callback may cause ID duplication (special processing is required).
The machine ID needs to be allocated: The machine ID needs to be planned in advance to avoid more than 1024 instances.

4. Business field combination

What is a business field combination?

Directly use business-related fields as primary keys (such as "User ID + Product ID").

Case: Student course selection

CREATE TABLE course_selection (
    student_id INT,
    course_id INT,
    PRIMARY KEY (student_id, course_id)
);

advantage

Intuitive and clear: The primary key directly reflects the business meaning.
Avoid redundant fields: No additional primary key columns are required.

shortcoming

High maintenance cost: The primary key may need to be modified when business rules change.
Large storage space: The joint primary key may take up more space.
High query complexity: Multi-condition query may affect performance.

5. How to choose the primary key generation method?

Scene	Recommended method	reason
Standalone MySQL	Self-increasing ID	Simple and efficient, suitable for small systems
Distributed Systems	Snowflake Algorithm	Balanced performance and global uniqueness, suitable for library and table division
High security requirements	UUID	Avoid ID speculation, suitable for anonymous data
Strengthen business connections	Business field combination	If the order number contains date, user ID and other business information

Conclusion

When selecting the primary key generation method, comprehensive consideration must be taken into accountSystem Size、Extensibility、performanceandBusiness Requirements。

Self-increasing ID is suitable for traditional stand-alone applications, but it is difficult to deal with distributed scenarios.
UUID solves the global uniqueness problem, but sacrifices performance and storage.
The snowflake algorithm performs well in distributed systems, but it needs to solve the clock callback problem.
Business field combinations are simple and effective in specific scenarios, but have less flexibility.

In actual development, there are many ways to combine. For example, in e-commerce systems, the order ID uses the snowflake algorithm to ensure distributed uniqueness, while the order list uses self-increasing ID to improve insertion efficiency.Only by choosing flexibly according to business characteristics can you find the optimal solution。

The above are the four methods of MySQL primary key generation and the detailed explanation. For more information about MySQL primary key generation, please pay attention to my other related articles!