SoFunction
Updated on 2025-04-07

MySQL library sub-table id primary key processing method

MySQL library sub-table id primary key processing

1 Problem Analysis

How to deal with the id primary key after the library is divided into different tables?

In fact, this is a problem you must face after dividing databases and tables, which is how to generate id? Because if each table is divided into multiple tables, each table is accumulated from 1, it will definitely be wrong, one is neededGlobally uniqueid to support it. So these are all issues that you must consider in your actual production environment.

2 Answers to interview questions

2.1 Database-based implementation solution

Database self-increase id

This means that every time you get an id in your system, you insert a data that has no business meaning into a table in a library, and then get an id that increases the self-increase of a database. After getting this id, write it to the corresponding database and sub-table.

The advantage of this solution is that it is convenient and simple, and anyone can use it;The disadvantage is single library generationIf the id is increased automatically, there will be bottlenecks; if you want to improve it, then open a special service. This service will get the current maximum id value every time, then increment several ids by itself, return a batch of ids at a time, and then modify the current maximum id value to a value after incrementing several ids;But it is based on a single database anyway.

Suitable scenarios: There are two reasons for you to divide the database and table, either the single database is too concurrency, or the single database data volume is too large; unless it is youConcurrency is not high, but the data volume is too largeYou can use this solution to create a library and table expansion, because the maximum concurrency per second may be at most a few hundred, so just use a separate library and table to generate a self-increment primary key.

2.2 UUID

The advantage is that local generation is not based on the database; the disadvantage is that UUID is too long and takes up a large space, and its performance as a primary key is too poor; more importantly, UUID is not orderly, which will lead to too many random write operations for the B+ tree index when writing (continuous IDs can produce partial sequential writing). In addition, since sequential append operations cannot be generated when writing, and insert operations are required, the entire B+ tree node will be read into memory. After inserting this record, the entire node will be written back to disk. This operation will significantly degrade when the record takes up a lot of space.

Suitable scenarios: If you want to randomly generate file names, numbers, etc., you can use UUID, but you cannot use UUID as the primary key.

().toString().replace(“-”, “”) -> sfsdf23423rr234sfdaf

2.3 Get the current time of the system

This is to get the current time, but the problem isWhen the concurrency is very high, for example, several thousand are sent in one second,There will be duplication, this is definitely not suitable. Basically, there is no need to consider it.

Suitable scenarios: Generally, if you use this solution, you will splice the current time with many other business fields. As an id, if you think it is acceptable in the business, then it is also OK. You can splice other business field values ​​with the current time to form a globally unique number.

2.4 snowflake algorithm

The snowflake algorithm is a distributed id generation algorithm that is open source on Twitter. It is implemented in Scala language. It uses a 64-bit long-type id. One bit is not used. 41 bits are used as the number of milliseconds, 10 bits are used as the working machine id, and 12 bits are used as the serial number.

  • 1 bit: No, why? Because if the first bit in the binary is 1, then it is all negative, but the ids we generate are all positive, so the first bit is uniformly 0.
  • 41 bit: represents a timestamp, and the unit is milliseconds. 41 bits can represent up to digits2^41 - 1, that is, it can be identified2^41 - 1The value of milliseconds is converted into adulthood, which means 69 years.
  • 10 bit: record the working machine id, which means that this service can be deployed on up to 2^10 machines, that is, 1024 machines. But 5 bits in 10 bits represent the computer room id, and 5 bits represent the machine id. It means the most representative2^5Each computer room can represent2^5One machine (32 machines).
  • 12 bit: This is used to record different ids generated in the same millisecond. The largest positive integer that 12 bit can represent is2^12 - 1 = 4096, that is, you can use the number represented by this 12-bit to distinguish itWithin the same millisecond4096 different ids.

0 | 0001100 10100010 10111110 10001001 01011100 00 | 10001 | 1 1001 | 0000 00000000

public class IdWorker {

    private long workerId;
    private long datacenterId;
    private long sequence;

    public IdWorker(long workerId, long datacenterId, long sequence) {
        // sanity check for workerId
        // I just checked it here, the requirement is that the computer room id and machine id you pass in cannot exceed 32, and cannot be less than 0        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(
                    ("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(
                    ("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        (
                "worker starting. timestamp left shift %d, datacenter id bits %d, worker id bits %d, sequence bits %d, workerid %d",
                timestampLeftShift, datacenterIdBits, workerIdBits, sequenceBits, workerId);

         = workerId;
         = datacenterId;
         = sequence;
    }

    private long twepoch = 1288834974657L;

    private long workerIdBits = 5L;
    private long datacenterIdBits = 5L;

    // This is a binary operation, which means that 5 bits can only have up to 31 numbers, which means that the machine id can only be within 32 at most.    private long maxWorkerId = -1L ^ (-1L << workerIdBits);

    // This means that 5 bits can only have up to 31 numbers, and the computer room id can only be within 32.    private long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
    private long sequenceBits = 12L;

    private long workerIdShift = sequenceBits;
    private long datacenterIdShift = sequenceBits + workerIdBits;
    private long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
    private long sequenceMask = -1L ^ (-1L << sequenceBits);

    private long lastTimestamp = -1L;

    public long getWorkerId() {
        return workerId;
    }

    public long getDatacenterId() {
        return datacenterId;
    }

    public long getTimestamp() {
        return ();
    }

    public synchronized long nextId() {
        // This is to get the current timestamp, in milliseconds        long timestamp = timeGen();

        if (timestamp < lastTimestamp) {
            ("clock is moving backwards.  Rejecting requests until %d.", lastTimestamp);
            throw new RuntimeException((
                    "Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }

        if (lastTimestamp == timestamp) {
            // This means that there can only be up to 4096 numbers in a millisecond            // No matter how many times you pass in, this bit operation is always within the range of 4096, avoiding you passing a sequence that exceeds the range of 4096            sequence = (sequence + 1) & sequenceMask;
            if (sequence == 0) {
                timestamp = tilNextMillis(lastTimestamp);
            }
        } else {
            sequence = 0;
        }

        // Here we record the timestamp of the last generated id, in milliseconds        lastTimestamp = timestamp;

        // This is to move the timestamp left and put it on 41 bit;        // Move the computer room id left to 5 bits;        // Move the machine id left to 5 bits; place the serial number last 12 bits;        // Finally, it is spliced ​​into a 64-bit binary number, and converted into a decimal system to be a long type        return ((timestamp - twepoch) << timestampLeftShift) | (datacenterId << datacenterIdShift)
                | (workerId << workerIdShift) | sequence;
    }

    private long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }

    private long timeGen() {
        return ();
    }

    // ---------------test---------------    public static void main(String[] args) {
        IdWorker worker = new IdWorker(1, 1, 1);
        for (int i = 0; i < 30; i++) {
            (());
        }
    }

}

How to say it, it probably means that 41 bit is a timestamp of the current millisecond unit, that's it; then 5 bit is the id of the computer room you passed in (but the maximum can be within 32), and the other 5 bits are the id of the machine you passed in (but the maximum can be within 32). The remaining 12 bit serial number is that if the time you generated the id in the last time is still within one millisecond, the order will be accumulated for you, with a maximum of 4096 serial numbers.

So you use this tool class yourself, set up a service yourself, and then initialize this thing for each machine in each computer room. At the beginning, the serial number of the machine in this computer room was 0. Then every time you receive a request, saying that the machine in this computer room wants to generate an id, you will find the corresponding Worker generation.

Using this snowflake algorithm, you can develop your own company's services, and even for the computer room id and machine id, you have reserved 5 bit + 5 bits for you, and you can change it to something with business meaning.

This snowflake algorithm is relatively reliable, so if you really do distributed id generation, if it is high concurrency or something, then the performance should be better when using this. Generally, a scenario with tens of thousands of concurrency per second is enough for you to use.

Summarize

The above is personal experience. I hope you can give you a reference and I hope you can support me more.