Snowflak algorithm (snowflak) generates ordered Java implementation code with no repetition ID

1. Introduction

The Snowflake Algorithm is a method of generating unique IDs in a distributed system, originally used internally by Twitter. It generates a 64-bit long integer number consisting of the following parts:

The highest bit is the sign bit, usually 0, because the ID is usually a positive number.
41 bits are used to store millisecond-level timestamps. This part is not the timestamp that stores the current time, but the difference of the timestamp (current timestamp - start timestamp), which can support about 69 years.
10 bits are used to store machine code and can support up to 1024 machines. If multiple requests arrive at the same machine within the same millisecond, the machine code can be used to distinguish different requests.
12-bits are used to store serial numbers, used for multiple requests within the same millisecond, and each machine can generate up to 4096 (0~4095) IDs per millisecond.

Advantages of the snowflake algorithm include:

In a highly concurrency distributed system, the uniqueness of the ID can be guaranteed.
Based on timestamps, the ID is basically incremented in an orderly manner.
It does not rely on third-party libraries or middleware, reducing system complexity.
Generating IDs is very efficient.

2. Snowflake algorithm diagram

The ID generated using the 64-bit long type is the following decomposition structure of a long type binary, as follows:

|0000|0000|0000|0000|0000|0000|0000|0000|0000|0000|0000|0000|0000|0000|0000|0000|
|-111|1111|1111|1111|1111|1111|1111|1111|1111|1111|11--|----|----|----|----|----|
|----|----|----|----|----|----|----|----|----|----|--11|1111|1111|----|----|----|
|----|----|----|----|----|----|----|----|----|----|----|----|----|1111|1111|1111|

It is convenient to distinguish the meanings of each paragraph and express each paragraph independently in different lines:

The first line: represents a long type, with the initial value of 0L;
- Because the ID is usually a positive number, the highest bit in java is the sign bit, and 0 means a positive number 1 means a negative number, so here is 0.
The second line: 41 bits are used to store millisecond-level timestamps;
- The normal timestamp has more than 41 bits. In order to represent a longer time with a fixed number of digits, the time stamp length needs to be shortened. Here we use the difference in the timestamp (current timestamp - start timestamp);
- The maximum number that 41 bits can represent is 2^41-1=2,199,023,255,552, and the number of milliseconds in a year is: 3600x1000x24x365=31,536,000,000;
- Use 2,199,023,255,552/31,536,000,000=69.73, so the 41 millisecond-level timestamp can represent up to 69.73 years;
- The start timestamp is set to the system online time, and this ID can be used continuously for 69.73 years, which can meet most business system requirements;
The third line: 10 bits are used to store machine code;
- It can support 1024 machines with numbers from 0 to 1023. If multiple requests arrive at the same machine within the same millisecond, the machine code can be used to distinguish different requests.
Line 4: 12 bits are used to store the serial number;
- For multiple requests within the same millisecond, each machine can generate up to 4096 (numbered from 0 to 4095) IDs per millisecond.

3. Calculation of 41-bit millisecond-level timestamps

The algorithm supports time callback within 1.5 seconds. Here, the logic when the millisecond sequence number overflows, that is, the getTimestamp() method. When referring to the algorithm written on the Internet, this method only writes a wait and does not return a value. After the wait is completed, the current variable is not assigned, resulting in duplication of the generated ID. ~~~The logic problems are the hardest to check -_-!!!

/** The time when the business system was launched 2024-10-01 0:0:0, 41 bits can represent up to about 69.7 years */
private static final long twepoch = 1727712000000L;
/**
 * Generate the next unique ID
 *
 * @return Next unique ID
 * @throws RuntimeException If the system clock falls back, a RuntimeException exception is thrown
 */
public synchronized long nextId() {
    long now = getTimestamp(); // Get the timestamp    // Clock fallback processing: If the current time is less than the time stamp generated by the last ID    if (now &lt; lastTimestamp) {
        //Support callbacks within 1.5 seconds at most (1500 milliseconds), otherwise an exception will be thrown        long offset = lastTimestamp - now;
        if(offset&lt;=1500) {
            try {
                offset = offset&lt;&lt;2;//Wait 2 times the time                (offset);
                now = getTimestamp();
                //It's still small, throw exception                if (now &lt; lastTimestamp) {
                    throw new RuntimeException(("Clock callback, ID %d millionseconds cannot be generated", lastTimestamp - now));
                }
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }
    }
    // If it is generated at the same time, a sequence within milliseconds will be performed    if (lastTimestamp == now) {
        // millisecond-level sequence number, use mask 4095 to get the lower 12 digits, limiting the self-increment value between 1 and 4095 (mask 4095 represents the value where both 12 digits of binary are 1, that is: 1111 1111 1111)        sequence = (sequence + 1) &amp; 4095;
        //overflow        if (sequence == 0) {
            //The sequence overflows within milliseconds, wait until the next millisecond before continuing            now = getNextMillis(now);
        }
    } else {
        //Before setting 0, the serial number will be increased to this point after concurrent at the same time. The time is different, so the version number is set 0        sequence = 0;
    }
    lastTimestamp = now;
    /*
     * Length 64 bits, where:
     * 1-bit sign bit, 0 positive number, 1 negative number
     * 41-bit millisecond-level timestamp, 411111111111111111111111111111111111
     * 10-bit machine ID, 11 1111 1111
     * 12-bit serial number, 1111 1111 1111
     * */
    long id = ((now - twepoch) &lt;&lt; 22) | (workerId &lt;&lt; 12) | sequence;
    return id;
}

IV. Generation of 10 machine codes

For the 10-bit machine code in the middle, some algorithms are divided into 2 segments. The first 5 digits are the data center ID and the last 5 digits are the machine code. It can only represent 31*31=961 machines at most.
If you use 10 bits to identify the machine code, you can represent up to 1024 machines from 0~1023, which can represent more machines and reduce the complexity of logic, so I used the 10 bit machine code form.
Moreover, there are some high-concurrency business scenarios. In order to ensure that there are multiple live deployment modes in other places, 31 machines in one computer room are really not enough.

There is a way to really make machine code generation:

Use sequential nodes in the ZooKeeper data model as ID encoding;
Use Redis to encode IDs;
Id encoding based on database tables;
Locally based on IP address bit ID encoding, the following example adopts this method;

/**
  * workId is generated using IP
  * @return workId
  */
private int getWorkId() {
    try {
        String hostAddress = ();
        int[] ints = (hostAddress);
        int sums = 0;
        for (int b : ints) {
            sums = sums + b;
        }
        return (sums % 1024);
    } catch (UnknownHostException ex) {
        ();
        // Generate randomly if it fails        return (0, 1024);
    }
}

V. Generation of 12-bit serial number

This algorithm is mainly used to generate 12-bit serial numbers, which can represent 4096 numbers in total from 0 to 4095, or represent a maximum of 4096 concurrencies in millisecond level.

Using 4095 as a mask and doing and operating the sequence number, you can get a lower 12-digit value.

Because qequence comes up +1, so if the value is 0, it means the value overflows.

After overflow, you need to wait for the next millisecond to start numbering again from 0.

long now = getTimestamp(); // Get the timestamp// If it is generated at the same time, a sequence within milliseconds will be performedif (lastTimestamp == now) {
    // millisecond-level sequence number, use mask 4095 to get the lower 12 digits, limiting the self-increment value between 1 and 4095 (mask 4095 represents the value where both 12 digits of binary are 1, that is: 1111 1111 1111)    sequence = (sequence + 1) &amp; 4095;
    //overflow    if (sequence == 0) {
        //The sequence overflows within milliseconds, wait until the next millisecond before continuing        now = getNextMillis(now);
    }
} else {
    //Before setting 0, the serial number will be increased to this point after concurrent at the same time. The time is different, so the version number is set 0    sequence = 0;
}
lastTimestamp = now;

6. The final assembly of the snowflake algorithm ID

The bit-by-bit left-shift operation is used, and the three values of the timestamp difference, machine code, sequence number, are finally merged into a long.

One advantage of this algorithm is that the ID can be decoded and the time, machine code and sequence number can be obtained.

/*
 * Length 64 bits, where:
 * 1-bit sign bit, 0 positive number, 1 negative number
 * 41-bit millisecond-level timestamp, 411111111111111111111111111111111111
 * 10-bit machine ID, 11 1111 1111
 * 12-bit serial number, 1111 1111 1111
 * */
long id = ((now - twepoch) &lt;&lt; 22) | (workerId &lt;&lt; 12) | sequence;

7. Snowflake Algorithm ID Decoding

The bit-wise right-shift operation is used to split the timestamp difference, machine code, sequence number, and three values from long.

The output result is: id:7778251575992320 -> time:1854479688 req:0 wid:584 2024-10-22 11:07:59.688

/** The time when the business system was launched 2024-10-01 0:0:0, 41 bits can represent up to about 69.7 years */
private static final long twepoch = 1727712000000L;
/**
  * Decode the long integer ID into string format
  *
  * @param id The long integer ID that needs to be decoded
  * @return The decoded string, format is "Timestamp\tSerial number\tWorker ID\tCenter ID"
  */
public static String idDecode(long id) {
    long sequence = id &amp; 4095; //Take the number of the lower 12 digits    long workerId = (id &gt;&gt; 10) &amp; 1023;//Turn the number of 10 digits lower after moving left    long time = (id &gt;&gt; 22); //Turn the number of 41 digits lower after moving left    return ("time:{0,number,#}\treq:{1}\twid:{2}\t{3}"
            , time
            , sequence
            , workerId
            , getDataTime(time));
}
private static String getDataTime(long timeInterval) {
    var timestamp = twepoch+timeInterval;
    var date = new Date(timestamp);
    SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:");
    var dtStr = (date);
    return dtStr;
}

8. Complete ID generation class

import .;
import .;
import org.;
import org.;

import ;
import ;
import ;
import ;
import ;

public class SnowflakeIdUtil {
    private static Logger logger = (());
    /** The time when the business system was launched 2024-10-01 0:0:0, 41 bits can represent up to about 69.7 years */
    private static final long twepoch = 1727712000000L;
    /** Sequence within milliseconds */
    private long sequence = 0L;
    /** Machine ID */
    private int workerId;
    /** The timestamp of the last ID generated */
    private long lastTimestamp = -1L;
    private volatile static SnowflakeIdUtil instance = null;

    public void setWorkerId(int workerId) {
        if (workerId &gt; 1023 || workerId &lt; 0)
            throw new IllegalArgumentException("workerId must be between 0 and 1023");
         = workerId;
    }


    /**
      * SnowflakeIdUtil class constructor
      *
      * @throws IllegalArgumentException This exception is thrown if the incoming workerId or datacenterId is not in the range of 0 to 31
      */
    private SnowflakeIdUtil() {
        workerId = getWorkId();
    }


    /**
      * Gets the singleton object of SnowflakeIdUtil.
      * This method first gets the working machine ID and the data center ID, and then uses these two IDs to call another getInstance method to get the singleton object of SnowflakeIdUtil.
      * @return Returns the singleton object of SnowflakeIdUtil.
      */
    public static SnowflakeIdUtil getInstance() {
        if (instance == null) {
            synchronized () {
                if (instance == null) {
                    instance = new SnowflakeIdUtil();
                }
            }
        }
        return instance;
    }

    /**
      * workId is generated using IP
      * @return workId
      */
    private int getWorkId() {
        try {
            String hostAddress = ();
            int[] ints = (hostAddress);
            int sums = 0;
            for (int b : ints) {
                sums = sums + b;
            }
            return (sums % 1024);
        } catch (UnknownHostException ex) {
            ();
            // Generate randomly if it fails            return (0, 1024);
        }
    }

    /**
      * Generate the next unique ID
      *
      * @return Next unique ID
      * @throws RuntimeException If the system clock falls back, a RuntimeException exception is thrown
      */
    public synchronized long nextId() {
        long now = getTimestamp(); // Get the timestamp        // Clock fallback processing: If the current time is less than the time stamp generated by the last ID        if (now &lt; lastTimestamp) {
            //Support callbacks within 1.5 seconds at most (1500 milliseconds), otherwise an exception will be thrown            long offset = lastTimestamp - now;
            if(offset&lt;=1500) {
                try {
                    offset = offset&lt;&lt;2;//Wait 2 times the time                    (offset);
                    now = getTimestamp();
                    //It's still small, throw exception                    if (now &lt; lastTimestamp) {
                        throw new RuntimeException(("Clock callback, ID %d millionseconds cannot be generated", lastTimestamp - now));
                    }
                } catch (InterruptedException e) {
                    throw new RuntimeException(e);
                }
            }
        }
        // If it is generated at the same time, a sequence within milliseconds will be performed        if (lastTimestamp == now) {
            // millisecond-level sequence number, use mask 4095 to get the lower 12 digits, limiting the self-increment value between 1 and 4095 (mask 4095 represents the value where both 12 digits of binary are 1, that is: 1111 1111 1111)            sequence = (sequence + 1) &amp; 4095;
            //overflow            if (sequence == 0) {
                //The sequence overflows within milliseconds, wait until the next millisecond before continuing                now = getNextMillis(now);
            }
        } else {
            //Before setting 0, the serial number will be increased to this point after concurrent at the same time. The time is different, so the version number is set 0            sequence = 0;
        }
        lastTimestamp = now;
        /*
         * Length 64 bits, where:
         * 1-bit sign bit, 0 positive number, 1 negative number
         * 41-bit millisecond-level timestamp, 411111111111111111111111111111111111
         * 10-bit machine ID, 11 1111 1111
         * 12-bit serial number, 1111 1111 1111
         * */
        long id = ((now - twepoch) &lt;&lt; 22) | (workerId &lt;&lt; 12) | sequence;
        return id;
    }

    /**
      * Decode the long integer ID into string format
      *
      * @param id The long integer ID that needs to be decoded
      * @return The decoded string, format is "Timestamp\tSerial number\tWorker ID\tCenter ID"
      */
    public static String idDecode(long id) {
        long sequence = id &amp; 4095; //Take the number of the lower 12 digits        long workerId = (id &gt;&gt; 10) &amp; 1023;//Turn the number of 10 digits lower after moving left        long time = (id &gt;&gt; 22); //Turn the number of 41 digits lower after moving left        return ("time:{0,number,#}\treq:{1}\twid:{2}\t{3}"
                , time
                , sequence
                , workerId
                , getDataTime(time));
    }

    private static String getDataTime(long timeInterval) {
        var timestamp = twepoch+timeInterval;
        var date = new Date(timestamp);
        SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:");
        var dtStr = (date);
        return dtStr;
    }


    protected long getTimestamp() {
        return ();
    }

    // Wait for the next millisecond until a new timestamp is obtained    protected long getNextMillis(long lastTimestamp) {
        //("wait until next millis : "+lastTimestamp);
        long timestamp = getTimestamp();
        while (timestamp &lt;= lastTimestamp) {
            timestamp = getTimestamp();
        }
        return timestamp;
    }
}

9. Multi-threaded test cases

import .;
import ;
import ;
import ;

import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;

public class IdUtilTest {
    /**
      * Test the concurrency performance of SnowflakeId generator
      *
      * @throws InterruptedException If the thread is interrupted while waiting, an InterruptedException exception is thrown
      */
    @Test
    public void snowflakTest() throws InterruptedException {
        var trehadCount = 30;
        var loopCount = 100000;
        var debug = true;
        var unique = new ConcurrentHashMap&lt;Long,String&gt;();
        var duplicates  = new TreeMap&lt;Long,String&gt;();
        ("Thread:"+trehadCount+"\tNumber of loops per thread:"+loopCount+"");
        Runnable runnable = () -&gt; {
            var start = ();
            for(int i = 0; i &lt; loopCount; i++) {
                var id = ().nextId();
                if(debug) {
                    if ((id)) {
                        (id, ().getName());
                    } else {
                        (id, ().getName());
                    }
                }
            }
            var timecost = () - start;
            (timecost+"\t"+().getName());
        };
        List&lt;Thread&gt; threads = new ArrayList&lt;&gt;();
        for(int i = 0; i &lt; trehadCount; i++) {
            Thread thread = new Thread(runnable);
            (thread);
        }
        for(Thread thread : threads) {
            ();
            ();
        }
        ("------------------------------------------------------------------------------------------------------------------------------);
        ("Plan to generate:"+trehadCount*loopCount);
        ("No repeating the number of IDs:"+());
        ("Number of duplicate IDs:"+());
        ("------------------------------------------------------------------------------------------------------------------------------);
        for(var id : ()) {
            (("id:{0}\t-&gt;\t| DECODE:{1}\t| thread:{2}\t{3}"
                    ,id
                    ,(id)
                    ,(id)
                    ,(id)));
        }
        (() == 0, "The number of duplicate IDs is not 0");
    }
    @Test
    public void snowflakIdDecodTest(){
        for(var i=0;i&lt;100;i++){
            var id = ().nextId();
            var idDecode = (id);
            ("id:" + id+"\t-&gt;\t"+idDecode);
        }
    }
}

10. Check the test results

30 and 3 million IDs were generated, which took 1356 milliseconds, and the performance was better than the generation of 300 UUIDs.

Thread：30	每个Thread循环次数：100000
185	Thread-0
63	Thread-1
26	Thread-2
57	Thread-3
25	Thread-4
26	Thread-5
24	Thread-6
103	Thread-7
55	Thread-8
26	Thread-9
35	Thread-10
25	Thread-11
25	Thread-12
25	Thread-13
26	Thread-14
135	Thread-15
25	Thread-16
25	Thread-17
42	Thread-18
27	Thread-19
25	Thread-20
26	Thread-21
25	Thread-22
40	Thread-23
49	Thread-24
50	Thread-25
27	Thread-26
75	Thread-27
32	Thread-28
27	Thread-29
---------------------------- Statistical results
Plan to generate the number：3000000
No repetitionIDNumber of：3000000
repeatIDNumber of：0
---------------------------- repeatID

Summarize

In backend systems, using 64-bit long-type IDs usually won't encounter any problems. However, considering that most services are currently web applications, interaction with JavaScript has become extremely common. JavaScript has an important limitation when processing integers: the maximum integer value it can accurately represent is 53 bits. When the value exceeds this range, JavaScript will have a problem of accuracy loss.

Therefore, when designing the system, we must ensure that the ID length does not exceed 53 bits so that JavaScript can handle these values directly and accurately. If the ID length exceeds 53 bits, we must convert these values to string format so that it can be handled correctly in JavaScript. This transformation will undoubtedly increase the complexity of the API interface, so we need to carefully consider this when designing and developing the system.

In order to pass the Long type ID to the front end without conversion, we can use the 53-bit snowflake algorithm. This algorithm divides the ID into three parts: a 32-bit second-level timestamp, a 16-bit auto-value-added and a 5-bit machine identifier. Such a combination can support 32 machines to generate 65,535 serial numbers per second, thus meeting the needs of most systems.

If you still need to use a 63-bit ID, we can save the ID as a varchar(64) type string in the database, or add a string type ID field in the entity object. Before returning the data to the front-end, we can directly provide this string ID value to avoid the accuracy problem when JavaScript handles integers. This design not only ensures data integrity, but also simplifies the complexity of front-end processing.

This is the article about the implementation of the Java implementation of snowflak algorithm that generates an orderly and non-repeat ID. For more related Java content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!