Detailed introduction to Golang distributed lock

When a stand-alone program modifies global variables concurrently or in parallel, it is necessary to lock the modification behavior to create a critical area. Why do you need to add a lock? You can check out the following code:

package main
import (
    "sync"
)
// Global variablesvar counter int
func main() {
    var wg 
    for i := 0; i &lt; 1000; i++ {
        (1)
        go func() {
            defer ()
            counter++
        }()
    }
    ()
    println(counter)
}

Multiple runs will get different results:

❯❯❯ go run local_lock.go
945
❯❯❯ go run local_lock.go
937
❯❯❯ go run local_lock.go
959

Locking in process

If you want to get the correct result, add the lock to the counter operation code:

// ...Overall partvar wg 
var l 
for i := 0; i &lt; 1000; i++ {
    (1)
    go func() {
        defer ()
        ()
        counter++
        ()
    }()
}
()
println(counter)
// ... The following part is omitted

This way, the calculation results can be obtained stably:

❯❯❯ go run local_lock.go
1000

trylock

package main
import (
    "sync"
)
// Lock try lock
type Lock struct {
    c chan struct{}
}
// NewLock generate a try lock
func NewLock() Lock {
    var l Lock
     = make(chan struct{}, 1)
     <- struct{}{}
    return l
}
// Lock try lock, return lock result
func (l Lock) Lock() bool {
    lockResult := false
    select {
    case <-:
        lockResult = true
    default:
    }
    return lockResult
}
// Unlock , Unlock the try lock
func (l Lock) Unlock() {
     <- struct{}{}
}
var counter int
func main() {
    var l = NewLock()
    var wg 
    for i := 0; i < 10; i++ {
        (1)
        go func() {
            defer ()
            if !() {
                // log error
                println("lock failed")
                return
            }
            counter++
            println("current counter", counter)
            ()
        }()
    }
    ()
}

Because our logic limits that each goroutine will continue to execute subsequent logic only if the Lock is successfully executed, it is guaranteed that the channel in the Lock struct must be empty when unlocking, so that it will not block or fail.

In stand-alone systems, trylock is not a good choice. Because a large amount of goroutine lock grabs may lead to meaningless waste of resources on cpu. There is a proper noun to describe this kind of lock grab scenario: live lock.

Live lock refers to the program that seems to be executing normally, but in fact the CPU cycle is wasted on lock grabbing rather than executing tasks, so the overall execution efficiency of the program is inefficient. The problem of live locks is much troublesome to locate. Therefore, in a single airport scenario, this kind of lock is not recommended.

Redis-based setnx

package main
import (
    "fmt"
    "sync"
    "time"
    "/go-redis/redis"
)
func incr() {
    client := (&{
        Addr:     "localhost:6379",
        Password: "", // no password set
        DB:       0,  // use default DB
    })
    var lockKey = "counter_lock"
    var counterKey = "counter"
    // lock
    resp := (lockKey, 1, *5)
    lockSuccess, err := ()
    if err != nil || !lockSuccess {
        (err, "lock result: ", lockSuccess)
        return
    }
    // counter ++
    getResp := (counterKey)
    cntValue, err := getResp.Int64()
    if err == nil {
        cntValue++
        resp := (counterKey, cntValue, 0)
        _, err := ()
        if err != nil {
            // log err
            println("set value error!")
        }
    }
    println("current counter is ", cntValue)
    delResp := (lockKey)
    unlockSuccess, err := ()
    if err == nil && unlockSuccess > 0 {
        println("unlock success!")
    } else {
        println("unlock failed", err)
    }
}
func main() {
    var wg 
    for i := 0; i < 10; i++ {
        (1)
        go func() {
            defer ()
            incr()
        }()
    }
    ()
}

Check out the running results:

❯❯❯ go run redis_setnx.go
<nil> lock result: false
<nil> lock result: false
<nil> lock result: false
<nil> lock result: false
<nil> lock result: false
<nil> lock result: false
<nil> lock result: false
<nil> lock result: false
<nil> lock result: false
current counter is 2028
unlock success!

Through the code and execution results, we can see that our remote call to setnx is actually very similar to a stand-alone trylock. If the acquisition of the lock fails, the relevant task logic should not continue to be executed forward.

setnx is very suitable for competing for some "unique" resources in high concurrency scenarios. For example, in the transaction matching system, sellers initiate orders, and multiple buyers will compete for them concurrently. We cannot rely on specific time to judge the sequence in this scenario, because no matter whether it is the time of the user equipment or the time of each machine in a distributed scenario, there is no way to ensure the correct timing after merging. Even in our cluster in the same computer room, the system time of different machines may vary slightly.

Therefore, we need to rely on the order in which these requests arrive at the redis node to perform the correct lock grab operation. If the user's network environment is relatively poor, he can only ask for more blessings.

Based on zk

package main
import (
    "time"
    "/samuel/go-zookeeper/zk"
)
func main() {
    c, _, err := ([]string{"127.0.0.1"}, ) //*10)
    if err != nil {
        panic(err)
    }
    l := (c, "/lock", ())
    err = ()
    if err != nil {
        panic(err)
    }
    println("lock succ, do your business logic")
    ( * 10)
    // do some thing
    ()
    println("unlock succ, finish business logic")
}

The difference between zk-based locks and redis-based locks is that they will block until they succeed, which is very similar to what we have in our single airport view.

The principle is also based on temporary sequence nodes and watch APIs. For example, what we are using here is/locknode. Lock will insert its own value into the node list under this node. As long as the child nodes under this node change, all programs that watch the node will be notified. At this time, the program will check whether the id of the smallest child node under the current node is consistent with its own. If it is consistent, it means that the locking is successful.

This distributed blocking lock is more suitable for distributed task scheduling scenarios, but is not suitable for high-frequency lock-holding scenarios with short lock-holding times. According to the explanation in Google's chubby paper, locks based on strong consensus protocols are applicable toCoarse grain sizelocking operation. The coarse grain size here refers to the longer lock occupancy time. When using it, we should also think about whether it is appropriate to use it in our business scenarios.

Based on etcd

package main
import (
    "log"
    "/zieckey/etcdsync"
)
func main() {
    m, err := ("/lock", 10, []string{"http://127.0.0.1:2379"})
    if m == nil || err != nil {
        (" failed")
        return
    }
    err = ()
    if err != nil {
        (" failed")
        return
    }
    (" OK")
    ("Get the lock. Do something here.")
    err = ()
    if err != nil {
        (" failed")
    } else {
        (" OK")
    }
}

There is no sequence node like zookeeper in etcd. Therefore, its lock implementation is different from that based on zookeeper implementation. The Lock process for etcdsync used in the above example code is:

Check first/lockIs there a value in the path? If there is a value, it means that the lock has been snatched by someone else
If there is no value, write your own value. A write is successful and returns, indicating that the lock is successfully added. If the node has been written by other nodes during writing, it will cause the locking to fail. At this time, it will reach 3.
watch /lockThe following event is blocked at this time
when/lockWhen an event occurs in the path, the current process is awakened. Check whether the event that occurred is a delete event (indicates that the lock is actively unlocked by the holder), or an expired event (indicates that the lock expires and fails). If so, then go back to 1 and go through the lock grab process.

redlock

package main
import (
    "fmt"
    "time"
    "/garyburd/redigo/redis"
    "/redsync.v1"
)
func newPool(server string) * {
    return &{
        MaxIdle:     3,
        IdleTimeout: 240 * ,
        Dial: func() (, error) {
            c, err := ("tcp", server)
            if err != nil {
                return nil, err
            }
            return c, err
        },
        TestOnBorrow: func(c , t ) error {
            _, err := ("PING")
            return err
        },
    }
}
func newPools(servers []string) [] {
    pools := []{}
    for _, server := range servers {
        pool := newPool(server)
        pools = append(pools, pool)
    }
    return pools
}
func main() {
    pools := newPools([]string{"127.0.0.1:6379", "127.0.0.1:6378", "127.0.0.1:6377"})
    rs := (pools)
    m := ("/lock")
    err := ()
    if err != nil {
        panic(err)
    }
    ("lock success")
    unlockRes := ()
    ("unlock result: ", unlockRes)
}

redlock is also a blocking lock, and a single node operation corresponds toset nx pxIf more than half of the nodes return successfully, the lock is considered to be successful.

The design of redlock once caused a war of words in the community, and distributed experts expressed their opinions. However, this is not what we are going to discuss, and the relevant links are given in the reference materials.

How to choose

When the business is still on the order of magnitude that can be handled by a stand-alone machine, then you can use any stand-alone lock solution according to your needs.

If it develops to the distributed service stage, but the business scale is not large, such as qps < 1000, it will be similar to the lock solution used. If there is already a zk/etcd/redis cluster that can be used in the company, try to meet business needs without introducing a new technology stack.

If the business develops to a certain level, it needs to be considered from many aspects. The first is whether your lock does not allow data loss under any harsh conditions. If it is not allowed, then do not use the simple lock of setnx of redis.

If you want to use redlock, then you need to consider whether your company's redis cluster solution can directly expose the ip+port of the corresponding redis instance to developers. If not, it won't work.

If the reliability of lock data is extremely high, you can only use etcd or zk lock schemes that ensure data reliability through consistency protocols. But a reliable back is often low throughput and high latency. It needs to be stress tested based on the order of magnitude of the business to ensure that the etcd/zk cluster used by the distributed lock can withstand the actual pressure of business requests. It should be noted that there is no way for etcd and zk clusters to improve their performance by adding nodes. To scale it horizontally, you can only build multiple clusters to support more requests. This will further increase the requirements for operation and maintenance and monitoring. Multiple clusters may need to introduce proxy. Without proxy, the business needs to shard the business according to the business id. If the business is already online, dynamic data migration should also be considered. None of these are easy things.

When choosing a specific plan, you still need to think more and estimate the risks early.

This is the end of this article about the detailed introduction of Go distributed locks. For more related contents of Go distributed locks, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!