Dubbo LoadBalance weight-based random load balancing algorithm improves service performance

Introduction

Dubbo is a distributed service framework that can avoid single point of failure and horizontal expansion of support services. A service usually deploys multiple instances. How to select one from a cluster composed of multiple service providers for call involves a load balancing strategy.

The load balancing responsibility is to "allocate" network requests or other forms of load on different service nodes, thereby avoiding the situation where some nodes in the service cluster are too stressed and resources are tight, while other nodes are relatively idle. Through a reasonable load balancing algorithm, we hope that each service node can obtain a load suitable for its processing capabilities and achieve reasonable allocation of processing capabilities and traffic. Commonly used load balancing can be divided into software load balancing (such as Nginx used in daily work) and hardware load balancing (mainly F5, Array, NetScaler, etc., but development engineers rarely directly contact it in practice).

Dubbo provides 5 load balancing implementations

ConsistentHashLoadBalance based on Hash consistency;
RandomLoadBalance based on weight random algorithm;
LeastActiveLoadBalance based on the least active call number algorithm;
RoundRobinLoadBalance based on weighted polling algorithm;
ShortestResponseLoadBalance based on the shortest response time.

LoadBalance is an extension interface. The default extension implementation is RandomLoadBalance. Its definition is as follows. The @Adaptive annotation parameter is loadbalance, that is, the dynamically generated adapter will select the extension implementation class according to the loadbalance parameter value in the URL.

@SPI()
public interface LoadBalance {
    @Adaptive("loadbalance")
    <T> Invoker<T> select(List<Invoker<T>> invokers, URL url, Invocation invocation) throws RpcException;
}

The core function of the select() method in the LoadBalance interface is to select an Invoker from the Invoker collection and return based on the incoming URL and Invocation, as well as its own load balancing algorithm.

ConsistentHashLoadBalance (consistent Hash algorithm)

The underlying layer uses the consistent Hash algorithm to achieve load balancing.

Consistent Hash load balancing allows requests with the same parameters to be routed to the same service node every time. This load balancing strategy allows traffic on some Provider nodes to be spread to other Providers when some Provider nodes are offline, without causing severe fluctuations in traffic.

Suppose there are now three provider nodes, 1, 2, and 3, providing services to the outside world, and 100 requests arrive at the same time. If you want the requests to be distributed to these three provider nodes as evenly as possible, the easiest way we may think of is Hash modulo, that is, hash (request parameter) % 3. If the Hash calculations are all the parameters of the request, then the request with the same parameters will fall on the same Provider node. However, if a Provider node suddenly fails at this time, we need to modulo 2, that is, the request will be reassigned to the corresponding Provider. In extreme cases, even all requested processing nodes will change, which will cause relatively large fluctuations.

To avoid the situation where a large number of requested processing nodes are changed due to a Provider node downtime, we can consider using a consistent Hash algorithm.The principle of the consistency Hash algorithm is also a modulus algorithm. The difference from Hash modulus is that Hash modulus is to modulus the number of Provider nodes, while the consistency Hash algorithm is to modulus 2^32.

The consistency Hash algorithm needs to modulo the Provider address and request parameters at the same time:

hash(Provider address) % 2^32

hash(request parameter) % 2^32

The resulting value obtained by moduloing 2^32 by the Provider address and request will fall on a Hash ring.

Consistency Hash nodes uniformly distributed

We distribute the requests to the corresponding provider in a clockwise direction. In this way, when a Provider node goes down or a new Provider node is added, it will only affect the corresponding requests of this Provider node.

Ideally, the consistent Hash algorithm will distribute these three Provider nodes evenly on the Hash ring, and requests can also be distributed evenly to these three Provider nodes. However, in actual situations, the values after the address of these three Provider nodes are moduloed may be small, which will cause a large number of requests to fall on one Provider node. There is a problem of data skew. The so-called data skew refers to the situation where a large number of requests fall on the same node because the nodes are not scattered enough, while other nodes will only receive a small number of requests.

In order to solve the data skew problem that arises in the consistent Hash algorithm, the concept of Hash slot has evolved.

The idea of Hash slot to solve data skew is: since the problem is caused by uneven distribution of Provider nodes on the Hash ring, n groups of Provider nodes of P1, P2, and P3 can be virtualized to allow multiple groups of Provider nodes to be distributed relatively evenly on the Hash ring. As shown in the figure below, nodes with the same shadow are the same Provider node, such as P1-1, P1-2...P1-99 represents the provider node P1. After introducing the Provider virtual node, let the Provider be spread out on the ring to avoid data skew problems.

RandomLoadBalance (weighted random algorithm)

RandomLoadBalance is a simple and efficient load balancing implementation, and it is also the LoadBalance implementation used by Dubbo by default.

Next, we generate a random number in the range [0, 10) through the random number generator, and then calculate which interval the random number will fall into. For example, if a random generation of 4 will fall into the corresponding interval of Provider A, and then RandomLoadBalance will return to the node Provider A.

Next, let’s look at the implementation of the doSelect() method in RandomLoadBalance. Its core logic is three key points:

Calculate the weight value and the total weight value corresponding to each Invoker;
When the weight values of each Invoker are not equal, calculate which Invoker interval should the random number fall into and return the corresponding Invoker object;
When the weight values of each Invoker are the same, a random Invoker is returned.

RandomLoadBalance After multiple requests, the call request can be evenly allocated to each Provider node according to the weight value.

LeastActiveLoadBalance (minimum active load balancing algorithm)

LeastActiveLoadBalance uses the minimum active load balancing algorithm. It believes that the smaller the number of active requests, the more processing power remaining, the higher the efficiency of processing requests. Then the provider can process more requests in a unit time, so we should prioritize allocating the requests to the Provider node.

LeastActiveLoadBalance needs to be used with ActiveLimitFilter. ActiveLimitFilter will record the number of active requests for each interface method. When LeastActiveLoadBalance is load balancing, Invoker will only be selected from the Invoker collection with the least number of active requests.

In the implementation of LeastActiveLoadBalance, the Invoker object with the smallest active request will be selected first, and the subsequent logic is exactly the same as that of RandomLoadBalance. The final Invoker object is selected according to the weight of these Invoker objects.

RoundRobinLoadBalance (weighted polling load balancing algorithm)

Polling refers to assigning requests in turn to each Provider. For example, there are three Provider nodes: A, B, and C. According to the normal polling method, we will assign the first request to Provider A, assign the second request to Provider B, the third request to Provider C, and the fourth request to Provider A again... This cycle is repeated.

Polling is a stateless load balancing algorithm with simple implementation and is suitable for scenarios with similar performance in all Provider nodes in the cluster. However, it is difficult to ensure this in reality, because it is easy to cause the best and worst provider nodes in the cluster to handle the same traffic, which may lead to the tight resources of all aspects of the provider nodes with poor performance and even unable to respond in time. However, the use of all aspects of the provider nodes with good performance is still relatively idle. At this time, we can reduce the traffic allocated to the Provider nodes with poor performance through weighted polling.

After weighting, the traffic ratio assigned to each Provider node is close to or equal to their weight ratio. For example, the weight ratio of Provider nodes A, B, and C is 5:1:1, then in 7 requests, Node A will receive 5 requests, Node B will receive 1 request, and Node C will receive 1 request.

ShortestResponseLoadBalance (load balancing algorithm with minimum response time)

ShortestResponseLoadBalance is a new LoadBalance implementation class added after Dubbo version 2.7. It implements a load balancing algorithm with the shortest response time, that is, select the provider node with the successful call and the shortest response time from multiple Provider nodes. However, there may be multiple Provider nodes that meet this condition, so you have to use a random algorithm to select it once to get the Provider node to be called in the end.

The above is the detailed content of Dubbo's load balancing principle. For more information about Dubbo's load balancing, please pay attention to my other related articles!