Analysis of the principle of MySQL Join algorithm

In MySQL,JOINOperation is used to combine data from multiple tables together. In order to perform efficientlyJOINOperation, MySQL implements a variety ofJOINAlgorithm, the following will explain several common ones in detailJOINAlgorithm principle.

1. Nested Loop Join (Nested - Loop Join, NLJ)

principle

Nested loop connection is the most basicJOINAlgorithm, which completes table join operations through two or more layers of nested loops. Suppose there are two tablesAandB,The basic steps of the NLJ algorithm are as follows:

Outer loop traversal tableArecord every line in it.
For the tableARecord each row in the record, the inner layer loops through the tableBRecord each line in and check whether these two line records meet theJOINcondition. If the condition is met, the two rows of records are combined into part of the result set.

Sample code explanation

SELECT * 
FROM tableA 
JOIN tableB 
ON  = ;

In this query, MySQL may use a nested loop join algorithm. FirsttableATake out a row and scan it progressivelytableB, find satisfaction = Records of conditions, combine matching records and output them.

Complexity analysis

Time complexity: where is the tableAThe number of rows is a tableBnumber of rows. This algorithm is less efficient when dealing with large tables.

2. Index Nested - Loop Join, INLJ)

principle

Index nested loop joins are an optimized version of nested loop joins. When the driven table (usually the inner loop table) has the sameJOINWhen a condition-related index, MySQL will use this index to speed up finding matching records instead of full table scanning. The basic steps are as follows:

The outer layer loops through each row record in the driver table (usually a table with fewer rows).
For each row record in the driver table, use the index on the driven table to quickly locate and satisfy the contentJOINRecording of conditions without the need to scan the driven table progressively.

Sample code explanation

SELECT * 
FROM tableA 
JOIN tableB 
ON  = tableB.a_id;

iftableBTablea_idThere are indexes on the column, and MySQL will use the index nested loop join algorithm. FirsttableATake out a row from and usetableBsuperiora_idThe index of the column is quickly found to satisfy = tableB.a_idRecord of conditions.

Complexity analysis

Time complexity: where is the number of rows in the driving table and the number of rows in the driven table. Due to the use of indexes, the search efficiency has been significantly improved.

3. Block Nested - Loop Join, BNLJ)

principle

When there is no available index on the driven table, in order to reduce the number of inner loops, MySQL introduces a block nested loop joining algorithm. Its basic idea is to divide the data of the driver table into multiple blocks, each time the data of one block is loaded into the cache area in memory, and then scan the driven table progressively to check whether each row in the cache area and the row in the driven table meet the satisfaction.JOINcondition. The basic steps are as follows:

Divide the data of the driver table into multiple blocks, and the size of each block isjoin_buffer_sizeParameter control.
Load the data of one block at a time tojoin buffermiddle.
Scan the driven table progressively, for each row in the driven table, check whether it is withjoin bufferAny line in it satisfiesJOINcondition.

Sample code explanation

SELECT * 
FROM tableA 
JOIN tableB 
ON tableA.some_column = tableB.some_column;

iftableBNo on the table withJOINFor condition-related indexes, MySQL may use block nested loop joining algorithm. Let's firsttableAThe data is divided into blocks and loaded intojoin buffer, then scantableB,examinetableBWhether each line in thejoin bufferThe records in match.

Complexity analysis

Time complexity: Although the time complexity is the same as the nested loop connection, the performance has been improved to a certain extent due to the reduction of the number of inner loops.

4. Hash Join

principle

Hash connection is a suitable type of processing large data setsJOINAlgorithms, usually used in MySQL 8.0 and above,JOINoperate. Its basic steps are as follows:

Construction stage: Select a smaller table as the construction table, traverse each row of records in the construction table, according toJOINThe columns in the condition calculate the hash value and insert the record into the corresponding hash bucket.
Detection stage: traverse each row of records in a larger table (detection table), according to the sameJOINThe conditional column calculates the hash value and then looks for matching records in the hash table.

Sample code explanation

SELECT * 
FROM large_table 
JOIN small_table 
ON large_table.key = small_table.key;

In this query, ifsmall_tableSmall, MySQL willsmall_tableAs a build table, build the hash table, and then iterate overlarge_tablePerform detection and find matching records.

Complexity analysis

Time complexity: where and are the row counts of two tables respectively. Hash connections are highly efficient when processing large data sets.

This is the end of this article about the principle of understanding and reading of MySQL Join algorithm. For more related contents of MySQL Join algorithm, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!