Some optimizations that MySQL can make in complex association situations

Yesterday, I dealt with an optimization of complex associated SQL. This type of SQL often considers the following four points:

First. The result set returned by the query is usually very few, so it is confident to optimize it;

Second, the selection of driver table is crucial. By viewing the execution plan, you can see the driver table selected by the optimizer. The rows in the execution plan can roughly reflect the problem;

Third. Clarify the relationship between each table and pay attention to whether there are appropriate indexes on the correlation field;

Fourth. Use straight_join keyword to force the association order between tables, which can facilitate us to verify certain conjectures;

SQL:
Execution time:

mysql> select c.yh_id,
-> c.yh_dm,
-> c.yh_mc,
-> ,
-> c.yh_lx,
-> a.jg_id,
-> a.jg_dm,
-> a.jg_mc,
-> a.jgxz_dm,
-> d.js_dm yh_js
-> from a, b, c
-> left join d on d.yh_id = c.yh_id
-> where a.jg_id = b.jg_id
-> and b.yh_id = c.yh_id
-> and a.yx_bj = ‘Y'
-> and c.sc_bj = ‘N'
-> and c.yx_bj = ‘Y'
-> and c.sc_bj = ‘N'
-> and c.yh_dm = '006939748XX' ;

1 row in set (0.75 sec)

This SQL query actually only returns one line of data, but the execution took 750ms to view the execution plan:

mysql> explain
-> select c.yh_id,
-> c.yh_dm,
-> c.yh_mc,
-> ,
-> c.yh_lx,
-> a.jg_id,
-> a.jg_dm,
-> a.jg_mc,
-> a.jgxz_dm,
-> d.js_dm yh_js
-> from a, b, c
-> left join d on d.yh_id = c.yh_id
-> where a.jg_id = b.jg_id
-> and b.yh_id = c.yh_id
-> and a.yx_bj = ‘Y'
-> and c.sc_bj = ‘N'
-> and c.yx_bj = ‘Y'
-> and c.sc_bj = ‘N'
-> and c.yh_dm = '006939748XX' ;

+—-+————-+——-+——–+——————+———+———+————–+——-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+——————+———+———+————–+——-+————-+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | .JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | .YH_ID | 1 | Using where |
| 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |
+—-+————-+——-+——–+——————+———+———+————–+——-+————-+

You can see that there are two more conspicuous performance bottlenecks in the execution plan:

| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |

| 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |

Since d is a left join table, the driver table will not select table d. Let's take a look at the size of the three tables a, b, and c:

mysql> select count(*) from c;
+———-+
| count(*) |
+———-+
| 53731 |
+———-+

mysql> select count(*) from a;
+———-+
| count(*) |
+———-+
| 53335 |
+———-+

mysql> select count(*) from b;
+———-+
| count(*) |
+———-+
| 105809 |
+———-+

Since the amount of data in table b is larger than the other two tables, and there are basically no query filtering conditions on table b, it is possible to exclude the driver table selection B;

The optimizer actually chose table a as the driver table, but why not table c as the driver table? Let's analyze it:

Phase 1: Table a is used as a driver table
a–>b–>c–>d:
(1):a.jg_id=b.jg_id—>(b index:PRIMARY KEY (`JG_ID`,`YH_ID`) )

(2):b.yh_id=c.yh_id—>(c index:PRIMARY KEY (`YH_ID`))

(3):c.yh_id=d.yh_id—>(d index:PRIMARY KEY (`JS_DM`,`YH_ID`))
Since there is no index of yh_id on the d table, the index adds an index on the d table:

alter table d add index ind_yh_id(yh_id);

Implementation plan:

+—-+————-+——-+——–+——————+———–+———+————–+——-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+——————+———–+———+————–+——-+————-+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | .JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | .YH_ID | 1 | Using where |
| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | .YH_ID | 272 | Using index |
+—-+————-+——-+——–+——————+———–+———+————–+——-+————-+

Execution time:

1 row in set (0.77 sec)

After adding an index on table d, the number of scanned rows in table d drops to 272 rows (start: 54584)

| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | .YH_ID | 272 | Using index |

Stage 2: Table c is used as driver table

d
^
|
c–>b–>a
Since there are filtering conditions with high filterability on the c table, we create an index on the yh_dm:

mysql> select count(*) from c where yh_dm = '006939748XX';
+———-+
| count(*) |
+———-+
| 2 |
+———-+

Add index:

alter table c add index ind_yh_dm(yh_dm)

View execution plan:

+—-+————-+——-+——–+——————-+———–+———+————–+——-+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+——————-+———–+———+————–+——-+————-+
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where |
| 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | .JG_ID | 1 | Using index |
| 1 | SIMPLE | c | eq_ref | PRIMARY,ind_yh_dm | PRIMARY | 98 | .YH_ID | 1 | Using where |
| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | .YH_ID | 272 | Using index |
+—-+————-+——-+——–+——————-+———–+———+————–+——-+————-+

Execution time:

1 row in set (0.74 sec)

After adding an index on table c, the index still did not move up, and the execution plan still uses table a as the driver table, so let's analyze why table a is still used as the driver table?

1):c.yh_id=b.yh_id—>( PRIMARY KEY (`JG_ID`,`YH_ID`) )

a. If table c is used as the driver table, when table c and table b are associated, since there is no index of yh_id field in table b, and because table b is large in amount of data, the optimizer believes that if table c is used as the driver table here, it will have a larger association with table b (here can be used straight_join to force the use of table c as the driver table);
b. If table a is used as the driving table, when table a is associated with table b, since there is an index of the jg_id field on table b, the optimizer believes that the cost of using a as the driving table is less than the cost of using c as the driving board;
So if we want to use C table as the driver table, we just need to add the index of yh_id on b:

alter table b add index ind_yh_id(yh_id);

2):b.jg_id=a.jg_id—>( PRIMARY KEY (`JG_ID`) )

3):c.yh_id=d.yh_id—>( KEY `ind_yh_id` (`YH_ID`) )
Implementation plan:

+—-+————-+——-+——–+——————-+———–+———+————–+——+————-+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+—-+————-+——-+——–+——————-+———–+———+————–+——+————-+
| 1 | SIMPLE | c | ref | PRIMARY,ind_yh_dm | ind_yh_dm | 57 | const | 2 | Using where |
| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | .YH_ID | 272 | Using index |
| 1 | SIMPLE | b | ref | PRIMARY,ind_yh_id | ind_yh_id | 98 | .YH_ID | 531 | Using index |
| 1 | SIMPLE | a | eq_ref | PRIMARY,INDEX_JG | PRIMARY | 98 | .JG_ID | 1 | Using where |
+—-+————-+——-+——–+——————-+———–+———+————–+——+————-+

Execution time:

1 row in set (0.00 sec)

You can see that the rows in the execution plan have been greatly reduced, and the execution time has also been reduced from the original 750ms to the 0ms level;