MySql deep paging problem solved

1. Problem description

In daily development, data query paging is often involved. Generally, paging is implemented through the limit method of mysql based on the number of pages and page numbers in front-end. There is no problem with the small amount of data. However, if the amount of data is large, deep paging may lead to inefficient query and interface timeout.

2. Problem analysis

In fact, for our MySQL query statement, the overall efficiency is still OK. There are all conjunction table query optimizations, and there are also some simple query contents, and the key condition fields and sort fields should have indexes. The problem is that it querys page by page. The more pages it finds, the more data it scans, the slower it will be.

When we looked at the previous pages, we found that the speed was very fast, such as limit 200, 25, and it came out in an instant. But the further you go, the slower the speed, especially after a million pieces, it can't get stuck. So what's the principle? Let’s first look at what the sql is like when we turn the page and turn it to the back:

select * from t_name where c_name1='xxx' order by c_name2 limit 2000000,25;

This kind of query is actually caused by the large offset after limit.
For example, like the limit 2000000, 25 above, this is equivalent to scanning 2000025 pieces of data in the database, then discarding the previous 200000000 and returning the remaining 25 pieces of data to the user. This method is obviously unreasonable.

3. Verification test

3.1 Create two tables

-- Create two tables：Employee and department tables
-- Department list,Delete if it exists
drop table if EXISTS dep;
create table dep(
    id int unsigned primary key auto_increment,
    depno mediumint unsigned not null default 0,
    depname varchar(20) not null default "",
    memo varchar(200) not null default ""
);

-- Employee List,Delete if it exists
drop table if EXISTS emp;
create table emp(
    id int unsigned primary key auto_increment,
    empno mediumint unsigned not null default 0,
    empname varchar(20) not null default "",
    job varchar(9) not null default "",
    mgr mediumint unsigned not null default 0,
    hiredate datetime not null,
    sal decimal(7,2) not null,
    comn decimal(7,2) not null,
    depno mediumint unsigned not null default 0
);

Notes

mediumint is an integer in MySQL database, smaller than INT and larger than SMALLINT.
The value range is: -8388608 to 8388607, and the unsigned range is 0 to 16777215.
Medium size integer, with a single size of 3 bytes.

3.2 Create two functions

-- Create two functions：Generate random strings and random numbers
-- Functions that produce random strings
delimiter $ 
drop FUNCTION if EXISTS rand_string;
CREATE FUNCTION rand_string(n INT) RETURNS VARCHAR(255)
BEGIN
    DECLARE chars_str VARCHAR(100) DEFAULT 'abcdefghijklmlopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
    DECLARE return_str VARCHAR(255) DEFAULT '';
    DECLARE i INT DEFAULT 0;
    WHILE i &lt; n DO
    SET return_str = CONCAT(return_str,SUBSTRING(chars_str,FLOOR(1+RAND()*52),1));
    SET i = i+1;
    END WHILE;
    RETURN return_str;
END $
delimiter;

-- Functions that generate random department numbers
delimiter $ 
drop FUNCTION if EXISTS rand_num;
CREATE FUNCTION rand_num() RETURNS INT(5)
BEGIN
    DECLARE i INT DEFAULT 0;
    SET i = FLOOR(100+RAND()*10);
    RETURN i;
END $
delimiter;

Notes
-- Execution function problem，This function has none of DETERMINISTIC, NO SQL, or READS SQL DATA in its de
-- This is what we've startedbin-log, We must specify whether our function is，DETERMINISTIC Uncertain， NO SQL NoSQlStatement，Of course, the data will not be modified.
-- existMySQLSolution to this error when creating a function in：set global log_bin_trust_function_creators=TRUE;
set global log_bin_trust_function_creators=TRUE;

3.3 Writing stored procedures

-- Writing stored procedures，simulation 100W Employee data。
-- Create stored procedures：PastempInsert data in the table
 DELIMITER $
 drop PROCEDURE if EXISTS insert_emp;
 CREATE PROCEDURE insert_emp(IN START INT(10),IN max_num INT(10))
 BEGIN
     DECLARE i INT DEFAULT 0;
     /*set autocommit =0 Set autocommit to 0, close the default commit*/
     SET autocommit = 0;
     REPEAT
     SET i = i + 1;
     INSERT INTO emp(empno,empname,job,mgr,hiredate,sal,comn,depno) VALUES ((START+i),rand_string(6),'SALEMAN',0001,now(),2000,400,rand_num());
     UNTIL i = max_num
     END REPEAT;
     COMMIT;
 END $
 DELIMITER;
 
-- insert500WData，A little long，Wait patiently，1409s
 call insert_emp(0,5000000);

-- Query department employee list
select * from emp LIMIT 1,10;

3.4 Writing stored procedures

-- Writing stored procedures，simulation 120 Departmental data
-- Create stored procedures：PastdepInsert data in the table
 DELIMITER $
 drop PROCEDURE if EXISTS insert_dept;
 CREATE PROCEDURE insert_dept(IN START INT(10),IN max_num INT(10))
 BEGIN
     DECLARE i INT DEFAULT 0;
     SET autocommit = 0;
     REPEAT
     SET i = i+1;
     INSERT  INTO dep( depno,depname,memo) VALUES((START+i),rand_string(10),rand_string(8));
     UNTIL i = max_num
     END REPEAT;
     COMMIT;
 END $
 DELIMITER;
 
-- insert120Data
 call insert_dept(1,120);

-- Query department employee list
select * from dep;

3.5 Creating an index

-- Create index of key fields，This is to create an index after running the data，It will take a long time to create an index，But the data will be faster。
-- Create index of key fields:Sort、condition
CREATE INDEX idx_emp_id ON emp(id);
CREATE INDEX idx_emp_depno ON emp(depno);
CREATE INDEX idx_dep_depno ON dep(depno);

3.6 Verification Test

-- Verification test
-- The offset is100，Pick25，Time: 0.011s
SELECT ,,,,,
from emp a left join dep b on  =  order by  desc limit 100,25;

-- The offset is4800000，Pick25，Time: 10.242s
SELECT ,,,,,
from emp a left join dep b on  =  order by  desc limit 4800000,25;

4. Solution

4.1 Optimization with index overlay + subquery

Because we have the primary key id and have an index created on it, we can first find the id value at the start position in the index tree, and then query the row data based on the found id value.

-- Subquery to get offset100The position of the barid，Take back at this position25，Time: 0.04s
 SELECT ,,,,,
 from emp a left join dep b on  = 
 where  &gt;= (select id from emp order by id limit 100,1)
 order by  limit 25;

-- Subquery to get offset4800000The position of the barid，Take back at this position25，Time: 1.549s
 SELECT ,,,,,
 from emp a left join dep b on  = 
 where  &gt;= (select id from emp order by id limit 4800000,1)
 order by  limit 25;

4.2 Start position redefinition

Remember to find the primary key position of the result last time and avoid using offset offset.

This efficiency is the best. No matter how you paging it, the time is basically the same, because after executing the conditions, it only scans 25 pieces of data.

But there is a problem, it is only suitable for page-by-page pagination, so that you can remember the last id of the previous page. If the user jumps to pages, there will be problems. For example, if you just finished page 25 and jump to page 35 immediately, the data will be incorrect. This kind of suitable scenario is a situation where the scroll wheel is pulled down and continuously pulling and loading. This delayed loading ensures that the data will not jump and be retrieved.

-- Remember the last piece of data from the last page of the pageidyes100，Just skip this100，from101Start scanning the table，Time: 0.006s
 SELECT ,,,,,,
 from emp a left join dep b on  = 
 where  &gt; 100 order by  limit 25;

-- Remember the last piece of data from the last page of the pageidyes4800000，Just skip this4800000，from4800001Start scanning the table，Time: 0.046s
 SELECT ,,,,,,
 from emp a left join dep b on  = 
 where  &gt; 4800000
 order by  limit 25;

4.3 Downgrade strategy

I read a plan shared by a DBA student from Alibaba on the Internet: configure the offset of limit and obtain the maximum value. If this maximum value exceeds the maximum value, empty data will be returned.
Because he feels that you are no longer paging than this value, but are swiping data. If you confirm that you want to find data, you should enter appropriate conditions to narrow the scope instead of paging page by page.

5. Sorting and summarizing

The deep paging problem is theoretically an existing scenario, but considering the actual business scenario, deep paging lacks specific business scenarios for support in many cases. Imagine which business will query 25 pieces of data from a 480W page. If you need to search for a certain piece of data, the most used one should filter according to the condition type.

Each solution has its own advantages and disadvantages. Which solution is used needs to be combined with the specific business scenario. If deep paging is not required according to the actual business scenario, a downgrade strategy can be adopted and the paging parameter threshold can be set. If you really need deep paging problems, you can override sub-query optimization or pass offset query. If you can get the offset, you will be given priority to select the offset scheme, otherwise you will use the override index + sub-query.

Whether it is deep paging or not, the issue of current limiting and downgrading should be considered, and the issue of repeated calls in a short time should be considered. The number of executions per second can be limited to avoid user errors and data security problems caused by frequent calls.

This is the article about solving the problem of MySql deep paging. For more related content on MySql deep paging, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!