Organize five common methods of deep paging in Mysql

In the case of very large data volume, deep paging queries become very common. Deep paging will cause MySQL to need to scan a large amount of previous data, which is inefficient. For example, when using LIMIT 100000, 10, MySQL needs to scan the first 100000 pieces of data to find the data on page 10000.
To solve the problem of deep paging in MySQL, you can achieve it through the following 5 optimization solutions:

Solution 1: Deferred Join

Principle: First get the primary key through subquery, and then associate the original table to get the complete data

Usually, we directly query data with larger pagings with slower rates. We can choose to query the primary key column first, because it can be queryed through index and has the fastest speed, and then match the corresponding data according to the obtained primary key.

SELECT t.* 
FROM user t
INNER JOIN (
SELECT id 
FROM user 
ORDER BY sort_field 
LIMIT 100000, 10
) AS tmp ON  = ;

Solution 2: Order-based Pagination

Requirements: There is an ordered unique key in the table (such as self-increment ID)

The principle of this method is that after we conduct a range query, we need to record the line number at the end of the page. When querying the range data starting with the line number, we directly match according to the line number, avoiding scanning the previous data.

-- Assume that the last record on the previous page is knownidfor12345
SELECT * 
FROM user 
WHERE id &gt; 12345 
ORDER BY id 
LIMIT 10;

Plan 3: Bookmark Pagination

Principle: Record the sort field value of the last data on the previous page

-- Assumecreate_timeSort，Last recorded on the previous pagecreate_timefor'2023-01-01 12:00:00'
SELECT * 
FROM user 
WHERE create_time &gt; '2023-01-01 12:00:00' 
ORDER BY create_time 
LIMIT 10;

Plan 4: Approximate Pagination

Applicable scenarios: Approximate paging of allowable errors

Suitable for scenarios with a huge amount of data, that is, the primary key no longer performs pagination query, but obtains the approximate line number range through estimates, and then matches the data rows through the primary key (this scheme may have errors and needs to be selected according to the scene)

-- Get the estimated offset first
SELECT COUNT(*) 
FROM user 
WHERE sort_field &lt; {target_value};
 
 
-- Use delay association to obtain accurate data
SELECT t.* 
FROM user t
INNER JOIN (
SELECT id 
FROM user 
WHERE sort_field &lt; {target_value} 
ORDER BY sort_field 
LIMIT 10
) AS tmp ON  = ;

Solution 5: Caching optimization

Applicable scenarios: Fixed sorting paging for high-frequency access

Pre-generate paging results for common sorting methods
Cache intermediate results using Redis, etc.
Preferential reading of cached data during query

Performance comparison (1 million data test)

plan	Traditional LIMIT	Delayed association	Ordered unique key	Bookmark pagination
Time-consuming for 1000 pages of query	2.3s	420ms	8ms	12ms
Memory usage	high	middle	Low	Low

Best Practice Recommendations

1. Priority is given to using ordered unique key pages (such as self-increment ID), and the time complexity is reduced from O(n) to O(1)

2. Index the sorting fields of high-frequency queries

3. Select a plan in combination with business scenarios:

High real-time requirements → Plan 2/3
Extreme amount of data → Plan 4/5
Allowable Error → Solution IV

4. For paging requirements of more than 100,000 pieces of data, it is recommended to use scroll load (unlimited pull-down) mode instead.

Method supplement

1. Optimization method for ordinary paging

Generally, when the pagination is not very deep, we can generally solve most of the pagination problems through the following methods.

By adding primary key sort, for example: order by id

If you need to sort by time, add an index to commonly used fields, including time fields. For example: order by create_time

The above two methods can actually solve most of the paging problems. However, if the number of pages afterwards is very deep, for example, starting from 100w, we will find that it will be very slow to execute the SQL statement. This is because when the MySQL optimizer finds that the number of rows in the SQL query exceeds a certain proportion, it will automatically convert it into a full table scan. You can simulate the data and test it yourself.

What is Mysql's depth pagination?

The scenario where the query is too large for pagination, we call it depth paging. For example, the following SQL statement is a typical depth paging scenario.

SELECT * FROM t_xxx ORDER BY id LIMIT 1000000, 20

2. Optimization solution for deep paging

Force index (not recommended)

At the beginning, I thought about using force index to force index, but my leader told me that it is not recommended to add force index to perform SQL optimization. There are mainly the following disadvantages:

Indexes that affect the best selectivity: Forced indexing may affect the index that best selectivity of the database engine, resulting in a degradation in query performance
Increase the time of update operation: After forced use of indexes, the time of database update operation will increase because the index file needs to be updated.
Reduce the flexibility of query: If the forced use of indexes is too fixed, it will reduce the flexibility of query and is inconvenient for later maintenance.

ID range query

If you don't need page numbers, such as sliding loading (message list), or website pagination that only clicks on the upper and lower page buttons, we can optimize it through where id > #{id of the last record in the last query}

# Query the data with the specified ID rangeSELECT * FROM t_xxx WHERE id &gt; 1000000 AND id &lt;= 1000020 ORDER BY id
# You can also query the next page by recording the ID of the last record that records the last query result.SELECT * FROM t_xxx WHERE id &gt; 1000000 LIMIT 20

Subquery + INNER JOIN

You can first query the id according to the time field (create_time) or id sort, such as:

SELECT id FROM t_xxx ORDER BY create_time DESC LIMIT 1000000,20

This subquery is first found as a temporary table, and then let the main table join the temporary table to connect the information field corresponding to t_xxx required for querying the table. This can also achieve a good effect. The final SQL statement is like this:

SELECT * FROM t_xxx INNER JOIN (SELECT id FROM t_xxx WHERE name = 'xxx' ORDER BY id LIMIT 1000000,20) AS t_temp ON t_xxx.id = t_temp.id

Subquery + ID filtering

It can also be optimized through subquery + ID filtering optimization, for example:

SELECT * FROM t_xxx WHERE name = 'xxx' AND id >(SELECT id FROM t_xxx WHERE name = 'xxx' ORDER BY id LIMIT 1000000,1) ORDER BY id LIMIT 20

This is the article compiled by this article about five common methods of deep paging in Mysql. This is all about it. For more related content on deep paging in Mysql, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!