MySQL deep paging optimization method

MySQL deep paging optimization

The deep paging problem in MySQL usually refers to when we passLIMITStatements query data, especially when flipping to the next page number, performance will drop sharply.

For example, querying data on page 1000, with 10 items per page, the system needs to skip the first 9990 items before obtaining the required records, which is very inefficient on large data sets.

The traditional deep paging implementation method is usually usedOFFSETandLIMITDirectly do pagination query:

SELECT * FROM table
ORDER BY some_column
LIMIT 9990, 10;

This causes the database to scan a large number of unwanted rows and then discard them in order to get the data it really needs.

How delayed association works

Delay association optimizes performance through two-step query:

Quick positioning: First, run the quick query only on the index to quickly locate the location of the required data. This step does not get all fields, only the primary key or the column used for sorting.
Accurate acquisition: Then, based on the primary key (or a few columns) obtained by querying the first step, do the query in the second step to accurately obtain all required data fields.

Example: YespostsTable andcommentssurface.

-- Query articles with specific tagsID
SELECT post_id
INTO TEMPORARY TABLE temp_post_ids
FROM posts
WHERE tags LIKE '%Specific Tag%';

-- Use temporary table data for association query
SELECT p.*, c.*
FROM temp_post_ids t
JOIN posts p ON t.post_id = 
LEFT JOIN comments c ON  = c.post_id;

Why can improve performance

Reduce data scans: The first step is to query only run on the index, which greatly reduces the number of data scans. Because indexes are usually much smaller than complete data rows, and the database can more efficiently sort and paging on indexes.
Reduce IO operations: The complete row of data will only be retrieved in the second step of query, which reduces the IO operation of the database, especially when the table contains a large number of large fields (such asTEXT, BLOBtype).
Make full use of indexes: Generally, the first step of query can make full use of indexes to maximize query efficiency.

Maximum ID query method

Using the maximum ID query method, we take advantage of the nature that IDs in the database are usually autoincremental (or at least ordered).

By recording the ID of the last record returned by the last query, we only need to select records with ID greater than this value during the next query, which avoids scanning and skipping all previous records.

advantage:

Performance improvement: This method reduces the load on the database, especially for large data sets. Because it only querys the required data, it avoids a large number of useless scanning.
Scalability: As the amount of data increases, traditionalOFFSETThe performance of the method is reduced, while the performance of the maximum ID method is not significantly reduced, which is suitable for scenarios with large data volumes.
Simple and effective: Simple implementation, but can significantly improve the performance of paging query.

shortcoming:

Relying on an ordered ID: The effectiveness of this method depends on an ordered ID (such as auto-increment ID). This method does not work if there is no ordered, monotonically increasing field in the database table.
Not suitable for complex sorting requirements: This method may no longer be applicable when the query needs to be sorted based on other fields. For example, if you need to paging based on time or other non-incremental fields, the maximum ID method cannot be used directly.
Processing of data deletion or update: If records in the data table are deleted, this may cause certain IDs to be skipped, affecting the continuity of the paging. Similarly, if the ID is updating, this approach will also encounter problems.
Non-isometric pagination: When using the maximum ID for paging, if there are a large number of deletion operations in the data table, resulting in a large interval of IDs, and there may be inconsistent data volume per page. Although this is not a big problem, it may affect the user experience in some application scenarios.
Home page data dynamic changes: If your application scenario needs to frequently display the latest status of data, using the maximum ID paging method may cause the latest added records to be not displayed instantly. For example, when the user is browsing the second page, if new data is added on the home page, the user may not see the new data when returning to the home page, because the query's start ID has changed.
Not applicable for random access: For scenarios that need to jump directly to the specified page (for example, the user jumps directly to page 100), the maximum ID method is difficult to implement because you cannot directly know what the ID is at the beginning of page 100 unless you maintain an additional mapping table for each page start ID.

Summarize

The above is personal experience. I hope you can give you a reference and I hope you can support me more.