Example of cursor-based paging interface implementation code

Preface

The implementation of paging interfaces should be very common in service-oriented server development, including various tables in the PC era, various feed streams and timelines in the mobile era.

For the control of traffic or user experience, large quantities of data will not be returned directly to the client, but will be requested to return data multiple times through the pagination interface.

The most commonly used pagination interface definition is roughly like this:

('/list', async ctx => {
 const { page, size } = 

 // ...

  = {
 data: []
 }
})

// > curl /list?page=1&size=10

The page number of the requested interface and the number of requests to be requested per page. I personally guess this may be related to the database you are exposed to when you are beginners - - Among the people I know, I first come into contact with MySQL, SQL Server, etc., and similar SQL statements. This is basically a pagination condition when querying:

SELECT <column> FROM <table> LIMIT <offset>, <rows>

Or similar operations for zset in Redis are similar:

> ZRANGE <key> <start> <stop>

Therefore, you may habitually use a similar method to create a pagination request interface, allowing the client to provide two parameters: page and size.

There is no problem with this approach. The data can be displayed neatly in the PC table and mobile list.

However, this is a relatively common data paging method, suitable for data without any dynamic filtering conditions.

If the data is a very high real-time requirement, there are a lot of filtering conditions, or it needs to be compared with other data sources, it will look a bit weird if you use this method of processing.

Issues with page number + number of paging interface

To give a simple example, our company has a live broadcast business, and there must be an interface like a live broadcast list.

Data like live broadcasts are very demanding timeliness, such as popular lists and newcomers. The source of these data is offline calculated data, but such data generally only stores the user's identity or the identification of the live broadcast room. For example, the number of viewers in the live broadcast room, the duration of the live broadcast, and popularity, this type of data must have high timeliness requirements and cannot be processed in offline scripts, so it is necessary to obtain it only when the interface requests.

Moreover, some verification is required when requesting by the client, such as some simple conditions:

Make sure the anchor is broadcasting live
Ensure live broadcast content compliance
Check the blocking relationship between users and anchors

These are not possible when offline scripts are running, because they change at every moment, and the data may not be stored in the same location. The list data may come from MySQL, and the filtered data needs to be obtained in Redis, and the data related to user information is in the XXX database. Therefore, these operations cannot be solved by a table query. It needs to be performed at the interface layer and multiple copies of data are obtained for synthesis.

At this time, using the above-mentioned paging mode will lead to an embarrassing problem.

Perhaps the user who accesses the interface is more angry and blocks all the anchors on the first page, which will lead to the fact that the actual interface returns 0 pieces of data, which is very terrible.

let data = [] // length: 10
data = (filterBlackList)
return data // length: 0

In this case, should the client display it as no data or should he immediately request the second page of data?

Therefore, such a pagination design cannot meet our needs in some cases. It happened that a command in Redis was discovered: scan.

Implementation of pagination interface of cursor + number of characters

The scan command is used to iterate over all keys in the Redis database, but because the number of keys in the data cannot be determined (the number of keys will be killed if you execute the keys directly online), and the number of keys is constantly changing during your operation. Some may be deleted, and there may be new ones during this period.

Therefore, the scan command requires a cursor to be passed in. It is enough to pass in 0 when it is called for the first time. The scan command has two returns. The first item is the cursor needed for the next iteration, and the second item is a collection, indicating all keys returned by this iteration.

And scan can add regular expressions to iterate over certain keys that satisfy rules, such as all keys starting with temp_: scan 0 temp_*, and scan will not really match the key according to the rules you specified and then return it to you. It does not guarantee that N pieces of data will definitely be returned in one iteration. It is very likely that no data will be returned in one iteration.

If we explicitly need XX pieces of data, then we will just call it multiple times according to the cursor.

// Use a simple recursive implementation to get ten matching keysawait function getKeys (pattern, oldCursor = 0, res = []) {
 const [ cursor, data ] = await (oldCursor, pattern)

 res = (data)
 if ( &gt;= 10) return (0, 10)
 else return getKeys(cursor, pattern, res)
}

await getKeys('temp_*') // length: 10

This way of using it gives me some ideas and intends to implement the pagination interface in a similar way.

However, putting such logic on the client will make it very troublesome to adjust the logic later. It requires a release version to solve the problem. Compatibility with the new and old versions will also restrict later modifications.

Therefore, such logic will be developed on the server side, and the client only needs to carry the cursor returned by the interface on the next interface request.

The general structure

For clients, this is a simple cursor storage and usage.

But the logic of the server is a little more complicated:

First, we need to have a function to get the data
Secondly, there is a function for data filtering
There is a function for judging the length of the data and intercepting it

function getData () {
 // Get data}

function filterData () {
 // Filter data}

function generatedData () {
 // Merge, generate, return data}

accomplish

It has become LTS, so the sample code will use some new features of 10.

Because the list is likely to be stored as a collection, similar to a set of user identification, it is set or zset in Redis.

If the data source is from Redis, my suggestion is to cache a complete list globally, update the data regularly, and then use slices at the interface level to obtain some of the data required for this request.

. The following example code assumes that the data in the list is stored in a collection of unique IDs, and the corresponding detailed data is obtained from other databases through these unique IDs.

redis> SMEMBER list
 > 1
 > 2
 > 3

mysql> SELECT * FROM user_info
+-----+---------+------+--------+
| uid | name | age | gender |
+-----+---------+------+--------+
| 1 | Niko | 18 | 1 |
| 2 | Bellic | 20 | 2 |
| 3 | Jarvis | 22 | 2 |
+-----+---------+------+--------+

List data is cached globally

// Complete list in global cachelet globalList = null

async function updateGlobalData () {
 globalList = await ('list')
}

updateGlobalData()
setInterval(updateGlobalData, 2000) // 2s Update once

Obtaining data Implementation of filtering data functions

Because the scan example above uses recursive method, but is not very readable, we can use the generator Generator to help us achieve such requirements:

// Function to get dataasync function * getData (list, size) {
 const count = ( / size)

 let index = 0

 do {
 const start = index * size
 const end = start + size
 const piece = (start, end)
 
 // Query MySQL to get the corresponding user detailed data const results = await (`
 SELECT * FROM user_info
 WHERE uid in (${piece})
 `)

 // The functions required for filtering will be listed below yield filterData(results)
 } while (index++ &lt; count)
}

At the same time, we also need a function to filter data. These functions may obtain data from some other data sources to verify the legitimacy of the list data. For example, User A has a blacklist with users B and user C. Then when User A accesses the interface, it needs to filter B and C.
Or we need to determine the status of a certain piece of data, such as whether the anchor has closed the live broadcast room and whether the push streaming status is normal, these may call other interfaces for verification.

// Functions for filtering dataasync function filterData (list) {
 const validList = await ((async item =&gt; {
 const [
 isLive,
 inBlackList
 ] = await ([
 (`/live?target=${}`), (`XXX:black:list`, )
 ])

 // Correct state if (isLive &amp;&amp; !inBlackList) {
 return item
 }
 }))

 // Filter invalid data return (i =&gt; i)
}

The function that splices the data at the end

After the functions of the above two key functions are implemented, a function used to check and splice data needs to appear.

Used to decide when to return data to the client and when to initiate a new request to obtain data:

async function generatedData ({
 cursor,
 size,
}) {
 let list = globalList

 // If a cursor is passed, the list is intercepted from the cursor if (cursor) {
 // The function of + 1 is mentioned below list = ((cursor) + 1)
 }

 let results = []

 // Note that here is the for loop, not map, forEach or other for await (const res of getData(list, size)) {
 results = (res)

 if ( &gt;= size) {
 const list = (0, size)
 return {
 list,
 // If there is still data, then this time needs to be // We return the ID of the last item in the list as a cursor, which explains why the indexOf at the interface entrance has an operation of +1 cursor: list[size - 1].id,
 }
 }
 }

 return {
 list: results,
 }
}

A very simple for loop. The purpose of using a for loop is to make the interface request process become serial. After the first interface request gets the result, it is determined that the data is not enough. You still need to continue to obtain the data for filling. Only then will a second request be initiated to avoid extra resource waste.

After obtaining the required data, you can directly return, the loop terminates, and subsequent generators will also be destroyed.

And put this function in our interface to complete the assembly of the entire process:

('/list', async ctx => {
 const { cursor, size } = 

 const data = await generatedData({
 cursor,
 size,
 })

  = {
 code: 200,
 data,
 }
})

The return value of such a structure is probably a list and a cursor, a scan-like return value, a cursor and data.

The client can also pass in an optional size to specify the number of return numbers expected by the interface at a time.

However, compared with the ordinary page+size paging method, such interface requests will inevitably be slower (because ordinary paging may not return a fixed number of data on a page, and this may perform the operation of obtaining data multiple times internally).

However, for some interfaces with strong real-time requirements, I personally think that this implementation method will be more user-friendly.

Comparison between the two

Both methods are very good paging methods. The first one is more common, and the second one is not a magic pill, but it may be better in some cases.

The first method may be more applied to the B-side, such as work orders, reports, archived data, etc.

The second possibility is that it is better to use the C-end, after all, the products provided to users;

On the PC page, it may be a pagination table, the first one displays 10 items, the second one displays 8 items, but the third one becomes 10 items, which is simply a disaster for the user experience.

On the mobile side, the page may be relatively better, similar to the infinite scrolling waterfall flow, but there will also be 2 data that users load at one time, and 8 data that appear once. It is barely acceptable in situations other than the homepage, but if there are 2 data that appear on the homepage, tsk.

The second method, cursor, can ensure that the data returned by the interface is a size bar. If it is not enough, it means there is no data afterwards.
The experience will be better for users. (Of course, if the list does not have any filtering conditions, it is just a normal display, then it is recommended to use the first one, there is no need to add these logic processing)

summary

Of course, this is just some paging-related processing that can be done from the server, but this still does not solve all problems. Similar to some lists with faster update speeds, rankings, etc., the data per second may be changing. It is possible that when the first request is, user A is tenth, and when the interface is second requested, user A is eleventh, then the interface will have a record of user A.

In this case, the client also needs to perform corresponding deduplication processing, but this deduplication will lead to a reduction in the amount of data.
This is another big topic, and I don’t plan to talk about it. .
A simple way to deceive users is to request 16 interfaces at a time, display 10, and the remaining 6 are present locally and then spliced in the next time the interface is displayed.

If you have any errors in the article, or if you have a better way to implement the page or your favorite way, you might as well communicate.

References

redis | scan

Summarize

The above is the entire content of this article. I hope that the content of this article has certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support.