Redis's practice and optimization guide for efficient query of big data

1. Introduction

Redis is a high-performance key-value storage database that is widely used in cache, rankings, counters and other scenarios. In actual business, we often need to query data that meets specific conditions, such as finding key-value pairs whose value is greater than a certain threshold (such as 10). However, directly traversing all keys and checking one by one with the GET command can cause performance problems, especially when the data volume is large.

This article will discuss how to efficiently query data in Redis that meets conditions, from the initial simple implementation to the optimized efficient solution, and combine Java code examples to help developers master the best practices of Redis data query.

2. Problem background

Suppose we have the following requirements:

The Redis database DB1 (-n 1) stores a large number of keys such as flow:count:1743061930:*.

You need to find out all key-value pairs with value > 10 and count the total.

Initial implementation plan

The original shell script is as follows:

redis-cli -h 10.206.0.16 -p 6379 -n 1 --scan --pattern "flow:count:1743061930:*" | \
while read key; do
value=$(redis-cli -h 10.206.0.16 -p 6379 -n 1 GET "$key")
if [ "$value" != "1" ]; then
echo "$key: $value"
fi
done | tee /dev/stderr | wc -l | awk '{print "Total count: " $1}'

Problems with this program:

Multiple Redis queries: Each key must be executed separately, which has a large network overhead.

Shell strings are relatively inefficient: [ "$value" != "1" ] is a string comparison, and numerical comparisons are more appropriate.

Too many pipelines: tee, wc, awk Multiple pipelines affect performance.

3. Optimization plan

3.1 Optimizing Shell Scripts

Optimized version:

redis-cli -h 10.206.0.16 -p 6379 -n 1 --scan --pattern "flow:count:1743061930:*" | \
while read key; do
redis-cli -h 10.206.0.16 -p 6379 -n 1 GET "$key"
done | \
awk '$1 > 10 {count++; print} END {print "Total count: " count}'

Optimization points:

Reduce Redis command calls: directly batch get value to reduce network overhead.
Use awk for numerical comparisons: $1 > 10 is more efficient than shell string comparisons.
Merge counting logic: awk completes filtering, output and counting at the same time.

If you still need to keep the key name:

redis-cli -h 10.206.0.16 -p 6379 -n 1 --scan --pattern "flow:count:1743061930:*" | \
while read key; do
value=$(redis-cli -h 10.206.0.16 -p 6379 -n 1 GET "$key")
echo "$key: $value"
done | \
awk -F': ' '$2 > 10 {count++; print} END {print "Total count: " count}'

3.2 Optimization with Redis Pipeline

Shell scripts still have multiple GET problems, we can use Redis Pipeline to obtain data in batches to reduce network round trip time.

Optimized Shell + Pipeline Solution

redis-cli -h 10.206.0.16 -p 6379 -n 1 --scan --pattern "flow:count:1743061930:*" | \
xargs -I {} redis-cli -h 10.206.0.16 -p 6379 -n 1 MGET {} | \
awk '$1 > 10 {count++; print} END {print "Total count: " count}'

Here, use xargs + MGET to batch get value to reduce the number of network requests.

4. Java implementation solution

In Java applications, we can use Jedis or Lettuce clients to optimize queries.

4.1 Query with Jedis

import ;
import ;
import ;
import ;

public class RedisValueFilter {
    public static void main(String[] args) {
        String host = "10.206.0.16";
        int port = 6379;
        int db = 1;
        String pattern = "flow:count:1743061930:*";
        int threshold = 10;

        try (Jedis jedis = new Jedis(host, port)) {
            (db);

            ScanParams scanParams = new ScanParams().match(pattern).count(100);
            String cursor = "0";
            int totalCount = 0;

            do {
                ScanResult&lt;String&gt; scanResult = (cursor, scanParams);
                List&lt;String&gt; keys = ();
                cursor = ();

                // Bulk acquisition values                List&lt;String&gt; values = ((new String[0]));

                // Filter and count                for (int i = 0; i &lt; (); i++) {
                    String key = (i);
                    String valueStr = (i);
                    if (valueStr != null) {
                        int value = (valueStr);
                        if (value &gt; threshold) {
                            (key + ": " + value);
                            totalCount++;
                        }
                    }
                }
            } while (!("0"));

            ("Total count: " + totalCount);
        }
    }
}

Optimization points:

Use SCAN instead of KEYS to avoid blocking Redis.
Use MGET batch queries to reduce network overhead.
Direct numerical comparison to improve efficiency.

4.2 Using Lettuce (asynchronous non-blocking)

Lettuce is a high-performance Redis client that supports asynchronous queries:

import .*;
import ;
import ;

public class RedisLettuceQuery {
    public static void main(String[] args) {
        RedisURI uri = ("redis://10.206.0.16:6379/1");
        RedisClient client = (uri);

        try (RedisConnection&lt;String, String&gt; connection = ()) {
            RedisCommands&lt;String, String&gt; commands = ();
            String pattern = "flow:count:1743061930:*";
            int threshold = 10;
            int totalCount = 0;

            ScanCursor cursor = ;
            do {
                ScanArgs scanArgs = (pattern).limit(100);
                KeyScanCursor&lt;String&gt; scanResult = (cursor, scanArgs);
                List&lt;String&gt; keys = ();
                cursor = (());

                // Bulk acquisition values                List&lt;KeyValue&lt;String, String&gt;&gt; keyValues = ((new String[0]));

                for (KeyValue&lt;String, String&gt; kv : keyValues) {
                    if (()) {
                        int value = (());
                        if (value &gt; threshold) {
                            (() + ": " + value);
                            totalCount++;
                        }
                    }
                }
            } while (!());

            ("Total count: " + totalCount);
        } finally {
            ();
        }
    }
}

Advantages:

Non-blocking I/O, suitable for high concurrency scenarios.

Supports Reactive programming (such as RedisReactiveCommands).

5. Performance comparison

plan	Query method	Network overhead	Applicable scenarios
Original Shell	Single GET traversal	high	A small amount of data
Optimize Shell + awk	Batch GET	middle	Medium data volume
Shell + Pipeline	MGET Batch	Low	Large data volume
Java + Jedis	SCAN + MGET	Low	Production environment
Java + Lettuce	Asynchronous SCAN	lowest	High concurrency

6. Conclusion

Avoid KEYS command: Use SCAN instead to prevent blocking Redis.
Reduce network requests: Use MGET or Pipeline batch queries.
Numerical comparison optimization: Use awk or Java to directly compare numerical values, not strings.
Production recommendation: Java + Jedis/Lettuce solution, suitable for large-scale data query.

Through optimization, we can significantly improve the efficiency of Redis big data query, reduce server load, and be suitable for high concurrent production environments.

This is the article about the detailed guide to the practice and optimization of Redis efficiently querying big data. For more related Redis data, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!