Solve the problem of high mongo database CPU occupancy caused by regular representation matching ($regex)

One day, the usage rate of mongo database CPU was much higher. After checking, I found that it was caused by the following statement:

db.example_collection.find({
 "idField" : 
{ "$regex" : "123456789012345678"
} ,
 "dateField" : 
{ "$regex" : "2019/10/10"
}})

Usually, in this case, my first reaction is to lack the index of the relevant fields, which leads to the full table scanning every time this statement is executed.

However, I analyzed it using the explain( ) statement and found that the two fields idField and dateField mentioned above have indexes, and this statement also uses indexes. The following is the result of explain( ):

mgset-11111111:PRIMARY> db.example_collection.find({ "idField" : { "$regex" : "123456789012345678"} , "dateField" : { "$regex" : "2019/10/10"}}).explain("queryPlanner")
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "example_db.example_collection",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [
                {
                    "idField" : {
                        "$regex" : "123456789012345678"
                    }
                },
                {
                    "dateField" : {
                        "$regex" : "2019/10/10"
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "inputStage" : {
                "stage" : "IXSCAN",
                "filter" : {
                    "$and" : [
                        {
                            "idField" : {
                                "$regex" : "123456789012345678"
                            }
                        },
                        {
                            "dateField" : {
                                "$regex" : "2019/10/10"
                            }
                        }
                    ]
                },
                "keyPattern" : {
                    "idField" : 1,
                    "dateField" : 1
                },
                "indexName" : "idField_1_dateField_1",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "idField" : [ ],
                    "dateField" : [ ]
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2,
                "direction" : "forward",
                "indexBounds" : {
                    "idField" : [
                        "[\"\", {})",
                        "[/123456789012345678/, /123456789012345678/]"
                    ],
                    "dateField" : [
                        "[\"\", {})",
                        "[/2019/10/10/, /2019/10/10/]"
                    ]
                }
            }
        },
        "rejectedPlans" : [ ]
    },
    "ok" : 1
}

Looking at the mongo log, it was found that it took 800~900ms to execute this statement once, which is indeed relatively slow. Unless there are many CPU cores in the database, as long as this statement is slightly higher per second, the CPU will be filled soon.

After searching, I found that it might be a problem with regular expressions. It turns out that although this statement does use an index, there is a field "indexBounds" in the output of the explain( ) statement, indicating the index range that needs to be scanned when executing the statement. To be honest, in the output above, I never understood its index range. The above statement performs ordinary regular expression matching for both fields idField and dateField. I guess it should have scanned the entire index tree, so the index does not actually improve the query efficiency of the statement.

I looked at the data in the database and found that there is no need to perform regular matching of the two fields, idField and dateField, just perform normal text matching. After removing the regular matching operation $regex, analyze it again, and the result is as follows:

mgset-11111111:PRIMARY> db.example_collection.find({ "idField" : "123456789012345678", "dateField" : "2019/10/10"}).explain("queryPlanner")
{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "example_db.example_collection",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [
                {
                    "idField" : {
                        "$eq" : "123456789012345678"
                    }
                },
                {
                    "dateField" : {
                        "$eq" : "2019/10/10"
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "FETCH",
            "inputStage" : {
                "stage" : "IXSCAN",
                "keyPattern" : {
                    "idField" : 1,
                    "dateField" : 1
                },
                "indexName" : "idField_1_dateField_1",
                "isMultiKey" : false,
                "multiKeyPaths" : {
                    "idField" : [ ],
                    "dateField" : [ ]
                },
                "isUnique" : false,
                "isSparse" : false,
                "isPartial" : false,
                "indexVersion" : 2,
                "direction" : "forward",
                "indexBounds" : {
                    "idField" : [
                        "[\"123456789012345678\", \"123456789012345678\"]"
                    ],
                    "dateField" : [
                        "[\"2019/10/10\", \"2019/10/10\"]"
                    ]
                }
            }
        },
        "rejectedPlans" : [ ]
    },
    "ok" : 1
}

As you can see, the index is still used, and the index scan range is limited to only one value.

Later, I confirmed with the developer that there is no need to use regular matches for this statement, so I asked him to remove the regular matches. There was no problem after that, and the statement did not appear in the mongo slow log.

Summarize

The above is the problem of solving the problem of high mongo database CPU usage caused by regular representation matching ($regex). I hope it will be helpful to everyone. If you have any questions, please leave me a message and the editor will reply to everyone in time. Thank you very much for your support for my website!
If you think this article is helpful to you, please reprint it. Please indicate the source, thank you!