SoFunction
Updated on 2025-03-10

MongoDB regular expressions and applications

Regular expressions are often used to search for any pattern or literal of a string in all languages. MongoDB also provides regular expression function string mode using the regular expression $regex operator. MongoDB uses PCRE (Perl compatible regular expressions) as the regular expression language.

Unlike text search, we can use regular expressions directly without any configuration or commands.

Consider a collection of posts containing text and its tags, with the following document structure:

{
 "post_text": "enjoy the mongodb articles on yiibai",
 "tags": [
  "mongodb",
  "yiibai"
 ]
}

Express using regular expressions

The following regular expression query searches for all posts containing strings:

Copy the codeThe code is as follows:

>({post_text:{$regex:""}})

The same query can also be written as:

>({post_text://})

Use regular expressions insensitive

To make search case-insensitive, we use $options with the value parameter $i. The following command will search for strings: regardless of case:

Copy the codeThe code is as follows:

>({post_text:{$regex:"yiibai",$options:"$i"}})

The result of this query readjustment is: where the word yiibai document is contained in the size, such as the following:

{
 "_id" : ObjectId("53493d37d852429c10000004"),
 "post_text" : "hey! this is my post on Yiibai", 
 "tags" : [ "yiibai" ]
} 

Array elements using regular expressions:

We can also use the array field regular expression concept. At this time, it is particularly important for us to implement the function of labels. So, if you want to search for all posts that start with the phrase tutorial (whether it is tutorial or tutorials or tutorialjava or tutorialphp) with tags, you can use the following code:

Copy the codeThe code is as follows:

>({tags:{$regex:"tutorial"}})

Optimize regular expression query:

If the document field has been indexed, the query uses a matching regular expression using the index value. This makes searching very fast, with regular expressions relative to scanning the entire set.

If the regular expression is a prefix expression, all matches start with a string of characters. For example, if the regular expression ^tut, the query has only searched for those start strings tut.

mongodb regular expression application

Regular expressions are fully supported in mongodb. Generally, operator $regex can be used in queries.

( { 'name': /*./i } )
( { 'name': { $regex: '*.', $options: 'i' } } )

The above two are completely equivalent. You can use regular expressions or operators directly for the field (field), that is, the 'name' key in the above example. The optional item is i, that is, case is ignored.
Regarding regular options, mongodb and other language standard regulars are slightly different and have their own standards.

Optional value for $options

i Ignore case;

m Multi-line search. If there is no newline symbol in the content (for example, \n) or there is no (start/end) in the construction, this option has no effect;
The x whitespace character is completely ignored except for escaped or in the character class. All characters, including both ends, between # except the unescaped character class, and between the next newline character, including both ends, are also ignored;

s dot metacharacter (.) matches all characters, including newlines

Suppose we have a database named mongoDemo

use mongoDemo

There is a collection in the database called lnmopy

()

There are the following data:

{ "_id" : ObjectId("502dd63d16a25b1ff6000000"), "name" : "", "site" : "website", "tag" : "l,n,m,o,p,y"}
{ "_id" : ObjectId("502dd63d16a25b1ff6000000"), "name" : "", "site" : "unknown", "tag" : "d,e,m,o"}
{ "_id" : ObjectId("502dd63d16a25b1ff6000000"), "name" : "", "site" : "website", "tag" : "w,e,l,c,o,m,e"}

Mongodb's regular expression only supports javascript native writing methods for i and m (such as /*./i). If you want to use the x and s options, you must use the "$regex" operator and specify the options in "$options".

Update operations using regular expressions:

( { 'name': /*./i }, { $set: { 'site':'' } } );

It means that you can find the entries in the current database with the "name" field in the "/*./i" regularity, and only update the "site" field to "". The update statement only updates one data. If you do not use $set, then the record will only have the updated part and the default ObjectId left, which can be said to be a replacement. If you want to replace all, you can add parameters:

( { 'name': /*./i }, { $set: { 'site':'' } } , false, true);

The parameters are in order, false is upsert, if not, insert new one. true means multi-record updates, all matching results. Or directly specify { multi: true }:
( { 'name': /*./i }, { $set: { 'site':'' } } , { multi: true });
This will update all the "site" fields to "".

The field "tag" I designed has a flaw, that is, it was originally a word, but now every letter is separated by ", ". There are similar problems in actual work. Due to batch conversion of data, improper operation of other programs, or changes in business requirements, etc., the general update method of mongodb cannot be implemented, so JavaScript statements are needed.

Regular expression replacement query result in ',' is "

().forEach( function(u) { = (/\,/, ""); (u); } );

Final execution

()

The following data is displayed:

{ "_id" : ObjectId("502dd63d16a25b1ff6000000"), "name" : "", "site" : "", "tag" : "lnmopy"}
{ "_id" : ObjectId("502dd63d16a25b1ff6000000"), "name" : "", "site" : "", "tag" : "demo"}
{ "_id" : ObjectId("502dd63d16a25b1ff6000000"), "name" : "", "site" : "", "tag" : "welcome"}

Postscript: Javascript is a major feature of mongodb and an advantage. Many complex queries and processing can be implemented in JavaScript. It should be noted that Javascript is less efficient, and in principle, it should be avoided to use it in the main business logic as much as possible. In analogy, JavaScript is equivalent to oracle stored procedures. It is not surprising that 10gen (mongodb's development team) is derived from oracle. I'll write about how to use more complex javascript later.