SoFunction
Updated on 2025-03-09

Regular expressions for advanced shell learning

Regular Expressions Overview

Regular expressions are defined rules that Linux tools can use to filter text.

Basic regular expressions

Plain text

[root@node1 ~]# echo "this is a cat" | sed -n '/cat/p'
this is a cat
[root@node1 ~]# echo "this is a cat" | gawk '/cat/{print $0}'
this is a cat

The matching of regular expressions is very picky, especially remember that regular expressions are case sensitive.

Special characters

Special characters recognized by regular expressions include:

.*[]^${}\+?|()

If you want to use a special character as a text character, you must escape it, generally using (\) to escape it.

[root@node1 ~]# echo "this is a $" | sed -n '/\$/p'
this is a $

Anchor characters

There are two special characters that can be used to lock the pattern at the beginning or end of the line of the data stream.

The de-character (^) defines the pattern starting from the beginning of the text line in the data stream.

The dollar sign ($) defines the end of the line anchor point.

[root@node1 ~]# echo "this is a cat" | sed -n '/^this/p'
this is a cat
[root@node1 ~]# echo "this is a cat" | sed -n '/cat$/p'
this is a cat

In some cases, these two commands can be used in combination

1. For example, look for lines that only contain specific text

[root@node1 ljy]# more   
this is a dog
what
how
this is a cat
is a dog
[root@node1 ljy]# sed -n '/^is a dog$/p' 
is a dog
[root@node

2. Combining two anchor points can directly filter blank lines

[root@node1 ljy]# more   
this is a dog
what
how
 
this is a cat
is a dog
[root@node1 ljy]# sed '/^$/d'  
this is a dog
what
how
this is a cat
is a dog

Dot character

The dot number is used to match any single character except the newline character, and it must match one character.

[root@node1 ljy]# more 
this is a dog
what
how
this is a cat
is a dog
at
[root@node1 ljy]# sed -n '/.at/p' 
what
this is a cat

Character Group

Define the specific characters to be matched and use character groups. Use square brackets to define a character group.

[root@node1 ljy]# more 
this is a dog
this is a Dog
this is a DoG
this is a cat
[root@node1 ljy]# sed -n '/[dD]og/p' 
this is a dog
this is a Dog
[root@node1 ljy]# sed -n '/[dD]o[gG]/p'  
this is a dog
this is a Dog
this is a DoG

Exclude character groups

To exclude certain elements, add a de-character before the character group.

[root@node1 ljy]# sed -n '/[dD]o[gG]/p'  
this is a dog
this is a Dog
this is a DoG
[root@node1 ljy]# sed -n '/[^D]og/p'  
this is a dog

Range

Regular expressions will include any character within this interval.

[root@node1 ljy]# more 
123123
1231
121222222
412345341613
vsdvs
qwer12344123
12345
34211
444444
[root@node1 ljy]# sed -n '/^[0-9][0-9][0-9][0-9][0-9]$/p' 
12345
34211

Expand regular expressions

question mark

The question mark indicates that the previous character appears 0 or 1 time, only this is the case.

[root@node1 ljy]# echo "bat" | gawk '/ba?t/{print $0}' 
bat
[root@node1 ljy]# echo "baat" | gawk '/ba?t/{print $0}'
[root@node1 ljy]# echo "bt" | gawk '/ba?t/{print $0}' 
bt

You can use question marks and character groups together

[root@node1 ljy]# echo "bt" | gawk '/b[ae]?t/{print $0}'
bt
[root@node1 ljy]# echo "bat" | gawk '/b[ae]?t/{print $0}'
bat
[root@node1 ljy]# echo "bet" | gawk '/b[ae]?t/{print $0}'
bet
[root@node1 ljy]# echo "baat" | gawk '/b[ae]?t/{print $0}'

Add a sign

The plus sign indicates that the preceding character can appear once or more times, but at least once.

[root@node1 ljy]# echo "baat" | gawk '/b[ae]+t/{print $0}'
baat
[root@node1 ljy]# echo "bt" | gawk '/b[ae]+t/{print $0}' 
[root@node1 ljy]# echo "bt" | gawk '/ba+t/{print $0}' 
[root@node1 ljy]# echo "bat" | gawk '/ba+t/{print $0}'
bat
[root@node1 ljy]# echo "baat" | gawk '/ba+t/{print $0}'
baat

Curly braces

Curly braces in ERE allow you to specify upper and lower limits for repeatable regular expressions.

m,n appears at least m, and at most n times.

[root@node1 ljy]# echo "baat" | gawk '/b[ae]{1,2}t/{print $0}' 
baat
[root@node1 ljy]# echo "baaat" | gawk '/b[ae]{1,2}t/{print $0}'

Pipe symbols

Specify regular expression rules in a logical or way, and one of the conditions meets the requirements.

Expression grouping

Regular expression grouping can also be grouped in parentheses.

[root@node1 ljy]# echo "bat" | gawk '/b(a|e)t/{print $0}'  
bat
[root@node1 ljy]# echo "baat" | gawk '/b(a|e)t/{print $0}'
[root@node1 ljy]# echo "bet" | gawk '/b(a|e)t/{print $0}' 
bet

Summarize

The above is the entire content of this article. I hope that the content of this article has certain reference value for your study or work. Thank you for your support.