background
During the use of Linux, it is often necessary to find files, and the distinction between wildcard patterns and regular expressions in commands is not very clear. It is necessary to study it carefully.
1 Literacy
Wildcards and regular expressions
When using the command line, there is a lot of time spent searching for the files you need, such asls find
wait. Shell provides a complete set of string pattern matching rules, or metacharacters. When Shell encounters the above characters, it treats them as special characters instead of ordinary characters in the file name, so that users can use them to match the corresponding file name. I understand this can be called wildcards.
There is a difference between wildcard characters and regular expressions. Simply put, wildcard characters are used for wildcard characters, and regular expressions are used to match strings;
Wildcards are tools that come with shells to match file names, and are mostly used on file names, such as searching.find
,ls
,cp
etc. Regular expressions need the support of specific commands before they can be used, such as:grep
、sed
andawk
(known as the Three Musketeers of Linux),vi/vim
、perl
etc. These are tools for processing text.
Secondly, shells also handle wildcards and regular expressions differently.“ ”
Generally, it is a wildcard character (it is the shell itself extracted).‘ ’
Generally, regular expressions are regular expressions (shell will pass the data in them to other commands for processing).
Wildcard
Common metacharacters:
-
*
: Match zero or multiple arbitrary characters -
?
: Match any single character -
[]
: Specify multiple characters in brackets, such as: [rwc] or [r,w,c] -
[^]
or[!]
: Match any character except the characters in brackets
To quote the following pattern, you need to install another [] outside,tr
No orders are required (Is this the regulations?)
-
[:digit:]
: Match any number -
[:lower:]
: any lowercase character -
[:upper:]
: any capital character -
[:alpha:]
:Arbitrary uppercase and lowercase letters -
[:alnum:]
: Any letter or number -
[:space:]
: A space -
[:punct:]
: Punctuation marks
Regular expressions
I won't introduce regular expressions specifically here, I just want to talk about the difference between them and wildcard characters
First, there is no matching number of wildcard characters
-
*
: Match the previous characters zero or multiple times -
.
: Match any character -
?
: The previous characters are zero or once, what is the basic regularity? -
+
: The previous characters are at least once, the basic regularity is + -
[]
: Exactly the same as wildcard -
[^]
: Same as wildcards, but not[!]
How to write
2 Detailed introduction to wildcard characters
Test data
touch a b c A
*
Represents any multiple characters
- Example: Query
.log
Ending filell *.log
?
Represents any single character
- Example: Query only
a、b、c
ll ?
[]
represent[
and]
A certain character between, for example[0-9]
Can represent any number between 0-9,[a-zA-Z]
Can representa-z
andA-Z
Any letter between the letters, the letters are case sensitive.
- Example: Query only letter files
ll [a-zA-Z]
- Example: Query
.log
Ending and.log
File with only two characters first and the second character is a numberll ?[0-9].log
^
It means inverting the matching result. Note that this wildcard must be used in []
- Example: The query is not
.txt
Ending filell *[^txt]*
{}
Indicates multiple files included in brackets
- Example: Query
.log
and.txt
Ending filell {*.log,*.txt}
Notice:.
This match is quite special. If the matching condition is added to the matching condition, it means that the query result file will contain the query result file.
Files
For example, the previous one^
Example, if I query thisll *.[^txt]*
Then the result will be different
Delete operation
- For example: delete
a、b、c
And.txt
Ending file
rm -f {[abc],*.txt}
Of course, since you can query and of course you can use wildcard matching to move files. If you need to move certain types of files in folders where many files exist, the efficiency of using wildcard matching is obvious; at that time, there are not only these techniques for using wildcards, but you can study them more if you have time.
3 Examples
-
*
Match any string in the file name, including empty strings. -
?
Match any single character in the file name. -
[...]
Match any characters contained in [ ]. -
[!...]
Match [ ] China-Africa exclamation mark! The following characters. The same effect as ^
like:
-
5*
All strings starting with 5 -
*5
All strings ending with 5 -
*5?
A string with 5 as the penultimate character -
[0-9]
All characters with numbers -
[1,2]
1 or 2 -
[!0-9]
Characters that are not numbers -
ls /etc/[!a-n]*.conf
List files in the /etc/ directory that do not start with letters a to n and end with .conf -
ls /etc/[a-n]*.conf
List files in the /etc/ directory that start with letters a to n and end with .conf -
ls /bin/[ck]*
List file names starting with c or k
4 detailed introduction to regular expressions
Regular expressions (also known as "regex" or "regexp") are special syntaxes used to describe text patterns.
On Linux systems, regular expressions are often used to find patterns in text, as well as perform "search-replace" operations on text streams, and other functions.
Simple string
$ grep bash /etc/passwd operator:x:11:0:operator:/root:/bin/bash root:x:0:0::/root:/bin/bash ftp:x:40:1::/home/ftp:/bin/bash
In the above command, the first parameter of grep is a regular expression; the second parameter is a file name. grep reads each line in /etc/passwd and applies a simple substring regular expression bash to it to find matches. If a match is found, grep prints out the entire line; otherwise, ignore the line.
Understand simple substrings
Generally speaking, if you are searching for a substring, you can just specify the text literally without providing any "special" characters. Only included in substrings+
、.
、*
、[
、]
or/
(In such cases, these characters need to be enclosed in quotes and used backslashes before them) to do something special.
Here are a few other examples of simple substring regular expressions:
tmp (Scan to find text strings tmp) “\[box\]”(Scan to find text strings [box]) “\*funny\*”(Scan to find text strings *funny*) “ld\.so”(Scan to find text strings )
Metacharacter
Using regular expressions, metacharacters can be used to perform searches that are much more complex than the examples we have studied so far. One of these metacharacters is . (dot), which matches any single character:
$ grep /etc/fstab /dev/hda3 reiserfs noatime,ro 1 1 /dev/hda1 /boot reiserfs noauto,noatime,notail 1 2 /dev/hda2 swap sw 0 0 #/dev/hda4 /mnt/extra reiserfs noatime,rw 1 1
In this example, the text text does not appear in any line in /etc/fstab. However, grep scans these lines instead of looking for text strings, but instead looking for patterns. Remember that . will match any single character. As you can see, the . metacharacter is functionally equivalent to how ? metacharacter works in glob extensions.
use []
If we want to match characters more specifically with ., then we can use [ and ] (square brackets) to specify the subset of characters to match:
$ grep [12] /etc/fstab /dev/hda1 /boot reiserfs noauto,noatime,notail 1 2 /dev/hda2 swap swap sw 0 0
[\u4e00-\u9fa5]
: Indicates any Chinese character
As you can see, the role of this special syntax isglob
File name extension[]
same.
Use [^]
By[
Followed by one^
, you can make the meaning in square brackets the opposite. In this example, square brackets will match any character not listed in square brackets. Again, please note that we use it in regular expressions[^]
, and used in glob[!]
:
$ grep [^12] /etc/fstab /dev/hda3 reiserfs noatime,ro 1 1 /dev/hda4 /mnt/extra reiserfs noatime,rw 1 1
Difference syntax
It is important to note the following: the syntax inside square brackets is fundamentally different from the syntax in other parts of regular expressions.
For example, if you place a . inside a square bracket, it allows the square bracket to match the text . like 1 and 2 in the example above. In comparison, unless \ is used as a prefix, the text outside the square brackets is interpreted as a metacharacter. By entering the following command, we can use this fact to print a list of all lines in /etc/fstab that contain the literal string:
$ grep dev[.]hda /etc/fstab
Alternatively, we can also enter:
$ grep "dev\.hda" /etc/fstab
Neither of these regexes are possible with yours/etc/fstab
Any line in the file matches.
*
Metacharacter
Some metacharacters themselves do not match any characters, but modify the meaning of the previous character. One such metacharacter is*
(Asterisk), which is used to match zero or multiple recurrences of the previous character.
Here are some examples:
-
ab*c
(andabbbbc
Match but not withabqc
match) -
ab*c
(andabc
Match but not withabbqbbc
match) -
ab*c
(andac
Match but not withcba
match) -
b[cq]*e
(andbqe
Match but not witheb
match) -
b[cq]*e
(andbccqqe
Match but not withbccc
match) -
b[cq]*e
(andbqqcce
Match but not withcqe
match) -
b[cq]*e
(andbbbeee
match) -
.*
(Match any string) -
foo.*
(withfoo
Any strings that start match) -
ac
Lines and regular expressionsab*c
Match, because the asterisk also allows the previous expressionb
Zero occurrences. Please note the explanation*
Methods and explanations for regular expression metacharacters* glob
The methods of characters are fundamentally different.
The beginning and end of the line
The last few meta characters we want to describe here are^
and$
Metacharacters, which are used to match the beginning and end of a line respectively. By using a ^ at the beginning of the regular expression, you can "anchor" your pattern at the beginning of the line.
In the following example, we use the ^# regular expression to match any line starting with the # character:
$ grep ^# /etc/fstab # /etc/fstab: static file system information. #
Complete line regular expression
Can be combined^
and$
to match the complete line.
For example, the following regular expression will be followed by#
The character begins and.
The ending lines match, with any number of other characters in between:
$ grep '^#.*/.$' /etc/fstab # /etc/fstab: static file system information.
In the example above, we enclose our regular expression in single quotes to preventshell
explain$
。
Without single quotes,grep
No chance to check it out$
,$
It disappeared from our regular expression.
Regular summary
Metacharacter
-
.
: The decimal point can be matched and divided\n
Any other character. If you want to match, include\n
All characters including[\s\S]
, or use.
add(?s)
Match pattern to implement. -
[abc]
: Match any character in square brackets. You can use - to represent the character range, such as[a-z0-9]
Match lowercase letters and Arabic numerals. -
[^abc]
: Used in square brackets^
Symbol, indicating that any character matches except the characters in square brackets.|
: means or -
\d
Matching Arabic numerals is equivalent to[0-9]
。 -
\D
Match any character other than Arabic numerals, equivalent to[^0-9]
。 -
\x
Match hexadecimal numbers, equivalent to[0-9A-Fa-f]
。 -
\X
Match hexadecimal numbers, equivalent to[^0-9A-Fa-f]
。 -
\w
Match the letters of the word, equivalent to[0-9A-Za-z_]
。 -
\W
Match any character other than the letter of the word, which is equivalent to[^0-9A-Za-z_]
。 -
\t
match<TAB>
character. -
\s
Match whitespace characters, equivalent to[ /t]
。 -
\S
Match non-whitespace characters, equivalent to[^ /t]
。 -
\a
All alphabetical characters. Equivalent to[a-zA-Z]
-
\l
Lowercase letters[a-z]
-
\L
Non-lowercase letters[^a-z]
-
\u
uppercase letter[A-Z]
-
\U
Non-caps[^A-Z]
Metachars representing the number of
Metacharacter Description
-
*
match0-Any one
-
\+
match1-Any one
-
\?
match0-1
-
\{n,m\}
matchn-m
-
\{n\}
matchN
-
\{n,\}
matchn-any
-
\{,m\}
match0-m
Line breaks Description
-
\r,\n
Carriage Enter and Line Break -
\\
match\
-
\^
,\$
,\.
match^
$
.
The following characters usually need to be escaped when matching themselves.
In actual applications, depending on the specific situation, the characters that need to be escaped may be more than the following characters:$
^
{
[
(
|
)
*
+
?
\
Symbols representing positions
-
$
Match the end of the line -
^
Match the beginning of the line -
\<
Match the word first -
\>
Match the end of words -
\b
Match word boundaries
Summarize
The above is personal experience. I hope you can give you a reference and I hope you can support me more.