The difference between Linux wildcards and regular expressions and instructions for use

background

During the use of Linux, it is often necessary to find files, and the distinction between wildcard patterns and regular expressions in commands is not very clear. It is necessary to study it carefully.

1 Literacy

Wildcards and regular expressions

When using the command line, there is a lot of time spent searching for the files you need, such asls findwait. Shell provides a complete set of string pattern matching rules, or metacharacters. When Shell encounters the above characters, it treats them as special characters instead of ordinary characters in the file name, so that users can use them to match the corresponding file name. I understand this can be called wildcards.

There is a difference between wildcard characters and regular expressions. Simply put, wildcard characters are used for wildcard characters, and regular expressions are used to match strings;

Wildcards are tools that come with shells to match file names, and are mostly used on file names, such as searching.find，ls，cpetc. Regular expressions need the support of specific commands before they can be used, such as:grep、sedandawk(known as the Three Musketeers of Linux),vi/vim、perletc. These are tools for processing text.

Secondly, shells also handle wildcards and regular expressions differently.“ ”Generally, it is a wildcard character (it is the shell itself extracted).‘ ’Generally, regular expressions are regular expressions (shell will pass the data in them to other commands for processing).

Wildcard

Common metacharacters:

*: Match zero or multiple arbitrary characters
？: Match any single character
[]: Specify multiple characters in brackets, such as: [rwc] or [r,w,c]
[^]or[!]: Match any character except the characters in brackets

To quote the following pattern, you need to install another [] outside,trNo orders are required (Is this the regulations?)

[:digit:]: Match any number
[:lower:]: any lowercase character
[:upper:]: any capital character
[:alpha:]:Arbitrary uppercase and lowercase letters
[:alnum:]: Any letter or number
[:space:]: A space
[:punct:]: Punctuation marks

Regular expressions

I won't introduce regular expressions specifically here, I just want to talk about the difference between them and wildcard characters

First, there is no matching number of wildcard characters

*: Match the previous characters zero or multiple times
.: Match any character
？: The previous characters are zero or once, what is the basic regularity?
+: The previous characters are at least once, the basic regularity is +
[]: Exactly the same as wildcard
[^]: Same as wildcards, but not[!]How to write

2 Detailed introduction to wildcard characters

Test data

touch a    b c   A

*Represents any multiple characters

Example: Query.logEnding filell *.log

?Represents any single character

Example: Query onlya、b、c ll ?

[]represent[and]A certain character between, for example[0-9]Can represent any number between 0-9,[a-zA-Z]Can representa-zandA-ZAny letter between the letters, the letters are case sensitive.

Example: Query only letter filesll [a-zA-Z]
Example: Query.logEnding and.logFile with only two characters first and the second character is a numberll ?[0-9].log

^It means inverting the matching result. Note that this wildcard must be used in []

Example: The query is not.txtEnding filell *[^txt]*

{}Indicates multiple files included in brackets

Example: Query.logand.txtEnding filell {*.log,*.txt}

Notice:.This match is quite special. If the matching condition is added to the matching condition, it means that the query result file will contain the query result file.Files

For example, the previous one^Example, if I query thisll *.[^txt]*Then the result will be different

Delete operation

For example: deletea、b、cAnd.txtEnding file

rm -f {[abc],*.txt}

Of course, since you can query and of course you can use wildcard matching to move files. If you need to move certain types of files in folders where many files exist, the efficiency of using wildcard matching is obvious; at that time, there are not only these techniques for using wildcards, but you can study them more if you have time.

3 Examples

*Match any string in the file name, including empty strings.
？Match any single character in the file name.
[...]Match any characters contained in [ ].
[!...]Match [ ] China-Africa exclamation mark! The following characters. The same effect as ^

like:

5＊All strings starting with 5
＊5All strings ending with 5
＊5？A string with 5 as the penultimate character
[0－9]All characters with numbers
[1,2]1 or 2
[!0-9]Characters that are not numbers
ls /etc/[!a-n]*.confList files in the /etc/ directory that do not start with letters a to n and end with .conf
ls /etc/[a-n]*.confList files in the /etc/ directory that start with letters a to n and end with .conf
ls /bin/[ck]*List file names starting with c or k

4 detailed introduction to regular expressions

Regular expressions (also known as "regex" or "regexp") are special syntaxes used to describe text patterns.

On Linux systems, regular expressions are often used to find patterns in text, as well as perform "search-replace" operations on text streams, and other functions.

Simple string

$ grep bash /etc/passwd        
operator:x:11:0:operator:/root:/bin/bash        
root:x:0:0::/root:/bin/bash        
ftp:x:40:1::/home/ftp:/bin/bash

In the above command, the first parameter of grep is a regular expression; the second parameter is a file name. grep reads each line in /etc/passwd and applies a simple substring regular expression bash to it to find matches. If a match is found, grep prints out the entire line; otherwise, ignore the line.

Understand simple substrings

Generally speaking, if you are searching for a substring, you can just specify the text literally without providing any "special" characters. Only included in substrings+、.、*、[、]or/(In such cases, these characters need to be enclosed in quotes and used backslashes before them) to do something special.

Here are a few other examples of simple substring regular expressions:

  tmp （Scan to find text strings tmp）        
“\[box\]”（Scan to find text strings [box]）        
“\*funny\*”（Scan to find text strings *funny*）        
“ld\.so”（Scan to find text strings ）

Metacharacter

Using regular expressions, metacharacters can be used to perform searches that are much more complex than the examples we have studied so far. One of these metacharacters is . (dot), which matches any single character:

$ grep  /etc/fstab        
/dev/hda3 reiserfs noatime,ro 1 1        
/dev/hda1 /boot reiserfs noauto,noatime,notail 1 2        
/dev/hda2 swap sw 0 0        
#/dev/hda4 /mnt/extra reiserfs noatime,rw 1 1

In this example, the text text does not appear in any line in /etc/fstab. However, grep scans these lines instead of looking for text strings, but instead looking for patterns. Remember that . will match any single character. As you can see, the . metacharacter is functionally equivalent to how ? metacharacter works in glob extensions.

use []

If we want to match characters more specifically with ., then we can use [ and ] (square brackets) to specify the subset of characters to match:

$ grep [12] /etc/fstab        
/dev/hda1 /boot reiserfs noauto,noatime,notail 1 2        
/dev/hda2 swap swap sw 0 0

[\u4e00-\u9fa5]: Indicates any Chinese character

As you can see, the role of this special syntax isglobFile name extension[]same.

Use [^]

By[Followed by one^, you can make the meaning in square brackets the opposite. In this example, square brackets will match any character not listed in square brackets. Again, please note that we use it in regular expressions[^], and used in glob[!] :

$ grep [^12] /etc/fstab        
/dev/hda3 reiserfs noatime,ro 1 1        
/dev/hda4 /mnt/extra reiserfs noatime,rw 1 1

Difference syntax

It is important to note the following: the syntax inside square brackets is fundamentally different from the syntax in other parts of regular expressions.

For example, if you place a . inside a square bracket, it allows the square bracket to match the text . like 1 and 2 in the example above. In comparison, unless \ is used as a prefix, the text outside the square brackets is interpreted as a metacharacter. By entering the following command, we can use this fact to print a list of all lines in /etc/fstab that contain the literal string:

$ grep dev[.]hda /etc/fstab

Alternatively, we can also enter:

$ grep "dev\.hda" /etc/fstab

Neither of these regexes are possible with yours/etc/fstabAny line in the file matches.

*Metacharacter

Some metacharacters themselves do not match any characters, but modify the meaning of the previous character. One such metacharacter is*(Asterisk), which is used to match zero or multiple recurrences of the previous character.

Here are some examples:

ab*c(andabbbbcMatch but not withabqcmatch)
ab*c(andabcMatch but not withabbqbbcmatch)
ab*c(andacMatch but not withcbamatch)
b[cq]*e(andbqeMatch but not withebmatch)
b[cq]*e(andbccqqeMatch but not withbcccmatch)
b[cq]*e(andbqqcceMatch but not withcqematch)
b[cq]*e(andbbbeeematch)
.*(Match any string)
foo.*(withfooAny strings that start match)
acLines and regular expressionsab*cMatch, because the asterisk also allows the previous expressionbZero occurrences. Please note the explanation*Methods and explanations for regular expression metacharacters* globThe methods of characters are fundamentally different.

The beginning and end of the line

The last few meta characters we want to describe here are^and$Metacharacters, which are used to match the beginning and end of a line respectively. By using a ^ at the beginning of the regular expression, you can "anchor" your pattern at the beginning of the line.

In the following example, we use the ^# regular expression to match any line starting with the # character:

$ grep ^# /etc/fstab        
# /etc/fstab: static file system information.        
#

Complete line regular expression

Can be combined^and$to match the complete line.

For example, the following regular expression will be followed by#The character begins and.The ending lines match, with any number of other characters in between:

$ grep '^#.*/.$' /etc/fstab        
# /etc/fstab: static file system information.

In the example above, we enclose our regular expression in single quotes to preventshellexplain$ 。

Without single quotes,grepNo chance to check it out$，$It disappeared from our regular expression.

Regular summary

Metacharacter

.: The decimal point can be matched and divided\nAny other character. If you want to match, include\nAll characters including[\s\S], or use.add(?s)Match pattern to implement.
[abc]: Match any character in square brackets. You can use - to represent the character range, such as[a-z0-9]Match lowercase letters and Arabic numerals.
[^abc]: Used in square brackets^Symbol, indicating that any character matches except the characters in square brackets.|: means or
\dMatching Arabic numerals is equivalent to[0-9]。
\DMatch any character other than Arabic numerals, equivalent to[^0-9]。
\xMatch hexadecimal numbers, equivalent to[0-9A-Fa-f]。
\XMatch hexadecimal numbers, equivalent to[^0-9A-Fa-f]。
\wMatch the letters of the word, equivalent to[0-9A-Za-z_]。
\WMatch any character other than the letter of the word, which is equivalent to[^0-9A-Za-z_]。
\tmatch<TAB>character.
\sMatch whitespace characters, equivalent to[ /t]。
\SMatch non-whitespace characters, equivalent to[^ /t]。
\aAll alphabetical characters. Equivalent to[a-zA-Z]
\lLowercase letters[a-z]
\LNon-lowercase letters[^a-z]
\uuppercase letter[A-Z]
\UNon-caps[^A-Z]

Metachars representing the number of

Metacharacter Description

*match0-Any one
\+match1-Any one
\?match0-1
\{n,m\}matchn-m
\{n\}matchN
\{n,\}matchn-any
\{,m\}match0-m

Line breaks Description

\r,\nCarriage Enter and Line Break
\\match\
\^,\$,\.match^ $ .

The following characters usually need to be escaped when matching themselves.

In actual applications, depending on the specific situation, the characters that need to be escaped may be more than the following characters:$ ^ { [ ( | ) * + ? \

Symbols representing positions

$Match the end of the line
^Match the beginning of the line
\<Match the word first
\>Match the end of words
\bMatch word boundaries

Summarize

The above is personal experience. I hope you can give you a reference and I hope you can support me more.