JavaScript regular expression analysis

At the beginning, we still have to say that ^ and $ are used to match the beginning and end of the string respectively. The following examples are given:

"^The": There must be a "The" string at the beginning;

"of despair$": The string "of despair" must be at the end;

So,

"^abc$": It is to require strings that start with abc and end with abc. In fact, only abc matches.

"Notice": Match the string containing the notice.

You can see that if you don't use the two characters we mentioned (the last example), that is, the pattern (regular expression) can appear anywhere on the string being tested, and you don't lock it to both sides.

Next, say '*', '+', and '? ',

They are used to represent the number or order in which a character can appear. They said:

"zero or more" is equivalent to {0,},

"one or more" is equivalent to {1,},

"zero or one." is equivalent to {0, 1}, here are some examples:

"ab*": synonymous with ab{0,}, matches a string starting with a and can be followed by 0 or N b ("a", "ab", "abbb", etc.);

"ab+": Synonymous with ab{1,}, the same as the previous article, but at least one b must exist ("ab", "abbb", etc.);

"ab?": synonymous with ab{0, 1}, it can have no or only one b;

"a?b+$": Match a string ending with one or 0 a plus more than one b.

Key points, '*', '+', and '? 'Just that character before it.

You can also limit the number of characters in curly braces, for example

"ab{2}": Require that two bs must be followed by a (either one cannot be missing) ("abb");

"ab{2,}": Requires that there must be two or more b after a (such as "abb", "abbbb", etc.);

"ab{3, 5}": Requires that there can be 2-5 bs after a ("abbb", "abbbbb", or "abbbbb").

Now we put a few characters in brackets, such as:

"a(bc)*": Match a followed by 0 or one "bc";

"a(bc){1,5}": one to 5 "bc."

There is also a character '│', which is equivalent to an OR operation:

"hi│hello": Match a string containing "hi" or "hello";

"(b│cd)ef": Match a string containing "bef" or "cdef";

"(a│b)*c": Match a string containing such multiple (including 0) a or b, followed by a c;

A dot ('.') can represent all single characters, excluding "\n"

What if, to match all single characters including "\n"?

By the way, use the '[\n.]' pattern.

"a.[0-9]": A plus a character and a number from 0 to 9

"^.{3}$": End of three arbitrary characters.

The content enclosed in brackets only matches a single character

"[ab]": Match a or b (same as "a│b");

"[a-d]": Match a single character from 'a' to 'd' (the same as "a│b│c│d" and "[abcd]"); generally we use [a-zA-Z] to specify that the character is in a case English

"^[a-zA-Z]": Match strings starting with upper and lower case letters

"[0-9]%": Match a string containing a shaped x%

",[a-zA-Z0-9]$": Match a string ending with a comma plus a number or letter

You can also list the characters you don't want in brackets. You just need to use '^' as the beginning of the brackets "%[^a-zA-Z]%" to match a string containing two percent signs with a non-letter inside.

Key points: ^ When used at the beginning of brackets, it means excluding characters in brackets. For PHP to be able to explain, you must add '' before and after these characters and escape some characters.

Don't forget that the characters in brackets are exceptions to this rule? In brackets, all special characters, including (''), will lose their special properties "[*\+?{}.]" match the string containing these characters.

Also, as the regx manual tells us: "If the list contains ']', it is best to use it as the first character in the list (maybe after '^'). If there is '-', it is best to put it in the front or the last, or the '-' in the middle of the second end point of a range [a-d-0-9] will be valid.

After looking at the example above, you should understand {n, m}. It should be noted that neither n nor m can be negative integers, and n is always less than m. In this way, at least n times can be matched and most m times can be matched. For example, "p{1, 5}" will match the first five ps in "pvpppppp".

Let's talk about starting with \

\b The book says that it is used to match a word boundary, that is... For example, 've\b', it can match the ve in love but not very ve in very

\B is exactly the opposite of \b above. I won't give any examples

...Suddenly I remembered... You can go to //251 and see other syntaxes that start with\

OK, let's make an application:

How to build a pattern to match the input of currency quantity

Construct a matching pattern to check whether the input information is a number representing money. We think there are four ways to represent the number of money: "10000.00" and "10,000.00", or without the decimal part, "10000" and "10,000". Now let's start building this matching pattern:

^[1-9][0-9]*$

This is the variable must start with a non-0 number. But this also means that a single "0" cannot pass the test either. Here are the solutions:

^（0│[1-9][0-9]*）$

"Only 0 and digits that do not start with 0 match it", we can also allow a minus sign before the number:

^（0│-？[1-9][0-9]*）$

This is: "0 or a number that starts with 0 and may have a negative sign in front of it." OK, now let's not be so rigorous, and allow starting with 0. Now let's give up the negative sign because we don't need to use it when denoting coins. We now specify the pattern to match the decimal part:

^[0-9]+（\.[0-9]+）？$

This implies that the matching string must start with at least one Arabic numeral. But note that in the above pattern, "10." is not matched, only "10" and "10.2" can be done. (Do you know why)

^[0-9]+（\.[0-9]{2}）？$

We specify above that there must be two decimal places after the decimal point. If you think this is too harsh, you can change it to:

^[0-9]+（\.[0-9]{1，2}）？$

This will allow one or two characters to be followed by the decimal point. Now we add the comma (every three digits) to increase readability, and we can express it like this:

^[0-9]{1，3}（，[0-9]{3}）*（\.[0-9]{1，2}）？$

Don't forget that '+' can be replaced by '*' if you want to allow blank strings to be entered (why?). Don't forget the backslash '\' There may be errors in the php string (very common error).

Now that we can confirm the string, we now remove all commas str_replace(",", "", $money) and then treat the type as double and we can do mathematical calculations through it.

Another one:

Construct a regular expression for checking email

There are three parts in a complete email address:

1. Username (everything to the left of '@'),

2.'@'，

3. Server name (that is the rest).

Usernames can contain upper and lower case Arabic numerals, periods ('.'), minus signs ('-'), and underscores ('_'). The server name also complies with this rule, except for underscores.

Now, the beginning and end of a username cannot be a period. The same is true for the server. Also, you can't have two consecutive periods. There is at least one character between them. Now let's take a look at how to write a matching pattern for the user name:

^[_a-zA-Z0-9-]+$

The existence of periods cannot be allowed now. Let's add it:

^[_a-zA-Z0-9-]+（\.[_a-zA-Z0-9-]+）*$

The above means: "Start with at least one canonical character (except.) followed by 0 or more strings starting with dots."

To make it simple, we can replace ereg() with ereg(). eregi() is not case sensitive, so we don’t need to specify two ranges “a-z” and “A-Z”? Just specify one:

^[_a-z0-9-]+（\.[_a-z0-9-]+）*$

The following server name is the same, but the underline must be removed:

^[a-z0-9-]+（\.[a-z0-9-]+）*$

good. Now you only need to use "@" to connect the two parts:

^[_a-z0-9-]+（\.[_a-z0-9-]+）*@[a-z0-9-]+（\.[a-z0-9-]+）*$

This is the complete email authentication matching pattern, just call

eregi（‘^[_a-z0-9-]+（\.[_a-z0-9-]+）*@[a-z0-9-]+（\.[a-z0-9-]+）*$ '，$eamil）

You can get whether it is an email.

Other usages of regular expressions

Extract strings

ereg() and eregi() has a feature that allows users to extract part of a string through regular expressions (you can read the manual for specific usage). For example, we want to extract file names from path/URL? The following code is what you need:

ereg（"（[^\\/]*）$"， $pathOrUrl， $regs）；

echo $regs[1]；

Advanced replacement

ereg_replace() and eregi_replace() are also very useful: If we want to replace all the spacing signs with commas:

ereg_replace（"[ \n\r\t]+"， "，"， trim（$str））；

Finally, I will use another string of regular expressions to check EMAIL for you to analyze them when reading the article.

"^[-！#$%&\'*+\\./0-9=？A-Z^_`a-z{|}~]+'.'@'.'[-！#$%&\'*+\\/0-9=？A-Z^_`a-z{|}~]+\.'.'[-！#$%&\'*+\\./0-9=？A-Z^_`a-z{|}~]+$"

If you can easily understand it, the purpose of this article will be achieved.