This article will explain how to use regular expressions in Java to process text data. A regular expression is a string, but unlike ordinary strings, a regular expression is an abstraction of a set of similar strings, such as the following strings:
a98b c0912d c10b a12345678d ab
We carefully analyze the above five strings and we can see that they have a common feature, that is, the first character must be 'a' or 'c', the last character must be 'b' or 'd', and the middle character is composed of any number (including 0 numbers). Therefore, we can abstract the common characteristics of these five strings, which produces a regular expression: [ac]\\d*[bd]. And according to this regular expression, we can write infinite number of strings that satisfy the conditions.
There are many ways to use regular expressions in Java, the easiest way is to use them with strings. There are four methods in String that can use regular expressions, they are matches, split, replaceAll, and replaceFirst.
1. Matches method
The matches method can determine whether the current string matches the given regular expression. Return true if it matches, otherwise, return false. The matches method is defined as follows:
public boolean matches(String regex)
As shown above, we can verify the regular expressions with the following program.
<!---->String[] ss = new String[]{"a98b", "c0912d", "c10b", "a12345678d", "ab"}; for(String s: ss) (("[ac]\\d*[bd]"));
Output result:
true
true
true
true
true
Let’s briefly explain the meaning of this regular expression. If we have learned the lexical analysis of the compilation principle, it will be easy to understand the above regular expression (because the representation method of regular expressions is similar to the expression in lexical analysis). As in [...], the equivalent of or "|", such as [abcd] is equivalent to a|b|c|d, that is, a or b or c or d. If the beginning of the regular expression above is [ac], it means that the beginning of the string can only be a or c. [bd]Expression string ends with b or d. The \d in the middle expresses the number 0-9. Since \ has a special meaning in the regular expression, it is used to represent \\. * means there are 0 or infinite numbers (this is called *closure in lexical analysis), since * is followed by \d, it means there are 0 or infinite numbers.
2. Split method
The split method uses a regular expression to split the string and returns the split result in the form of a String array. Split has two overload forms, which are defined as follows:
<!---->public String[] split(String regex) public String[] split(String regex, int limit)
For example, the following code will use the first overload form of split to split to split the first line of the HTTP request header, the code is as follows:
<!---->String s = "GET / HTTP/1.1"; String ss[] = (" +"); for(String str: ss) (str);
Output result:
GET
/
HTTP/1.1
When using the first overload form of split, you should note that if the split string has an empty string at the end, it will be ignored. If the regular expression \d is used to split the string a0b1c3456, the length of the obtained array is 3, not 7.
In the second overload form of split, there is a limit parameter, which needs to be discussed in three situations:
1. Greater than 0: If the value of limit is n, then n-1 will be used for the regular expression, the following code:
<!---->String s = "a0b1c3456"; String ss[] = ("\\d", 3); for(String str: ss) (str);
Output result:
a
b
c3456
From the output results, we can see that the program only uses regular expressions twice for "a0b1c3456". That is, after scanning the character '1' less, regardless of whether there is a string that meets the conditions, the following string is used as a whole as the last value of the return array.
2. Less than 0: Do not ignore the ending empty string. That is, the above example returns the length of the array that should be 7, not 3.
3. Equal to 0: This is the default value, equivalent to the first overload form of split.
3. ReplaceAll and replaceFirst methods
The definitions for the two methods are as follows:
public String replaceAll(String regex, String replacement) public String replaceFirst(String regex, String replacement)
These two methods replace the string matching the regex in the current string with replacement. The usage method is very simple, and will not be described in detail here. Interested readers can refer to the relevant documents.