SoFunction
Updated on 2025-03-08

Detailed explanation of the use of functions in Java

Use of Java functions

In addition to Pattern (String regex),

There is another version of the compile() method of the Pattern class:

Pattern (String regex,int flag), which accepts a flag to adjust the matching behavior.

flags come from constants in the following Pattern class:

Compile tags Effect
Pattern.CANON_EQ Two characters are considered to be matched if and only if their full canonical decomposition matches, for example, if we specify this tag, the expression a\u030A will match the string? . By default, the matching does not take into account the equivalence of the specification
Pattern.CASE_INSENSITIVE(?i) By default, case-insensitive matching assumes that only characters from the US-ASCII character set can be performed. This marker allows pattern matching to be free from having to take into account uppercase or lowercase. By specifying the UNICODE_CASE tag and combining this tag, case-insensitive matching based on Unicode can be enabled, or the embedded tag expression?i can be enabled, the same below
(?x) In this mode, spaces in the expression (not referring to \s, simply spaces) will be ignored, and comments starting with # until the end of the line will also be ignored. Unix line mode can also be enabled through embedded tag expressions
(?s) In dotall pattern, the expression "." matches all characters, including the line terminator. By default, the "." expression does not match the line terminator
(?m) In multi-line mode, the expressions ^ and $ match the beginning and end of a line or input string, respectively. By default, these expressions only match the beginning and end of the full string entered.
Pattern.UNICODE_CASE(?u) When this flag is specified and CASE_INSENSITIVE is turned on, case-insensitive matching will be performed in a consistent manner with the Unicode standard. By default, case-insensitive matching assumes that only characters in the US-ASCII character set can be performed
Pattern.UNIX_LINES(?d) In this mode, only the line terminator\n is identified in the ., ^ and $ behaviors

Among these tags, Pattern.CASE_INSENSITIVE(?i) , (?m), ==(?x)==.

Examples of use are as follows:

We can combine multiple tags through the "or" ( | ) operation

import ;
import ;
public class ReFlags {
 public static void main(String[] args) {  
  Pattern p=("^java",Pattern.CASE_INSENSITIVE|);
  /*
    * Use Pattern.CASE_INSENSITIVE (case-insensitive matching) and (multi-line pattern) tags to ignore case matching of all rows starting with java
    */
  
  Matcher m=("java has regex\nJava has regex\n"
    + "JAVA has pretty good regular expression\n"
    + "Regular expressions are in JavA");
  while (()) { 
   (());//Output the matched part  }
 }
}

Output result:

java

Java

JAVA

Example of using (?x):

import ;
import ;
public class ReFlags_Comments {
 public static void main(String[] args) {
  /*
    * Not used (not started comments)
    */
         String s="123";
         Pattern p1=(" (\\d+)+#test comments");
         Matcher m1=(s);
         (());//false
         /*
                     * Use the flag to start annotation in regular expressions
           */
         Pattern p2=("(?x) (\\d+)+#test comments");
         Matcher m2=(s);
         (());//true
         /*
                * Used in the parameter to start the comment
           */
         Pattern p3=("  (\\d+)+#test comments",);
         Matcher m3=(s);
         (());//true 
 }
}

Running results:

false

true

true

Related explanation of functions

function:

Pattern (String regex, int flag)

The range of values ​​of flags is as follows:

  • Pattern.CANON_EQ: A match is determined if and only if the "canonical decomposition" of the two characters is exactly the same. For example, after using this flag, the expression "a\u030A" will match "?". By default, "canonical equality" is not considered.
  • Pattern.CASE_INSENSITIVE: By default, case-insensitive matching is only available for US-ASCII character sets. This flag allows expressions to ignore case for matching. To match Unicode characters with unknown size, just combine UNICODE_CASE with this flag.
  • : In this mode, the space characters (in regular expressions) will be ignored when matching (not referring to "\\s" in the expression, but referring to spaces, tabs, carriage return, etc.) in the expression. Comments start at # and end at this line. Unix row mode can be enabled through embedded flags.
  • : In this mode, the expression '.' can match any character, including the ending character representing a line. By default, the expression '.' does not match the ending character of the line.
  • : In this mode, '^' and '$' match the beginning and end of a row respectively. Additionally, '^' still matches the beginning of the string, and '$' also matches the end of the string. By default, these two expressions only match the beginning and end of the string.
  • Pattern.UNICODE_CASE: In this mode, if you also enable the CASE_INSENSITIVE flag, it will match Unicode characters with unidentified case. By default, case-insensitive matching is only available for US-ASCII character sets.
  • Pattern.UNIX_LINES(?d) In this mode, only '\n' is considered an abort of a line and matches '.', '^', and '$'.

The above is personal experience. I hope you can give you a reference and I hope you can support me more.