SoFunction
Updated on 2025-03-03

Methods for filtering regular expressions by strings

// Filter special characters public staticString StringFilter(String str) throws PatternSyntaxException {
// Only letters and numbers are allowed // String regEx ="[^a-zA-Z0-9]";
// Clear all special characters

String regEx="[`~!@#$%^&*()+=|{}':;',\\[\\].<>/?~!@#¥%……&*()——+|{}【】‘;:”“'。,、?]"; 
Pattern p = (regEx); 
Matcher m = (str);
return ("").trim();
} 
@Test public void testStringFilter() throws PatternSyntaxException {
String str = "*adCVs*34_a _09_b5*[/435^*&city()^$$&*).{}+.|.)%%*(*.China}34{45[]'*&999Below are Chinese characters¥……{}【】。,;'“‘”?"; 
(str); 
(StringFilter(str));
}

It was tested using JUnit, of course you can change it to main

Java regular expression learning:

Because regular expressions are a very complex system, this example only gives some introductory concepts. For more information, please refer to relevant books and explore them yourself.

\\ Backslash
\t interval ('\u0009')
\n Line break ('\u000A')
\r Enter ('\u000D')
\d The number is equivalent to[0-9]
\D 非The number is equivalent to[^0-9]
\s Blank symbol [\t\n\x0B\f\r]
\S 非Blank symbol [^\t\n\x0B\f\r]
\w Individual characters [a-zA-Z_0-9]
\W 非Individual characters [^a-zA-Z_0-9]
\f Page changer
\e Escape
\b The boundary of a word
\B A non-word boundary
\G The end of the previous match
^Beginning with limits
^java The conditions are limited toJavaBe the beginning character
$Ending for limit
java$ The conditions are limited tojavafor the ending character
. Conditional restrictions\n以外任意一个Individual characters
java.. The conditions are limited tojava后除Line break外任意两个字符
Add specific restrictions「[]」
[a-z] Conditions are limited to lowercasea to zA character in the range
[A-Z] Conditions are limited to capitalizationA to ZA character in the range
[a-zA-Z] Conditions are limited to lowercasea to zor capitalA to ZA character in the range
[0-9] Conditions are limited to lowercase0 to 9A character in the range
[0-9a-z] Conditions are limited to lowercase0 to 9ora to zA character in the range
[0-9[a-z]] Conditions are limited to lowercase0 to 9ora to zA character in the range(Intersection)
[]Join^Add another restriction condition「[^]」
[^a-z] Conditions are limited to non-lowercasea to zA character in the range
[^A-Z] Conditions are limited to non-capsA to ZA character in the range
[^a-zA-Z] Conditions are limited to non-lowercasea to zor capitalA to ZA character in the range
[^0-9] Conditions are limited to non-lowercase0 to 9A character in the range
[^0-9a-z] Conditions are limited to non-lowercase0 to 9ora to zA character in the range
[^0-9[a-z]] Conditions are limited to non-lowercase0 to 9ora to zA character in the range(Intersection)
Appears when the restriction is a specific character0More than once,Available「*」
J* 0More than oneJ
.* 0More than one任意字符
J.*D JandDbetween0More than one任意字符
Appears when the restriction is a specific character1More than once,Available「+」
J+ 1More than oneJ
.+ 1More than one任意字符
J.+D JandDbetween1More than one任意字符
Appears when the restriction is a specific character有0or1More than once,Available「?」
JA? Jor者JAAppear
限制为连续Appear指定次数字符「{a}」
J{2} JJ
J{3} JJJ
WordaMore than one,and「{a,}」
J{3,} JJJ,JJJJ,JJJJJ,???(3More thanJcoexist)
WordMore than one,bBelow「{a,b}」
J{3,5} JJJorJJJJorJJJJJ
Take one of both「|」
J|A JorA
Java|Hello JavaorHello
「()」Specify a combination type
for example,I'll query<ahref=\"\">index</a>middle<ahref></a>Data between,Writable<a.*href=\".*\">(.+?)</a>
When using functions,Parameters that control the matching behavior of regular expressions can be added:
Pattern (String regex, int flag)
flagThe value range is as follows:

Pattern.CANON_EQ The match is determined if and only if the "canonical decomposition" of the two characters are exactly the same. For example, after using this flag, the expression "a\u030A" will match "?". By default, "canonicalequality" is not considered.

Pattern.CASE_INSENSITIVE(?i) By default, case-unidentified matching is only available for US-ASCII character sets. This flag allows expressions to ignore case for matching. To match Unicode characters with unknown size, just combine UNICODE_CASE with this flag.

(?x) In this mode, the space characters (in regular expressions) will be ignored when matching (translator's note: does not refer to "\s" in the expression, but refers to spaces, tabs, carriage return, etc. in the expression). Comments start at # and end at this line. Unix row mode can be enabled through embedded flags.

(?s) In this mode, the expression '.' can match any character, including the ending character representing a line. By default, the expression '.' does not match the ending character of the line.

(?m) In this mode, '^' and 'KaTeX parse error: Expected group after '^' at position 19: …match the beginning and end of a row. Additionally, '^̲' still matches the beginning of the string, and '' also matches the end of the string. By default, these two expressions only match the beginning and end of the string.

Pattern.UNICODE_CASE

(?u) In this mode, if you also enable the CASE_INSENSITIVE flag, it will match Unicode characters with unidentified case. By default, case-insensitive matching is only available for US-ASCII character sets.

Pattern.UNIX_LINES(?d) In this mode, only '\n' is considered an abort of a line and matches '.', '^', and '$'.

Putting aside the empty concept, here are a few simple Java regular use cases:

◆For example, when string contains verification

//Find strings that start with Java and end with any end

Pattern pattern = ("^Java.*");
Matcher matcher = ("Java is not a human");
boolean b= ();

// When the condition is satisfied, true will be returned, otherwise false will be returned.

(b);

◆When splitting strings with multiple conditions

Pattern pattern = ("[, |]+");
String[] strs = ("Java Hello WorldJava,Hello,,World|Sun");
for (int i=0;i<;i++) {
(strs[i]);
}

◆Text replacement (character appears for the first time)

Pattern pattern = ("regular expression");
Matcher matcher = ("regular expression Hello World,regular expression Hello World");

//Replace the first data that complies with the regularity

((“Java”));

◆Text replacement (all)

Pattern pattern = ("regular expression");

Matcher matcher = ("regular expression Hello World,regular expression Hello World");

//Replace the first data that complies with the regularity

((“Java”));

◆Text replacement (replace characters)

Pattern pattern = ("regular expression");
Matcher matcher = ("Regular Expression Hello World, Regular Expression Hello World");
StringBuffer sbr = new StringBuffer();
while (()) {
(sbr, "Java");
}
(sbr);
(());
◆Verify that it is an email address
String str="ceponline@";
Pattern pattern =("[\\w\\.\\-]+@([\\w\\-]+\\.)+[\\w\\-]+",Pattern.CASE_INSENSITIVE);
Matcher matcher = (str);
(());

◆Remove html tags

Pattern pattern = ("&lt;.+?&gt;",);
Matcher matcher = ("&lt;ahref=\"\"&gt;Home page&lt;/a&gt;");
String string = ("");
(string);

◆Find the corresponding condition string in html

Pattern pattern = ("href=\"(.+?)\"");
Matcher matcher = ("&lt;ahref=\"\"&gt;Home page&lt;/a&gt;");
if(())
((1));
}

◆Intercept http://address

//Intercept the url

Pattern pattern =("(http://|https://){1}[\\w\\.\\-/:]+");
Matcher matcher =("dsdsds<http://dsds//gfgffdfd>fdf");
StringBuffer buffer = new StringBuffer();
while(()){
(());
("\r\n");
(());
}

◆Replace the specified {} Chinese characters

String str = "JavaThe current development history is{0}Year-{1}Year";
String[][] object={new String[]{"\\{0\\}","1995"},newString[]{"\\{1\\}","2007"}};
(replace(str,object));
public static String replace(final String sourceString,Object[] object) {
String temp=sourceString;
for(int i=0;i&lt;;i++){
String[] result=(String[])object[i];
Pattern pattern = (result[0]);
Matcher matcher = (temp);
temp=(result[1]);
}
return temp;
}

◆Query files in the specified directory with regular conditions

// Used to cache file list

private ArrayList files = new ArrayList();
// Used to host file pathprivate String _path;
// Used to bear unmerged regular formulaprivate String _regexp;
class MyFileFilter implements FileFilter {
/**
 * Match file name
 */
public boolean accept(File file) {
try {
Pattern pattern = (_regexp);
Matcher match = (());
return ();
} catch (Exception e) {
return true;
}
}
}
/**
 * parse input stream
 * @param inputs
 */
FilesAnalyze (String path,String regexp){
getFileName(path,regexp);
}
/**
 * Analyze the file name and add files
 * @param input
 */
private void getFileName(String path,String regexp) {
//Table of contents_path=path;
_regexp=regexp;
File directory = new File(_path);
File[] filesFile = (new MyFileFilter());
if (filesFile == null) return;
for (int j = 0; j &lt; ; j++) {
(filesFile[j]);
}
return;
}

/**

Display output information

* @param out
*/
public void print (PrintStream out) {
Iterator elements = ();
while (()) {
File file=(File) ();
(());
}
}
public static void output(String path,String regexp) {
FilesAnalyze fileGroup1 = new FilesAnalyze(path,regexp);
();
}
public static void main (String[] args) {
output("C:\\","[A-z|.]*");
}

This is the end of this article about string filtering regular expressions. For more related string filtering regular expressions, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!