SoFunction
Updated on 2025-04-10

Go uses regular expressions to process multi-line text

Problem description

Frequently Asked Questions

text := `first line
second line
third line`

// Seemingly correct but potentially invalid regularitypattern := "first.*third"
matched, _ := (pattern, []byte(text))
(matched) // false

Cause analysis

  • By default,.Don't match line breaks
  • \nand\r\nPlatform differences
  • The difference between multiline and singleline

Solution

1. Use the (?s) logo (recommended)

// Enable single line mode (let . match newline characters)pattern := `(?s)first.*third`
matched, _ := (pattern, []byte(text))
(matched) // true

2. Use the [\s\S] character class

// Match any character (including line breaks)pattern := `first[\s\S]*third`
matched, _ := (pattern, []byte(text))
(matched) // true

3. Combined with multi-line mode (?m)

// The beginning and end of the line when processing multi-line textpattern := `(?m)^line\d$`
matches := (pattern).FindAllString(text, -1)

Practical examples

1. Extract multiple lines of comments

func extractComments(code string) []string {
    pattern := `(?s)/\*.*?\*/`
    re := (pattern)
    return (code, -1)
}

// testcode := `
/* This is a
    Multi-line comment */
func main() {
    /* Another comment */
}
`
comments := extractComments(code)

2. Process log files

func parseLogEntry(log string) []LogEntry {
    pattern := `(?m)^(\d{4}-\d{2}-\d{2})\s+(.*)$`
    re := (pattern)
    matches := (log, -1)
    
    var entries []LogEntry
    for _, match := range matches {
        entries = append(entries, LogEntry{
            Date:    match[1],
            Content: match[2],
        })
    }
    return entries
}

Performance optimization suggestions

1. Precompiled regular expressions

// Good practicevar commentRegex = (`(?s)/\*.*?\*/`)

func process(input string) {
    matches := (input, -1)
    // ...
}

2. Use appropriate quantifiers

// Avoid too much backtrackingpattern := `(?s)/\*.*?\*/`  // Use non-greedy mode// insteadpattern := `(?s)/\*.*\*/`   // Greedy mode may cause performance problems

Common traps and precautions

1. Windows line break

// Handle cross-platform line breakspattern := `(?s)line1[\r\n]+line2`
// orpattern := `(?s)line1\R+line2`

2. Unicode support

// Enable Unicode supportpattern := `(?s)(?U)first.*third`

3. Greed and non-greed

// Non-greedy matchpattern := `(?s)".*?"`
// Greedy Matchpattern := `(?s)".*"`

Best Practice Summary

1. Use of regular expression flags

  • (?s): Single-line mode
  • (?m): Multi-line mode
  • (?i): Ignore case
  • (?U): Unicode support

2. Performance considerations

  • Precompiled regular expressions
  • Using non-greedy matches
  • Avoid overly complex expressions

3. Cross-platform compatibility

  • Consider different line breaks
  • use\RMatch Universal Line Brings

Debugging Tips

// Print regular matching processdebug := (pattern)
("Pattern: %q\n", ())
("Groups: %d\n", ())

Summarize

The key to dealing with the problem of regular expression line breaks in Go language is:

  • understand(?s)The function of the sign
  • Correctly handle cross-platform line breaks
  • Select the right matching mode
  • Pay attention to performance optimization

The above is the detailed content of Go's use of regular expressions to process multi-line text. For more information about Go's processing of multi-line text, please pay attention to my other related articles!