How to use regular expressions in Swift

Preface

Regular expression, also known as regular expressions. (English: Regular Expression, often abbreviated as regex, regexp or RE in code), a concept of computer science. Regular tables are usually used to retrieve and replace text that conforms to a certain pattern (rule).

Regular expression (regex) allows us to perform complex searches and replacements between thousands of documents in seconds, and it has been widely used for more than 50 years since its birth.

Although Swift is a new language, it does not provide special syntax and classes for handling regular procedures. So we can only use the ancient NSRegularExpression class for regular matching.

In this article, I will explain the basic usage of regular expressions in Swift. We will explain some of the most important regular expression syntax and some useful extensions from easy to difficult.

NSRegularExpression: How to match regular expressions in string

The NSRegularExpression class allows us to replace substrings with regular expression lookup, which can describe text concisely and flexibly. For example, if you want to extract "Taylor Swift" from "My name is Taylor Swift", you can write a regular expression that matches the text "My name is", which can be followed by any text, and then pass it to the NSRegularExpression class.

For details, see the code below. Note that what we want to extract is the second range, because the first range is the matching string, and the second range is the "Taylor Swift" part.

do {
 let input = "My name is Taylor Swift"
 let regex = try NSRegularExpression(pattern: "My name is (.*)", options: )
 let matches = (in: input, options: [], range: NSRange(location: 0, length: input.))

 if let match =  {
  let range = (at:1)
  if let swiftRange = Range(range, in: input) {
   let name = input[swiftRange]
  }
 }
} catch {
 // regex was bad!
}

Detailed explanation of regular expressions

Let's start with a few simple examples to facilitate unfamiliar understanding of regular expressions. Regular expressions, referred to as regex for short, are used to allow us to perform fuzzy searches in strings. For example, we know that "cat" contains "at", but what if we search for all 3-letter words ending with "at"?

Regular expressions are used to solve this problem, although their syntax is a bit less clever due to the basics of Objective-C.

1. First, define the string you want to retrieve:

let testString = "hat"

Then create an NSRange instance to represent the length of the entire string

let range = NSRange(location: 0, length: testString.)

Utf16 is used here to avoid problems caused by emojis, etc.

2. Then use regular expression syntax to create an NSRegularExpression instance

let regex = try! NSRegularExpression(pattern: "[a-z]at")

[a-z] is used in regular expressions to specify any letter between a to z. In actual use, you may provide an invalid regular expression, but here we have a hard-coded correct regular expression, so there is no need to look for errors.

3. Finally, call firstMatch(in:) in the created regular expression, enter the string to be retrieved, some special options, and the range of the string. If the string matches the regular expression, the data will be returned, otherwise it is nil. So if you want to check if the string matches exactly, use the result of firstMatch(in:) and nil:

(in: testString, options: [], range: range) != nil

NSRange must be used here - although this API is designed for NSString, and it is not well connected to Swift. Swift String Manifesto may replace it, but it looks like it will be a long time.

The regular expression "[a-z]at" will successfully match "hat", and "cat", "sat", "mat", "bat", etc. - we just focus on what we want to match, and NSRegularExpression will handle it.

Make NSRegularExpression easier to use

Next, we will show more regular expression syntax. First, let’s take a look at how to make NSRegularExpression a little easier to use.

Now we need 3 lines of Swift code to match a simple string

let range = NSRange(location: 0, length: testString.)
let regex = try! NSRegularExpression(pattern: "[a-z]at")
(in: testString, options: [], range: range) != nil

We can improve in a number of ways, but the most effective is to extend NSRegularExpression to make creating and matching expressions easier.

First line:

let regex = try! NSRegularExpression(pattern: "[a-z]at")

I mentioned that creating an NSRegularExpression instance can cause an error because an illegal regular expression may be provided. For example [a-zat, forget]

The result is that NSRegularExpression instances are usually created with try!. However, this can lead to destruction of lint tools such as SwiftLint. So a better way is to create a convenient initialization that can correctly create regular expressions, or generate an assertion failed during development.

extension NSRegularExpression {
 convenience init(_ pattern: String) {
  do {
   try (pattern: pattern)
  } catch {
   preconditionFailure("Illegal regular expression: \(pattern).")
  }
 }
}

Notice:If your app requires users to write regular expressions, you need to use NSRegularExpression(pattern:) to initialize it, which can better handle errors.

After that:

let range = NSRange(location: 0, length: testString.)
(in: testString, options: [], range: range) != nil

The first line creates an NSRange containing the entire string, and the second line looks for the first match in the text. But this is a very stupid method, because most of the time you want to find the entire string you entered, using firstMatch(in:) and nil to determine that your intention will be confused.

So, replace it with another extension, which includes the following code in a simple matches() method.

extension NSRegularExpression {
 func matches(_ string: String) -> Bool {
  let range = NSRange(location: 0, length: string.)
  return firstMatch(in: string, options: [], range: range) != nil
 }
}

If you merge these two extensions, it will be easier to create and retrieve regular expressions.

let regex = NSRegularExpression("[a-z]at")
("hat")

We can further make the Swift contain the, ~=, operators that are suitable for regular expressions through operator overloading:

extension String {
 static func ~= (lhs: String, rhs: String) -> Bool {
  guard let regex = try? NSRegularExpression(pattern: rhs) else { return false }
  let range = NSRange(location: 0, length: lhs.)
  return (in: lhs, options: [], range: range) != nil
 }
}

Through the above code, we can use any character on the left side of a sentence and a regular expression on the right side.

"hat" ~= "[a-z]at"

Notice:Creating an NSRegularExpression instance will consume a certain amount of money, so if you want to use a regular expression repeatedly, it is best to save the NSRegularExpression instance.

A journey of regular expression grammar

We have used [a-z] to represent any letter between "a" and "z", which is a character class in regular expressions. It allows you to specify a set of letters to match, which can be matched by a defined list of letters, or matched by a range of characters.

Regular expression ranges do not necessarily be the entire alphabet, you can use [a-t] to exclude letters between "u" and "z". Also, if you want to specify some letters in particular, just list them separately like this:

[csm]at

Regular expressions are compatible with Miss-insensitive by default, which means that "Cat" and "Mat" will not be matched in "[a-z]at". If you want to ignore case, you can use "[a-zA-Z]at", or create your own NSRegularExpression object and tag .caseInsensitive

In addition to upper and lower case, you can specify the range of numbers through the character class. The most commonly used is [0-9] to represent any number, or [A-Za-z0-9] to represent any alphanumeric mixed characters, and [A-Fa-f0-9] can also be used to represent hexadecimal numbers.

If you want to match a character sequence, you also need a concept called quantifier. It is used to represent the number of characters that appear.

The most commonly used is the asterisk quantifier, *, which means matching 0 or more. Quantifiers appear after their modified characters, like this:

let regex = NSRegularExpression("ca[a-z]*d")

This sentence first looks for "ca", then 0 or more letters from "a" to "z", and finally "d" - it can match "cad", "card", "clamped", etc.

In addition to *, there are 2 similar quantifiers + and ? . + means "1 or more", which is a little different from "0 or more" of *. And ? means "0 or 1"

These quantifiers are the basic content of regular expressions. I hope everyone can really understand their differences, such as the following three regular expressions

ca[a-z]*d
ca[a-z]+d
ca[a-z]?d

And think about which ones can match if the string "cd" or "clamped" are given.

If needed, you can use braces { and } to specify the number of matches in more detail, such as [a-z]{3} that means matching 3 lowercase letters.

Consider a phone number format such as 111-1111. If you want to match this format exactly, using [0-9-]+ will not work. So we need to use such regular expression [0-9]{3}-[0-9]{4}, that is, first there are 3 numbers, then the concatenation number, and then 4 numbers.

In addition, the range can be specified in braces, which can be bounded or unbounded. For example, [a-z]{1,3} means matching 1, 2, or 3 lowercase letters. [a-z]{3,} means match 3 or more

Finally, meta-characters are special characters, with special meanings in regular expressions, and here are several of them that are most frequently used.

First of all, these are the most commonly used and abused . characters. It can match any character except line breaks. For example, regular expressions can match "cat", but not "cart". If you use . and * quantifiers together, it means matching 1 or more characters except line breaks, which is probably your most common regular expression.

The reason for common use is also obvious: without the need to design a special regular expression, .* can match almost everything. The problem, however, is that specificization is originally one of the key points of regular expressions, you can find out some characters in the text and manipulate them. And too many people rely entirely on .* without realizing that this may cause imperceptible errors to their expressions.

To use the previous phone number example, we use [0-9]{3}-[0-9]{4} to match phone numbers similar to 555-5555. Considering that some people will write "555 55555" or "5555555", we may relax the regular expression conditions and change them to [0-9]{3}.*[0-9]{4}

But this brings up a problem, it will match "123-4567", "123-4567890", or "123-456-789012345". To match [0-9]{3} with [0-9]{4}, .* will match as many characters as possible

So here we use character classes and quantifiers, such as [0-9]{3}[ -]*[0-9]{4}, representing 3 numbers, followed by 0 or more spaces and connecting lines, and then 4 numbers. Or use a class that does not contain characters, that is, use it to match characters other than numbers, such as [0-9]{3}[^0-9]+[0-9]{4}, which will match spaces, connection lines, slashes, etc., and will not match numbers.

Summarize

The above is the entire content of this article. I hope that the content of this article has a certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support.