SoFunction
Updated on 2025-04-11

Crash caused by a character in Swift's actual combat

Recently, Crash was triggered because of a character. Because the actual business scenario is inconvenient to describe, I will use a piece of test code as an explanation here.

Without further ado, just upload the code:

let testCharacters: Set<Character> = ["!", "\"", "$", "%", "&", "'", "+", ",", "<", "=", ">", "@", "[", "]", "`", "{", "}"]
let testString = "@`Hello World`!"
var result: UInt8 = 0
for character in testString {
    if (character) {
	result += !
    }
}

What the above code does is: take out the ASCII code of a specific character in the testString and add it together.

Let’s review this code. Experienced students should immediately smell the bad smell in the code:! A strong solution is used here.

So is the strong solution here reasonable? Because the characters defined in testCharacters must have corresponding ASCII codes, it doesn't matter if you look at it.

However, if we actually run it, there will be a strong solution Crash because asciiValue is nil. Why is this?

The key is that testString contains full-width characters. The latter ` in the testString is a full-width character, which does not have asciiValue.

We can execute the following code in Swift Playgrounds to get the answer:

let halfWidth = "`"
(using: .utf8) // 1
!.isASCII // true
!.asciiValue // 96

let fullWidth = "`"
(using: .utf8) // 3
!.isASCII // false
!.asciiValue // nil
// Character implements the Equatable protocol and determines that the two values ​​are equal.halfWidth == fullWidth // true

From the above code execution results, we can see that halfWidth, the half-width, occupies one byte length, the corresponding ASCII code is 96, and the full-width character, fullWidth, occupies three byte lengths, and its asciiValue is empty.

The contains method of Swift array uses the Equatable protocol. Judging from the result of halfWidth == fullWidth in the above code, the Equatable protocol implemented by Character does not consider the full/half-width case of characters.

With the naked eye, we can't see the difference in characters at all. The result of the contains method true also affects our judgment. We think this strong solution is OK, but if we are not careful, it will lead to Crash.

Finally, we compiled historical knowledge about full/half-width from Wikipedia:

In early computers, each letter or symbol was stored in the system used in English or Latin alphabet languages ​​using one byte space (one byte consists of 8 bits, a total of 256 encoding spaces); while Chinese, Japanese and Korean characters, because the number of them exceeds 256, they are often used to store a character. So this was originally a "single byte" and "double byte" problem at the encoding level.

When computers used monospace fonts (such as DOS, partial text editors, etc.), the fonts followed this encoding form and drew the width of Chinese, Japanese and Korean characters to twice the Latin letters and numbers. In this way, the encoding storage and display widths of characters can be matched one by one:

  • Single byte text displayed as half width,
  • Double-byte text is displayed as full width.

Therefore, users at that time began to accustomed to calling Chinese, Japanese, Korean and other characters as full-width characters, while Latin letters or numbers as half-width characters.

However, computer text encoding technology has changed a lot later, storing a character may use one, two, four or more bytes. Even if an English character is displayed as half-width, it is not necessarily stored in one byte according to different encoding methods.

Therefore, there is no one-to-one correspondence between character encoding storage and character display width.

However, due to the former correspondence between character encoding and glyph width, many users have always used the "full/half-width" vocabulary.

Therefore, the current full-width word may refer to:

  • Characters stored in two bytes
  • All characters except ASCII (so-called half-width English and numbers)
  • Shows a font with a width of one to one square on the upper body.

Summarize

This is the end of this article about Swift's actual combat of Crash caused by one character. For more relevant Crash content caused by Swift characters, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!