Talk about iOS string flip in detail

Preface

String flips are already a problem that cannot be fundamental as an algorithm problem. They are nothing more than reverse order traversal, double pointer traversal, and recursion. The code can be written in minutes:

void strrev(char *str) {
 size_t start = 0;
 size_t end = start + strlen(str) - 1;

 while (start < end) {
  char ch = str[start];
  str[start++] = str[end];
  str[end--] = ch;
 }
}

OK, the above code can definitely be AC when putting it on LeetCode, but can it be AC in actual situations? The answer is certainly not possible! A reliable string flip algorithm problem is at least difficult to put on LeetCode.

First of all, we know that strings have encoding rules, such as UTF-8, UTF-16, which we commonly use in early Windows (the API with W suffixes adopts this encoding) and so on... For the case of ASCII characters such as English letters, UTF-8 and ASCII encoding are both the same byte, so there is no big problem with the above method. However, for Chinese cases, a Chinese character will occupy 3 bytes in UTF-8, and if you simply flip by bytes, garbled code will appear.

So how to solve it?

The easiest way is to use the mbstowcs function to convert a string of type char * into a wide string of type wchar_t. This type wchar_t accounts for 4 bytes on Linux and UNIX systems and 2 bytes on Windows. 4 bytes means that the characters will be encoded with UTF-32, and both Chinese characters and Emoji can be stored. However, for 2 bytes, that is, UTF-16, Chinese characters can represent them, but characters such as Emoji that are located in auxiliary plane code points require two symbols to represent them, so the method in this article is not applicable yet.

First, let's take a look at the improved string flip:

static void strrev2(char *str) {
 setlocale(LC_CTYPE, "UTF-8");
 size_t len = mbstowcs(NULL, str, 0);
 wchar_t *wcs = (wchar_t *) calloc(len + 1, sizeof(wchar_t));
 mbstowcs(wcs, str, len + 1);

 size_t start = 0;
 size_t end = start + len - 1;

 while (start < end) {
  wchar_t wc = wcs[start];
  wcs[start++] = wcs[end];
  wcs[end--] = wc;
 }

 wcstombs(str, wcs, wcstombs(NULL, wcs, 0));
 free(wcs);
}

Using mbstowcs conversion functions like this first requires setting the system encoding of the string, otherwise the function cannot determine what the char * you passed in is. In this article, both the source code and the std I/O in the system environment use UTF-8 encoding.

Next we call mbstowcs without passing in the target address and character length, which allows the function to directly calculate the number of wchar_ts required and return it back so that we can apply for memory.

Then a regular string based on wchar_t is flipped, and finally don't forget to convert it back and free the memory.

Bonus: String flip in Cocoa development

As an iOS developer, of course, you also need to consider the solution in OC.

Plan 1:

Iterate through the substring through the API and insert forward into the new NSMutableString.

- (NSString *)my_stringByReversing {
 NSMutableString *reversed = [NSMutableString stringWithCapacity:];
 NSRange range = NSMakeRange(0, );
 [self enumerateSubstringsInRange:range
        options:NSStringEnumerationByComposedCharacterSequences
       usingBlock:^(NSString * _Nullable substring, NSRange substringRange,
          NSRange enclosingRange, BOOL * _Nonnull stop) {
        [reversed insertString:substring atIndex:0];
       }];
 return [reversed copy];
}

This method works best, it will also extract Composed Emoji (such as 👨‍👩‍👧‍👧‍👧) because this type of Emoji is composed of multiple Unicode characters, so even a 4-byte wchar_t cannot accommodate it. But the disadvantage of this method is that the overhead is too high, so we will make a comparison later.

Plan 2:

Get the C String through the API, then use the method described at the beginning of the article to process it, and then reconstruct the NSString with the processed C String.

- (NSString *)my_stringByReversing2 {
 NSUInteger length = [self lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
 char *buf = calloc(length + 1, 1);
 [self getCString:buf maxLength:length + 1 encoding:NSUTF8StringEncoding];
 strrev2(buf);
 NSString *reversed = [NSString stringWithCString:buf encoding:NSUTF8StringEncoding];
 free(buf);
 return reversed;
}

The advantage of this method is that it is efficient. After testing, it has more than 100 times performance improvement compared to the traversal method, but the problem is that it cannot handle complex Emoji.

The two methods need to be carefully measured during use.

Plan 3:

Swift. The basic unit of Swift's String is Character, which is a collection of Unicode Scalars that represent a renderable character, including Composed Emoji. Moreover, String implements BidirectionalCollection, with reversed method, which can easily implement string flips. Also, I would like to remind everyone that because Swift's String is based on Character, for operations like taking a certain character, if you can reuse the previous Index, I have seen many people like to write

(, offsetBy: i)

This is very performance-intensive, because the movement operation of Index requires traversal calculations from the starting point. The complexity of traversing the string using this method is approximately O(n!).

If you are interested, you can try the performance of Swift~

Summarize

The above is the entire content of this article. I hope that the content of this article has certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support.