1. Get the Unicode value of each character in the string
usechar
Implicit conversion of type orConvert.ToInt32
Methods can obtain the Unicode value of a character.
Sample code:
using System; class Program { static void Main() { string input = "Hello"; foreach (char c in input) { int unicodeValue = c; // Implicitly convert to Unicode value ($"character: {c}, Unicode value: {unicodeValue}"); } } }
Output:
Character: H, Unicode Value: 72
Character: e, Unicode Value: 101
Character: l, Unicode Value: 108
Character: l, Unicode Value: 108
Character: o, Unicode Value: 111
Character: , Unicode Value: 32
Character: You, Unicode Value: 20320
Character: OK, Unicode value: 22909
2. Format Unicode values as \u escape characters
If you need to format Unicode values as\u
Escape characters at the beginning (e.g.\u0041
Indicates charactersA
), can be usedToString("X4")
Convert Unicode value to a 4-bit hexadecimal string.
Sample code:
using System; class Program { static void Main() { string input = "Hello"; foreach (char c in input) { int unicodeValue = c; string unicodeEscape = $"\\u{unicodeValue:X4}"; // Format as \uHHHH ($"character: {c}, Unicode 转义character: {unicodeEscape}"); } } }
Output:
Characters: H, Unicode escape characters: \u0048
Characters: e, Unicode escape characters: \u0065
Characters: l, Unicode escape characters: \u006C
Characters: l, Unicode escape characters: \u006C
Characters: o, Unicode escape characters: \u006F
Character: , Unicode escape character: \u0020
Character: You, Unicode escape characters: \u4F60
Character: OK, Unicode escape characters: \u597D
3. Convert the overall string to Unicode escape characters
If you need to convert the entire string to Unicode escape character format, you can iterate over the string and splice the results.
Sample code:
using System; using ; class Program { static void Main() { string input = "Hello"; StringBuilder unicodeBuilder = new StringBuilder(); foreach (char c in input) { int unicodeValue = c; ($"\\u{unicodeValue:X4}"); } string unicodeString = (); (unicodeString); // Output: \u0048\u0065\u006C\u006C\u006F\u0020\u4F60\u597D } }
4. Process Surrogate Pair (Proxy Pair)
For some Unicode characters (such as emojis or certain special characters), they may be composed of twochar
Value (called proxy pair) represents. Need to useand
char.ConvertToUtf32
To deal with it.
Sample code:
using System; using ; class Program { static void Main() { string input = "Hello 😊Hello"; StringBuilder unicodeBuilder = new StringBuilder(); for (int i = 0; i < ; i++) { if ((input, i)) { // Handle proxy pairs int codePoint = char.ConvertToUtf32(input, i); ($"\\U{codePoint:X8}"); // Use \U to represent 8-bit hexadecimal i++; // Skip the next char } else { // Handle ordinary characters int unicodeValue = input[i]; ($"\\u{unicodeValue:X4}"); } } string unicodeString = (); (unicodeString); // Output: \u0048\u0065\u006C\u006C\u006F\u0020\U0001F60A\u0020\u4F60\u597D } }
5. Summary
- use
char
Implicit conversion orConvert.ToInt32
Gets the Unicode value of the character. - use
ToString("X4")
Format Unicode values as\uHHHH
Escape characters. - For proxy pair characters, use
char.ConvertToUtf32
and\UHHHHHHHH
Format. - By traversing the string and splicing the results, you can convert the entire string into Unicode escape character format.
With these methods, you can easily convert strings to Unicode characters or escape character formats in C#.
This is the end of this article about the implementation of C# string to unicode characters. For more related content on C# string to unicode characters, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!