This function does not seem to be implemented in standard C++ language itself, and requires the use of third-party libraries or operating system APIs. I have to complain that such an important function cannot be implemented by relying on the C++ language itself. The C++ Standards Committee is really irrelevant. Then let’s not talk nonsense, just give the implementation under Windows.
std::string Utf8ToGbk(const std::string& utf8Str) { // Step 1: Convert UTF-8 to Wide Char (UTF-16) int wideCharLen = MultiByteToWideChar(CP_UTF8, 0, utf8Str.c_str(), -1, nullptr, 0); if (wideCharLen == 0) { throw std::runtime_error("Failed to convert from UTF-8 to wide char."); } std::wstring wideStr(wideCharLen, 0); MultiByteToWideChar(CP_UTF8, 0, utf8Str.c_str(), -1, &wideStr[0], wideCharLen); // Step 2: Convert Wide Char (UTF-16) to GBK int gbkLen = WideCharToMultiByte(CP_ACP, 0, wideStr.c_str(), -1, nullptr, 0, nullptr, nullptr); if (gbkLen == 0) { throw std::runtime_error("Failed to convert from wide char to GBK."); } std::string gbkStr(gbkLen, 0); WideCharToMultiByte(CP_ACP, 0, wideStr.c_str(), -1, &gbkStr[0], gbkLen, nullptr, nullptr); // Remove the null terminator added by the conversion functions gbkStr.pop_back(); return gbkStr; } std::string GbkToUtf8(const std::string& gbkStr) { // Step 1: Convert GBK to Wide Char (UTF-16) int wideCharLen = MultiByteToWideChar(CP_ACP, 0, gbkStr.c_str(), -1, nullptr, 0); if (wideCharLen == 0) { throw std::runtime_error("Failed to convert from GBK to wide char."); } std::wstring wideStr(wideCharLen, 0); MultiByteToWideChar(CP_ACP, 0, gbkStr.c_str(), -1, &wideStr[0], wideCharLen); // Step 2: Convert Wide Char (UTF-16) to UTF-8 int utf8Len = WideCharToMultiByte(CP_UTF8, 0, wideStr.c_str(), -1, nullptr, 0, nullptr, nullptr); if (utf8Len == 0) { throw std::runtime_error("Failed to convert from wide char to UTF-8."); } std::string utf8Str(utf8Len, 0); WideCharToMultiByte(CP_UTF8, 0, wideStr.c_str(), -1, &utf8Str[0], utf8Len, nullptr, nullptr); // Remove the null terminator added by the conversion functions utf8Str.pop_back(); return utf8Str; } </code>
The principle of this code is very simple:
- CP_ACP means local encoding, which is the default encoding defined by the operating system, and depends on the language and region settings of the current operating system. In a Chinese environment, it is the Chinese encoding of the GBk series, such as GB2312, GBK or GB18030.
- A wide byte string is required to be redirected. Under Windows, std::wstring is a 16-byte string, encoded using UTF-16. This is a bit similar to the string of C# and the string of Java, both of which are UTF-16 encoding.
- MultiByteToWideChar and WideCharToMultiByte are both C interfaces of the operating system. The input and returned strings are both '\0', so going to the string of c++ requires removing the last '\0' character. This needs attention.
Tested the use case without any problem. Test Utf8ToGbk:
// string utfStr = u8"This is a Chinese string for testing, check it out"; // string utfStr = u8 "test"; string utfStr = u8"abcdefg"; string gbkStr = Utf8ToGbk(utfStr); // cout << gbkStr << "-------" << endl; // cout << () << endl; // cout << gbkStr.c_str() << endl; // cout << strlen(gbkStr.c_str()) << endl; </code>
Test GbkToUtf8:
#ifdef _WIN32 SetConsoleOutputCP(65001); #endif // string gbkStr = "test"; string gbkStr = "This is a Chinese string for testing, check it out"; // string gbkStr = "abcdefg"; cout << () << endl; string utfStr = GbkToUtf8(gbkStr); cout << utfStr << endl; cout << () << endl;
The above is the implementation of Windows. The Linux environment needs to use other methods, such as using the iconv library.
This is the article about C++ implementing the mutual transfer of utf8 strings and gbk strings. For more related contents of C++ utf8 and gbk, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!