SoFunction
Updated on 2025-04-06

Methods for converting utf strings and gbk strings in C++

This function does not seem to be implemented in standard C++ language itself, and requires the use of third-party libraries or operating system APIs. I have to complain that such an important function cannot be implemented by relying on the C++ language itself. The C++ Standards Committee is really irrelevant. Then let’s not talk nonsense, just give the implementation under Windows.

std::string Utf8ToGbk(const std::string& utf8Str) {
  // Step 1: Convert UTF-8 to Wide Char (UTF-16)
  int wideCharLen =
      MultiByteToWideChar(CP_UTF8, 0, utf8Str.c_str(), -1, nullptr, 0);
  if (wideCharLen == 0) {
    throw std::runtime_error("Failed to convert from UTF-8 to wide char.");
  }
  std::wstring wideStr(wideCharLen, 0);
  MultiByteToWideChar(CP_UTF8, 0, utf8Str.c_str(), -1, &wideStr[0],
                      wideCharLen);
  // Step 2: Convert Wide Char (UTF-16) to GBK
  int gbkLen = WideCharToMultiByte(CP_ACP, 0, wideStr.c_str(), -1, nullptr, 0,
                                   nullptr, nullptr);
  if (gbkLen == 0) {
    throw std::runtime_error("Failed to convert from wide char to GBK.");
  }
  std::string gbkStr(gbkLen, 0);
  WideCharToMultiByte(CP_ACP, 0, wideStr.c_str(), -1, &gbkStr[0], gbkLen,
                      nullptr, nullptr);
  // Remove the null terminator added by the conversion functions
  gbkStr.pop_back();
  return gbkStr;
}
std::string GbkToUtf8(const std::string& gbkStr) {
  // Step 1: Convert GBK to Wide Char (UTF-16)
  int wideCharLen =
      MultiByteToWideChar(CP_ACP, 0, gbkStr.c_str(), -1, nullptr, 0);
  if (wideCharLen == 0) {
    throw std::runtime_error("Failed to convert from GBK to wide char.");
  }
  std::wstring wideStr(wideCharLen, 0);
  MultiByteToWideChar(CP_ACP, 0, gbkStr.c_str(), -1, &wideStr[0], wideCharLen);
  // Step 2: Convert Wide Char (UTF-16) to UTF-8
  int utf8Len = WideCharToMultiByte(CP_UTF8, 0, wideStr.c_str(), -1, nullptr, 0,
                                    nullptr, nullptr);
  if (utf8Len == 0) {
    throw std::runtime_error("Failed to convert from wide char to UTF-8.");
  }
  std::string utf8Str(utf8Len, 0);
  WideCharToMultiByte(CP_UTF8, 0, wideStr.c_str(), -1, &utf8Str[0], utf8Len,
                      nullptr, nullptr);
  // Remove the null terminator added by the conversion functions
  utf8Str.pop_back();
  return utf8Str;
}

The principle of this code is very simple:

  • CP_ACP means local encoding, which is the default encoding defined by the operating system, and depends on the language and region settings of the current operating system. In a Chinese environment, it is the Chinese encoding of the GBk series, such as GB2312, GBK or GB18030.
  • A wide byte string is required to be redirected. Under Windows, std::wstring is a 16-byte string, encoded using UTF-16. This is a bit similar to the string of C# and the string of Java, both of which are UTF-16 encoding.
  • MultiByteToWideChar and WideCharToMultiByte are both C interfaces of the operating system. The input and returned strings are both '\0', so going to the string of c++ requires removing the last '\0' character. This needs attention.

Tested the use case without any problem. Test Utf8ToGbk:

// string utfStr = u8"This is a Chinese string for testing, check it out";  // string utfStr = u8 "test";  string utfStr = u8"abcdefg";
  string gbkStr = Utf8ToGbk(utfStr);
  // cout << gbkStr << "-------" << endl;
  // cout << () << endl;
  // cout << gbkStr.c_str() << endl;
  // cout << strlen(gbkStr.c_str()) << endl;

Test GbkToUtf8:

#ifdef _WIN32
  SetConsoleOutputCP(65001);
#endif
  // string gbkStr = "test";  string gbkStr = "This is a Chinese string for testing, check it out";
  // string gbkStr = "abcdefg";
  cout << () << endl;
  string utfStr = GbkToUtf8(gbkStr);
  cout << utfStr << endl;
  cout << () << endl;

The above is the implementation of Windows. The Linux environment needs to use other methods, such as using the iconv library.

This is the article about the conversion of utf8 strings and gbk strings in C++. For more related contents of utf8 strings and gbk strings, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!