SoFunction
Updated on 2025-04-09

Explanation of the problem of character encoding conversion of PHP iconv function

In PHP, the iconv function library can complete the conversion between various character sets and is an indispensable basic function library in PHP programming; but sometimes the iconv will be less transcoding for some data for no reason. For example, an error occurs when converting the character "-" to gb2312.

Let’s take a look at the usage of this function together.

The simplest application, replace gb2312 with utf-8:

$text=iconv("GB2312","UTF-8",$text);

In use$text=iconv("UTF-8","GB2312",$text)During the process, if you encounter some special characters, such as "—", "." in English names, etc., the conversion will be broken. The text after these characters cannot be converted.

To address this problem, you can use the following code to implement it:

$text=iconv("UTF-8","GBK",$text);

You read it right, it's that simple. Just write it as GBK without using gb2312.

There is another method, the second parameter, plus//IGNORE, ignore the error, as follows:

iconv("UTF-8","GB2312//IGNORE",$data);

There is no specific comparison of these two methods, and I feel that the first method (GBK instead of gb2312) is better.

Iconv() description in php manual:

iconv

(PHP 4 >= 4.0.5, PHP 5)
iconv – Convert string to requested character encoding
Description
string iconv ( string in_charset, string out_charset, string str )
Performs a character set conversion on the string str from in_charset to out_charset. Returns the converted string or FALSE on failure.
If you append the string //TRANSLIT to out_charset transliteration is activated. This means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking characters. If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded. Otherwise, str is cut from the first illegal character.

When using this function for string encoding conversion, it is necessary to note that if utf-8 is converted to gb2312, the string may be truncated. This can be solved using the following method:

$str=iconv('utf-8',"gb2312//TRANSLIT",file_get_contents($filepath));

That is, add a red character to the second parameter, indicating that if the character matching the source encoding cannot be found in the target encoding, similar characters will be selected for conversion. You can also use the://IGNORE parameter here to indicate that characters that cannot be converted are ignored.

ignore means ignore errors during conversion. Without the ignore parameter, all strings following the character cannot be saved.

iconv is not the default function of php, but also the default installed module. Need to be installed to use.

If it is Windows 2000+php, you can modify the file and remove the ";" before extension=php_iconv.dll. At the same time, you need to copy the original php installation file under your winnt/system32 (if your dll points to this directory). In the Linux environment, use static installation method and add an additional one --with-iconv when configuring, and phpinfo can see the iconv item. (Linux7.3+Apache4.06+php4.3.2).

Introduction to mb_convert_encoding and iconv functions

mb_convert_encodingThis function is used to convert encoding. I used to not understand the concept of program coding, but now I seem to be a little enlightened. However, there is generally no coding problem in English, only Chinese data will have this problem. For example, when you use Zend Studio or Editplus to write programs, you use gbk encoding. If the data needs to be entered into the database and the database encoding is utf8, you must encode and convert the data, otherwise it will become garbled when entering the database.

Make a GBK To UTF-8:

<?php 
header("content-Type: text/html; charset=Utf-8"); 
echo mb_convert_encoding("You are my friend", "UTF-8", "GBK"); 
?>

Let’s have another GB2312 To Big5:

<?php 
header("content-Type: text/html; charset=big5"); 
echo mb_convert_encoding("You are my friend", "big5", "GB2312"); 
?>

However, to use the above function, you need to install it, but you need to enable the mbstring extension library first.

string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )You need to enable the mbstring extension library first, and in ; extension=php_mbstring.dll before; remove mb_convert_encoding and can specify multiple input encodings, which will automatically identify based on the content, but the execution efficiency is much worse than iconv;

string iconv ( string in_charset, string out_charset, string str )Note: In addition to specifying the encoding to be converted to, the second parameter can also be added two suffixes: //TRANSLIT and //IGNORE. Where //TRANSLIT will automatically turn characters that cannot be converted into one or more approximate characters, //IGNORE will ignore characters that cannot be converted, and the default effect is to truncate from the first illegal character.

Generally, iconv is used only when you encounter an iconv that cannot be determined what kind of encoding the original encoding is, or if the iconv cannot be displayed normally after conversion.mb_convert_encoding function.

$content = iconv("GBK", "UTF-8″, $content);
$content = mb_convert_encoding($content, "UTF-8″, "

Summarize

The above is the entire content of this article. I hope that the content of this article has certain reference value for your study or work. Thank you for your support. If you want to know more about it, please see the relevant links below