SoFunction
Updated on 2025-03-10

php strlen mb_strlen calculates the length of Chinese and English mixed-arranged strings

Comparing strlen and mb_strlen
When all characters are English characters, the two are the same. Here we mainly compare the two calculation results when mixing Chinese and English. (The encoding method during testing is UTF8)
Copy the codeThe code is as follows:

<?php
$str='Chinese character a 1 character';
echo strlen($str);
echo ‘<br />‘;
echo mb_strlen($str,‘UTF8‘);
//Output result
//14
//6
?>

Results analysis: When strlen calculation, the Chinese characters treated with UTF8 are 3 lengths, so the length of "Chinese A character 1 character" is 3*4+2=14
When mb_strlen calculation, if the inner code is selected as UTF8, a Chinese character will be calculated as length 1, so the length of "Chinese a character 1 character" is 6
About the placeholder calculation of mixed Chinese and English strings:
Using these two functions, we can jointly calculate the placeholder of a Chinese and English mixed-arranged string (the placeholder of a Chinese character is 2 and the English character is 1). The calculation method is: If a mixed-arranged string has a Chinese and b English, the placeholder is:
Copy the codeThe code is as follows:

<?php
$str='Chinese character a 1 character';
//The calculation is as follows
echo (strlen($str) + mb_strlen($str,‘UTF8‘)) / 2;
echo
//Output result
//10
?>

For example, the strlen($str) value of "Chinese character a 1 character" is 14 and the mb_strlen($str) value is 6, so it can be calculated that the placeholder of "Chinese character a 1 character a 1 character" is 10.
An article attached to the website:
Still a question about Chinese. PHP's built-in string length function strlen cannot process Chinese strings correctly, it only gets the number of bytes occupied by the string. For the Chinese encoding of GB2312, strlen gets a value of 2 times the number of Chinese characters, while for the Chinese encoding of UTF-8, it is a 3-fold difference (under UTF-8 encoding, a Chinese character accounts for 3 bytes).

Using the mb_strlen function can better solve this problem. The usage of mb_strlen is similar to strlen, except that it has a second optional parameter to specify character encoding. For example, to get the string $str length of UTF-8, you can use mb_strlen($str,'UTF-8'). If the second parameter is omitted, the internal encoding of PHP will be used. Internal encoding can be obtained through the mb_internal_encoding() function. It should be noted that mb_strlen is not a PHP core function. Before using it, you need to make sure that php_mbstring.dll is loaded, that is, make sure that the line "extension=php_mbstring.dll" exists and is not commented out, otherwise there will be a problem of undefined functions.