SoFunction
Updated on 2025-04-08

Analysis of the problem of the inability to download Chinese attachments

Details: After clicking on the attachment, the correct path appears in the browser address bar (*/Test.doc, you can also download it after testing Thunder), but the expected opening/download dialog box does not appear, and the "Web page cannot be displayed". But there is a doc file that does. After comparison, the only difference is that the file name of the file that can be placed is 11 Chinese characters, while the other ones are 8 or 10 Chinese characters respectively, that is, even numbers will make mistakes. It's really a lot of insight.
I searched on Baidu and found the following article.
I often see people asking questions on the Internet: "My file name is in Chinese. I put it on the web server for others to download it, but I always prompt that the file cannot be found, but the file is clearly there?" This kind of problem is a coding problem, such as UTF8, GBK, and BIG5, which makes you feel a headache at first glance.
For this kind of problem, some people have proposed many solutions to the requested file name, such as encoding the requested file name, or removing the checkmark at the bottom of the page that is always sent with UTF8 encoding, that is, there is no UTF8 encoding to send URLs. However, because IE is sent with UTF code by default, everyone needs to change the IE settings.
The general reason for this problem is that IE encodes the Chinese in the URL in UTF8-->After the web server receives the URL, it needs to decode it. Different WEB servers have different decoding rules, but they are generally decoded with the default character set. For example, the Chinese system is generally GBK, so the decoded encoding is naturally wrong.
My machine environment is: win2000 Chinese + IIS5. Let's do an experiment below
1. Create a new file in the root directory of the web, with the file name "Ciqinqiang.txt", and the content is Ciqinqiang. We visited http://127.0.0.1/Ciqinqiang.txt through the web page and found that this was fine and could display the file contents normally.
2. Let’s create another file "Qinqiang.txt", which is the content of Qinqiang. We visited http://127.0.0.1/Qinqiang.txt through the web page, and found that the web page could not be found.
3. Let’s create another file "Tornadoゅゅ.txt" with the content of "garbled code". We accessed http://127.0.0.1/Tornadoゅゅ.txt through the web page and found that there was no problem with this and it can be displayed normally.
4. At this time, we will visit http://127.0.0.1/Qinqiang.txt again, and found that we can access it again. Unfortunately, the content that appears is not the "Qinqiang" we hope to be, but the "garbled code", that is, the content of the "Tornadoゅゅ.txt" file.
At this point, everyone should understand that after "Qinqiang" is encoded by UTF8, IIS will decode it with GBK and it will be decoded into "Tornadoゅゅ". It is quite cumbersome to encode, so I won’t analyze it in depth here. As long as you know that UTF8 encoding, Chinese will be encoded into 3 bytes, while Unicode\GBK is all two bytes. For example, after the two words Qinqiang are encoded by UTF8, they become %E5%8B%A4%E5%BC%BA, with a total of 6 bytes, %E5%8B%A4 is Qin%E5%BC%BA.
IIS decoding is decoding in the form of two bytes and one Chinese character, which means that %E5%8B will be decoded into one word according to gbk, and %A4%E5 and %BC%BA represent one word. We can check the GBK encoding table and find that E58B is twelve, A4E5 is ゅ, and BCBA is twelve.
Therefore, for IIS, if you have to use the Chinese file name, the number of Chinese words should be odd, and there should be no problems (the conclusions drawn by my machine may not be suitable for other things). For example, the word .txt can be displayed normally this time. If even numbers occur, there will be problems, such as "We are all children.txt". For other web servers, such as apache, this may not be the case, and it is not clear what the details are.