SoFunction
Updated on 2025-03-03

How to remove BOM header in text file

BOM

Byte order mark is the name of the unified code character located at the code point U+FEFF. When encoding a string composed of UCS/Unified Code characters with UTF-16 or UTF-32, this character is used to indicate its byte order. It is often used as a marker file that is encoded in UTF-8, UTF-16 or UTF-32.

Representation of byte order markers of different encodings:

coding Representation (hexadecimal) Representation (decimal)
UTF8 EF BB BF 239 187 191
UTF-16 (Big End) FE FF 254 255
UTF-16 (little endian) FF FE 255 254
UTF-32 (big-endian) 00 00 FE FF 0 0 254 255
UTF-32 (little endian) FF FE 00 00 255 254 0 0

BOM Add

UTF8 encoding does not require a BOM, but we can manually add a BOM header to the UTF8 encoding file.

const fs = require('fs');

('./', '\ufeffThis is an example with accents : é è à ', 'utf8', function (err) {})

BOM removal

For UTF8, the existence or not of a BOM is not necessary, because UTF8 bytes have no order and no marking is required. That is to say, a UTF8 file may have a BOM or no BOM.

Depending on the different coded BOMs, we can judge whether the file contains a BOM based on the first few bytes of the file, and the Unicode encoding used.

Although BOM characters play the role of marking file encoding, they themselves do not belong to the file content. If the BOM is not removed when reading text files, there will be problems in some usage scenarios. For example, after we merge several JS files into one file, if the file contains BOM characters, it will cause a browser JS syntax error. Therefore, when reading text files, you generally need to remove the BOM.

// For string contentfunction stripBOM(content) { 
 // Check whether the first character is BOM if ((0) === 0xFEFF) {
 content = (1);
 }
 return content;
}

// For Bufferfunction stripBOMBuffer(buf) { 
 if (buf[0] === 0xEF && buf[1] === 0xBB && buf[2] === 0xBF) { 
 buf = (3); 
 } 
 return buf;
}

refer to

  • Character encoding notes: ASCII, Unicode and UTF-8
  • Byte order mark

Summarize

This is the article about the removal of text file BOM headers. For more related text file BOM headers, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!