Detailed explanation of nodeJS binary buffer object

The previous words

Before ES6 introduced TypedArray, the JavaScript language had no mechanism to read or manipulate binary data streams. The Buffer class is introduced as part of Nodejs' API, allowing it to process binary data streams in scenarios such as TCP streams and file system operations. Now that TypedArray has been added to ES6, the Buffer class implements Uint8Array in a better and more suitable use case. This article will introduce the buffer object in detail

Overview

Due to different application scenarios, in Node, applications need to process network protocols, operate databases, process pictures, receive and upload files, etc. In network streams and files, a large amount of binary data must be processed. JavaScript's own strings are far from meeting these needs, so the Buffer object came into being.

Buffer is a typical JavaScript combined with C++. It implements performance-related parts in C++ and non-performance-related parts in JavaScript. An instance of the Buffer class is similar to an array of integers, except that it is of fixed size and allocates physical memory outside the V8 heap. The size of the Buffer is determined at the time of its creation and cannot be resized

Because the Buffer is too common, Node loads it when the process starts and places it on a global object. Therefore, when using a Buffer, you don’t need to use it directly without requiring()

/*
{ [Function: Buffer]
 poolSize: 8192,
 from: [Function],
 alloc: [Function],
 allocUnsafe: [Function],
 allocUnsafeSlow: [Function],
 isBuffer: [Function: isBuffer],
 compare: [Function: compare],
 isEncoding: [Function],
 concat: [Function],
 byteLength: [Function: byteLength] }
 */
(Buffer);

create

In versions before v6, the Buffer instance was created through the Buffer constructor, which returns different Buffers according to the provided parameters, while the new version of nodejs provides the corresponding method

1. new Buffer(size). Pass a value as the first parameter to Buffer() (such as new Buffer(10)), and allocate a new Buffer object of the specified size.

The memory allocated to such a Buffer instance is uninitialized (not filled with 0). Although such a design makes memory allocation very fast, allocated memory segments may contain potentially sensitive old data

This Buffer instance must be initialized manually, and can be used (0) or filled with this Buffer. While this behavior is intentional for performance improvement, development experience shows that there needs to be a clearer distinction between creating a fast but uninitialized Buffer and creating a slower but safer Buffer

var buf = new Buffer(5);
(buf);//<Buffer e0 f7 1d 01 00>
(0);
(buf);//<Buffer 00 00 00 00 00>

[Note] When we allocate space size to a Buffer object, its length is fixed and cannot be changed

var buf = new Buffer(5);
(buf);//<Buffer b8 36 70 01 02>
buf[0] = 1;
(buf);//<Buffer 01 36 70 01 02>
buf[10] = 1;
(buf);//<Buffer 01 79 43 6f 6e>

【(size)】

In the new version, the (size) method is replaced by the (size) method to allocate a newly created Buffer with size bytes that is not filled with 0. You can use (0) to initialize the Buffer instance to 0

var buf = (10);
(buf);//<Buffer 75 63 74 42 79 4c 65 6e 67 74>
(0);
(buf);//<Buffer 00 00 00 00 00 00 00 00 00 00>

【(size[, fill[, encoding]])】

In the new version, a safe buffer object can be generated using the (size) method. The parameter size <Integer> is the expected length of the newly created Buffer; fill <String> | <Buffer> | <Integer> is used to pre-fill the value of the newly created Buffer. Default: 0; encoding <String> If fill is a string, the value is its character encoding. Default: 'utf8'

Assign a newly created Buffer of size bytes. If fill is undefined , the Buffer will be filled with 0

var buf = (5);
(buf);//<Buffer 00 00 00 00 00>

2. new Buffer (array or buffer). Pass an array or a Buffer as the first parameter, and copy the data of the passed object to the Buffer

var buf1 = new Buffer([1, 2, 3, 4, 5]);
(buf1);//<Buffer 01 02 03 04 05>
var buf2 = new Buffer(buf1);
(buf2);//<Buffer 01 02 03 04 05>

【(array or buffer)】

In the new version, replace by (array or buffer) method

var buf1 = ([1, 2, 3, 4, 5]);
(buf1);//<Buffer 01 02 03 04 05>
var buf2 = (buf1);
(buf2);//<Buffer 01 02 03 04 05>

3. new Buffer(string[, encoding]). The first parameter is a string, the second parameter is encoding method, the default is 'utf-8'

var buf1 = new Buffer('this is a tést');
(());//this is a tést
(('ascii'));//this is a tC)st
var buf2 = new Buffer('7468697320697320612074c3a97374', 'hex');
(());//this is a tést

Currently supported character encodings include:

'ascii' - Supported only 7 Bit ASCII data。如果设置去掉高Bit的话，This encoding is very fast。
'utf8' - Multibyte encoding Unicode character。Many web pages and other document formats are used UTF-8 。
'utf16le' - 2 or 4 Bytes，Small endian coded Unicode character。Support proxy pairs（U+10000 to U+10FFFF）。
'ucs2' - 'utf16le' Alias of。
'base64' - Base64 coding。当从character串创建 Buffer hour，这种coding可接受“URL Security alphabet with filename”。
'latin1' - A kind of Buffer coding成一字节coding的character串的方式。
'binary' - 'latin1' Alias of。
'hex' - 将每Bytescoding为两个十六进制character。

【(string[, encoding])】

In the new version, the (string[, encoding] method is replaced by

var buf1 = ('this is a tést');
(());//this is a tést
(('ascii'));//this is a tC)st
var buf2 = ('7468697320697320612074c3a97374', 'hex');
(());//this is a tést

4. new Buffer(arrayBuffer[, byteOffset[, length]]). Parameter arrayBuffer <ArrayBuffer> An ArrayBuffer, or a TypedArray .buffer property; byteOffset <Integer> The index to start copying. Default is 0; length <Integer> The number of bytes copied. Default is - byteOffset

var arr = new Uint16Array(2);
arr[0] = 5000;
arr[1] = 4000;
var buf = new Buffer();
(buf);//<Buffer 88 13 a0 0f>
arr[1] = 6000;
(buf);//<Buffer 88 13 70 17>

【(arrayBuffer[, byteOffset [, length]])】

In the new version, the (arrayBuffer[, byteOffset[, length]]) method is replaced by

var arr = new Uint16Array(2);
arr[0] = 5000;
arr[1] = 4000;
var buf = ();
(buf);//<Buffer 88 13 a0 0f>
arr[1] = 6000;
(buf);//<Buffer 88 13 70 17>

Class array

The Buffer object is similar to an array, and its elements are two-digit numbers in hexadecimal, i.e., values from 0 to 255.

(('test'));//<Buffer 74 65 73 74>

【length】

The number of elements occupied by different encoded strings is different. Chinese characters occupy 3 elements under UTF-8 encoding, and letters and half-width punctuation occupy 1 element.

var buf = ('match');
();//5
var buf = ('matches');
();//6

【Subscript】

Buffer is greatly affected by Array type. You can access the length attribute to get the length, or you can access the element through the subscript.

var buf = (10); 
(); // => 10

The above code allocates a 10-byte-long Buffer object. We can assign it through subscript

buf[0] = 100;
(buf[0]); // => 100

It should be noted that if the assignment value to the element is less than 0, the value is added 256 one after another until an integer between 0 and 255 is obtained. If the obtained value is greater than 255, it will be reduced by 256 successively until the value in the interval 0~255 is obtained. If it is a decimal, discard the decimal part and only the integer part is retained.

buf[0] = -100;
(buf[0]); // 156
buf[1] = 300;
(buf[1]); // 44
buf[2] = 3.1415;
(buf[2]); // 3

【fromcharcode】

Typically, the content of the created buffer object is its uft-8 character encoding.

var buf = ('match'); 
(buf); //<Buffer 6d 61 74 63 68>

If you want to access its corresponding characters, you need to use the fromCharCode() method of the string

((buf[0]));//'m'

Memory allocation

The memory allocation of the Buffer object is not in the heap memory of V8, but is used to implement memory applications at the C++ level of Node. Because processing a large amount of byte data cannot be done by applying for a little memory from the operating system when it requires a little memory, this may cause a large amount of memory to apply for system calls, which puts some pressure on the operating system. For this reason, Node applies the strategy of applying memory at the C++ level and allocating memory in JavaScript.

In order to efficiently use the memory received from the application, Node adopts a slab allocation mechanism. slab is a dynamic memory management mechanism, first born in SunOS operating system (Solaris). It is currently widely used in some *nix operating systems, such as FreeBSD and Linux. Simply put, slab is a fixed-size memory area that is applied for. slab has the following three states: full: fully allocated state; partial: partially allocated state; empty: not allocated state

When we need a Buffer object, we can allocate a Buffer object of the specified size in the following ways:

new Buffer(size);//old(size);//new

【poolSize】

The poolSize property is the number of bytes used to determine the size of the preallocated, internal Buffer instance pool. By default, Node uses 8KB as the limit to distinguish whether a Buffer is a large or a small object:

 = 8 * 1024;

This 8KB value is the size value of each slab. At the JavaScript level, it is used as a unit for memory allocation

1. Assign small Buffer objects

If the size of the specified Buffer is less than 8KB, the Node will be allocated as a small object. During the allocation process of Buffer, a local variable pool is mainly used as an intermediate processing object, and all slab units in the allocation state point to it. Here is the operation to assign a brand new slab unit that points the newly applied SlowBuffer object to it:

var pool;
function allocPool() {
  pool = new SlowBuffer();
   = 0;
}

The code when constructing a small Buffer object is as follows:

new Buffer(1024);//old(1024);//new

This time the construction will check the pool object. If the pool is not created, a new slab unit will be created to point to it:

if (!pool ||  -  < ) allocPool();

At the same time, the parent attribute of the current Buffer object points to the slab and records where the location (offset) of this slab is used. The slab object itself also records how many bytes has been used. The code is as follows:

 = pool; 
 = ; 
 += ;
if ( & 7)  = ( + 8) & ~7;

The slab status at this time is partial. When a Buffer object is created again, the construction process will determine whether the remaining space of the slab is sufficient. If sufficient, use the remaining space and update the allocation status of the slab. The following code creates a new Buffer object, which causes a slab allocation:

new Buffer(3000);//old(3000);//new

If the remaining space in the slab is not enough, a new slab will be constructed, and the remaining space in the original slab will cause waste. For example, if you construct a 1-byte Buffer object for the first time and a 8192-byte Buffer object for the second time, since there is not enough space in the slab during the second allocation, you create and use a new slab. The 8KB of the first slab will be exclusive to the first 1-byte Buffer object. The following code uses two slab units:

new Buffer(1);//old(1);//newnew Buffer(8192);//old(8192);//new

It should be noted that since the same slab may be allocated to multiple Buffer objects for use, the slab's 8KB space will be recycled only when these small Buffer objects are released in scope and can all be recycled. Although a 1-byte Buffer object was created, if it is not released, it may actually be that 8KB of memory was not released.

2. Assign large Buffer objects

If a Buffer object exceeding 8KB is needed, a SlowBuffer object will be directly allocated as the slab unit, and this slab unit will be exclusive to this large Buffer object

// Big buffer, just alloc one
 = new SlowBuffer(); 
 = 0;

The SlowBuffer class here is defined in C++. Although it can be accessed by referencing the buffer module, it is not recommended to operate it directly, but instead use Buffer instead.

The Buffer objects mentioned above are all JavaScript-level and can be recycled by V8's garbage collection tags. However, the SlowBuffer object pointed to by its internal parent attribute comes from Node's own definition in C++. It is a Buffer object at the C++ level. The memory used is not in the V8 heap.

In summary, real memory is provided at the C++ level of Node, and JavaScript level just uses it. When performing small and frequent Buffer operations, the slab mechanism is used for pre-apply and post-allocation, so that there is no need for too many system calls for memory application between JavaScript and the operating system. For large-scale buffers, the memory provided by the C++ level is directly used without delicate allocation operations

Convert

Buffer objects can be converted to strings. Currently supported string encoding types are as follows: ASCII, UTF-8, UTF-16LE/UCS-2, Base64, Binary, Hex

【write()】

A Buffer object can store the transcoded values of strings of different encoding types. Calling the write() method can achieve this purpose.

(string, [offset], [length], [encoding])

string <String> string to write to buf

offset <Integer> The location where the string begins to be written. Default: 0

length <Integer> Number of bytes to be written. Default: - offset

encoding <String> character encoding of string. Default: 'utf8'; Return: <Integer> Number of bytes written

Write string to offset position in buf according to the character encoding of encoding. The length parameter is the number of bytes written. If buf does not have enough space to hold the entire string, only part of the string will be written. Only partially decoded characters will not be written

var buf = (5); 
(buf); //<Buffer 00 00 00 00 00>
var len = ('test',1,3);
(buf);//<Buffer 00 74 65 73 00>
(len);/3

Since content can be continuously written into the Buffer object, and encoding can be specified for each write, there can be a variety of encoding converted contents in the Buffer object. Be careful that each encoding uses different byte lengths, so you need to be careful when inverting the Buffer back to a string.

【toString()】

Implementing the conversion of Buffer to string is also very simple. The toString() of the Buffer object can convert the Buffer object into a string

([encoding], [start], [end])

encoding - the encoding used. Default is 'utf8'

start - Specifies the index position to start reading, default is 0

end - end position, default to the end of the buffer

Return - Decode the buffer data and return the string using the specified encoding.

var buf =(26);
for (var i = 0 ; i < 26 ; i++) {
 buf[i] = i + 97;
}
( ('ascii'));//abcdefghijklmnopqrstuvwxyz
( ('ascii',0,5));//abcde
( ('utf8',0,5));//abcde
( (undefined,0,5));//abcde

【toJSON()】

Convert Node Buffer to JSON object

()

Returns the JSON format of buf

var buf = ('test');
var json = (buf);
(json);//{ type: 'Buffer', data: [ 116, 101, 115, 116 ] }

【isEncoding()】

It is unfortunate that Node's Buffer object supports limited encoding types, and only a few encoding types can be converted between strings and buffers. To this end, Buffer provides an isEncoding() function to determine whether the encoding supports conversion.

(encoding)

Pass the encoding type as a parameter into the function above. If the conversion is supported, the return value is true, otherwise it is false. Unfortunately, the commonly used GBK, GB2312 and BIG-5 encodings in China are not among the supported ranks.

(('utf8'));//true
(('gbk'));//false

Class Methods

【(string[, encoding])】

The () method returns the actual byte length of a string. This is different from , because that returns the number of characters that the string returns

string <String> | <Buffer> | <TypedArray> | <DataView> | <ArrayBuffer> Value to calculate length

encoding <String> If string is a string, this is its character encoding. Default: 'utf8'

Return: <Integer> string contains bytes

var str = 'matches';
var buf = (str);
();//2
();//6
();//6

【(buf1, buf2)】

This method is used to compare buf1 and buf2 , and is usually used to sort Buffer instance arrays. Equivalent to call (buf2)

buf1 <Buffer>

buf2 <Buffer>

Returns: <Integer>

var buf1 = ('1234');
var buf2 = ('0123');
var arr = [buf1, buf2];
var result = (buf1,buf2);
(result);//1
(());//[ <Buffer 30 31 32 33>, <Buffer 31 32 33 34> ]

【(list[, totalLength])】

This method returns a newly created Buffer that combines all Buffer instances in the list

list <Array> array of Buffer instances to be merged

totalLength <Integer> Total length of Buffer instance in list when merged

Return: <Buffer>

If there is no element in the list, or totalLength is 0, a newly created Buffer of length 0 is returned. If totalLength is not provided, it is calculated from the Buffer instance in the list. To calculate totalLength, additional loops need to be executed, so providing a clear length will run faster

var buf1 = (10);
var buf2 = (14);
var buf3 = (18);
var totalLength =  +  + ;
(totalLength);//42
var bufA = ([buf1, buf2, buf3], totalLength); 
(bufA);//<Buffer 00 00 00 00 ...>
();//42

【(obj)】

Return true if obj is a Buffer, otherwise return false

var buf = (5);
var str = 'test';
((buf));//true
((str));//false

Example method

【([start[, end]])】

This method returns a newly created Buffer pointing to the same original memory, but is offset and cropped through start and end indexes

start <Integer> The location where the newly created Buffer starts. Default: 0

end <Integer> The location where the newly created Buffer ends (not included). default:

Return: <Buffer>

var buffer1 =('test');
(buffer1);//<Buffer 74 65 73 74>
var buffer2 = (1,3);
(buffer2);//<Buffer 65 73>
(());//'es'

[Note] Modifying this newly created Buffer slice will also modify the original Buffer memory at the same time, because the memory allocated by these two objects overlaps

var buffer1 =('test');
(buffer1);//<Buffer 74 65 73 74>
var buffer2 = (1,3);
(buffer2);//<Buffer 65 73>
buffer2[0] = 0;
(buffer1);//<Buffer 74 00 73 74>
(buffer2);//<Buffer 00 73>

【(target[, targetStart[, sourceStart[, sourceEnd]]])】

This method is used to copy data from an area of buf to an area of target, even if the memory area of target overlaps with buf

target <Buffer> | <Uint8Array> Buffer or Uint8Array to be copied into

targetStart <Integer> The offset started copying in the target. Default: 0

sourceStart <Integer> The offset to start copying in buf. Ignore when targetStart is undefined. Default: 0

sourceEnd <Integer> The offset (not included) of the end copy in buf. Ignored when sourceStart is undefined. default:

Return: <Integer> Number of bytes copied

var buffer1 =('test');
var buffer2 = (5);
var len = (buffer2,1,3);
(buffer1);//<Buffer 74 65 73 74>
(buffer2);//<Buffer 00 74 00 00 00>
(len);//1

【(target[, targetStart[, targetEnd[, sourceStart[, sourceEnd]]]])】

This method compares buf with target, returning a buf that indicates whether buf is ranked before, after, or the same in the order. The comparison is based on the actual byte sequence of each Buffer

target <Buffer> Buffer to compare

targetStart <Integer> offsets to start comparison in target. Default: 0

targetEnd <Integer> offset (not included) ending comparison in target. Ignore when targetStart is undefined. default:

sourceStart <Integer> The offset to start comparison in buf. Ignore when targetStart is undefined. Default: 0

sourceEnd <Integer> Offset (not included) in buf. Ignore when targetStart is undefined. default:

Return: <Integer>

If target is the same as buf, return 0

If target is ahead of buf, return 1

If target is after buf, return -1

var buf1 = ([1, 2, 3, 4, 5, 6, 7, 8, 9]);
var buf2 = ([5, 6, 7, 8, 9, 1, 2, 3, 4]);

// Output: 0 (1234 in buf2 vs. 1234 in buf2)((buf2, 5, 9, 0, 4));

// Output: -1 (567891 in buf2 vs. 56789 in buf1)((buf2, 0, 6, 4));

// Output: 1 (1 in buf2 vs. 6789 in buf2)((buf2, 5, 6, 5));

【(otherBuffer)】

Return true if buf has exactly the same bytes as otherBuffer, otherwise return false

otherBuffer <Buffer> Buffer to compare

Return: <Boolean>

var buf1 = ('ABC');
var buf2 = ('ABC');
var buf3 = ('abc');
((buf2));//true
((buf3));//false

【(value[, offset[, end]][, encoding])】

value <String> | <Buffer> | <Integer> Used to fill the value of buf

offset <Integer> starts filling the position of buf. Default: 0

end <Integer> The position where the buf is filled (not included). default:

encoding <String> If value is a string, this is its character encoding. Default: 'utf8'

Return: <Buffer> buf's reference

If offset and end are not specified, the entire buf is filled. This simplification allows the creation and filling of a Buffer to be done in one line

var b = (10).fill('h');
(());//hhhhhhhhhh

【(value[, byteOffset][, encoding])】

value <String> | <Buffer> | <Integer> Value to search

byteOffset <Integer> The location where the search started in buf. Default: 0

encoding <String> If value is a string, this is its character encoding. Default: 'utf8'

Return: <Integer> The index in buf that appears for the first time, if buf does not contain value, it returns -1

If the value is a string, the value is parsed according to the character encoding of encoding; if the value is a Buffer, the value is used as a whole. If you want to compare some of the Buffers, use (); if the value is a numeric value, the value resolves to an unsigned octet integer value between 0 and 255

var buf = ('this is a buffer');

// Output: 0(('this'));

// Output: 2(('is'));

// Output: 8((('a buffer')));

// Output: 8// (97 is the decimal ASCII value of 'a')((97));

// Output: -1((('a buffer example')));

// Output: 8((('a buffer example').slice(0, 8)));

【(value[, byteOffset][, encoding])】

Similar to (), except that buf is searching from behind instead of from behind

var buf = ('this buffer is a buffer');

// Output: 0(('this'));

// Output: 17(('buffer'));

// Output: 17((('buffer')));

// Output: 15// (97 is the decimal ASCII value of 'a')((97));

// Output: -1((('yolo')));

// Output: 5(('buffer', 5));

// Output: -1(('buffer', 4));

【(value[, byteOffset][, encoding])】

This method is equivalent to () !== -1

value <String> | <Buffer> | <Integer> Value to search

byteOffset <Integer> The location where the search started in buf. Default: 0

encoding <String> If value is a string, this is its character encoding. Default: 'utf8'

Return: <Boolean> Return true if buf finds value, otherwise return false

var buf = ('this is a buffer');

// Output: true(('this'));

// Output: true(('is'));

// Output: true((('a buffer')));

// Output: true// (97 is the decimal ASCII value of 'a')((97));

// Output: false((('a buffer example')));

// Output: true((('a buffer example').slice(0, 8)));

// Output: false(('this', 4));

The above is all the content of this article. I hope it will be helpful to everyone's study and I hope everyone will support me more.