Efficient IO library in C#

When we write network programs, we often perform the following operations:

Apply for a buffer
Read data from the data source to the buffer
Parsing the buffer data
Repeat step 2

On the surface, this is a very conventional and simple operation, but there are often the following pain points during actual use:

Incomplete data read:

It may not be possible to read all the required data in a read operation, so a cursor needs to be maintained in the buffer to record the starting position of the next read operation. This cursor has a considerable complexity:

When reading data from a buffer, the buffer start write position and the remaining space size should be calculated based on the cursor. Increases the complexity of reading data.
The analytical data is also used to multiplex this buffer. When parsing, you must also determine the cursor starting position and the remaining space size. At the same time, it increases the complexity of analyzing data.
After parsing, you have to move the cursor and remark the start position of the buffer, which increases the complexity again.

Limited buffer capacity:

Due to limited buffers, the buffers that may be applied may not be enough, and a dynamic buffer needs to be introduced. This also greatly increases the complexity of the code.

If you apply for larger memory every time, on the one hand, the memory application release overhead is brought, and on the other hand, the original data needs to be moved and the cursor is updated, bringing more complex logic.
If you rely on multiple segments of memory to form a logical organization, the reading and writing methods of data are relatively complicated.
The memory needs to be released after use, and if higher efficiency is required, a memory pool must be maintained.

Reading and using without separation

Our business itself only cares about the use operations, but there is no separation between read and use operations. Both complex operations lead to the use operations becoming complicated and seriously interferes with the business logic.

Today I will introduce a new library from Microsoft: (need to be installed on Nuget) to solve these pain points. It mainly contains a Pipe object, which has a Writer property and Reader property.

var pipe   = new Pipe();
var writer = ;
var reader = ;

Writer Object

The Writer object is used to read data from a data source and write data into a pipeline; it corresponds to the "read" operation in the business.

var content = ("hello world");
var data    = new Memory<byte>(content);
var result  = await (data);

In addition, it also has a way to apply for Memory using Pipe

var buffer = (512);
(buffer);
();
var result = await ();

Reader object

The Reader object is used to obtain data sources from the pipeline, which corresponds to the "use" operation in the business.

First get the buffer of the pipeline:

var result = await ();
var buffer = ;

This Buffer is a ReadOnlySequence<byte> object, which is a pretty good dynamic memory object and is quite efficient. It itself consists of multiple Memory<byte>. The methods to view Memory segments are:

IsSingleSegment: Determine whether there is only one section of Memory<byte>
First: Get the first paragraph of Memory<byte>
GetEnumerator: Get the segmented Memory<byte>

It can also be logically viewed as a continuous Memory<byte>, and there are similar methods:

Length: The entire data buffer length
Slice: Split buffer
CopyTo: Copy the content to Span
ToArray: Copy the content into byte[]

In addition, it also has a cursor-like position object SequencePosition, which can be used from its Position-related functions, so I won't introduce it here.

This buffer solves the problem of "not reading enough data". If you don't read enough at one time, you can continue reading next time without dynamic allocation of buffers. The efficient memory management method brings good performance. A good interface is that we can pay more attention to business.

After obtaining the buffer, the buffer's data is used.

var data = ();

After using it, tell PIPE how much data is currently used, and then read it from the end position next time

((4));

This is a very practical design. It solves the problem of "reading it must be used". Not only can you use unused data again next time, but you can also implement Peek operations, read only but not change the cursor.

Interaction

In addition to "read" and "use" operations, there is also some interaction between them, such as:

The data source is unavailable during reading and needs to be stopped
During use, the business ends and the data source needs to be aborted.

Both Reader and Writer have a Complete function that notifies the end of the process:

();
();

When Writer writes and Reader reads, a result will be obtained

FlushResult result = await ();
ReadResult result = await ();

They all have an IsComplete property that can determine whether the read and write operations have been completed based on whether it is true.

Cancel

During writing and reading, a CancellationToken can also be passed in to cancel the corresponding operation.

();
();

If the cancellation is successful, the corresponding Result IsCanceled will be true (not verified)

This is all about this article about the efficient C# IO library. I hope it will be helpful to everyone's learning and I hope everyone will support me more.