Analysis of C# Read binary file method

This article analyzes the C# binary file method in more detail. Share it for your reference. The specific analysis is as follows:

It's really a good thing when you think of converting all files to XML. However, this is not true. There are still a large number of file formats that are not XML, and not even ASCII. Binaries are still propagating across the network, stored on disk, and passed between applications. By contrast, they appear more efficient than text files when it comes to dealing with these issues.

In C and C++, it is still easy to read binary files. Except for some problems with the carriage return and the ending character (line feed), every file read in C/C++ is a binary file. In fact, C/C++ only knows binary files and how to make binary files look like text files. As the language we use becomes more and more abstract, the language we use last cannot read the created files directly and easily. These languages want to automatically process output data in their own unique way.

The problem lies:

In many computer science fields, C and C++ still store and read data directly according to data structures. In C and C++, it is very simple to read and write files according to the data structure in memory. In C, you just need to use the fwrite() function and provide the following parameters: a pointer to your data, telling it how much data it has and how big a data is. In this way, the data is written into a file directly in binary format.

Writing the data into a file as described above, and if you also know its correct data structure, it means that it is easy to read the file. You just use the fread() function and provide the following parameters: a file handle, a pointer to data, how many data is read, and the length of each data. The fread() function helps you do the rest. Suddenly, the data returned to memory. There is no way to parse or object model, it just reads the file directly into memory.

In C and C++, the two biggest problems are data alignment and byte swapping. Data alignment refers to the time the compiler will skip the bytes in the middle of the data, because if the processor accesses those bytes, it will no longer be in the optimized state, and it will take more time (usually, the processor takes twice as much time to access the unaligned data as it accesses the aligned data), and more instructions. Therefore, the compiler will optimize for execution speed, skip those bytes and reorder them. On the other hand, byte exchange refers to the process of reordering the bytes of data due to different processors in different ways.

Data Alignment

Because processors are able to process more information at once (within one clock cycle), they want the information they process to be arranged in a definite way. Most Intel processors enable the first address of the stored integer type (32-bit) to be divided by 4 (i.e., to start storage from addresses that can be divided by 4). If the integers in memory are not stored at addresses multiples of 4, they will not work. The compiler knows this. So when the compiler encounters a data that may cause this problem, they have the following three options.

The first one is that they can choose to add some useless white space characters to the data, which can make the start address of the integer divisible by 4. This is the most common practice. The second type, they can reorder fields so that integers are on a 4-bit boundary. Because this can cause other interesting problems, this method is rarely used. The third option is to allow integers in the data not to be on the 4-bit boundary, but copy the code to a suitable place so that those integers are on the 4-bit boundary. This method takes some extra time, but it is useful if it has to be compressed.

Most of the above are compiler details, so you don't need to worry too much. If you use the same compiler and the same settings for the program that writes data and the program that reads data, then these are no longer a problem. The compiler uses the same method to process the same data, and everything is OK. However, when you are involved in cross-platform file conversion issues, it is important to arrange all data in the correct way, so as to ensure that the information can be converted. In addition, some programmers also understand how to make compilers ignore their data.
Byte swapping: high-endians and low-endians

High bit priority and low bit priority refer to two different ways of storing integers in a computer. Because integers are more than one byte, the question is whether the most important byte should be read and written first. The least important bytes are the ones that change the most frequently. That is, if you keep adding one to an integer, the least important byte must be changed 256 times, and the second unimportant byte will only change once.

Different processors store integers in different ways. Intel processors generally use low-bit priority to store integers. In other words, low-bits are first read and written. Most other processors store integers in high-bit priority. So when binary files are read and written on different platforms, you may have to reorder the bytes in order to get the correct order.

There is also a special problem on the UNIX platform, because UNIX can run on a variety of processors such as Sun Sparc processors, HP processors, IBM Power PCs, Inter chips, etc. When transferring from one processor to another, it means that the byte order of those variables must be flipped so that they can meet the order required by the new processor.

Use C# to process binary files

There will be two other new challenges when using C# to process binary files. The first challenge is: All .NET languages are strongly typed. So you have to convert from the byte stream in the file to the data type you want. The second challenge is that some data types are much more complex than they appear to be, requiring some kind of transformation.

Type breaking

Because .NET languages, including C#, are strongly typed, you can't just read a byte from a file arbitrarily and stuff it into the data structure and everything is OK. So when you want to break the type conversion rules, you have to do this, first read the number of bytes you need into an array of bytes, and then copy them from beginning to end into the data structure.

Searching in the Usenet document, you will find several sets of programs that are structured at a hierarchy that allow you to convert any object into a series of bytes and can be converted back to the object. They can be found at the following address Listing A

Complex data types

In C++, you understand what an object is, what an array is, what is neither an object nor an array. But in C#, things aren't as simple as they seem. A string is an object, and therefore an array. Because in C#, there are neither real arrays nor many objects have fixed sizes, some complex data types are not suitable for binary data of fixed sizes.

Fortunately, .NET provides a way to solve this problem. You can tell C# how you want to handle your strings and other types of arrays. This will be done with the MarshalAs property. The following example is to use strings in C#, which must be used before the data you are controlled:

[MarshalAs(, SizeConst = 50)]
The length of the string you want to read from a binary file or store it in a binary file determines the size of the parameter SizeConst. This determines the maximum value of the string length.
Solve previous problems

Now, you know how the problem introduced by .NET is solved. Then, later, you can understand that it is so easy to solve the binary file problems encountered before.

Package

There is no need to trouble setting the compiler to control how to arrange the data. You can just use the StructLayout property to make the data sort or package according to your wishes. This is very useful when you need different data and have different packaging methods. It's like dressing up your car, as much as you like. Using the StructLayout property is like you're careful to decide whether to wrap every data in a compact manner or just pass them on as long as they can be read out again. The use of the StructLayout property is as follows:

Copy the codeThe code is as follows:

[StructLayout(, Pack = 1)]

Doing so allows the data to ignore boundary alignment, making the data as compact as possible. This property should be consistent with the properties of any data you read from the binary file (that is, the properties you write to the file should remain unchanged from the properties you read from the file).

You may find that even after adding this attribute to your data, the problem is not completely solved. In some cases, you may have to do dull and lengthy repeated experiments. Since different computers and compilers have different operating methods at the binary level, this is the reason for the above problems. Especially when cross-platform, we must all handle binary data with great care. .NET is a good tool for other binary files, but it is not a perfect tool.

endian flipping

One of the classic problems of reading and writing binary files is that some computers store the least important bytes first (such as Inter), while others store the most important bytes first. In C and C++, you have to deal with this problem manually, and it can only be a flip of one field per field. One of the advantages of the .NET framework is that the code can access metadata of type at runtime, so you can read information and use it to automatically solve the byte order of each segment in the data. The source code can be found on Listing B, and you can understand how it is handled.

Once you know the type of the object, you can get every part of the data and start checking each part and determine whether it is a 16-bit or 32-bit unsigned integer. In either of the above cases, you can change the sort order of bytes without destroying the data.

Note: You are not using string classes to do everything. Whether to use high or low bits first will not affect the string class. Those fields are not affected by the flipped code. You just need to pay attention to unsigned integers. Because negative numbers do not use the same representation on different systems. A negative number can be represented by only one token (one byte), but more commonly, it is represented by two tokens (two bytes). This makes negative numbers a little more difficult when crossing the platform. Fortunately, negative numbers are rarely used in binary files.

This is just a few more words. Similarly, floating point numbers are sometimes not represented in a standard way. Although most systems set floating point numbers based on IEEE format, there are still a small number of old systems that use other formats to set floating point numbers.

Overcome difficulties

Although there are still some problems with C#, you can still use it to read binary files. In fact, because of the way C# uses to access the metadata of objects, it becomes a language that can better read binary files. Therefore, C# can automatically solve the problem of byte swapping of the entire data.

I hope this article will be helpful to everyone's C# programming.