Use .NET 2.0 compression/decompression function to process large data

summaryIf your application has never used compression, then you are in luck. And for another part of developers using compression, the good news is that .NET 2.0 now provides two classes to handle compression and decompression issues. This article is about discussing when and how to use these useful tools.

introduction

A new namespace in .NET Framework 2.0 is. This new namespace provides two data compression classes: DeflateStream and GZipStream. Both compression classes support lossless compression and decompression, and are designed to handle compression and decompression issues of streaming data.

Compression is an effective way to reduce data size. For example, if you have a huge amount of data stored in your SQL database, you can save a lot of disk space if you compress it before saving it to a table. And, now that you save smaller chunks of data to your database, the operations spent on disk I/O will be greatly reduced. The disadvantage of compression is that it requires your machine to perform additional processing (and therefore additional processing time), and you need to calculate this part of the time before you decide to apply compression to your program.

Compression is extremely useful in situations where you need to deliver data online, especially for very slow and expensive networks such as GPRS connections. In this case, the use of compression can greatly reduce the data size and reduce the overall communication consumption. Web services are another area - at this point, using compression can provide huge advantages because XML data can be highly compressed.

But once you think the performance of your program needs to be used for compression, you will need to have a deep understanding of the two new compression classes of .NET 2.0, which is exactly what I want to explain in this article.

Create a sample application

In this article, I will build a sample application to demonstrate the use of compression. The application allows you to compress files, including plain text files. You can then reuse the code from this example into your own application.

First, use Visual Studio 2005 to create a new Windows application and use the following controls to populate the default form (see Figure 1):

Figure 1. Fill the form: Fill the default Form1 with all displayed controls.

GroupBox control

RadioButton control

TextBox control

Button control

Label control

Switch to Form1's code-behind and import the following namespace:

Imports

Before you start using compression classes, it is very important to understand how they work. These compressed classes read data from a byte array, compress it and store the result into a stream object. For decompression, decompress the data stored in one stream object and then store it in another stream object.

First, define the Compress() function, which has two parameters: algo and data. The first parameter specifies which algorithm (GZip or Deflate) is used; the second parameter is a byte array containing the data to be compressed. A memory stream object will be used to store compressed data. Once the compression is complete, you need to calculate the compression ratio, which is calculated by dividing the size of the compressed data by the size of the decompressed data.

The compressed data stored in the memory stream is then copied into another byte array and returned to the calling function. In addition, you also need to use a StopWatch object to track how long the compression algorithm has been used. The Compress() function is defined as follows:

Public Function Compress(ByVal algo As String， ByVal data() As Byte) As Byte()

Try

Dim sw As New Stopwatch

'---ms are used to store compressed data---

Dim ms As New MemoryStream()

Dim zipStream As Stream = Nothing

'---Start stopwatch timing---

()

If algo = "Gzip" Then

zipStream = New GZipStream(ms，， True)

ElseIf algo = "Deflate" Then

zipStream = New DeflateStream(ms，， True)

End If

'---Compression using information stored in the data---

(data， 0， )

()

'---Stop the stopwatch---

()

'---Calculate compression ratio---

Dim ratio As Single = (( / ) * 100， 2)

Dim msg As String = "Original size: " & & _

"， Compressed size: " & & _

", compression ratio: " & ratio & "%" & _

"， Time spent: " & & "ms"

= msg

= 0

'--- Used to store compressed data (byte array)---

Dim c_data( - 1) As Byte

'---Read the content of the memory stream to the byte array---

(c_data， 0， )

Return c_data

Catch ex As Exception

MsgBox()

Return Nothing

End Try

End Function

This Decompress() function will decompress the data compressed by the Compress() function. The first parameter specifies the algorithm to be used. The byte array containing the compressed data is passed as a second parameter, and it is then copied into a memory stream object. These compressed classes then decompress the data stored in the memory stream, and then store the decompressed data into another stream object. In order to get the decompressed data, you need to read the data from the stream object. This is achieved by using the RetrieveBytesFromStream() function (which will be explained later).

The definition of the Decompress() function is as follows:

Public Function Decompress(ByVal algo As String， ByVal data() As Byte) As Byte()

Try

Dim sw As New Stopwatch

'---Copy data (compressed) to ms---

Dim ms As New MemoryStream(data)

Dim zipStream As Stream = Nothing

'---Start stopwatch---

()

'---Use data stored in ms decompression---

If algo = "Gzip" Then

zipStream = New GZipStream(ms， )

ElseIf algo = "Deflate" Then

zipStream = New DeflateStream(ms，， True)

End If

'--- Used to store decompressed data---

Dim dc_data() As Byte

'---The decompressed data is stored in zipStream;

'Extract them into a byte array---

dc_data = RetrieveBytesFromStream(zipStream， )

'---Stop the stopwatch---

()

= "Decompression completed. Time spent: " & _

& "ms" & _

"， Original size: " & dc_data.Length

Return dc_data

Catch ex As Exception

MsgBox()

Return Nothing

End Try

End Function

The RetrieveBytesFromStream() function uses two parameters: a stream object, an integer, and returns a byte array containing the decompressed data. This integer parameter is used to determine how many bytes are read from the stream object into the byte array at a time. This is necessary because when the data is decompressed, you don't know the size of the decompressed data that exists in the stream object. Therefore, it is necessary to dynamically expand the byte array into blocks for storage in the data decompressed during the run time. When you continuously expand the byte array, too large blocks will waste memory, and too small blocks will lose precious time. Therefore, it can be determined by the calling routine to determine the optimal block size to be read.

The definition of the RetrieveBytesFromStream() function is as follows:

Public Function RetrieveBytesFromStream( _

ByVal stream As Stream， ByVal bytesblock As Integer) As Byte()

'---Retrieve bytes from a stream object---

Dim data() As Byte

Dim totalCount As Integer = 0

Try

While True

'---Gradually increase the size of the data byte array--

ReDim Preserve data(totalCount + bytesblock)

Dim bytesRead As Integer = (data， totalCount， bytesblock)

If bytesRead = 0 Then

Exit While

End If

totalCount += bytesRead

End While

'---Make sure the byte array contains the extracted number of bytes correctly--

ReDim Preserve data(totalCount - 1)

Return data

Catch ex As Exception

MsgBox()

Return Nothing

End Try

End Function

Note that in the Decompress() function, you call the RetrieveBytesFromStream() function, as shown below:

dc_data = RetrieveBytesFromStream(zipStream， )

Block size refers to the size of compressed data (). In most cases, the decompressed data is several times larger than the compressed data (shown by the compression ratio), so you will dynamically expand the byte array up to several times during the runtime. As an example, assuming that the compression ratio is 20% and the size of the compressed data is 2MB, then, in this case, the decompressed data will be 10MB. Therefore, the byte array will be dynamically expanded by 5 times. Ideally, the byte array should not be expanded too often during runtime, as this will severely slow down the application. However, using the size of compressed data as the block size is indeed a good way.