Comparison of performance of three different md5 calculation methods in Go language

Preface

This article mainly introduces three different md5 calculation methods. In fact, the difference is the difference in reading files, that is, disk I/O, so you can also learn from one example and use it on network I/O. Let’s take a look together below.

ReadFile

Let’s look at the first one first, simple and crude:

func md5sum1(file string) string {
 data, err := (file)
 if err != nil {
 return ""
 }

 return ("%x", (data))
}

The reason why it is rude is that a readall is actually called in ReadFile, which allocates the most memory.

Benchmark:

var test_path = "/path/to/file"
func BenchmarkMd5Sum1(b *) {
 for i := 0; i < ; i++ {
 md5sum1(test_path)
 }
}

go test -=none -="^BenchmarkMd5Sum1$" -benchtime=10s -benchmem

BenchmarkMd5Sum1-4 300 43704982 ns/op 19408224 B/op 14 allocs/op
PASS
ok tmp 17.446s

Let me first explain that the file size is 19405028 bytes, which is very close to the above 19408224 B/op, because readall does allocate the file size memory, and the code is proof:

ReadFile source code

// ReadFile reads the file named by filename and returns the contents.
// A successful call returns err == nil, not err == EOF. Because ReadFile
// reads the whole file, it does not treat an EOF from Read as an error
// to be reported.
func ReadFile(filename string) ([]byte, error) {
 f, err := (filename)
 if err != nil {
 return nil, err
 }
 defer ()
 // It's a good but not certain bet that FileInfo will tell us exactly how much to
 // read, so let's try it but be prepared for the answer to be wrong.
 var n int64

 if fi, err := (); err == nil {
 // Don't preallocate a huge buffer, just in case.
 if size := (); size &lt; 1e9 {
 n = size
 }
 }
 // As initial capacity for readAll, use n + a little extra in case Size is zero,
 // and to avoid another allocation after Read has filled the buffer. The readAll
 // call will read into its allocated internal buffer cheaply. If the size was
 // wrong, we'll either waste some space off the end or reallocate as needed, but
 // in the overwhelmingly common case we'll get it just right.
 
 // The second parameter is the buffer size to be created return readAll(f, n+)
}

func readAll(r , capacity int64) (b []byte, err error) {
 // The size of this buffer is file size +
 buf := (make([]byte, 0, capacity))
 // If the buffer overflows, we will get .
 // Return that as an error. Any other panic remains.
 defer func() {
 e := recover()
 if e == nil {
 return
 }
 if panicErr, ok := e.(error); ok &amp;&amp; panicErr ==  {
 err = panicErr
 } else {
 panic(e)
 }
 }()
 _, err = (r)
 return (), err
}

Let's look at the second one,

func md5sum2(file string) string {
 f, err := (file)
 if err != nil {
 return ""
 }
 defer ()

 h := ()

 _, err = (h, f)
 if err != nil {
 return ""
 }

 return ("%x", (nil))
}

The second feature is: used. In general (special cases will be mentioned below),Each time, 32 * 1024 bytes of memory, that is, 32 KB, then let's take a look at the situation of Benchmark:

func BenchmarkMd5Sum2(b *) {

 for i := 0; i < ; i++ {
 md5sum2(test_path)
 }
}

$ go test -=none -="^BenchmarkMd5Sum2$" -benchtime=10s -benchmem

BenchmarkMd5Sum2-4 500 37538305 ns/op 33093 B/op 8 allocs/op
PASS
ok tmp 22.657s

32 * 1024 = 32768, is very close to the 33093 B/op above.

Then look at the third situation.

This time it was not only used, and it was used. As the name suggests, bufio is buffered I/O, and its performance is relatively better.By default, a 4096-byte buffer will be created.

func md5sum3(file string) string {
 f, err := (file)
 if err != nil {
 return ""
 }
 defer ()
 r := (f)

 h := ()

 _, err = (h, r)
 if err != nil {
 return ""
 }

 return ("%x", (nil))

}

Take a look at Benchmark's situation:

func BenchmarkMd5Sum3(b *) {
 for i := 0; i < ; i++ {
 md5sum3(test_path)
 }
}

$ go test -=none -="^BenchmarkMd5Sum3$" -benchtime=10s -benchmem
BenchmarkMd5Sum3-4 300 42589812 ns/op 4507 B/op 9 allocs/op
PASS
ok tmp 16.817s

Is the 4507 B/op above very close to 4096? Then why + The memory used in the way will be simpler than Does it take up less memory? As mentioned above, under normal circumstances, 32 * 1024 bytes of memory will be allocated each time. What is the special situation? The answer is in the source code.

Let's take a look at the relevant source code:

func Copy(dst Writer, src Reader) (written int64, err error) {
 return copyBuffer(dst, src, nil)
}

// copyBuffer is the actual implementation of Copy and CopyBuffer.
// if buf is nil, one is allocated.
func copyBuffer(dst Writer, src Reader, buf []byte) (written int64, err error) {
 // If the reader has a WriteTo method, use it to do the copy.
 // Avoids an allocation and a copy.

 // This Writer does not implement the WriteTo method, so it will not go here if wt, ok := src.(WriterTo); ok {
 return (dst)
 }
 // Similarly, if the writer has a ReadFrom method, use it to do the copy.
 // And the ReadFrom method is implemented, so, I will go here if rt, ok := dst.(ReaderFrom); ok {
 return (src)
 }
 
 if buf == nil {
 buf = make([]byte, 32*1024)
 }
 for {
 nr, er := (buf)
 if nr &gt; 0 {
 nw, ew := (buf[0:nr])
 if nw &gt; 0 {
 written += int64(nw)
 }
 if ew != nil {
 err = ew
 break
 }
 if nr != nw {
 err = ErrShortWrite
 break
 }
 }
 if er == EOF {
 break
 }
 if er != nil {
 err = er
 break
 }
 }
 return written, err
}

Judging from the above source code, Implemented It does not follow the default buffer creation path, but returns in advance and uses it The buffer created is also used The allocated memory will be smaller.

Of course if you wish It can also allocate smaller memory, but it is only used, buf just create a 4096 []byte, just followThere is not much difference.

See if this is the case:

// Md5Sum2 is re-implemented with CopyBufer, buf := make([]byte, 4096)BenchmarkMd5Sum2-4  500 38484425 ns/op 4409 B/op  8 allocs/op
BenchmarkMd5Sum3-4  500 38671090 ns/op 4505 B/op  9 allocs/op

Judging from the results, the allocated memory is not much different, after all, the implementation is different and it is impossible to be consistent.

Then next time if you want to write a program to download large files, you will still use it() Is it?

Finally, the overall comparison of Benchmark's situation:

$ go test -=none -="." -benchtime=10s -benchmem
testing: warning: no tests to run
BenchmarkMd5Sum1-4  300 42551920 ns/op 19408230 B/op  14 allocs/op
BenchmarkMd5Sum2-4  500 38445352 ns/op 33089 B/op  8 allocs/op
BenchmarkMd5Sum3-4  500 38809429 ns/op 4505 B/op  9 allocs/op
PASS
ok tmp 63.821s

summary

These three different md5 calculation methods are almost the same in execution time, and the biggest difference is in memory allocation;

bufio is still very advantageous in handling I/O, so it is preferred;

Try to avoid the use of ReadAll.

Summarize

The above is the entire content of this article. I hope the content of this article will be of some help to your study or work. If you have any questions, you can leave a message to communicate.