Recently, there is a requirement: obtain multiple files md5 checksum to determine whether there are duplicate files. Because there are many files, some files are still relatively large, and the files that need to be processed have not yet been in place, so I considered the issue of efficiency.
Currently I know there are two methods to get md5 checksum in Golang
The implementation source code is directly given here.
package main import ( "crypto/md5" "flag" "fmt" "io" "io/ioutil" "os" ) var which = ("which", true, "") var path = ("path", "", "") var cnt = ("cnt", 100, "") func aaa() { f, err := (*path) if err != nil { ("Open", err) return } defer () body, err := (f) if err != nil { ("ReadAll", err) return } (body) //("%x\n", (body)) } func bbb() { f, err := (*path) if err != nil { ("Open", err) return } defer () md5hash := () if _, err := (md5hash, f); err != nil { ("Copy", err) return } (nil) //("%x\n", (nil)) } func main() { () for i := 0; i < *cnt; i++ { if *which { aaa() } else { bbb() } } }
There are also shell commands for reference to obtain md5 checksum
md5 -- calculate a message-digest fingerprint (checksum) for a file md5 [-pqrtx] [-s string] [file ...]
The test file is a log file for the company's project
banjakukutekiiMac:shell panshiqu$ ls -an | grep by -rw-r--r-- 1 501 20 7285957 11 17 16:14 banjakukutekiiMac:shell panshiqu$ cp banjakukutekiiMac:shell panshiqu$ cat >> banjakukutekiiMac:shell panshiqu$ ls -an | grep by -rw-r--r-- 1 501 20 7285957 11 17 16:14 -rw-r--r-- 1 501 20 14571914 11 17 17:03
The following efficiency displays
banjakukutekiiMac:shell panshiqu$ time ./gomd5 -cnt=1 -which=true -path="" real 0m0.027s user 0m0.017s sys 0m0.012s banjakukutekiiMac:shell panshiqu$ time ./gomd5 -cnt=1 -which=true -path="" real 0m0.048s user 0m0.033s sys 0m0.018s banjakukutekiiMac:shell panshiqu$ time ./gomd5 -cnt=1 -which=false -path="" real 0m0.018s user 0m0.012s sys 0m0.004s banjakukutekiiMac:shell panshiqu$ time ./gomd5 -cnt=1 -which=false -path="" real 0m0.031s user 0m0.024s sys 0m0.005s banjakukutekiiMac:shell panshiqu$ time md5 MD5 () = 9d79e19a00cef1ae1bb6518ca4adf9de real 0m0.023s user 0m0.019s sys 0m0.006s banjakukutekiiMac:shell panshiqu$ time md5 MD5 () = 0a029a460a20e8dcb00d032d6fab74c6 real 0m0.042s user 0m0.037s sys 0m0.009s
Summarize:
No matter what method, it will take longer as the file grows. The above examples are about 2 times.
The method is most efficient, it is recommended that you use it like this
Supplementary: Research on the efficiency of Go language: md5 calculation method
I studied Go's md5 calculation method. At present, the most efficient and fastest writing method is to call the() function to return 16-byte checksum, and then map the high 4 and low 4 bits of each byte into hexadecimal characters and store them in two bytes, obtain 32 bytes, and then convert them into a string.
FastMD5 is more efficient than other algorithms by at least 46%.
const hextable = "0123456789abcdef" //Author: pengpengzhoufunc FastMD5(str string) string { src := ([]byte(str)) var dst = make([]byte, 32) j := 0 for _, v := range src { dst[j] = hextable[v>>4] dst[j+1] = hextable[v&0x0f] j += 2 } return string(dst) }
Go Test Benchmark test results:
goos: linux goarch: amd64 pkg: example BenchmarkFastMD5-4 5564898 205 ns/op BenchmarkV1-4 3461698 379 ns/op BenchmarkV2-4 2277235 516 ns/op BenchmarkV3-4 2158122 527 ns/op PASS ok example 6.440s
The detailed code is as follows:
package main import ( "crypto/md5" "encoding/hex" "fmt" "io" ) const hextable = "0123456789abcdef" func FastMD5(str string) string { src := ([]byte(str)) var dst = make([]byte, 32) j := 0 for _, v := range src { dst[j] = hextable[v>>4] dst[j+1] = hextable[v&0x0f] j += 2 } return string(dst) } func md5V1(str string) string { h := () ([]byte(str)) return ((nil)) } func md5V2(str string) string { data := []byte(str) has := (data) md5str := ("%x", has) return md5str } func md5V3(str string) string { w := () (w, str) md5str := ("%x", (nil)) return md5str } func main() { str := "Chinese" (FastMD5(str)) (md5V1(str)) (md5V2(str)) (md5V3(str)) }
package main import ( "testing" ) var str = "Golang Chinese Tutorial" func BenchmarkFastMD5(b *) { for i := 0; i < ; i++ { FastMD5(str) } } func BenchmarkV1(b *) { for i := 0; i < ; i++ { md5V1(str) } } func BenchmarkV2(b *) { for i := 0; i < ; i++ { md5V2(str) } } func BenchmarkV3(b *) { for i := 0; i < ; i++ { md5V3(str) } }
The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.