Preface
The bufio module is one of the modules in the golang standard library. It mainly implements a read and write cache for reading or writing data. This module is used in multiple standard libraries involving io, such as buffio is used in the http module to complete the reading and writing of network data, and the Zip module of compressed files uses bufio to operate the reading and writing of file data, etc.
The SplitFunc defined in golang's bufio package is a relatively important and difficult to understand. This article hopes to introduce the working principle of SplitFunc and how to implement your own SplitFunc by combining simple examples.
An example
In the bufio package, some commonly used tools such as Scanner are defined. You may need to read some things entered by the user in the standard input, such as we make a repeater, read each line of input of the user, and then print it out:
package main import ( "bufio" "fmt" "os" ) func main() { scanner := () () for () { (()) } }
This program is very simple and implements the interface. We create a scanner from this reader, set the segmentation function to, and then a for loop. Every time we read a line of data, we print out the text content. Although the sparrow is small and complete with all the internal organs, this mini program is simple, but it leads to the object we are going to introduce today: , Its definition looks like this:
package "buffio" type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
The description of golang official documentation looks like this:
SplitFunc is the signature of the split function used to tokenize the input. The arguments are an initial substring of the remaining unprocessed data and a flag, atEOF, that reports whether the Reader has no more data to give. The return values are the number of bytes to advance the input and the next token to return to the user, if any, plus an error, if any.
Scanning stops if the function returns an error, in which case some of the input may be discarded.
Otherwise, the Scanner advances the input. If the token is not nil, the Scanner returns it to the user. If the token is nil, the Scanner reads more data and continues scanning; if there is no more data--if atEOF was true--the Scanner returns. If the data does not yet hold a complete token, for instance if it has no newline while scanning lines, a SplitFunc can return (0, nil, nil) to signal the Scanner to read more data into the slice and try again with a longer slice starting at the same point in the input.
The function is never called with an empty data slice unless atEOF is true. If atEOF is true, however, data may be non-empty and, as always, holds unprocessed text.
English! So many parameters! So many return values! Very annoying! I don’t know if readers will feel this way when they encounter this kind of document... Because of this situation, I decided to write an article to introduce the specific working principle of SplitFunc, and explain it in a popular way with specific examples. I hope it will be helpful to readers.
Okay, let’s talk less nonsense, let’s start with the topic!
Scanner and SplitFunc work mechanism
package "buffio" type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
Scanner has a cache, which means that Scanner maintains a Slice to save data that has been read from the Reader. Scanner will call us to set SplitFunc to pass the buffer content (data) and whether it has been input (atEOF) to SplitFunc in the form of parameters. The responsibility of SplitFunc is to return the next Scan based on the above two parameters to go forward several bytes, split data (token), and errors (err).
This is a bidirectional process of communication. Scanner tells us that the data that SplitFunc has scanned and whether it has reached the end. Our SplitFunc returns the split result based on this information and returns the position that needs to be advanced to the next scan to Scanner. Use an example to illustrate:
package main import ( "bufio" "fmt" "strings" ) func main() { input := "abcdefghijkl" scanner := ((input)) split := func(data []byte, atEOF bool) (advance int, token []byte, err error) { ("%t\t%d\t%s\n", atEOF, len(data), data) return 0, nil, nil } (split) buf := make([]byte, 2) (buf, ) for () { ("%s\n", ()) } }
Output
false 2 ab
false 4 abcd
false 8 abcdefgh
false 12 abcdefghijkl
true 12 abcdefghijkl
Here we set the initial size of the buffer to 2. When it is not enough, it will be expanded to 2 times the original size, with the maximum size. In this way, when we scan 2 bytes at the beginning, our buffer will be full, and the content of the reader has not read EOF, and then the split function will be executed and the output is:
false 2 ab
Immediately afterwards, the function returns 0, nil, nil, the return value tells the Scanner that the data is not enough. The next time the read position is 0 bits, and you need to continue reading from the reader. At this time, because the buffer is full, the capacity is expanded to 2 * 2 = 4. The content of the reader has not been read to EOF, and the output is
false 4 abcd
Repeat the above steps until the entire content is read, and EOF becomes true at this time
true 12 abcdefghijkl
After reading the above process, do you have a little understanding of the work of SplitFunc? Looking back at the official docs of golang, do you feel a little understanding? The following is the implementation, readers can study how this function works by themselves
ScanLines from Standard Curtain
func ScanLines(data []byte, atEOF bool) (advance int, token []byte, err error) { // means we have scanned to the end if atEOF && len(data) == 0 { return 0, nil, nil } // Find the location of \n if i := (data, '\n'); i >= 0 { // Move the position where you start reading next time i + 1 bit forward return i + 1, dropCR(data[0:i]), nil } // All the reader contents processed here have been read, but the content is not empty, so the remaining data needs to be returned if atEOF { return len(data), dropCR(data), nil } // means that it cannot be divided now, request more data from Reader return 0, nil, nil }
refer to
In-depth introduction to in Golang
Summarize
The above is the entire content of this article. I hope that the content of this article has certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support.