Overseer mainly completes three functions:
1. Lossless shutdown of the connection, 2. Smooth restart of the connection, 3. Automatic restart of file changes.
Let’s talk about it in turn:
1. Lossless closing of the connection
The official net package of golang does not support lossless closing of connections. When the main listening coroutine exits, it will not wait for the processing of each actual work coroutine to complete.
The following is the official golang code:
Go/src/net/http/
func (srv *Server) Serve(l ) error { if fn := testHookServerServe; fn != nil { fn(srv, l) // call hook with unwrapped listener } origListener := l l = &onceCloseListener{Listener: l} defer () if err := srv.setupHTTP2_Serve(); err != nil { return err } if !(&l, true) { return ErrServerClosed } defer (&l, false) baseCtx := () if != nil { baseCtx = (origListener) if baseCtx == nil { panic("BaseContext returned a nil context") } } var tempDelay // how long to sleep on accept failure ctx := (baseCtx, ServerContextKey, srv) for { rw, err := () if err != nil { if () { return ErrServerClosed } if ne, ok := err.(); ok && () { if tempDelay == 0 { tempDelay = 5 * } else { tempDelay *= 2 } if max := 1 * ; tempDelay > max { tempDelay = max } ("http: Accept error: %v; retrying in %v", err, tempDelay) (tempDelay) continue } return err } connCtx := ctx if cc := ; cc != nil { connCtx = cc(connCtx, rw) if connCtx == nil { panic("ConnContext returned nil") } } tempDelay = 0 c := (rw) (, StateNew, runHooks) // before Serve can return go (connCtx) } }
When the listening socket is closed and () exits the loop, the go (connCtx) coroutine is not waiting for the completion of processing.
The overseer process is to wrap golang's listening socket and connection socket, and provide support for the main coroutine to wait for the work coroutine to complete asynchronously.
The overseer code is as follows:
overseer-v1.1.6\
func (l *overseerListener) Accept() (, error) { conn, err := .(*).AcceptTCP() if err != nil { return nil, err } (true) // see (3 * ) // see uconn := overseerConn{ Conn: conn, wg: &, closed: make(chan bool), } go func() { //connection watcher select { case <-: () case <-: //closed manually } }() (1) return uconn, nil } //non-blocking trigger close func (l *overseerListener) release(timeout ) { //stop accepting connections - release fd = () //start timer, close by force if deadline not met waited := make(chan bool) go func() { () waited <- true }() go func() { select { case <-(timeout): close() case <-waited: //no need to force close } }() } //blocking wait for close func (l *overseerListener) Close() error { () return } func (o overseerConn) Close() error { err := () if err == nil { () <- true } return err }
In the (l *overseerListener) Accept function, every time a work connection is generated, (1), and in the (o overseerConn) Close function, every time a work connection is closed, () is executed.
In the asynchronous close mode (l *overseerListener) release function and in the synchronous close mode (l *overseerListener) Close function, () will be called to wait for the work coroutine to complete processing.
Listen to socket closing process:
1. The work process receives a restart signal, or the master process receives a restart signal and forwards it to the work process.
2. The signal processing of the work process includes a call to (l *overseerListener) release.
3. Close the listening socket in (l *overseerListener) release and asynchronously().
4. In the official package net/http/, an error returns () in the (srv *Server) Server, exit the listening loop, and then execute defer (), that is, (l *overseerListener) Close.
5. Execute() synchronously in (l *overseerListener) Close, waiting for the work connection processing to complete.
6. When the work connection processing is completed, the (o overseerConn) Close() will be called, and then () will be called.
7. After all work connection processing is completed, send SIGUSR1 signal to the master process.
8. After the master process receives the SIGUSR1 signal, it writes true to the pipeline.
9. In the (mp *master) fork in the master process, after receiving it, end this fork and enter the next fork.
2. Smooth restart of connection
The so-called smooth restart means that the restart will not cause the client to be disconnected and has no perception of the client. For example, the original queued connection will not be discarded, so the listening socket is passed between the old and new work processes through the master process, rather than the newly opened work process recreating the listening connection.
Listening sockets are created by the master process:
overseer-v1.1.6/proc_master.go
func (mp *master) retreiveFileDescriptors() error { = make([]*, len()) for i, addr := range { a, err := ("tcp", addr) if err != nil { return ("Invalid address %s (%s)", addr, err) } l, err := ("tcp", a) if err != nil { return err } f, err := () if err != nil { return ("Failed to retreive fd for: %s (%s)", addr, err) } if err := (); err != nil { return ("Failed to close listener for: %s (%s)", addr, err) } [i] = f } return nil }
Get the address from it, establish a listening connection, and finally save the file handle.
In this process, (l *TCPListener) Close is called, but it actually has no impact on the work process. The only impact is that the master process cannot read and write listening sockets.
Here we refer to the difference between network socket close and shutdown:
close --- Close the socket id of this process, but the connection is still open. Other processes using this socket id can also use this connection to read or write this socket id.
shutdown --- This destroys the socket connection. When reading, it may detect the EOF ending symbol. When writing, it may receive a SIGPIPE signal. This signal may not be received until the socket buffer is filled. Shutdown also has a closed parameter. 0 cannot be read, 1 cannot be written, and 2 cannot be read or written.
Pass to the child process, i.e. the work process:
overseer-v1.1.6/proc_master.go
func (mp *master) fork() error { ("starting %s", ) cmd := () //mark this new process as the "active" slave process. //this process is assumed to be holding the socket files. = cmd ++ //provide the slave process with some state e := () e = append(e, envBinID+"="+()) e = append(e, envBinPath+"="+) e = append(e, envSlaveID+"="+()) e = append(e, envIsSlave+"=1") e = append(e, envNumFDs+"="+(len())) = e //inherit master args/stdfiles = = = = //include socket files = if err := (); err != nil { return ("Failed to start slave process: %s", err) } //was scheduled to restart, notify success if { = () = false <- true } //convert wait into channel cmdwait := make(chan error) go func() { cmdwait <- () }() //wait.... select { case err := <-cmdwait: //program exited before releasing descriptors //proxy exit code out to master code := 0 if err != nil { code = 1 if exiterr, ok := err.(*); ok { if status, ok := ().(); ok { code = () } } } ("prog exited with %d", code) //if a restarts are disabled or if it was an //unexpected crash, proxy this exit straight //through to the main process if || ! { (code) } case <-: //if descriptors are released, the program //has yielded control of its sockets and //a parallel instance of the program can be //started safely. it should serve //to ensure downtime is kept at <1sec. The previous //() will still be consumed though the //result will be discarded. } return nil }
Pass the socket to the child process through the = statement, and this parameter is finally passed to the fork system call, and the passed fd will be inherited by the child process.
The child process, that is, the work process, handles inherited sockets:
overseer-v1.1.6/proc_slave.go
func (sp *slave) run() error { = (envSlaveID) ("run") = true = (envBinID) = () = = = make(chan bool, 1) = (envBinPath) if err := (); err != nil { return err } if err := (); err != nil { return err } () //run program with state ("start program") () return nil } func (sp *slave) initFileDescriptors() error { //inspect file descriptors numFDs, err := ((envNumFDs)) if err != nil { return ("invalid %s integer", envNumFDs) } = make([]*overseerListener, numFDs) = make([], numFDs) for i := 0; i < numFDs; i++ { f := (uintptr(3+i), "") l, err := (f) if err != nil { return ("failed to inherit file descriptor: %d", i) } u := newOverseerListener(l) [i] = u [i] = u } if len() > 0 { = [0] } return nil }
The child process just repacks the socket and does not create a new listening connection. It is wrapped into the type u := newOverseerListener(l). These listening sockets are finally passed to (), that is, the user's startup program:
overseer-v1.1.6/example/
// convert your 'main()' into a 'prog(state)' // 'prog()' is run in a child process func prog(state ) { ("app#%s (%s) listening...\n", BuildID, ) ("/", (func(w , r *) { d, _ := (().Get("d")) (d) (w, "app#%s (%s) %v says hello\n", BuildID, , ) })) (, nil) ("app#%s (%s) exiting...\n", BuildID, ) } // then create another 'main' which runs the upgrades // 'main()' is run in the initial process func main() { ({ Program: prog, Address: ":5001", Fetcher: &{Path: "my_app_next"}, Debug: true, //display log of overseer actions TerminateTimeout: 10 * , }) }
In the user program (, nil) call:
1. The accept method used is the packaged (l *overseerListener) Accept().
2. Defer () is also wrapped (l *overseerListener) Close().
3. The work connection created by (l *overseerListener) Accept() is also wrapped as an overseerConn connection. (o overseerConn) Close() will be called when closed.
3. Automatic restart of file changes
It can automatically monitor file changes, and automatically trigger the restart process when there are changes.
Check the configuration when the master process starts, and if set, enter fetchLoop:
overseer-v1.1.6/proc_master.go
// fetchLoop is run in a goroutine func (mp *master) fetchLoop() { min := (min) for { t0 := () () //duration fetch of fetch diff := ().Sub(t0) if diff < min { delay := min - diff //ensures at least MinFetchInterval delay. //should be throttled by the fetcher! (delay) } } }
The default is 1 second, which means that changes are checked every second. Type, can set smaller granularity.
The fetchers that have been supported include: fetcher_file.go, fetcher_github.go, fetcher_http.go, fetcher_s3.go.
Take fetcher_file.go as an example.
1. Judgment of document changes:
overseer-v1.1.6/proc_master.go
//tee off to sha1 hash := () reader = (reader, hash) //write to a temp file _, err = (tmpBin, reader) if err != nil { ("failed to write temp binary: %s", err) return } //compare hash newHash := (nil) if (, newHash) { ("hash match - skip") return }
Implemented through the sha1 algorithm, the new and old hash values are compared, and the file timestamp is not paid attention to.
2. Verification is an executable file and supports overseer:
overseer-v1.1.6/proc_master.go
tokenIn := token() cmd := (tmpBinPath) = append((), []string{envBinCheck + "=" + tokenIn}...) = returned := false go func() { (5 * ) if !returned { ("sanity check against fetched executable timed-out, check overseer is running") if != nil { () } } }() tokenOut, err := () returned = true if err != nil { ("failed to run temp binary: %s (%s) output \"%s\"", err, tmpBinPath, tokenOut) return } if tokenIn != string(tokenOut) { ("sanity check failed") return }
This is implemented through the code pre-embedded overseer:
overseer-v1.1.6/
//sanityCheck returns true if a check was performed func sanityCheck() bool { //sanity check if token := (envBinCheck); token != "" { (, token) return true } //legacy sanity check using old env var if token := (envBinCheckLegacy); token != "" { (, token) return true } return false }
This code will be called when main starts, passing a fixed environment variable, and then the command line output will be displayed as it is, which is a success.
3. Overwrite the old files and trigger a restart.
overseer-v1.1.6/proc_master.go
//overwrite! if err := overwrite(, tmpBinPath); err != nil { ("failed to overwrite binary: %s", err) return } ("upgraded binary (%x -> %x)", [:12], newHash[:12]) = newHash //binary successfully replaced if ! { () }
Enter the restart process from (mp *master) triggerRestart:
overseer-v1.1.6/proc_master.go
func (mp *master) triggerRestart() { if { ("already graceful restarting") return //skip } else if == nil || { ("no slave process") return //skip } ("graceful restart triggered") = true mp.awaitingUSR1 = true = () () //ask nicely to terminate select { case <-: //success ("restart success") case <-(): //times up mr. process, we did ask nicely! ("graceful timeout, forcing exit") () } }
Send a signal to the child process. After the child process receives the signal, it closes the listening socket and then sends a SIGUSR1 signal to the parent process:
overseer-v1.1.6/proc_slave.go
if len() > 0 { //perform graceful shutdown for _, l := range { () } //signal release of held sockets, allows master to start //a new process before this child has actually exited. //early restarts not supported with restarts disabled. if ! { (SIGUSR1) } //listeners should be waiting on connections to close... }
After the parent process receives the SIGUSR1 signal, it notifies the pipeline listening socket that it has been closed:
overseer-v1.1.6/proc_master.go
//**during a restart** a SIGUSR1 signals //to the master process that, the file //descriptors have been released if mp.awaitingUSR1 && s == SIGUSR1 { ("signaled, sockets ready") mp.awaitingUSR1 = false <- true } else
Finally, we return to the (mp *master) fork function. The fork function is waiting for notification or the child process exits. After receiving the pipeline notification, fork exits and enters the next round of fork loop.
overseer-v1.1.6/proc_master.go
func (mp *master) fork() error { //... ... //... ... //... ... //convert wait into channel cmdwait := make(chan error) go func() { cmdwait <- () }() //wait.... select { case err := <-cmdwait: //program exited before releasing descriptors //proxy exit code out to master code := 0 if err != nil { code = 1 if exiterr, ok := err.(*); ok { if status, ok := ().(); ok { code = () } } } ("prog exited with %d", code) //if a restarts are disabled or if it was an //unexpected crash, proxy this exit straight //through to the main process if || ! { (code) } case <-: //if descriptors are released, the program //has yielded control of its sockets and //a parallel instance of the program can be //started safely. it should serve //to ensure downtime is kept at <1sec. The previous //() will still be consumed though the //result will be discarded. } return nil }
The above is a detailed explanation of the implementation principle of the Go smooth restart library overseer. For more information about the Go smooth restart library overseer, please pay attention to my other related articles!