GoLang Coroutine Library libtask Learning Notes

What problems did coroutine solve

Let’s take the read operation during a network IO request as an example. The requested data will be copied to the system kernel space first, and then from the operating system kernel space to the application user space. There are two stages in the process of copying data from kernel space to user space:

Wait for data preparation
Copy data

Because of these two stages, there are various network IO models:

Synchronous programming: The application waits for IO results (such as waiting for a large file to be opened, or waiting for a response from a remote server), blocking the current thread.

Advantages: Simple logic.
Disadvantages: The efficiency is too low, and other IO-independent services also have to wait for the IO response.

Asynchronous multithreading/process: separates logic with frequent IO operations or simple IO operations into one/multiple threads, and data is shared between business threads and IO threads by communication/global variables.

Advantages: Make full use of CPU resources to prevent blocking resources.
Disadvantages: Thread switching is relatively expensive and the asynchronous logic code is complex.

Asynchronous message + callback function: Design a message loop processor to receive external messages (including system notifications and network messages, etc.), and call the registered callback function when receiving the message.

Advantages: Make full use of CPU resources to prevent blocking resources.
Disadvantages: The code logic is complex.

Coroutines use synchronous semantics to solve the asynchronous problem, that is, the business logic seems to be synchronous, but in fact it does not block the current thread (usually, it relies on event loop processing to distribute messages). Therefore, coroutines are actually application-level concurrency implemented in a single-threaded environment, which is to implement the switching + saving state originally controlled by the operating system in the application.

Since coroutines handle tasks at the application level, coroutines are more like a function, but with two more actions than ordinary functions:yield()andresume(), that is, give up and restore. When giving up, we need to save the coroutine context of the register, and when recovering, the context is re-pressed into the register and continue execution.

Introduction

Libtask is a simple coroutine library that shows the simplest way to implement coroutines. The operating system can only see one kernel thread and cannot sense the existence of the client coroutine.

Coroutines in Libtask are collaborative, that is, they do not use the time slice rotation algorithm. The scheduler executes coroutines in the ready queue based on the first-come-first-service policy. The scheduler will assign the CPU to the next coroutine only when each coroutine exits actively.

Abstraction of coroutines

In libtask, coroutines are abstracted into a Task structure, and fields in the structure are used to describe coroutines related information:

// A Task can be regarded as a task that needs to be executed asynchronously, an abstract description of coroutinestruct Task
{
	char	name[256];
	char	state[256];
    // Front and back pointers	Task	*next;
	Task	*prev;
	Task	*allnext;
	Task	*allprev;
    // Execution context	Context	context;
    // Sleep time	uvlong	alarmtime;
	uint	id;
    // Coroutine stack pointer	uchar	*stk;
    // Coroutine stack size	uint	stksize;
    // Whether the coroutine exits	int	exiting;
    // Subscript in index in alltask	int	alltaskslot;
    // Is it a system coroutine	int	system;
    // Is it in ready state	int	ready;
    // Functions that Task needs to be executed	void	(*startfn)(void*);
    // startfn parameters	void	*startarg;
    // Customize data	void	*udata;
};

Create a coroutine

int taskcreate(void (*fn)(void*), void *arg, uint stack)
{
	int id;
	Task *t;
    // Allocate space for task and stack	t = taskalloc(fn, arg, stack);
	// Number of coroutines +1	taskcount++;
	id = t-&gt;id;
	if(nalltask%64 == 0){
		alltask = realloc(alltask, (nalltask+64)*sizeof(alltask[0]));
		if(alltask == nil){
			fprint(2, "out of memory\n");
			abort();
		}
	}
    // Record location	t-&gt;alltaskslot = nalltask;
    // Save to alltask	alltask[nalltask++] = t;
    // The modification status is ready, can be scheduled and added to the ready queue	taskready(t);
	return id;
}

We can usetaskcreateFunctions to create coroutines,taskcreateIn the function, it will be called firsttaskallocThe function allocates memory for the Task and execution stack, and then initializes the context information of the coroutine, after which a coroutine is created successfully.

/*
  * The main logic of the taskalloc function is to apply the memory required for the Task structure and the memory of the execution stack.
  * Then initialize the fields.  After this, a coroutine is created successfully, and then taskready is executed
  * Add coroutines to the ready queue.
  * */
static Task*
taskalloc(void (*fn)(void*), void *arg, uint stack)
{
	Task *t;
	sigset_t zero;
	uint x, y;
	ulong z;
	/* allocate the task and stack together */
	// The size and stack size of the structure itself	// The size of the coroutine stack is 256*1024	t = malloc(sizeof *t+stack);
	if(t == nil){
		fprint(2, "taskalloc malloc: %r\n");
		abort();
	}
	memset(t, 0, sizeof *t);
	// Memory location of the stack	t-&gt;stk = (uchar*)(t+1);
	// Stack size	t-&gt;stksize = stack;
	// Coroutine id	t-&gt;id = ++taskidgen;
	// Coroutine working functions and parameters	t-&gt;startfn = fn;
	t-&gt;startarg = arg;
	/* do a reasonable initialization */
	memset(&amp;t-&gt;, 0, sizeof t-&gt;);
	sigemptyset(&amp;zero);
	// Initialize the uc_sigmask field to empty, that is, it does not block the signal	sigprocmask(SIG_BLOCK, &amp;zero, &amp;t-&gt;.uc_sigmask);
	/* must initialize with current context */
	// Initialize the uc field	if(getcontext(&amp;t-&gt;) &lt; 0){
		fprint(2, "getcontext: %r\n");
		abort();
	}
	/* call makecontext to do the real work. */
	/* leave a few words open on both ends */
	// Set the stack position and size of the coroutine when executing	t-&gt;.uc_stack.ss_sp = t-&gt;stk+8;
	t-&gt;.uc_stack.ss_size = t-&gt;stksize-64;
#if defined(__sun__) &amp;&amp; !defined(__MAKECONTEXT_V2_SOURCE)		/* sigh */
#warning "doing sun thing"
	/* can avoid this with __MAKECONTEXT_V2_SOURCE but only on SunOS 5.9 */
	t-&gt;.uc_stack.ss_sp = 
		(char*)t-&gt;.uc_stack.ss_sp
		+t-&gt;.uc_stack.ss_size;
#endif
	/*
	 * All this magic is because you have to pass makecontext a
	 * function that takes some number of word-sized variables,
	 * and on 64-bit machines pointers are bigger than words.
	 */
//print("make %p\n", t);
	z = (ulong)t;
	y = z;
	z &gt;&gt;= 16;	/* hide undefined 32-bit shift from 32-bit compilers */
	x = z&gt;&gt;16;
	// Save information to the uc field	makecontext(&amp;t-&gt;, (void(*)())taskstart, 2, y, x);
	return t;
}

After creating a coroutine,taskcreateThe function will be calledtaskready()The function changes the state of the coroutine to ready state and adds it to the ready queue.

/*
  * Modify the status of the coroutine and join the ready queue
  * */
void taskready(Task *t)
{
	t-&gt;ready = 1;
	addtask(&amp;taskrunqueue, t);
}

How to save context information

We can find that invokingtaskallocWhen the function initializes the coroutine, we will also initialize the coroutine's context. The process of the following code is to save the context information of the current CPU register to the context of the current Task, and at the same time save the stack position and size of the current Task into the context, and finally save the working function of the coroutine to the context information.

	// Set the context to zero	memset(&amp;t-&gt;, 0, sizeof t-&gt;);
	// Clear the signal set zero	sigemptyset(&amp;zero);
	// Initialize the uc_sigmask field to empty, that is, it does not block the signal	sigprocmask(SIG_BLOCK, &amp;zero, &amp;t-&gt;.uc_sigmask);
	// Save the current context information into the structure	if(getcontext(&amp;t-&gt;) &lt; 0){
		fprint(2, "getcontext: %r\n");
		abort();
	}
	// Set the stack position and size of the coroutine when executing	t-&gt;.uc_stack.ss_sp = t-&gt;stk+8;
	t-&gt;.uc_stack.ss_size = t-&gt;stksize-64;
	z = (ulong)t;
	y = z;
	z &gt;&gt;= 16;
	x = z&gt;&gt;16;
	// Set the coroutine's working function into context information	makecontext(&amp;t-&gt;, (void(*)())taskstart, 2, y, x);

ucontext family functions

In fact, the initialization and storage of coroutine context is achieved through the ucontext family function under Linux.

ucontext_t structure

We found that there is a Context field in the Task structure, which is actually correctucontext_tThe packaging of the structure,ucontext_tA structure is used to store the current context information, and its structure is as follows:

typedef struct ucontext
  {
    unsigned long int uc_flags;
    struct ucontext *uc_link;//Post-order context    __sigset_t uc_sigmask;// Signal mask word mask    stack_t uc_stack;// The stack used by the context    // Saved context register information    // For example, pc, sp, bp    // PC program counter: record the address of the next instruction    // sp stack pointer: a pointer to the top of the function call stack, so new data will be stored in the address of sp+1 on the stack    // bp base address pointer: point to the first address of the function call stack    mcontext_t uc_mcontext;
    long int uc_filler[5];
  } ucontext_t;
// where mcontext_t is defined as followstypedef struct
  {
    gregset_t __ctx(gregs);//Loaded register    fpregset_t __ctx(fpregs);//Register type} mcontext_t;
// where gregset_t is defined as followstypedef greg_t gregset_t[NGREG];//Includes all register information

getcontext() function

Function prototype:

int getcontext(ucontext_t* ucp)

getcontext()The underlying layer of the function is implemented through assembly, and its main function is to save the currently running register information into the parameter ucp.

setcontext() function

Function prototype:

int setcontext(const ucontext_t *ucp)

setcontext()The function of the function is toucontext_tThe context information in the structure variable ucp is restored to the CPU and executed.

makecontext() function

Function prototype:

void makecontext(ucontext_t *ucp, void (*func)(), int argc, ...)

makecontext()The main function of the function is to set the working function of the coroutine into the context (ucontext_t), and save some information on the stack set by the user, and set the value of the top pointer into the context information.

argc is the number of parameters of the entry function, followed by...It is a specific entry function parameter, which must be a shaping value.

swapcontext() function

Function prototype:

int swapcontext(ucontext_t *oucp, ucontext_t *ucp)

This function can save the context information in the current CPU tooucpIn the structure variable, thenucpThe context information of the structure is restored to the CPU.

Here it can be understood that two functions were called, the first time it was calledgetcontext(oucp)Then callsetcontext(ucp)。

Co-process scheduling

In usetaskcreateWhen creating a coroutine, this function will be called internallytaskreadyFunctions modify the status of the newly created coroutine and add to the ready queuetaskrunqueuemiddle,taskrunqueueThe coroutine in it requires a scheduler to schedule execution.

A coroutine dispatch center function is implemented in the tasklib library. The dispatch center will constantly take out coroutines from the ready queue to execute, and its core logic is as follows:

Take out a coroutine t from the ready queue and move t out of the ready queue.
passcontextswitchThe function switches the context information of the coroutine t totaskschedcontextexecute in.
Switch coroutine t back to the schedule center. If t has exited, modify the data structure, then recycle its memory, and then continue to schedule other coroutine execution. The scheduling mechanism here is relatively simple, it is a non-preemptive collaborative scheduling. There is no concept of a time slice. The execution time of a coroutine is determined by yourself. The power to give up execution is also controlled by yourself. When the coroutine does not want to execute, you can call it.taskyield()The function gives out the CPU.

static void taskscheduler(void)
{
	int i;
	Task *t;
	taskdebug("scheduler enter");
	for(;;){
		// If there is no ready coroutine, exit		if(taskcount == 0)
			exit(taskexitval);
		// Take out a coroutine from the ready queue		t = ;
		if(t == nil){
			fprint(2, "no runnable tasks! %d tasks stalled\n", taskcount);
			exit(1);
		}
		// Remove this coroutine from the ready queue		deltask(&amp;taskrunqueue, t);
		 // Change coroutine status to non-ready		t-&gt;ready = 0;
		// Save the executing coroutine		taskrunning = t;
		// Number of switches +1		tasknswitch++;
		taskdebug("run %d (%s)", t-&gt;id, t-&gt;name);
		// Switch to t execution and save the context information in the current CPU to taskschedcontext        // Then restore the context information in t->context to CPU for execution		contextswitch(&amp;taskschedcontext, &amp;t-&gt;context);
		// End of execution		taskrunning = nil;
		// The coroutine t that was just executed exits		if(t-&gt;exiting){
			// If it is not a system coroutine, the number of coroutines will be reduced by one			if(!t-&gt;system)
				taskcount--;
			// Save the index of the current coroutine in alltask			i = t-&gt;alltaskslot;
			// Switch the last coroutine to the current coroutine location because the current coroutine is about to exit			alltask[i] = alltask[--nalltask];
			// Update the index of the replaced coroutine			alltask[i]-&gt;alltaskslot = i;
			// Free heap memory			free(t);
		}
	}
}

From the code executed by the above scheduler, we usecontextswitchThe function implements context switching between coroutines, and the internal call of this function is called.swapcontext(&from->uc, &to->uc)Function, this function saves the context information in the current CPU into the from structure variable, and then restores the context information of the to structure to the CPU to execute.

After execution, the context is switched back to the scheduler center again.

static void contextswitch(Context *from, Context *to)
{
	if(swapcontext(&from->uc, &to->uc) < 0){
		fprint(2, "swapcontext failed: %r\n");
		assert(0);
	}
}

In fact, we can also call ittaskyieldFunctions to control the coroutine to actively give up before it is executed. Then the currently executing task will be inserted into the tail of the ready queue, waiting for subsequent scheduling, and the scheduler will then re-take a task from the head of the ready queue for execution.

/*
  * Coroutines take the initiative to give up the CPU
  * 1. Re-join the actively given up coroutine to the ready queue
  * 2. Mark the status of the current coroutine as giving up
  * 3. Logic for executing coroutine switching
  * */
int taskyield(void)
{
	int n;
	// Number of coroutines	n = tasknswitch;
	// Put the currently actively given up coroutine into the waiting queue	taskready(taskrunning);
	// Mark the current coroutine status as "Give Out"	taskstate("yield");
	// Switch coroutine	taskswitch();
	// Equal to 0 means that there is only one coroutine at present. When scheduling, add 1 to taskswitch, so you need to reduce 1 here.	return tasknswitch - n - 1;
}

We can find that when switching the process, it is actually calledtaskswitch()Function, this function will be called internallycontextswitchFunctions to switch context.

/*
  * Switch coroutine
  * */
void taskswitch(void)
{
	needstack(0);
	// Save the context information in the current CPU into the taskrunning->context structure	// Then restore the scheduling center context to the CPU to execute	contextswitch(&amp;taskrunning-&gt;context, &amp;taskschedcontext);
}

Summarize

So the entire scheduling process is like this:

Each coroutine corresponds to a Task structure. Then the dispatching center will continue to schedule the execution of the coroutine in a first-in-first-out manner. Because there is no preemption mechanism, the dispatching center relies on the coroutine itself to drive it. The coroutine needs to actively give up the CPU and switch the context back to the dispatching center. Only then can the dispatching center perform the next round of scheduling.

Of course we can call ittaskyieldThe function actively gives up the CPU, and it will insert the currently executing task into the tail of the ready queue, waiting for subsequent scheduling, and then the scheduler will re-take a task from the head of the ready queue to execute.

This is the article about the GoLang coroutine library libtask learning notes. For more related GoLang libtask content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!