Preface
I believe everyone has heard of the concept of "coroutine".
However, some students are ignorant of this concept and don’t know how to implement it, how to use it, and where to use it. Some people even think that yield is a coroutine!
I always believe that if you cannot express a knowledge point accurately, I can think you just don’t understand.
If you have learned about using PHP to implement coroutines before, you must have read the article by Brother Bird:Using coroutines to implement multi-task scheduling in PHP | Snowy
This article by Brother Bird is translated from a foreign author. The translation is concise and clear, and also gives specific examples.
The purpose of my writing this article is to make more sufficient supplements to Brother Bird’s article. After all, some students’ foundation is still not good enough, and they are confused.
What is a coroutine
Let’s first figure out what coroutines are.
You may have heard of the two concepts of "process" and "thread".
A process is an instance of a binary executable file running in computer memory. It is like your .exe file is a class, and a process is the instance that comes out of new.
A process is the basic unit for resource allocation and scheduling of computer systems (the dispatching unit should not be bothered with thread processes here). Each CPU can only process at the same time.
The so-called parallelism is nothing more than looking parallelism. The CPU is actually switching different processes at a very fast speed.
The switching of a process requires a system call. The CPU must save various information of the current process, and at the same time, it will also cause the CPUCache to be scrapped.
So if you don't have to switch the process, you won't do it.
So how to implement "not doing it if you have to do it without having to pay for the process switch"?
First of all, the conditions for the process to be switched are: the process has been executed, the CPU time slice allocated to the process has ended, the system interrupt needs to be processed, or the process is waiting for necessary resources (process blocking), etc. If you think about it, there is naturally nothing to say in the previous situations, but if you are blocking and waiting, will it be wasted?
In fact, if it is blocked, our program has other executable places to execute, and you don’t have to wait stupidly!
So there are threads.
A thread simply understands a "microprocess" that specifically runs a function (logical flow).
So we can use threads to express functions that can be run simultaneously during the process of writing the program.
There are two types of threads, one is managed and scheduled by the kernel.
We say that as long as the kernel needs to participate in management and scheduling, the cost is very high. This kind of thread actually solves the problem that when a thread executing encounters a blockage in a process, we can schedule another runnable thread to run, but it is still in the same process, so there is no process switching.
There is another type of thread, whose scheduling is managed by the programmer himself and is invisible to the kernel. This kind of thread is called "user space thread".
Coroutines can be understood as user space threads.
Coroutines have several characteristics:
- Collaboration, because it is a scheduling strategy written by the programmer himself, it switches through collaboration rather than preemption.
- Complete creation, switching and destruction in user mode
- ⚠️ From a programming perspective, the idea of coroutines is essentially the active yield and recovery mechanism of control flow.
- Iterators are often used to implement coroutines
Speaking of this, you should understand the basic concept of coroutines, right?
PHP implements coroutines
Take it step by step, start with explaining the concept!
Iterable object
PHP5 provides a way to define objects so that they can be traversed through a list of units, such as with a foreach statement.
If you want to implement an iterable object, you need to implement the Iterator interface:
<?php class MyIterator implements Iterator { private $var = array(); public function __construct($array) { if (is_array($array)) { $this->var = $array; } } public function rewind() { echo "rewinding\n"; reset($this->var); } public function current() { $var = current($this->var); echo "current: $var\n"; return $var; } public function key() { $var = key($this->var); echo "key: $var\n"; return $var; } public function next() { $var = next($this->var); echo "next: $var\n"; return $var; } public function valid() { $var = $this->current() !== false; echo "valid: {$var}\n"; return $var; } } $values = array(1,2,3); $it = new MyIterator($values); foreach ($it as $a => $b) { print "$a: $b\n"; }
Generator
It can be said that in order to have an object that can be traversed by foreach, you had to implement a bunch of methods. The yield keyword is to simplify this process.
The generator provides an easier way to implement simple object iteration, which greatly reduces performance overhead and complexity compared to the way to define classes to implement the Iterator interface.
<?php function xrange($start, $end, $step = 1) { for ($i = $start; $i <= $end; $i += $step) { yield $i; } } foreach (xrange(1, 1000000) as $num) { echo $num, "\n"; }
Remember, if yield is used in a function, it is a generator. It is useless to call it directly and cannot be executed like a function!
So, yield is yield. Next time, who will say yield is a coroutine, I will definitely take you xxxx.
PHP coroutine
When I introduced the coroutines earlier, I said that the coroutines require programmers to write the scheduling mechanism by themselves. Let’s see how to write this mechanism.
0) The generator is used correctly
Since the generator cannot be called directly like a function, how can it be called?
The method is as follows:
- foreach him
- send($value)
- current / next...
1) Task implementation
Task is an abstraction of a task. We just said that coroutines are user space coroutines, and threads can be understood as running a function.
So the constructor of Task is to receive a closure function, which we name as coroutine.
/** * Task task class */ class Task { protected $taskId; protected $coroutine; protected $beforeFirstYield = true; protected $sendValue; /** * Task constructor. * @param $taskId * @param Generator $coroutine */ public function __construct($taskId, Generator $coroutine) { $this->taskId = $taskId; $this->coroutine = $coroutine; } /** * Get the current Task ID * * @return mixed */ public function getTaskId() { return $this->taskId; } /** * Determine whether the Task has been executed * * @return bool */ public function isFinished() { return !$this->coroutine->valid(); } /** * Set the value to be passed to the coroutine next time, such as $id = (yield $xxxx), this value will give $id * * @param $value */ public function setSendValue($value) { $this->sendValue = $value; } /** * Run tasks * * @return mixed */ public function run() { // Note here that the generator starts to reset, so the first value must be obtained using current if ($this->beforeFirstYield) { $this->beforeFirstYield = false; return $this->coroutine->current(); } else { // We said, use send to call a generator $retval = $this->coroutine->send($this->sendValue); $this->sendValue = null; return $retval; } } }
2) Scheduler implementation
Next is Scheduler, the key core part, who plays the role of a dispatcher.
/** * Class Scheduler */ Class Scheduler { /** * @var SplQueue */ protected $taskQueue; /** * @var int */ protected $tid = 0; /** * Scheduler constructor. */ public function __construct() { /* The principle is to maintain a queue. * As mentioned earlier, from a programming perspective, the idea of coroutines is essentially the active yield and recovery mechanism of control flows. * */ $this->taskQueue = new SplQueue(); } /** * Add a task * * @param Generator $task * @return int */ public function addTask(Generator $task) { $tid = $this->tid; $task = new Task($tid, $task); $this->taskQueue->enqueue($task); $this->tid++; return $tid; } /** * Put the task into the queue * * @param Task $task */ public function schedule(Task $task) { $this->taskQueue->enqueue($task); } /** * Run the scheduler */ public function run() { while (!$this->taskQueue->isEmpty()) { // Mission dequeue $task = $this->taskQueue->dequeue(); $res = $task->run(); // Run the task until yield if (!$task->isFinished()) { $this->schedule($task); // If the task has not been fully executed, join the team and wait for the next execution } } } }
In this way, we basically implement a coroutine scheduler.
You can use the following code to test:
<?php function task1() { for ($i = 1; $i <= 10; ++$i) { echo "This is task 1 iteration $i.\n"; yield; // Actively give up the CPU's execution rights } } function task2() { for ($i = 1; $i <= 5; ++$i) { echo "This is task 2 iteration $i.\n"; yield; // Actively give up the CPU's execution rights } } $scheduler = new Scheduler; // Instantiate a scheduler$scheduler->newTask(task1()); // Add different closure functions as tasks$scheduler->newTask(task2()); $scheduler->run();
The key is to talk about where to use PHP coroutines.
function task1() { /* There is a remote task here, which takes 10 seconds. It may be a remote machine to crawl and analyze remote URLs. We just need to submit the results at the remote machine at the end */ remote_task_commit(); // After the request is issued, we should not wait here. We should actively give up the CPU's execution rights to task2 to run. It does not rely on this result yield; yield (remote_task_receive()); ... } function task2() { for ($i = 1; $i <= 5; ++$i) { echo "This is task 2 iteration $i.\n"; yield; // Actively give up the CPU's execution rights } }
This improves the execution efficiency of the program.
Brother Bird has already explained the implementation of "system call", and I will not explain it here.
3) Coroutine Stack
There is also an example of coroutine stack in the article.
As we mentioned above, if yield is used in a function, it cannot be used as a function.
So you nest another coroutine function in one coroutine function:
<?php function echoTimes($msg, $max) { for ($i = 1; $i <= $max; ++$i) { echo "$msg iteration $i\n"; yield; } } function task() { echoTimes('foo', 10); // print foo ten times echo "---\n"; echoTimes('bar', 5); // print bar five times yield; // force it to be a coroutine } $scheduler = new Scheduler; $scheduler->newTask(task()); $scheduler->run();
The echoTimes here cannot be executed! So a coroutine stack is needed.
But that's OK, let's change the code we just did.
Change the initialization method in the Task, because when we run a Task, we need to analyze which child coroutines it contains, and then save the child coroutines on a stack. (Students who are good at C linguistics will naturally understand this. If you don’t understand, I suggest to understand how the memory model of the process handles function calls)
/** * Task constructor. * @param $taskId * @param Generator $coroutine */ public function __construct($taskId, Generator $coroutine) { $this->taskId = $taskId; // $this->coroutine = $coroutine; // If you change to this, the actual Task->run is the stackedCoroutine function, not the closure function saved by $coroutine $this->coroutine = stackedCoroutine($coroutine); }
When Task->run(), a loop is used to analyze:
/** * @param Generator $gen */ function stackedCoroutine(Generator $gen) { $stack = new SplStack; // Continuously traverse this generator passed in for (; ;) { // $gen can be understood as pointing to the currently running coroutine closure function (generator) $value = $gen->current(); // Get the breakpoint, which is the value produced by yield if ($value instanceof Generator) { // If it is also a generator, this is a child coroutine. Put the currently running coroutine on the stack to save it $stack->push($gen); $gen = $value; // Give the child coroutine function to gen and continue to execute. Note that the next step is to execute the child coroutine process. continue; } // We have encapsulated the result returned by the child coroutine, and the following is $isReturnValue = $value instanceof CoroutineReturnValue; // The child coroutine returns `$value` and requires the main coroutine to help deal with it. if (!$gen->valid() || $isReturnValue) { if ($stack->isEmpty()) { return; } // If gen has been executed, or if the child coroutine needs to return the value to the main coroutine for processing $gen = $stack->pop(); //Open the stack and get the main coroutine saved before it is put into the stack. $gen->send($isReturnValue ? $value->getValue() : NULL); // Call the main coroutine to process the output value of the child coroutine continue; } $gen->send(yield $gen->key() => $value); // Continue to execute the subcoroutine } }
Then we add the end mark of echoTime:
class CoroutineReturnValue { protected $value; public function __construct($value) { $this->value = $value; } // Get the output value of the child coroutine to the main coroutine as the send parameter of the main coroutine public function getValue() { return $this->value; } } function retval($value) { return new CoroutineReturnValue($value); }
Then modify echoTimes:
function echoTimes($msg, $max) { for ($i = 1; $i <= $max; ++$i) { echo "$msg iteration $i\n"; yield; } yield retval(""); // Add this as the end mark}
Task becomes:
function task1() { yield echoTimes('bar', 5); }
This implements a coroutine stack, and now you can learn from it.
4) PHP7 yield from keyword
PHP7 has added yield from, so we don’t need to implement the Ctrip stack ourselves, it’s great.
Change the Task constructor back:
public function __construct($taskId, Generator $coroutine) { $this->taskId = $taskId; $this->coroutine = $coroutine; // $this->coroutine = stackedCoroutine($coroutine); //No need to implement it yourself, change it back to the previous one }
echoTimes function:
function echoTimes($msg, $max) { for ($i = 1; $i <= $max; ++$i) { echo "$msg iteration $i\n"; yield; } }
task1 generator:
function task1() { yield from echoTimes('bar', 5); }
This way, it is easy to call the child coroutine.
Summarize
Now you should understand how to implement PHP coroutines?
Okay, the above is the entire content of this article. I hope that the content of this article has a certain reference value for everyone's study or work. If you have any questions, you can leave a message to communicate. Thank you for your support.