yield is similar to return, but differs in that it returns the generator.
generator
A generator is a function formed by one or more yield expressions, each of which is an iterator (but an iterator is not necessarily a generator).
If a function contains the yield keyword, the function becomes a generator.
The generator does not return all the results at once, but returns the corresponding result each time the yield keyword is encountered, and retains the current state of the function, waiting for the next call.
Since the generator is also an iterator, it should support the next method to get the next value.
basic operation
# Creating generators via `yield` def func(): for i in xrange(10); yield i
# Create generators from lists [i for i in xrange(10)] # Creating generators via `yield` def func(): for i in xrange(10); yield i # Create generators from lists [i for i in xrange(10)] Python # Calls are as follows >>> f = func() >>> f # The generator isn't running yet <generator object func at 0x7fe01a853820> >>> () # When i=0, the yield keyword is encountered and returned directly >>> () # Continue where you left off in the previous execution and move to the next level of the loop ... >>> () >>> () # When the last loop is executed, end the yield statement and generate a StopIteration exception. Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration >>> # Calls are as follows >>> f = func() >>> f # The generator isn't running yet <generator object func at 0x7fe01a853820> >>> () # When i=0, the yield keyword is encountered and returned directly >>> () # Continue where you left off in the previous execution and move to the next level of the loop ... >>> () >>> () # When the last loop is executed, end the yield statement and generate a StopIteration exception. Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration >>>
In addition to the next function, the generator also supports the send function. This function can pass parameters to the generator.
>>> def func(): ... n = 0 ... while 1: ... n = yield n # You can assign a value to n with the send function ... >>> f = func() >>> () # n is 0 by default >>> (1) #n assigned 1 >>> (2) >>> >>> def func(): ... n = 0 ... while 1: ... n = yield n # You can assign a value to n with the send function ... >>> f = func() >>> () # n is 0 by default >>> (1) #n assigned 1 >>> (2) >>>
appliance
The most classical example, generating infinite sequences.
The conventional solution is to generate a very large list that satisfies the requirements, which needs to be kept in memory, and obviously memory limits this.
def get_primes(start): for element in magical_infinite_range(start): if is_prime(element): return element def get_primes(start): for element in magical_infinite_range(start): if is_prime(element): return element
If you use a generator you don't need to return the whole list, just one piece of data at a time, avoiding the problem of memory constraints.
def get_primes(number): while True: if is_prime(number): yield number number += 1 def get_primes(number): while True: if is_prime(number): yield number number += 1
Generator Source Code Analysis
The source code for the generator is in Objects/.
callstack
Before explaining the generator, it is necessary to explain the principle of calling the Python virtual machine.
The Python VM has a call stack of stack frames, where the stack frame is PyFrameObject, located in Include/.
typedef struct _frame { PyObject_VAR_HEAD struct _frame *f_back; /* previous frame, or NULL */ PyCodeObject *f_code; /* code segment */ PyObject *f_builtins; /* builtin symbol table (PyDictObject) */ PyObject *f_globals; /* global symbol table (PyDictObject) */ PyObject *f_locals; /* local symbol table (any mapping) */ PyObject **f_valuestack; /* points after the last local */ /* Next free slot in f_valuestack. Frame creation sets to f_valuestack. Frame evaluation usually NULLs it, but a frame that yields sets it to the current stack top. */ PyObject **f_stacktop; PyObject *f_trace; /* Trace function */ /* If an exception is raised in this frame, the next three are used to * record the exception info (if any) originally in the thread state. See * comments before set_exc_info() -- it's not obvious. * Invariant: if _type is NULL, then so are _value and _traceback. * Desired invariant: all three are NULL, or all three are non-NULL. That * one isn't currently true, but "should be". */ PyObject *f_exc_type, *f_exc_value, *f_exc_traceback; PyThreadState *f_tstate; int f_lasti; /* Last instruction if called */ /* Call PyFrame_GetLineNumber() instead of reading this field directly. As of 2.3 f_lineno is only valid when tracing is active (. when f_trace is set). At other times we use PyCode_Addr2Line to calculate the line from the current bytecode index. */ int f_lineno; /* Current line number */ int f_iblock; /* index in f_blockstack */ PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */ PyObject *f_localsplus[1]; /* locals+stack, dynamically sized */ } PyFrameObject; typedef struct _frame { PyObject_VAR_HEAD struct _frame *f_back; /* previous frame, or NULL */ PyCodeObject *f_code; /* code segment */ PyObject *f_builtins; /* builtin symbol table (PyDictObject) */ PyObject *f_globals; /* global symbol table (PyDictObject) */ PyObject *f_locals; /* local symbol table (any mapping) */ PyObject **f_valuestack; /* points after the last local */ /* Next free slot in f_valuestack. Frame creation sets to f_valuestack. Frame evaluation usually NULLs it, but a frame that yields sets it to the current stack top. */ PyObject **f_stacktop; PyObject *f_trace; /* Trace function */ /* If an exception is raised in this frame, the next three are used to * record the exception info (if any) originally in the thread state. See * comments before set_exc_info() -- it's not obvious. * Invariant: if _type is NULL, then so are _value and _traceback. * Desired invariant: all three are NULL, or all three are non-NULL. That * one isn't currently true, but "should be". */ PyObject *f_exc_type, *f_exc_value, *f_exc_traceback; PyThreadState *f_tstate; int f_lasti; /* Last instruction if called */ /* Call PyFrame_GetLineNumber() instead of reading this field directly. As of 2.3 f_lineno is only valid when tracing is active (. when f_trace is set). At other times we use PyCode_Addr2Line to calculate the line from the current bytecode index. */ int f_lineno; /* Current line number */ int f_iblock; /* index in f_blockstack */ PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */ PyObject *f_localsplus[1]; /* locals+stack, dynamically sized */ } PyFrameObject;
The stack frame holds the information and context of the given code, which contains information about the last executed instruction, global and local namespaces, exception status, etc. f_valueblock holds the data, and b_blockstack holds the exceptions and loop control methods.
To give an example of this.
def foo(): x = 1 def bar(y): z = y + 2 # def foo(): x = 1 def bar(y): z = y + 2 #
Then, the corresponding call stack is as follows: a py file, a class, a function is a block of code, corresponding to a Frame, which holds the context and bytecode instructions.
c --------------------------- a | bar Frame | -> block stack: [] l | (newest) | -> data stack: [1, 2] l --------------------------- | foo Frame | -> block stack: [] s | | -> data stack: [.bar at 0x10d389680>, 1] t --------------------------- a | main (module) Frame | -> block stack: [] c | (oldest) | -> data stack: [] k --------------------------- c --------------------------- a | bar Frame | -> block stack: [] l | (newest) | -> data stack: [1, 2] l --------------------------- | foo Frame | -> block stack: [] s | | -> data stack: [.bar at 0x10d389680>, 1] t --------------------------- a | main (module) Frame | -> block stack: [] c | (oldest) | -> data stack: [] k ---------------------------
Each stack frame has its own data stack and block stack, and the separate data and block stacks allow the interpreter to interrupt and restore stack frames (the generator formally takes advantage of this).
Python code is first compiled into bytecode, which is then executed by the Python virtual machine. In general, one Python statement corresponds to multiple bytecodes (since each bytecode corresponds to a C statement, not a machine instruction, code performance cannot be judged by the number of bytecodes).
The bytecode can be analyzed by calling the dis module.
from dis import dis dis(foo) 0 LOAD_CONST 1 (1) # Load constant 1 3 STORE_FAST 0 (x) # x is assigned to 1 6 LOAD_CONST 2 (<code>) # Load constant 2 9 MAKE_FUNCTION 0 # Create functions 12 STORE_FAST 1 (bar) 15 LOAD_FAST 1 (bar) 18 LOAD_FAST 0 (x) 21 CALL_FUNCTION 1 # Calling functions 24 RETURN_VALUE </code> from dis import dis dis(foo) 0 LOAD_CONST 1 (1) # Load constant 1 3 STORE_FAST 0 (x) # x is assigned to 1 6 LOAD_CONST 2 (<code>) # Load constant 2 9 MAKE_FUNCTION 0 # Create functions 12 STORE_FAST 1 (bar) 15 LOAD_FAST 1 (bar) 18 LOAD_FAST 0 (x) 21 CALL_FUNCTION 1 # Calling functions 24 RETURN_VALUE </code>
Among them.
The first line is the code line number;
The second line is the offset address;
The third line is a bytecode instruction;
The fourth line is the command parameter;
The fifth behavioral parameter is explained.
The first line is the code line number;
The second line is the offset address;
The third line is a bytecode instruction;
The fourth line is the command parameter;
The fifth behavioral parameter is explained.
Generator Source Code Analysis
By the above understanding of the call stack, it is easy to understand the specific implementation of the generator.
The source code for the generator is located in object/.
Generator creation
PyObject * PyGen_New(PyFrameObject *f) { PyGenObject *gen = PyObject_GC_New(PyGenObject, &PyGen_Type); # Create generator objects if (gen == NULL) { Py_DECREF(f); return NULL; } gen->gi_frame = f; # Assign code blocks Py_INCREF(f->f_code); # Reference count +1 gen->gi_code = (PyObject *)(f->f_code); gen->gi_running = 0; # 0 denotes execution, which is the initial state of the generator gen->gi_weakreflist = NULL; _PyObject_GC_TRACK(gen); # GC tracking return (PyObject *)gen; } PyObject * PyGen_New(PyFrameObject *f) { PyGenObject *gen = PyObject_GC_New(PyGenObject, &PyGen_Type); # Create generator objects if (gen == NULL) { Py_DECREF(f); return NULL; } gen->gi_frame = f; # Assign code blocks Py_INCREF(f->f_code); # Reference count +1 gen->gi_code = (PyObject *)(f->f_code); gen->gi_running = 0; # 0 denotes execution, which is the initial state of the generator gen->gi_weakreflist = NULL; _PyObject_GC_TRACK(gen); # GC tracking return (PyObject *)gen; }
send and next
The next and send functions, as follows
static PyObject * gen_iternext(PyGenObject *gen) { return gen_send_ex(gen, NULL, 0); } static PyObject * gen_send(PyGenObject *gen, PyObject *arg) { return gen_send_ex(gen, arg, 0); } static PyObject * gen_iternext(PyGenObject *gen) { return gen_send_ex(gen, NULL, 0); } static PyObject * gen_send(PyGenObject *gen, PyObject *arg) { return gen_send_ex(gen, arg, 0); }
As you can see from the code above, both send and next call the same function gen_send_ex, the difference being whether it comes with parameters or not.
static PyObject * gen_send_ex(PyGenObject *gen, PyObject *arg, int exc) { PyThreadState *tstate = PyThreadState_GET(); PyFrameObject *f = gen->gi_frame; PyObject *result; if (gen->gi_running) { # Determine if the generator has run PyErr_SetString(PyExc_ValueError, "generator already executing"); return NULL; } if (f==NULL || f->f_stacktop == NULL) { # Throw StopIteration exception if code block is empty or call stack is empty /* Only set exception if called from send() */ if (arg && !exc) PyErr_SetNone(PyExc_StopIteration); return NULL; } if (f->f_lasti == -1) { # f_lasti=1 for first execution if (arg && arg != Py_None) { # Parameters are not allowed for the first execution PyErr_SetString(PyExc_TypeError, "can't send non-None value to a " "just-started generator"); return NULL; } } else { /* Push arg onto the frame's value stack */ result = arg ? arg : Py_None; Py_INCREF(result); # Reference count of this parameter +1 *(f->f_stacktop++) = result; # Parameter stack } /* Generators always return to their most recent caller, not * necessarily their creator. */ f->f_tstate = tstate; Py_XINCREF(tstate->frame); assert(f->f_back == NULL); f->f_back = tstate->frame; gen->gi_running = 1; # Modify generator execution state result = PyEval_EvalFrameEx(f, exc); # Execute bytecode gen->gi_running = 0; # Revert to unimplemented status /* Don't keep the reference to f_back any longer than necessary. It * may keep a chain of frames alive or it could create a reference * cycle. */ assert(f->f_back == tstate->frame); Py_CLEAR(f->f_back); /* Clear the borrowed reference to the thread state */ f->f_tstate = NULL; /* If the generator just returned (as opposed to yielding), signal * that the generator is exhausted. */ if (result == Py_None && f->f_stacktop == NULL) { Py_DECREF(result); result = NULL; /* Set exception if not called by gen_iternext() */ if (arg) PyErr_SetNone(PyExc_StopIteration); } if (!result || f->f_stacktop == NULL) { /* generator can't be rerun, so release the frame */ Py_DECREF(f); gen->gi_frame = NULL; } return result; } static PyObject * gen_send_ex(PyGenObject *gen, PyObject *arg, int exc) { PyThreadState *tstate = PyThreadState_GET(); PyFrameObject *f = gen->gi_frame; PyObject *result; if (gen->gi_running) { # Determine if the generator has run PyErr_SetString(PyExc_ValueError, "generator already executing"); return NULL; } if (f==NULL || f->f_stacktop == NULL) { # Throw StopIteration exception if code block is empty or call stack is empty /* Only set exception if called from send() */ if (arg && !exc) PyErr_SetNone(PyExc_StopIteration); return NULL; } if (f->f_lasti == -1) { # f_lasti=1 for first execution if (arg && arg != Py_None) { # Parameters are not allowed for the first execution PyErr_SetString(PyExc_TypeError, "can't send non-None value to a " "just-started generator"); return NULL; } } else { /* Push arg onto the frame's value stack */ result = arg ? arg : Py_None; Py_INCREF(result); # Reference count of this parameter +1 *(f->f_stacktop++) = result; # Parameter stack } /* Generators always return to their most recent caller, not * necessarily their creator. */ f->f_tstate = tstate; Py_XINCREF(tstate->frame); assert(f->f_back == NULL); f->f_back = tstate->frame; gen->gi_running = 1; # Modify generator execution state result = PyEval_EvalFrameEx(f, exc); # Execute bytecode gen->gi_running = 0; # Revert to unimplemented status /* Don't keep the reference to f_back any longer than necessary. It * may keep a chain of frames alive or it could create a reference * cycle. */ assert(f->f_back == tstate->frame); Py_CLEAR(f->f_back); /* Clear the borrowed reference to the thread state */ f->f_tstate = NULL; /* If the generator just returned (as opposed to yielding), signal * that the generator is exhausted. */ if (result == Py_None && f->f_stacktop == NULL) { Py_DECREF(result); result = NULL; /* Set exception if not called by gen_iternext() */ if (arg) PyErr_SetNone(PyExc_StopIteration); } if (!result || f->f_stacktop == NULL) { /* generator can't be rerun, so release the frame */ Py_DECREF(f); gen->gi_frame = NULL; } return result; }
Byte code execution
The function of PyEval_EvalFrameEx is to execute the bytecode and return the result.
# The main processes are as follows. for (;;) { switch(opcode) { # opcode is the opcode, which corresponds to various operations case NOP: goto fast_next_opcode; ... ... case YIELD_VALUE: # If the opcode is yield retval = POP(); f->f_stacktop = stack_pointer; why = WHY_YIELD; goto fast_yield; # Use goto to jump out of the loop } } fast_yield: ... return vetval; # Return results # The main processes are as follows. for (;;) { switch(opcode) { # opcode is the opcode, which corresponds to various operations case NOP: goto fast_next_opcode; ... ... case YIELD_VALUE: # If the opcode is yield retval = POP(); f->f_stacktop = stack_pointer; why = WHY_YIELD; goto fast_yield; # Use goto to jump out of the loop } } fast_yield: ... return vetval; # Return results
As an example, f_back the last Frame, f_lasti the offset of the last instruction executed, the
import sys from dis import dis def func(): f = sys._getframe(0) print f.f_lasti print f.f_back yield 1 print f.f_lasti print f.f_back yield 2 a = func() dis(func) () () import sys from dis import dis def func(): f = sys._getframe(0) print f.f_lasti print f.f_back yield 1 print f.f_lasti print f.f_back yield 2 a = func() dis(func) () ()
The result is as follows, where the English in the third line is the opcode, which corresponds to the opcode above, and each switch is a choice between different opcodes.
Python 0 LOAD_GLOBAL 0 (sys) 3 LOAD_ATTR 1 (_getframe) 6 LOAD_CONST 1 (0) 9 CALL_FUNCTION 1 12 STORE_FAST 0 (f) 15 LOAD_FAST 0 (f) 18 LOAD_ATTR 2 (f_lasti) 21 PRINT_ITEM 22 PRINT_NEWLINE 23 LOAD_FAST 0 (f) 26 LOAD_ATTR 3 (f_back) 29 PRINT_ITEM 30 PRINT_NEWLINE 31 LOAD_CONST 2 (1) 34 YIELD_VALUE # At this point the opcode is YIELD_VALUE, directly jump to the above goto statement, at this point f_lasti for the current instruction, f_back for the current frame 35 POP_TOP 36 LOAD_FAST 0 (f) 39 LOAD_ATTR 2 (f_lasti) 42 PRINT_ITEM 43 PRINT_NEWLINE 44 LOAD_FAST 0 (f) 47 LOAD_ATTR 3 (f_back) 50 PRINT_ITEM 51 PRINT_NEWLINE 52 LOAD_CONST 3 (2) 55 YIELD_VALUE 56 POP_TOP 57 LOAD_CONST 0 (None) 60 RETURN_VALUE <frame object at 0x7fa75fcebc20> # is the same as the following frame and belongs to the same frame, that is, the frame is the same within the same function (namespace). <frame object at 0x7fa75fcebc20> 0 LOAD_GLOBAL 0 (sys) 3 LOAD_ATTR 1 (_getframe) 6 LOAD_CONST 1 (0) 9 CALL_FUNCTION 1 12 STORE_FAST 0 (f) 15 LOAD_FAST 0 (f) 18 LOAD_ATTR 2 (f_lasti) 21 PRINT_ITEM 22 PRINT_NEWLINE 23 LOAD_FAST 0 (f) 26 LOAD_ATTR 3 (f_back) 29 PRINT_ITEM 30 PRINT_NEWLINE 31 LOAD_CONST 2 (1) 34 YIELD_VALUE # At this point the opcode is YIELD_VALUE, directly jump to the above goto statement, at this point f_lasti for the current instruction, f_back for the current frame 35 POP_TOP 36 LOAD_FAST 0 (f) 39 LOAD_ATTR 2 (f_lasti) 42 PRINT_ITEM 43 PRINT_NEWLINE 44 LOAD_FAST 0 (f) 47 LOAD_ATTR 3 (f_back) 50 PRINT_ITEM 51 PRINT_NEWLINE 52 LOAD_CONST 3 (2) 55 YIELD_VALUE 56 POP_TOP 57 LOAD_CONST 0 (None) 60 RETURN_VALUE <frame object at 0x7fa75fcebc20> # is the same as the following frame and belongs to the same frame, that is, the frame is the same within the same function (namespace). <frame object at 0x7fa75fcebc20>
summarize
The above is a small introduction to the Python yield and implementation of the method code analysis, I hope to help you, if you have any questions please leave me a message, I will promptly reply to you. I would also like to thank you very much for your support of my website!