PHP operating mechanism and principles (bottom layer)

Speaking of the running mechanism of php, we will first introduce the modules of php. PHP has three modules: the kernel, the Zend engine, and the extension layer; the PHP kernel is used to handle requests, file flows, error handling and other related operations; the Zend engine (ZE) is used to convert source files into machine language and then run it on a virtual machine; the extension layer is a set of functions, class libraries, and streams, which PHP uses to perform some specific operations. For example, we need mysql extension to connect to MySQL database; when ZE executes the program, it may need to connect to several extensions, and then ZE hand over control to the extension and returns it after processing a specific task;

Finally, ZE returns the program running result to the PHP kernel, and then passes the result to the SAPI layer and finally outputs it to the browser.

PHP says it is simple, but it is not easy to be proficient. In addition to knowing how to use it, we also need to know how it works at the bottom.

PHP is a dynamic language suitable for web development. To be specific, it is a software framework that implements a large number of components in C language. To look at it narrower, you can think of it as a powerful UI framework.

Understand what is the purpose of PHP's underlying implementation? To use dynamic language well, you must first understand it. Memory management and framework models are worth learning from, and through extended development, we can achieve more and more powerful functions and optimize the performance of our programs.

1. PHP design concept and features

Multi-process model: Since PHP is a multi-process model, different requests do not interfere with each other, which ensures that a request is slacked will not affect the full service. Of course, with the development of the times, PHP has long supported the multi-threaded model.

Weak-type language: Unlike C/C++, Java, C# and other languages, PHP is a weak-type language. The type of a variable is not determined from the beginning and is determined during operation and may occur implicit or explicit type conversion. The flexibility of this mechanism is very convenient and efficient in web development. It will be described in detail later in PHP variables.

The mode of the engine (Zend)+component (ext) reduces internal coupling.

The intermediate layer (sapi) isolates web server and PHP.

The syntax is simple and flexible, without much specification. The disadvantages lead to mixed styles, but no matter how bad the programmer is, he will not write programs that are too outrageously harmful to the overall situation.

2. PHP's four-layer system

The core architecture of PHP is shown in the figure below:

As can be seen from the figure, PHP is a 4-layer system from bottom to top:

Zend engine: Zend is implemented in pure C as a whole, and is the kernel part of PHP. It translates PHP code (a series of compilation processes such as lexicography and syntax parsing) to execute opcode processing and implements corresponding processing methods, implements basic data structures (such as hashtable, oo), memory allocation and management, and provides corresponding API methods for external calls. It is the core of everything, and all peripheral functions are implemented around Zend.

Extensions: Focusing on the Zend engine, extensions provide various basic services in a component way. The various built-in functions we commonly use (such as array series), standard libraries, etc. are all implemented through extension. Users can also implement their own extension as needed to achieve functional expansion, performance optimization and other purposes (such as the PHP intermediate layer and rich text analysis that Tieba is using are typical applications of extension).

Sapi: The full name of Sapi is Server Application Programming Interface, which is the server application programming interface. Sapi uses a series of hook functions to enable PHP to interact with the peripheral data. This is a very elegant and successful design of PHP. Through sapi, PHP is successfully decoupled and isolated from the upper-level applications. PHP can no longer consider how to be compatible for different applications, and the application itself can also implement different processing methods based on its own characteristics.

Upper-level application: This is the PHP program we usually write. We obtain various application modes through different sapi methods, such as implementing web applications through webserver, running scripts under the command line, etc.

If PHP is a car, then the frame of the car is PHP itself, Zend is the car's engine (engine). The various components under Ext are the car's wheels. Sapi can be regarded as a road. The car can run on different types of roads. The execution of a PHP program is that the car runs on the road. Therefore, we need: excellent performance engine + proper wheels + correct runway.

3. Sapi

As mentioned earlier, Sapi uses a series of interfaces to enable external applications to exchange data with PHP and implement specific processing methods according to different application characteristics. Some of our common sapis are:

apache2handler: This is the processing method that uses apache as the webserver and uses mod_PHP mode to runtime, and it is also the most widely used one now.

cgi: This is another direct interaction between webserver and PHP, which is the famous fastcgi protocol. In recent years, fastcgi+PHP has received more and more applications, and is also the only way supported by asynchronous webservers.

cli: Application mode of command line call

4. PHP execution process &opcode

Let’s first take a look at the process of execution of PHP code.

As can be seen from the figure, PHP implements a typical dynamic language execution process: after getting a piece of code, after going through lexical parsing, syntax parsing and other stages, the source program will be translated into instructions (opcodes), and then the ZEND virtual machine executes these instructions in turn to complete the operation. PHP itself is implemented in C, so the final call is also C functions. In fact, we can regard PHP as a C-developed software.

The core of PHP execution is the translated instructions, that is, opcode.

Opcode is the most basic unit of PHP program execution. An opcode consists of two parameters (op1,op2), a return value and a processing function. The PHP program is eventually translated into a set of opcode processing functions in sequence.

Several common processing functions:

ZEND_ASSIGN_SPEC_CV_CV_HANDLER : Variable allocation ($a=$b)

ZEND_DO_FCALL_BY_NAME_SPEC_HANDLER: Function call

ZEND_CONCAT_SPEC_CV_CV_HANDLER: String splicing $a.$b

ZEND_ADD_SPEC_CV_CONST_HANDLER: Addition operation $a+2

ZEND_IS_EQUAL_SPEC_CV_CONST: judge equality $a==1

ZEND_IS_IDENTICAL_SPEC_CV_CONST: judge equality $a===1

5. HashTable — Core Data Structure

HashTable is the core data structure of zend, which is used to implement almost all common functions in PHP. The PHP array we know is its typical application. In addition, within zend, such as function symbol tables, global variables, etc., are also implemented based on hash tables.

PHP hash table has the following features:

Supports typical key->value query

Can be used as an array

Adding and deleting nodes is O(1) complexity

Key supports mixed types: associative index arrays exist at the same time

Value supports mixed types: array ("string", 2332)

Support linear traversal: such as foreach

Zend hash table implements a typical hash table hash structure, and at the same time, it provides the function of forward and reverse traversing arrays by attaching a two-way linked list. The structure is as follows:

It can be seen that there is a hash structure in the hash table in the form of key->value and a two-way linked list pattern, which makes it very convenient to support quick search and linear traversal.

Hash structure: Zend's hash structure is a typical hash table model, and conflicts are resolved through linked lists. It should be noted that zend's hash table is a self-growing data structure. When the number of hash tables is full, it will dynamically expand the capacity and re-element position in a 2x manner. The initial sizes are all 8. In addition, when performing a quick search for key->value, zend itself has also made some optimizations to speed up by changing space to time. For example, in each element, a variable nKeyLength will be used to identify the length of the key for quick judgment.

Bidirectional linked list: Zend hash table realizes linear traversal of elements through a linked list structure. In theory, it is enough to use a one-way linked list for traversal. The main purpose of using a two-way linked list is to quickly delete and avoid traversal. Zend hash table is a composite structure. When used as an array, it supports common associative arrays and can also be used as sequential index numbers, and even allows mixing of two.

PHP Association Array: Association Array is a typical hash_table application. The query process goes through the following steps (as can be seen from the code, this is a common hash query process and adds some quick judgment to speed up searches.):

getKeyHashValue h;
index = n & nTableMask;
Bucket *p = arBucket[index];
while (p) {
  if ((p->h == h) & (p->nKeyLength == nKeyLength)) {
    RETURN p->data; 
  }
  p=p->next;
}

PHP index array: Index array is our common array, accessed through subscripts. For example, $arr[0], Zend HashTable is normalized internally, and the hash value and nKeyLength (0) are also assigned to the index type key. The internal member variable nNextFreeElement is the maximum id currently allocated, and one is automatically added after each push. It is precisely this normalization that PHP can achieve a mix of association and non-association. Due to the particularity of push operation, the order of index keys in PHP array is not determined by the subscript size, but by the order of push. For example, $arr[1] = 2; $arr[2] = 3; For double type key, Zend HashTable will treat it as an index key

6. PHP variables

PHP is a weak-type language that does not strictly distinguish variable types. PHP does not need to specify a type when declaring a variable. PHP may perform implicit conversion of variable types during program run. Like other strongly typed languages, display type conversion can also be performed in programs. PHP variables can be divided into simple types (int, string, bool), collection types (array resource object) and constants (const). All the above variables are the same structure at the bottom layer zval.

Zval is another very important data structure in zend, used to identify and implement PHP variables, and its data structure is as follows:

Zval mainly consists of three parts:

type: Specifies the type described by the variable (integral, string, array, etc.)

refcount&is_ref: used to implement reference counting (more introduction later)

value: core part, storing the actual data of the variable

Zvalue is used to save the actual data of a variable. Because multiple types are to be stored, zvalue is a union, which also implements weak types.

The corresponding relationship between PHP variable types and their actual storage is as follows:

IS_LONG -> lvalue
IS_DOUBLE -> dvalue
IS_ARRAY -> ht
IS_STRING -> str
IS_RESOURCE -> lvalue

Reference counting is widely used in memory recycling, string operation, etc. Variables in PHP are typical applications of reference counting. Zval's reference count is implemented through member variables is_ref and ref_count. Through reference counting, multiple variables can share the same data. Avoid the heavy consumption caused by frequent copying.

When performing assignment operations, zend points the variable to the same zval and ref_count++ at the same time. During the unset operation, the corresponding ref_count-1. The destruction operation will only be performed if ref_count is reduced to 0. If it is a reference assignment, zend will modify is_ref to 1.

PHP variables realize variable sharing data through reference counting. What if one of the variables is changed? When trying to write a variable, if Zend finds that the zval pointed to by the variable is shared by multiple variables, it will copy a copy of zval with ref_count of 1 and decrement the refcount of the original zval. This process is called "zval separation". It can be seen that zend only performs copy operations when a write operation occurs, so it is also called copy-on-write (copy on write)

For reference variables, the requirements are contrary to non-referenced. The variables that refer to the assignment must be bundled. Modifying one variable changes all bundled variables.

Integers and floating-point numbers are one of the basic types in PHP and are also a simple variable. For integers and floating-point numbers, the corresponding value is stored directly in zvalue. The types are long and double respectively.

From the zvalue structure, it can be seen that for integer types, unlike strongly typed languages such as c, PHP does not distinguish between int, unsigned int, long, long, etc. For it, there is only one type of integers, that is, long. From this, it can be seen that in PHP, the value range of integers is determined by the compiler bits rather than fixed.

For floating-point numbers, similar to integers, it does not distinguish between float and double but unifies only one type of double.

In PHP, what if the integer range is out of bounds? In this case, it will automatically convert to double type. Be careful, and many tricks are generated from this.

Like integers, character variables are also basic types and simple variables in PHP. From the zvalue structure, we can see that in PHP, a string consists of a pointer to the actual data and a length structure, which is similar to string in C++. Since the length is represented by an actual variable, unlike c, its string can be binary data (including). At the same time, in PHP, finding the length of the string is O(1) operation.

When adding, modifying, or appending string operations, PHP will reallocate memory to generate new strings. Finally, for security reasons, PHP will still add at the end when generating a string

Common string splicing methods and speed comparison:

Suppose there are the following 4 variables: $strA='123'; $strB = '456'; $intA=123; intB=456;

Now let’s compare and explain the following string splicing methods:

$res = $strA.$strB and $res = "$strA$strB"

In this case, zend will re-master a piece of memory and process it accordingly, and its speed is generally

$strA = $strA.$strB

This is the fastest, and zend will directly relloc based on the current strA to avoid repeated copying

$res = $intA.$intB

This speed is slower because implicit format conversion is required. You should also pay attention to avoiding it when writing programs.

$strA = sprintf (“%s%s”,$strA.$strB);

This will be the slowest way, because sprintf is not a language structure in PHP, and it takes a lot of time to identify and process formats, and its mechanism is malloc. However, the sprintf method is the most readable, and in practice you can choose flexibly according to the specific situation.

PHP arrays are naturally implemented through Zend HashTable.

How to implement the foreach operation? The foreach of an array is done by traversing the two-way linked list in the hashtable. For index arrays, the efficiency of traversal through foreach is much higher than that of for, and the search of key->value is eliminated. The count operation directly calls the HashTable->NumOfElements, O(1) operation. For strings like '123', zend is converted to its integer form. $arr['123'] and $arr[123] are equivalent

Resource type variable is the most complex variable in PHP and is also a composite structure.

PHP's zval can represent a wide range of data types, but it is difficult to fully describe custom data types. Since there is no efficient way to depict these composite structures, there is no way to use traditional operators for them. To solve this problem, you only need to refer to a pointer through an essentially arbitrary identifier (label), which is called a resource.

In zval, for resource, lval is used as a pointer, directly pointing to the address where the resource is located. Resource can be any composite structure, and the familiar mysqli, fsock, memcached, etc. are all resources.

How to use resources:

Register: For a custom data type, you want to use it as a resource. First, registering is required, and zend will assign it a globally unique label.

Get a resource variable: For resources, zend maintains a hash_tale of id->actual data. For a resource, only its id is recorded in zval. When fetching, the specific value is found in hash_table and returned.

Resource Destruction: The data types of resources are diverse. Zend itself has no way to destroy it. Therefore, users need to provide a destruction function when registering resources. When unset resource, zend calls the corresponding function to complete the destruction. Also delete it from the global resource table.

Resources can reside for a long time, not just after all variables referring to it are out of scope, or even after a request has ended and a new request has been generated. These resources are called persistent resources because they persist throughout the entire life cycle of SAPI unless specifically destroyed. In many cases, persistent resources can improve performance to a certain extent. For example, our common mysql_pconnect , persistent resources allocate memory through pemalloc, so that they will not be released when the request ends.

For zend, there is no distinction between the two itself.

How are local and global variables implemented in PHP? For a request, PHP can see two symbol tables (symbol_table and active_symbol_table) at any time, where the former is used to maintain global variables. The latter is a pointer pointing to the currently active variable symbol table. When the program enters a function, zend will assign it a symbol table x and point active_symbol_table to a. In this way, the distinction between global and local variables is achieved.

Get variable value: PHP's symbol table is implemented through hash_table. Each variable is assigned a unique identifier. When obtaining, it is found according to the identifier and returns it from the table.

Use global variables in functions: In functions, we can use global variables by explicitly declaring global. Create a reference to the variable with the same name in symbol_table in active_symbol_table. If there is no variable with the same name in symbol_table, it will be created first.