The best assembly language tutorial explains how the CPU executes the code

Learning programming is actually learning advanced languages, that is, computer languages designed for humans.

However, computers do not understand high-level languages and must be converted into binary code through the compiler before they can run. Learning high-level language does not mean understanding the actual running steps of the computer.

What computers can really understand is low-level languages, which are specifically used to control hardware. Assembly language is a low-level language that directly describes/controls the operation of the CPU. If you want to know what the CPU does and the running steps of the code, you must learn assembly language.

Assembly language is not easy to learn, and even concise introductions are difficult to find. Try to write a tutorial on the best assembly language that explains how the CPU executes code.

1. What is assembly language?

We know that the CPU is only responsible for computing and does not have intelligence. If you enter an instruction, it runs once, then stops and waits for the next instruction.

These instructions are binary, called opcode, for example, the addition instructions are00000011. The function of the compiler is to translate programs written in high-level languages into operation codes.

For humans, binary programs are unreadable and there is no way to tell what the machine is doing. In order to solve the problem of readability and the occasional editing needs, assembly language was born.

Assembly language is the text form of binary instructions, which corresponds to instructions one by one. For example, the addition instruction00000011Written in assembly language is ADD. As long as it is restored to binary, the assembly language can be directly executed by the CPU, so it is the lowest-level language.

2. Origin

In the earliest days, writing programs was to write binary instructions by hand, and then input them into the computer through various switches. For example, when you want to add, you press the addition switch. Later, a paper tape hole punching machine was invented, and binary instructions were automatically input into the computer by punching holes on the paper tape.

To solve the readability of binary instructions, engineers wrote those instructions into octal. It is easy to turn binary to octal, but the readability of octal is not good. Naturally, in the end, it is expressed in words and written as ADD. The memory address is no longer directly referenced, but is represented by a tag.

In this way, there is one more step to translate these text instructions into binary. This step is called assembling, and the program that completes this step is called assembler. The text it processes is naturally called asseembly code. After standardization, it is called assembly language, abbreviated asm, and translated as assembly language in Chinese.

The machine instructions of each CPU are different, so the corresponding assembly language is also different. This article introduces the most common x86 assembly language, which is the type used by Intel's CPU.

3. Register

To learn assembly language, you must first understand two knowledge points: registers and memory models.

Let's look at the registers first. The CPU itself is only responsible for computing and not for storing data. Data is generally stored in memory, and when the CPU is to use it, it reads and writes the data in memory. However, the CPU's computing speed is much higher than the memory's read and write speed. In order to avoid being slowed down, the CPU comes with a Level 1 cache and a Level 2 cache. Basically, CPU cache can be regarded as memory with faster read and write speeds.

However, the CPU cache is still not fast enough, and the address of the data in the cache is not fixed, and the CPU will slow down every time it reads and writes. Therefore, in addition to cache, the CPU also comes with a register to store the most commonly used data. In other words, the data that is read and written most frequently (such as loop variables) will be placed in the register. The CPU will read and write the register first, and then exchange data with the registers.

Registers rely on names instead of address to distinguish data. Each register has its own name. We tell the CPU to use which register to get data. This is the fastest speed. Some people use the metaphor for registers to be the zero-level cache of the CPU.

4. Types of registers

Early x86 CPUs had only 8 registers, and each had a different purpose. There are more than 100 registers now, and they have become general registers and are not specifically specified for use, but the names of early registers have been saved.

EAX
EBX
ECX
EDX
EDI
ESI
EBP
ESP

Among the above 8 registers, the first seven are common. The ESP register has a specific purpose and saves the address of the current Stack (see the next section for details).

We often see names like 32-bit CPU and 64-bit CPU, which actually refer to the size of the register. The register size of a 32-bit CPU is 4 bytes.

5. Memory model: Heap

Registers can only store a small amount of data. Most of the time, the CPU needs to direct the registers to exchange data directly with the memory. Therefore, in addition to registers, you must also understand how memory stores data.

When a program is running, the operating system will allocate a piece of memory to store the data generated by the program and the operation. This memory has a start address and an end address, such as0x1000arrive0x8000, the starting address is the smaller address, and the end address is the larger address.

During the program running, dynamic memory usage requests (such as creating a new object, or usingmalloc) The system will divide a portion of the pre-allocated memory to the user. The specific rule is to divide it from the starting address (in fact, there will be a piece of static data at the starting address, which is ignored here). For example, if the user requires 10 bytes of memory, then from the starting address0x1000Start assigning him, all the way to the address0x100A, if you ask for 22 bytes, then you will be allocated to0x1020。

This memory area divided by users' active request is called Heap. It starts from the starting address and grows from the low (address) to the high (address). An important feature of Heap is that it will not disappear automatically, it must be released manually, or recycled by a garbage collection mechanism.

6. Memory model: Stack

In addition to Heap, other memory footprints are called Stack. Simply put, Stack is a memory area temporarily occupied by the function running.

Please see the example below.

int main() {
   int a = 2;
   int b = 3;
}

In the above code, the system starts executingmainWhen a function is used, a frame will be created in memory, allmainInternal variables (e.g.aandb) are all saved in this frame.mainAfter the function is executed, the frame will be recycled, freeing up all internal variables and no longer occupying space.

What happens if another function is called internally?

int main() {
   int a = 2;
   int b = 3;
   return add_a_and_b(a, b);
}

In the above code,mainThe function is called internallyadd_a_and_bfunction. When executing this line, the system will alsoadd_a_and_bCreate a new frame to store its internal variables. That is to say, there are two frames at the same time:mainandadd_a_and_b. Generally speaking, as many layers as the call stack, there are as many frames as there are.

wait untiladd_a_and_bAfter the run, its frame will be recycled and the system will return to the functionmainThe execution was interrupted just now, continue to execute. Through this mechanism, the calling of functions is implemented layer by layer, and each layer can use its own local variables.

All frames are stored in Stack. Since frames are stacked layer by layer, Stack is called a stack. Generating a new frame is called "stack into the stack", which is push in English; recycling of the stack is called "stack out", which is pop in English. The characteristic of Stack is that the frame that is the latest to be put out of the stack is the earliest (because the innermost function call ends the run first), which is called the "last in first out" data structure. Every time the function execution is finished, a frame will be automatically released. When all function execution is finished, the entire Stack will be released.

Stack starts from the end address of the memory area and is allocated from the high (address) to the low (address). For example, the end address of the memory area is0x8000, the first frame is assumed to be 16 bytes, then the next allocated address will be from0x7FF0Start; assume 64 bytes are needed in the second frame, then the address will move to0x7FB0。

7. CPU commands

7.1 An example

After understanding the register and memory model, you can see what assembly language is. Here is a simple program。

int add_a_and_b(int a, int b) {
   return a + b;
}

int main() {
   return add_a_and_b(2, 3);
}

gcc converts this program into assembly language.

$ gcc -S

After the above command is executed, a text file will be generated, which is assembly language, contains dozens of lines of instructions. Let me put it this way, the basic layer of a simple operation of a high-level language may be composed of several or even dozens of CPU instructions. The CPU executes these instructions in turn to complete this step.

After simplification, it is probably what it looks like below.

_add_a_and_b:
   push   %ebx
   mov    %eax, [%esp+8] 
   mov    %ebx, [%esp+12]
   add    %eax, %ebx 
   pop    %ebx 
   ret  

_main:
   push   3
   push   2
   call   _add_a_and_b 
   add    %esp, 8
   ret

You can see that the two functions of the original programadd_a_and_bandmain, corresponding to two tags_add_a_and_band_main. In each tag is the CPU running process converted from the function.

Each row is an operation performed by the CPU. It is divided into two parts, so one of the behaviors is used as an example.

push   %ebx

In this line,pushIt's a CPU instruction.%ebxIt is the operator to use this instruction. A CPU instruction can have zero to multiple operators.

Next, I will explain this assembly program one by one. It is recommended that readers copy this program in another window so that the page can be scrolled up when reading.

7.2 push command

According to the agreement, the procedure_mainThe tag starts executing, and will be on StackmainCreate a frame and write the address pointed to by Stack to the ESP register. If there is any data to be written latermainThis frame will be written in the address saved in the ESP register.

Then, start executing the first line of code.

push   3

pushInstructions are used to put operators into Stack, here is to3WritemainThis frame.

Although it looks simple,pushThe instruction actually has a pre-operation. It will first take out the address in the ESP register, subtract it by 4 bytes, and then write the new address to the ESP register. Subtraction is used because Stack develops from high to low, and 4 bytes is because3The type isint, occupying 4 bytes. After obtaining the new address, 3 will write the four bytes at the beginning of this address.

push   2

The same is true for the second line.pushThe command will2WritemainThis frame is located close to the previous one written3. At this time, the ESP register will subtract another 4 bytes (8 cumulatively subtracted).

7.3 call command

The third linecallDirectives are used to call functions.

call   _add_a_and_b

The above code indicates the calladd_a_and_bfunction. At this time, the program will look for it_add_a_and_btag and create a new frame for the function.

The execution will begin_add_a_and_bcode.

push   %ebx

This line indicates that the value in the EBX register is written to_add_a_and_bThis frame. This is because if you want to use this register later, you should take out the value inside first and write it back after using it.

At this time,pushThe instruction will subtract 4 bytes of the address in the ESP register (12 in total).

7.4 mov command

movInstructions are used to write a value to a register.

mov    %eax, [%esp+8]

This line of code means that you first add 8 bytes to the address in the ESP register to get a new address, and then retrieve the data from Stack according to this address. Based on the previous steps, it can be calculated that what is taken out here is2, then2Write to the EAX register.

The next line of code does the same thing.

mov    %ebx, [%esp+12]

The above code adds the value of the ESP register by 12 bytes, and then fetches the data at Stack according to this address. This time, the3, write it to the EBX register.

7.5 add command

addThe instruction is used to add two operators and write the result to the first operator.

add    %eax, %ebx

The above code adds the value of the EAX register (i.e. 2) plus the value of the EBX register (i.e. 3) to obtain the result 5, and then writes this result to the first operator EAX register.

7.6 pop command

popThe instruction is used to retrieve the last written value of Stack (that is, the value of the lowest bit address) and write this value to the position specified by the operator.

pop    %ebx

The above code means that you take out the recently written value of Stack (that is, the original value of the EBX register), and then write this value back to the EBX register (because the addition has been completed, the EBX register cannot be used).

Notice,popThe instruction will also add 4 addresses in the ESP register, which means 4 bytes are collected.

7.7 ret command

retThe instruction is used to terminate the execution of the current function and return the running right to the upper function. That is, the frames of the current function will be recycled.

ret

As you can see, this instruction has no operator.

along withadd_a_and_bThe function terminates and the system returns to the momentmainWhere the function breaks, continue to execute.

add    %esp, 8

The above code means that manually add 8 bytes to the address in the ESP register and then write it back to the ESP register. This is because the ESP register is the write start address of Stack, the previous onepopThe operation has recycled 4 bytes, and here 8 bytes are recycled, which is equivalent to recycling all of them.

ret

at last,mainThe function runs over,retThe instruction exits the program execution.

This is the article about the best assembly language tutorial and explaining how CPU executes code. For more related assembly language CPU execution content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!