The multiplication instruction is a basic arithmetic operation implemented in the CPU to calculate the product of two numbers. In assembly language, multiplication instructions usually passmul (unsigned multiplication)
andimul (signed multiplication)
These two instructions are implemented. Since multiplication instructions consume more clock cycles when executing, compilers usually try to convert multiplication operations into more efficient addition and shift operations when optimizing code.
For smaller numbers, the compiler may choose to convert the multiplication operation directly into an addition operation. For example,
a * b
Convert toa + a + ... + a
(b-order addition). This method can be optimized through technologies such as loop expansion and code vectorization.For larger numbers, the compiler may use displacement and shift operations instead of multiplication. For example,
a * b
Convert toa << n + a << m
The form ofn
andm
is the number of digits that meet the criteria. This method can accelerate computing through the efficiency of displacement instructions.
The compiler will only use it when none of the above methods can be optimizedmul/imul
Instructions to perform multiplication operations. These two instructions can multiply unsigned and signed numbers. Even though these two instructions use more clock cycles, the computational efficiency of the multiplication instructions is relative to other instructionsDIV
It is still low, so when writing efficient code, you should avoid multiplication operations as much as possible and optimize with the techniques mentioned above.
Use the IMUL instruction to complete multiplication
To calculate multiplication, the compiler will usually use it directly without considering the execution efficiency.imul
Instructions complete calculations. The imul instruction can perform multiplication operations faster than other multiplication instructions (such as the mul instruction), but the lower performance is mainly because the imul instruction is usually used for multiplication operations of signed numbers, and needs to deal with the expansion and overflow of sign bits when executed, which translates into additional instructions and clock cycle consumption. If unsigned integers or low or high-bit results are required for registers, using the imul instruction can provide some advantages.
When calculating multiplication, you should follow:
- If the multiplier and the multiplier are
8
ThenAL
Do multiplier, the result isAX
middle - If the multiplier and the multiplier are
16
Will putAX
Do multiplier, the result isEAX
middle - If the multiplier and the multiplier are
32 bits
Will putEAX
Do multiplier, the result isEDX:EAX
middle
The multiplication instruction calculation is very simple, you only need to accumulate the multiplier. As shown below, it is a simple assembly implementation of multiplication of three numbers;
.data x DWORD ? y DWORD ? z DWORD ? szFmt BYTE 'Calculation result: %d',0dh,0ah,0 .code main PROC mov dword ptr ds:[x],10 mov dword ptr ds:[y],24 mov dword ptr ds:[z],18 ; calculate x * y * z mov eax,dword ptr ds:[x] imul eax,dword ptr ds:[y] imul eax,dword ptr ds:[z] invoke crt_printf,addr szFmt,eax main ENDP END main
Use the LEA instruction to replace multiplication
In actual programming, we can use LEA instructions to replace multiplication operations, thereby improving the execution efficiency of the code. However, readers need to note that when calculating multiplication using LEA, it is necessary to ensure that the multiplier is2
and the range of multiplier must be2/4/8
This instruction can only be used in these three intervals, and we use assembly to implement calculationeax*8+2
The assembly instructions are as follows.
- Assumption
eax=5
calculateeax * 8 + 2
The result is the splitting process as follows: - 1. Calculation
lea ebx,dword ptr ds:[eax * 8 + 2]
This is equivalent to calculationebx = (eax * 8) +2
The results can be obtained directly.
The first case is relatively simple. You can use a lea instruction to complete the calculation process, as long as you ensure that the multiplier is the power of 2.
.data x DWORD ? szFmt BYTE 'Calculation result: %d',0dh,0ah,0 .code main PROC ; For multiplicationleaInstruction optimization mov dword ptr ds:[x],5 mov eax,dword ptr ds:[x] ; eax = x xor ebx,ebx ; ebx = 0 lea ebx,dword ptr ds:[eax * 8 + 2] ; ebx = eax * 8 + 2 invoke crt_printf,addr szFmt,ebx invoke ExitProcess,0 main ENDP END main
Use LEA instruction to split the calculation
If the multiplication we calculate exceeds2/4/8
For power range, multiplication needs to be split. When splitting, the power principle of 2 should also be followed, and the calculation will be performed separately after splitting.
- Assumption
eax=3
calculate15 * eax
The result is the splitting process as follows: - 1. Calculation
lea edx,[eax * 4 + eax]
This is equivalent to calculationedx = (4 * eax) + eax = 5eax
Each of themedx
It's equivalent to 5eax
- 2. Calculation
lea edx,[edx * 2 + edx]
This is equivalent to calculationedx = (5 * eax) * 2 + (5 * eax)
- 3. Calculation
(5eax * 2) = 10eax
Then calculate(5 * eax) = 5eax
Finally, it is concluded10eax + 5eax
- 4. Through this process, we can draw
eax * 15 = 45
Final calculation3*15=45
Get the final result.
This calculation process seems complicated, but if you convert it into assembly instructions, then only two are needed to implement fast multiplication.
.data x DWORD ? szFmt BYTE 'Calculation result: %d',0dh,0ah,0 .code main PROC ; For multiplicationleaInstruction optimization mov dword ptr ds:[x],3 ; If usingleaCalculate multiplication,Then the multiplier must be2/4/8 mov eax,dword ptr ds:[x] ; eax = 3 lea edx,dword ptr ds:[eax * 4 + eax] ; edx = 4eax + eax Got it 5eax,That is to say, everyedxThat means5indivualeax lea edx,dword ptr ds:[edx * 2 + edx] ; edx = (5eax * 2) + 5eax 最终Got it 15eax invoke crt_printf,addr szFmt,edx ; edx = eax * 15 计算后Got it 45 invoke ExitProcess,0 main ENDP END main
Calculate using the LEA instruction decrement
If the multiplication is not the power of 2 when calculating the multiplication, in this case, a specific value needs to be subtracted, for example, when we calculateeax * 7
When, because of the power of 7 not two, we cannot passlea
Instructions do calculations, but we can calculateeax * 8
The calculated result is subtracted by oneeax
You can also get the correct value.
- Assumption
eax=3
calculateeax * 7 + 10
The result is the splitting process as follows: - 1. Calculation
lea edx,dword ptr ds:[eax * 8]
This is equivalent to calculationedx = (8 * eax)
- 2. Calculation
sub edx,eax
This is equivalent to calculationedx = (8 * eax) - eax
- 3. Calculation
add edx,10
This is equivalent to calculationedx = ( (8 * eax) - eax ) + 10
- 4. After the above calculation, we can calculate
eax * 7 + 10
The final result
This calculation process seems complicated, but it is actually not difficult to construct at the assembly level. The following implements the calculation of two expression evaluation processes.
.data x DWORD ? szFmt BYTE 'Calculation result: %d',0dh,0ah,0 .code main PROC ; For multiplicationleaInstruction optimization mov dword ptr ds:[x],3 ; If the multiplication is not2Power of,Then it needs to be reduced at this time ; calculate edx = eax * 7 + 10 mov eax,dword ptr ds:[x] ; eax = 3 => calculate eax * 7 + 10 lea edx,dword ptr ds:[eax * 8] ; edx = eax * 8 sub edx,eax ; edx = edx - eax add edx,10 ; edx = edx + 10 invoke crt_printf,addr szFmt,edx ; edx = eax * 7 + 10 ; calculate edx = eax * 3 - 7 mov eax,dword ptr ds:[x] ; eax = 3 => calculate eax * 3 - 7 lea edx,dword ptr ds:[eax * 2] ; edx = eax * 2 add edx,eax ; edx = edx + eax sub edx,7 ; edx = edx - 7 invoke crt_printf,addr szFmt,edx ; edx = eax * 3 - 7 invoke ExitProcess,0 main ENDP END main
Calculate unsigned multiplication using SHL
By using logical left shift, high-speed multiplication operation with power of 2 can also be implemented, but logical left shift can only be used to calculate unsigned multiplication, and can only calculate equations whose multiplier is to the power of 2.
When calculating, we need to refer to the power table. Here I list several commonly used power values:
Power table: 1=>2 2=>4 3=>8 4=>16 5=>32 6=>64 7=>128
Power table: 8=>256 9=>512 10=>1024 11=>2048 12=>4096 13=>8192 14=>16384
Assumption
eax=3
calculateeax * 8 + 10
The result is the splitting process as follows:1. Calculation
shl eax,3
This is equivalent to calculationeax = eax * 2 ^(to the power) 3
Its formula is equivalent to calculationeax = eax * 8
2. Calculation
add eax,10
This is equivalent to calculationeax = (eax * 8) + 10
3. The final calculation result is
3*8+10
Get 34
By using logical shift left, we can implement fast unsigned multiplication operations, and the following code is the most efficient one.
.data x DWORD ? szFmt BYTE 'Calculation result: %d',0dh,0ah,0 .code main PROC mov dword ptr ds:[x],3 ; calculate eax = eax * 2 ^ 1 相当于calculate eax * 2 mov eax,dword ptr ds:[x] shl eax,1 invoke crt_printf,addr szFmt,eax ; calculate eax = eax * 2 ^ 2 相当于calculate eax * 4 mov eax,dword ptr ds:[x] shl eax,2 invoke crt_printf,addr szFmt,eax ; calculate eax = eax * 2 ^ 3 相当于calculate eax * 8 mov eax,dword ptr ds:[x] shl eax,3 add eax,10 invoke crt_printf,addr szFmt,eax invoke ExitProcess,0 main ENDP END main
Calculate signed multiplication using SAL
By using the arithmetic left shift, high-speed multiplication operation with the power of 2 can also be implemented. Unlike the logical left shift, the arithmetic left shift can only calculate signed multiplication, and can only calculate the equation where the multiplier is to the power of 2.
When calculating, we need to refer to the power table. Here I list several commonly used power values:
Power table: 1=>2 2=>4 3=>8 4=>16 5=>32 6=>64 7=>128
Power table: 8=>256 9=>512 10=>1024 11=>2048 12=>4096 13=>8192 14=>16384
Assumption
eax=-5,ebx=3
calculate(eax * 8) + (ebx * 4)
The result is the splitting process as follows:1. Calculation
sal eax,3
This is equivalent to calculationeax = (eax * 2 ^ 3 )
Its formula is equivalent to calculationeax = eax * 8
The result is a signed number2. Calculation
shl ebx,2
This is equivalent to calculationebx = (ebx * 2 ^2)
Its formula is equivalent to calculationebx = ebx * 4
The result is an unsigned number3. Finally, the signed and unsigned numbers will be passed.
add eax,ebx
Add it up and you can get(eax * 8) + (ebx * 4)
The final result-28
The following is to realize high-speed multiplication operation of power of 2 by shifting the arithmetic left, and we can add arithmetic operations and logical operations to improve the operation efficiency in this way.
.data x DWORD ? y DWORD ? szFmt BYTE 'Calculation result: %d',0dh,0ah,0 .code main PROC mov dword ptr ds:[x],-5 mov dword ptr ds:[y],3 ; calculate eax = eax * 2 ^ 1 相当于calculate eax * 2 mov eax,dword ptr ds:[x] sal eax,1 invoke crt_printf,addr szFmt,eax ; calculate eax = eax * 2 ^ 2 相当于calculate eax * 4 mov eax,dword ptr ds:[x] sal eax,2 invoke crt_printf,addr szFmt,eax ; calculate eax = (eax * 2 ^ 3 ) + (ebx * 2 ^2) 相当于calculate (eax * 8) + (ebx * 4) mov eax,dword ptr ds:[x] mov ebx,dword ptr ds:[y] sal eax,3 ; eax * 8 (Signed multiplication) shl ebx,2 ; ebx * 4 (Unsigned multiplication) add eax,ebx ; eax + ebx invoke crt_printf,addr szFmt,eax invoke ExitProcess,0 main ENDP END main
These are basically the knowledge points of multiplication optimization. Except for the multiplication of two unknown variables that cannot be optimized, other forms of multiplication operations can be optimized. If there is a constant value in the expression, the compiler will match various optimization strategies and finally adjust operations that do not conform to the optimization strategy. If it is really impossible to optimize, the original multiplication instructions will be calculated.
This is the end of this article about the specific usage methods of assembling efficient multiplication operations. For more related assembly, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!