Specific usage methods for assembling efficient multiplication operations

The multiplication instruction is a basic arithmetic operation implemented in the CPU to calculate the product of two numbers. In assembly language, multiplication instructions usually passmul (unsigned multiplication)andimul (signed multiplication)These two instructions are implemented. Since multiplication instructions consume more clock cycles when executing, compilers usually try to convert multiplication operations into more efficient addition and shift operations when optimizing code.

For smaller numbers, the compiler may choose to convert the multiplication operation directly into an addition operation. For example,a * bConvert toa + a + ... + a(b-order addition). This method can be optimized through technologies such as loop expansion and code vectorization.
For larger numbers, the compiler may use displacement and shift operations instead of multiplication. For example,a * bConvert toa << n + a << mThe form ofnandmis the number of digits that meet the criteria. This method can accelerate computing through the efficiency of displacement instructions.

The compiler will only use it when none of the above methods can be optimizedmul/imulInstructions to perform multiplication operations. These two instructions can multiply unsigned and signed numbers. Even though these two instructions use more clock cycles, the computational efficiency of the multiplication instructions is relative to other instructionsDIVIt is still low, so when writing efficient code, you should avoid multiplication operations as much as possible and optimize with the techniques mentioned above.

Use the IMUL instruction to complete multiplication

To calculate multiplication, the compiler will usually use it directly without considering the execution efficiency.imulInstructions complete calculations. The imul instruction can perform multiplication operations faster than other multiplication instructions (such as the mul instruction), but the lower performance is mainly because the imul instruction is usually used for multiplication operations of signed numbers, and needs to deal with the expansion and overflow of sign bits when executed, which translates into additional instructions and clock cycle consumption. If unsigned integers or low or high-bit results are required for registers, using the imul instruction can provide some advantages.

When calculating multiplication, you should follow:

If the multiplier and the multiplier are8ThenALDo multiplier, the result isAXmiddle
If the multiplier and the multiplier are16Will putAXDo multiplier, the result isEAXmiddle
If the multiplier and the multiplier are32 bitsWill putEAXDo multiplier, the result isEDX:EAXmiddle

The multiplication instruction calculation is very simple, you only need to accumulate the multiplier. As shown below, it is a simple assembly implementation of multiplication of three numbers;

.data
    x DWORD ?
    y DWORD ?
    z DWORD ?
    szFmt BYTE 'Calculation result: %d',0dh,0ah,0
.code
    main PROC
      mov dword ptr ds:[x],10
      mov dword ptr ds:[y],24
      mov dword ptr ds:[z],18
      
      ; calculate x * y * z
      mov eax,dword ptr ds:[x]
      imul eax,dword ptr ds:[y]
      imul eax,dword ptr ds:[z]
      invoke crt_printf,addr szFmt,eax
    main ENDP
END main

Use the LEA instruction to replace multiplication

In actual programming, we can use LEA instructions to replace multiplication operations, thereby improving the execution efficiency of the code. However, readers need to note that when calculating multiplication using LEA, it is necessary to ensure that the multiplier is2and the range of multiplier must be2/4/8This instruction can only be used in these three intervals, and we use assembly to implement calculationeax*8+2The assembly instructions are as follows.

Assumptioneax=5calculateeax * 8 + 2The result is the splitting process as follows:
1. Calculationlea ebx,dword ptr ds:[eax * 8 + 2]This is equivalent to calculationebx = (eax * 8) +2The results can be obtained directly.

The first case is relatively simple. You can use a lea instruction to complete the calculation process, as long as you ensure that the multiplier is the power of 2.

.data
  x DWORD ?
  szFmt BYTE 'Calculation result: %d',0dh,0ah,0
.code
  main PROC
    ; For multiplicationleaInstruction optimization
    mov dword ptr ds:[x],5
    
    mov eax,dword ptr ds:[x]               ; eax = x
    xor ebx,ebx                            ; ebx = 0
    lea ebx,dword ptr ds:[eax * 8 + 2]     ; ebx = eax * 8 + 2
    invoke crt_printf,addr szFmt,ebx
    
    invoke ExitProcess,0
  main ENDP
END main

Use LEA instruction to split the calculation

If the multiplication we calculate exceeds2/4/8For power range, multiplication needs to be split. When splitting, the power principle of 2 should also be followed, and the calculation will be performed separately after splitting.

Assumptioneax=3calculate15 * eaxThe result is the splitting process as follows:
1. Calculationlea edx,[eax * 4 + eax]This is equivalent to calculationedx = (4 * eax) + eax = 5eaxEach of themedxIt's equivalent to 5eax
2. Calculationlea edx,[edx * 2 + edx]This is equivalent to calculationedx = (5 * eax) * 2 + (5 * eax)
3. Calculation(5eax * 2) = 10eaxThen calculate(5 * eax) = 5eaxFinally, it is concluded10eax + 5eax
4. Through this process, we can draweax * 15 = 45Final calculation3*15=45Get the final result.

This calculation process seems complicated, but if you convert it into assembly instructions, then only two are needed to implement fast multiplication.

.data
  x DWORD ?
  szFmt BYTE 'Calculation result: %d',0dh,0ah,0
.code
  main PROC
    ; For multiplicationleaInstruction optimization
    mov dword ptr ds:[x],3
    
    ; If usingleaCalculate multiplication,Then the multiplier must be2/4/8
    mov eax,dword ptr ds:[x]               ; eax = 3
    lea edx,dword ptr ds:[eax * 4 + eax]   ; edx = 4eax + eax Got it 5eax,That is to say, everyedxThat means5indivualeax
    lea edx,dword ptr ds:[edx * 2 + edx]   ; edx = (5eax * 2) + 5eax 最终Got it 15eax
    invoke crt_printf,addr szFmt,edx       ; edx = eax * 15 计算后Got it 45
    
    invoke ExitProcess,0
  main ENDP
END main

Calculate using the LEA instruction decrement

If the multiplication is not the power of 2 when calculating the multiplication, in this case, a specific value needs to be subtracted, for example, when we calculateeax * 7When, because of the power of 7 not two, we cannot passleaInstructions do calculations, but we can calculateeax * 8The calculated result is subtracted by oneeaxYou can also get the correct value.

Assumptioneax=3calculateeax * 7 + 10The result is the splitting process as follows:
1. Calculationlea edx,dword ptr ds:[eax * 8]This is equivalent to calculationedx = (8 * eax)
2. Calculationsub edx,eaxThis is equivalent to calculationedx = (8 * eax) - eax
3. Calculationadd edx,10This is equivalent to calculationedx = ( (8 * eax) - eax ) + 10
4. After the above calculation, we can calculateeax * 7 + 10The final result

This calculation process seems complicated, but it is actually not difficult to construct at the assembly level. The following implements the calculation of two expression evaluation processes.

.data
  x DWORD ?
  szFmt BYTE 'Calculation result: %d',0dh,0ah,0
.code
  main PROC
    ; For multiplicationleaInstruction optimization
    mov dword ptr ds:[x],3
    
    ; If the multiplication is not2Power of,Then it needs to be reduced at this time

    ; calculate edx = eax * 7 + 10
    mov eax,dword ptr ds:[x]               ; eax = 3 =&gt; calculate eax * 7 + 10
    lea edx,dword ptr ds:[eax * 8]         ; edx = eax * 8
    sub edx,eax                            ; edx = edx - eax
    add edx,10                             ; edx = edx + 10
    invoke crt_printf,addr szFmt,edx       ; edx = eax * 7 + 10
    
    ; calculate edx = eax * 3 - 7
    mov eax,dword ptr ds:[x]               ; eax = 3 =&gt; calculate eax * 3 - 7
    lea edx,dword ptr ds:[eax * 2]         ; edx = eax * 2
    add edx,eax                            ; edx = edx + eax
    sub edx,7                              ; edx = edx - 7
    invoke crt_printf,addr szFmt,edx       ; edx = eax * 3 - 7
    
    invoke ExitProcess,0
  main ENDP
END main

Calculate unsigned multiplication using SHL

By using logical left shift, high-speed multiplication operation with power of 2 can also be implemented, but logical left shift can only be used to calculate unsigned multiplication, and can only calculate equations whose multiplier is to the power of 2.

When calculating, we need to refer to the power table. Here I list several commonly used power values:
Power table: 1=>2 2=>4 3=>8 4=>16 5=>32 6=>64 7=>128
Power table: 8=>256 9=>512 10=>1024 11=>2048 12=>4096 13=>8192 14=>16384
Assumptioneax=3calculateeax * 8 + 10The result is the splitting process as follows:
1. Calculationshl eax,3This is equivalent to calculationeax = eax * 2 ^(to the power) 3Its formula is equivalent to calculationeax = eax * 8
2. Calculationadd eax,10This is equivalent to calculationeax = (eax * 8) + 10
3. The final calculation result is3*8+10Get 34

By using logical shift left, we can implement fast unsigned multiplication operations, and the following code is the most efficient one.

.data
  x DWORD ?
  szFmt BYTE 'Calculation result: %d',0dh,0ah,0
.code
  main PROC
    mov dword ptr ds:[x],3
    
    ; calculate eax = eax * 2 ^ 1 相当于calculate eax * 2
    mov eax,dword ptr ds:[x]
    shl eax,1
    invoke crt_printf,addr szFmt,eax
    
    ; calculate eax = eax * 2 ^ 2 相当于calculate eax * 4
    mov eax,dword ptr ds:[x]
    shl eax,2
    invoke crt_printf,addr szFmt,eax
    
    ; calculate eax = eax * 2 ^ 3 相当于calculate eax * 8
    mov eax,dword ptr ds:[x]
    shl eax,3
    add eax,10
    invoke crt_printf,addr szFmt,eax

    invoke ExitProcess,0
  main ENDP
END main

Calculate signed multiplication using SAL

By using the arithmetic left shift, high-speed multiplication operation with the power of 2 can also be implemented. Unlike the logical left shift, the arithmetic left shift can only calculate signed multiplication, and can only calculate the equation where the multiplier is to the power of 2.

When calculating, we need to refer to the power table. Here I list several commonly used power values:
Power table: 1=>2 2=>4 3=>8 4=>16 5=>32 6=>64 7=>128
Power table: 8=>256 9=>512 10=>1024 11=>2048 12=>4096 13=>8192 14=>16384
Assumptioneax=-5,ebx=3calculate(eax * 8) + (ebx * 4)The result is the splitting process as follows:
1. Calculationsal eax,3This is equivalent to calculationeax = (eax * 2 ^ 3 )Its formula is equivalent to calculationeax = eax * 8The result is a signed number
2. Calculationshl ebx,2This is equivalent to calculationebx = (ebx * 2 ^2)Its formula is equivalent to calculationebx = ebx * 4The result is an unsigned number
3. Finally, the signed and unsigned numbers will be passed.add eax,ebxAdd it up and you can get(eax * 8) + (ebx * 4)The final result-28

The following is to realize high-speed multiplication operation of power of 2 by shifting the arithmetic left, and we can add arithmetic operations and logical operations to improve the operation efficiency in this way.

.data
  x DWORD ?
  y DWORD ?
  szFmt BYTE 'Calculation result: %d',0dh,0ah,0
.code
  main PROC
    mov dword ptr ds:[x],-5
    mov dword ptr ds:[y],3
    
    ; calculate eax = eax * 2 ^ 1 相当于calculate eax * 2
    mov eax,dword ptr ds:[x]
    sal eax,1
    invoke crt_printf,addr szFmt,eax
    
    ; calculate eax = eax * 2 ^ 2 相当于calculate eax * 4
    mov eax,dword ptr ds:[x]
    sal eax,2
    invoke crt_printf,addr szFmt,eax

    ; calculate eax = (eax * 2 ^ 3 ) + (ebx * 2 ^2) 相当于calculate (eax * 8) + (ebx * 4)
    mov eax,dword ptr ds:[x]
    mov ebx,dword ptr ds:[y]
    sal eax,3                  ; eax * 8 (Signed multiplication)
    shl ebx,2                  ; ebx * 4 (Unsigned multiplication)
    add eax,ebx                ; eax + ebx
    invoke crt_printf,addr szFmt,eax

    invoke ExitProcess,0
  main ENDP
END main

These are basically the knowledge points of multiplication optimization. Except for the multiplication of two unknown variables that cannot be optimized, other forms of multiplication operations can be optimized. If there is a constant value in the expression, the compiler will match various optimization strategies and finally adjust operations that do not conform to the optimization strategy. If it is really impossible to optimize, the original multiplication instructions will be calculated.

This is the end of this article about the specific usage methods of assembling efficient multiplication operations. For more related assembly, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!