How python benchmarks

Benchmarking is a type of performance testing that is used to evaluate and measure the performance metrics of software. We can establish a known level of performance, called a "baseline", through benchmarking at some point in software development. When changes are made to the system's hardware and software environment, another benchmark is performed to determine the impact of those changes on performance. This is the most common use of benchmarking.

Donald Knuth's 1974 book Structured Programming with go to Statements mentions:

There is no doubt that the one-sided pursuit of efficiency can lead to all sorts of abuses. Programmers can waste a lot of time on speeding up non-critical programs, and in fact these attempts to improve efficiency can in turn have a significant negative impact, especially when it comes to debugging and maintenance. We should not get overly obsessed with the details of optimization, and it should be said that about 97% of scenarios: premature optimization is the root of all evil.
Of course we shouldn't give up on optimizing that critical 3%. A good programmer will not be deterred by this small percentage, and will wisely observe and identify which code is critical; but will only optimize if the critical code has been identified. It is easy for many programmers to make empirical mistakes in determining which parts are critical performance bottlenecks, so they should generally prove it with the help of measurement tools.

While often interpreted as not needing to care about performance, there are a small number of cases (3%) where critical code should be observed and identified and optimized.

Benchmarking tools

There are very many tools provided in python for benchmarking.

To make the demo example slightly more interesting, let's generate a random list and sort the numbers in the list.

import random


def random_list(start, end, length):
    """
    Generating a randomized list
    :param start: random start number
    :param end: random end number
    :param length: length of the list
    """
    data_list = []
    for i in range(length):
        data_list.append((start, end))
    return data_list


def bubble_sort(arr):
    """
    Bubble Sort: Sorting a List
    :param arr list
    """
    n = len(arr)
    for i in range(n):
        for j in range(0, n - i - 1):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
    return arr


if __name__ == '__main__':
    get_data_list = random_list(1, 99, 10)
    ret = bubble_sort(get_data_list)
    print(ret)

The results of the run are as follows:

❯ python .\
[8, 16, 22, 31, 42, 58, 66, 71, 73, 91]

timeit

timeit is a module that comes with python and is very handy for benchmarking.

if __name__ == '__main__':
    import timeit
    get_data_list = random_list(1, 99, 10)
    setup = "from __main__ import bubble_sort"
    t = (
        stmt="bubble_sort({})".format(get_data_list),
        setup=setup
        )
    print(t)

Run results:

❯ python .\
5.4201355

An example of testing the bubble_sort() function. () Parameter description.

stmt: the function or statement to be tested, in string form.
setup: the environment in which to run, in this case if __name__ == '__main__':.
number: the number of executions, defaults to 1000000. So you will see that running bubble_sort() takes more than 5 seconds.

pyperf

/psf/pyperf

The usage of pyperf is more similar to timeit, but it provides richer results. (Note: I learned about benchmarking solely by discovering this library)

if __name__ == '__main__':
    get_data_list = random_list(1, 99, 10)

    import pyperf
    setup = "from __main__ import bubble_sort"
    runner = ()
    (name="bubble sort",
                  stmt="bubble_sort({})".format(get_data_list),
                  setup=setup)

Run results:

❯ python  .\ -o 
.....................
bubble sort: Mean +- std dev: 5.63 us +- 0.31 us

The test results are written to a file. The test results can be analyzed using the pyperf stats command.

❯ python -m pyperf stats 
Total duration: 15.9 sec
Start date: 2021-04-02 00:17:18
End date: 2021-04-02 00:17:36
Raw value minimum: 162 ms
Raw value maximum: 210 ms

Number of calibration run: 1
Number of run with values: 20
Total number of run: 21

Number of warmup per run: 1
Number of value per run: 3
Loop iterations per value: 2^15
Total number of values: 60

Minimum:         4.94 us
Median +- MAD:   5.63 us +- 0.12 us
Mean +- std dev: 5.63 us +- 0.31 us
Maximum:         6.41 us

  0th percentile: 4.94 us (-12% of the mean) -- minimum
  5th percentile: 5.10 us (-9% of the mean)
 25th percentile: 5.52 us (-2% of the mean) -- Q1
 50th percentile: 5.63 us (+0% of the mean) -- median
 75th percentile: 5.81 us (+3% of the mean) -- Q3
 95th percentile: 5.95 us (+6% of the mean)
100th percentile: 6.41 us (+14% of the mean) -- maximum

Number of outlier (out of 5.07 us..6.25 us): 6

pytest-benchmark

/ionelmc/pytest-benchmark

pytest-benchmark is a plugin for the pytest unit testing framework. Write unit test cases individually:

from demo import bubble_sort


def test_bubble_sort(benchmark):
    test_list = [5, 2, 4, 1, 3]
    result = benchmark(bubble_sort, test_list)
    assert result == [1, 2, 3, 4, 5]

Needs attention:

Import the bubble_sort() function.
benchmark is used as a hook function and does not require a package import. The prerequisite is that you need to have pytest and pytest-benchmark installed.
For ease of assertion, we'll just fix the number to be sorted.

Run the test cases:

❯ pytest -q .\test_demo.py
.                                                                       [100%]

------------------------------------------------ benchmark: 1 tests -----------------------------------------------
Name (time in us)        Min       Max    Mean  StdDev  Median     IQR   Outliers  OPS (Kops/s)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------
test_bubble_sort      1.6000  483.2000  1.7647  2.6667  1.7000  0.0000  174;36496      566.6715  181819           1
-------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
1 passed in 1.98s

on top of that--benchmark-histogram parameter, you get a chart

❯ pytest -q .\test_demo.py --benchmark-histogram
.                                                                                                                [100%]

------------------------------------------------ benchmark: 1 tests -----------------------------------------------
Name (time in us)        Min      Max    Mean  StdDev  Median     IQR    Outliers  OPS (Kops/s)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------
test_bubble_sort      1.6000  53.9000  1.7333  0.3685  1.7000  0.0000  1640;37296      576.9264  178572           1
-------------------------------------------------------------------------------------------------------------------


Generated histogram: D:\github\test-circle\article\code\benchmark_20210401_165958.svg

Pictures are below:

There are many more tools for benchmarking that will not be covered here.

After benchmarking and finding out that the program is slowing down, then the next thing you need to do is code performance analysis, which I'll cover in my next post.

Above is the details of how to benchmark python, for more information about python benchmarking please follow my other related articles!