Benchmarking is a type of performance testing that is used to evaluate and measure the performance metrics of software. We can establish a known level of performance, called a "baseline", through benchmarking at some point in software development. When changes are made to the system's hardware and software environment, another benchmark is performed to determine the impact of those changes on performance. This is the most common use of benchmarking.
Donald Knuth's 1974 book Structured Programming with go to Statements mentions:
There is no doubt that the one-sided pursuit of efficiency can lead to all sorts of abuses. Programmers can waste a lot of time on speeding up non-critical programs, and in fact these attempts to improve efficiency can in turn have a significant negative impact, especially when it comes to debugging and maintenance. We should not get overly obsessed with the details of optimization, and it should be said that about 97% of scenarios: premature optimization is the root of all evil.
Of course we shouldn't give up on optimizing that critical 3%. A good programmer will not be deterred by this small percentage, and will wisely observe and identify which code is critical; but will only optimize if the critical code has been identified. It is easy for many programmers to make empirical mistakes in determining which parts are critical performance bottlenecks, so they should generally prove it with the help of measurement tools.
While often interpreted as not needing to care about performance, there are a small number of cases (3%) where critical code should be observed and identified and optimized.
Benchmarking tools
There are very many tools provided in python for benchmarking.
To make the demo example slightly more interesting, let's generate a random list and sort the numbers in the list.
import random def random_list(start, end, length): """ Generating a randomized list :param start: random start number :param end: random end number :param length: length of the list """ data_list = [] for i in range(length): data_list.append((start, end)) return data_list def bubble_sort(arr): """ Bubble Sort: Sorting a List :param arr list """ n = len(arr) for i in range(n): for j in range(0, n - i - 1): if arr[j] > arr[j + 1]: arr[j], arr[j + 1] = arr[j + 1], arr[j] return arr if __name__ == '__main__': get_data_list = random_list(1, 99, 10) ret = bubble_sort(get_data_list) print(ret)
The results of the run are as follows:
❯ python .\ [8, 16, 22, 31, 42, 58, 66, 71, 73, 91]
timeit
timeit is a module that comes with python and is very handy for benchmarking.
if __name__ == '__main__': import timeit get_data_list = random_list(1, 99, 10) setup = "from __main__ import bubble_sort" t = ( stmt="bubble_sort({})".format(get_data_list), setup=setup ) print(t)
Run results:
❯ python .\ 5.4201355
An example of testing the bubble_sort() function. () Parameter description.
- stmt: the function or statement to be tested, in string form.
- setup: the environment in which to run, in this case if __name__ == '__main__':.
- number: the number of executions, defaults to 1000000. So you will see that running bubble_sort() takes more than 5 seconds.
pyperf
/psf/pyperf
The usage of pyperf is more similar to timeit, but it provides richer results. (Note: I learned about benchmarking solely by discovering this library)
if __name__ == '__main__': get_data_list = random_list(1, 99, 10) import pyperf setup = "from __main__ import bubble_sort" runner = () (name="bubble sort", stmt="bubble_sort({})".format(get_data_list), setup=setup)
Run results:
❯ python .\ -o ..................... bubble sort: Mean +- std dev: 5.63 us +- 0.31 us
The test results are written to a file. The test results can be analyzed using the pyperf stats command.
❯ python -m pyperf stats Total duration: 15.9 sec Start date: 2021-04-02 00:17:18 End date: 2021-04-02 00:17:36 Raw value minimum: 162 ms Raw value maximum: 210 ms Number of calibration run: 1 Number of run with values: 20 Total number of run: 21 Number of warmup per run: 1 Number of value per run: 3 Loop iterations per value: 2^15 Total number of values: 60 Minimum: 4.94 us Median +- MAD: 5.63 us +- 0.12 us Mean +- std dev: 5.63 us +- 0.31 us Maximum: 6.41 us 0th percentile: 4.94 us (-12% of the mean) -- minimum 5th percentile: 5.10 us (-9% of the mean) 25th percentile: 5.52 us (-2% of the mean) -- Q1 50th percentile: 5.63 us (+0% of the mean) -- median 75th percentile: 5.81 us (+3% of the mean) -- Q3 95th percentile: 5.95 us (+6% of the mean) 100th percentile: 6.41 us (+14% of the mean) -- maximum Number of outlier (out of 5.07 us..6.25 us): 6
pytest-benchmark
/ionelmc/pytest-benchmark
pytest-benchmark is a plugin for the pytest unit testing framework. Write unit test cases individually:
from demo import bubble_sort def test_bubble_sort(benchmark): test_list = [5, 2, 4, 1, 3] result = benchmark(bubble_sort, test_list) assert result == [1, 2, 3, 4, 5]
Needs attention:
- Import the bubble_sort() function.
- benchmark is used as a hook function and does not require a package import. The prerequisite is that you need to have pytest and pytest-benchmark installed.
- For ease of assertion, we'll just fix the number to be sorted.
Run the test cases:
❯ pytest -q .\test_demo.py . [100%] ------------------------------------------------ benchmark: 1 tests ----------------------------------------------- Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations ------------------------------------------------------------------------------------------------------------------- test_bubble_sort 1.6000 483.2000 1.7647 2.6667 1.7000 0.0000 174;36496 566.6715 181819 1 ------------------------------------------------------------------------------------------------------------------- Legend: Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile. OPS: Operations Per Second, computed as 1 / Mean 1 passed in 1.98s
on top of that--benchmark-histogram
parameter, you get a chart
❯ pytest -q .\test_demo.py --benchmark-histogram . [100%] ------------------------------------------------ benchmark: 1 tests ----------------------------------------------- Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS (Kops/s) Rounds Iterations ------------------------------------------------------------------------------------------------------------------- test_bubble_sort 1.6000 53.9000 1.7333 0.3685 1.7000 0.0000 1640;37296 576.9264 178572 1 ------------------------------------------------------------------------------------------------------------------- Generated histogram: D:\github\test-circle\article\code\benchmark_20210401_165958.svg
Pictures are below:
There are many more tools for benchmarking that will not be covered here.
After benchmarking and finding out that the program is slowing down, then the next thing you need to do is code performance analysis, which I'll cover in my next post.
Above is the details of how to benchmark python, for more information about python benchmarking please follow my other related articles!