Method for accurately recording function run time in Pytorch

0. Introduction

Referring to the description of CUDA by Pytorch's official documentation, GPU operations are executed asynchronously. Generally speaking, the effect of asynchronous calculation is invisible to the caller because

Each device performs operations in the order of queueing
For the synchronization between CPU and GPU, the synchronization between GPUs is automatically performed and does not need to be displayed and written in the code.

The consequence of asynchronous calculations is that time measurements without synchronization are inaccurate.

1. Solution

Referring to the help document mentioned in the introduction, the solution given by Pytorch is to use the recording time, the specific code is as follows:

# import torch
start_event = (enable_timing=True)
end_event = (enable_timing=True)
start_event.record()

# Run your code snippet here

end_event.record()
()  # Wait for the events to be recorded!
elapsed_time_ms = start_event.elapsed_time(end_event)  # elapsed time (ms)

Insert your code intostart_event.record()andend_event.record()In the middle, measure time in milliseconds.

A capable reader can also use it as a decorator or with statement:

First write a custom with class (ContextManager)

class CudaTimer:
    def __init__(self):
        self.start_event = (enable_timing=True)
        self.end_event = (enable_timing=True)

    def __enter__(self):
        self.start_event.record()
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        self.end_event.record()
        ()
        self.elapsed_time = self.start_event.elapsed_time(self.end_event) / 1000 # ms -> s

Then install the following with statement to return:

with CudaTimer() as timer:
	# run your code here
dt = timer.elapsed_time  # s

This ensures the simplicity of statements when multiple files are called. Special reminder: Gettimer.elapsed_timeoperatedon't wantWritten inwithInside the statement. When the with statement is not ended, the timer member variable cannot be obtained.

2. Supplement

For functions that mix CPU and GPU operations, useIt may make the statistical time shorter than the actual time, and it can be used at this time()Instead, the standard with objects are written as follows:

# import time
class Timer:
    def __enter__(self):
        self.start_time = ()
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        ()
        self.elapsed_time = () - self.start_time

Then just put the abovewith CudaTimer() as timerReplace withwith Timer() as timerJust do it.

This is the article about how to accurately record the running time of Pytorch function. For more related content on Pytorch function recording, please search for my previous article or continue browsing the related articles below. I hope everyone will support me in the future!