0. Introduction
Referring to the description of CUDA by Pytorch's official documentation, GPU operations are executed asynchronously. Generally speaking, the effect of asynchronous calculation is invisible to the caller because
- Each device performs operations in the order of queueing
- For the synchronization between CPU and GPU, the synchronization between GPUs is automatically performed and does not need to be displayed and written in the code.
The consequence of asynchronous calculations is that time measurements without synchronization are inaccurate.
1. Solution
Referring to the help document mentioned in the introduction, the solution given by Pytorch is to use the recording time, the specific code is as follows:
# import torch start_event = (enable_timing=True) end_event = (enable_timing=True) start_event.record() # Run your code snippet here end_event.record() () # Wait for the events to be recorded! elapsed_time_ms = start_event.elapsed_time(end_event) # elapsed time (ms)
Insert your code intostart_event.record()
andend_event.record()
In the middle, measure time in milliseconds.
A capable reader can also use it as a decorator or with statement:
First write a custom with class (ContextManager)
class CudaTimer: def __init__(self): self.start_event = (enable_timing=True) self.end_event = (enable_timing=True) def __enter__(self): self.start_event.record() return self def __exit__(self, exc_type, exc_value, traceback): self.end_event.record() () self.elapsed_time = self.start_event.elapsed_time(self.end_event) / 1000 # ms -> s
Then install the following with statement to return:
with CudaTimer() as timer: # run your code here dt = timer.elapsed_time # s
This ensures the simplicity of statements when multiple files are called. Special reminder: Gettimer.elapsed_time
operatedon't wantWritten inwith
Inside the statement. When the with statement is not ended, the timer member variable cannot be obtained.
2. Supplement
For functions that mix CPU and GPU operations, useIt may make the statistical time shorter than the actual time, and it can be used at this time
()
Instead, the standard with objects are written as follows:
# import time class Timer: def __enter__(self): self.start_time = () return self def __exit__(self, exc_type, exc_value, traceback): () self.elapsed_time = () - self.start_time
Then just put the abovewith CudaTimer() as timer
Replace withwith Timer() as timer
Just do it.
This is the article about how to accurately record the running time of Pytorch function. For more related content on Pytorch function recording, please search for my previous article or continue browsing the related articles below. I hope everyone will support me in the future!