Introduction
This article introduces a powerful GPU diagnostic tool that comprehensively collects and analyzes GPU-related information in the system, including hardware specifications, driver status, memory usage, and USB controller information. This tool is especially suitable for configuration checking and problem diagnosis in deep learning development environments.
Functional Features
1. System environment detection
- Python Runtime Environment Version
- PyTorch version information
- CUDA and cuDNN version check
Environment variable check
- CUDA_HOME
- CUDA_PATH
- CUDA_VISIBLE_DEVICES
Hardware information
- Equipment quantity and model
- Computational capability
- Video memory capacity
- Number of multiprocessors
- Maximum number of threads
4. Video memory usage status
- Allocated video memory
- Video memory reserved
- Available video memory
and lightning interface support
- NVIDIA USB controller detection
- Type-C interface supports checking
- Lightning interface support inspection
Implementation details
1. Environmental information collection
The tool uses Python's system library and PyTorch library to collect basic environment information. By accessing system environment variables and PyTorch built-in functions, you can obtain CUDA-related configuration information.
2. GPU information acquisition
Use PyTorch's CUDA API to get detailed GPU information, including:
- .is_available() Checks GPU availability
- .device_count() Gets the number of GPUs
- .get_device_properties() Gets GPU properties
3. Video memory monitoring
Monitor video memory usage in real time through PyTorch's memory management API:
- .memory_allocated()
- .memory_reserved()
4. Hardware interface detection
Use the Windows Management Instrumentation Command-line (WMIC) tool to detect the system's USB controller and lightning interface support.
How to use
Make sure that the system is installed with Python and PyTorch
Run the script to get the complete diagnostic report:
python gpu_info.py
Complete code
import sys import os import subprocess import torch from datetime import datetime def get_gpu_info(): print("=" * 50) print("GPU Diagnostic Report") print("=" * 50) print(f"Diagnosis time: {().strftime('%Y-%m-%d %H:%M:%S')}\n") # System Information print("System Information:") print(f"Python Version: {}") print(f"PyTorch Version: {torch.__version__}") print(f"CUDA Version (PyTorch): {}") print(f"cuDNN Version: {()}\n") # CUDA Environmental Check print("CUDA environment variables:") cuda_vars = { 'CUDA_HOME': ('CUDA_HOME', 'Not set'), 'CUDA_PATH': ('CUDA_PATH', 'Not set'), 'CUDA_VISIBLE_DEVICES': ('CUDA_VISIBLE_DEVICES', 'Not set') } for var, value in cuda_vars.items(): print(f"{var}: {value}") print() # NVIDIA-SMI output print("NVIDIA-SMI Information:") try: encodings = ['gbk', 'utf-8', 'iso-8859-1'] nvidia_smi = None for encoding in encodings: try: nvidia_smi = subprocess.check_output(["nvidia-smi"]).decode(encoding) break except UnicodeDecodeError: continue if nvidia_smi: print(nvidia_smi) else: print("Unable to decode nvidia-smi output") except Exception as e: print(f"implement nvidia-smi fail: {e}\n") # PyTorch GPU Information print("\nPyTorch GPU details:") if .is_available(): print(f"Detected {.device_count()} indivual GPU equipment") for i in range(.device_count()): props = .get_device_properties(i) print(f"\nGPU {i}: {}") print(f"├─ Computational capability: {}.{}") print(f"├─ Total video memory: {props.total_memory / (1024**2):.1f} MB") print(f"├─ Number of multiprocessors: {props.multi_processor_count}") print(f"├─ Maximum number of threads/piece: {props.max_threads_per_multi_processor}") # Video memory usage try: memory_allocated = .memory_allocated(i) / (1024**2) memory_reserved = .memory_reserved(i) / (1024**2) memory_free = (props.total_memory / (1024**2)) - memory_allocated print(f"├─ Allocated video memory: {memory_allocated:.1f} MB") print(f"├─ Video memory reserved: {memory_reserved:.1f} MB") print(f"└─ Available video memory: {memory_free:.1f} MB") except Exception as e: print(f"└─ Unable to obtain video memory usage: {e}") else: print("No available GPU device detected") print("\nPossible Causes:") print("1. CUDA driver is not installed correctly") print("2. PyTorch not compiled CUDA support") print("3. GPU is occupied by other processes") print("4. The system does not correctly identify the GPU") def get_usb_controller_info(): print("\nUSB controller information:") try: result = subprocess.check_output(["wmic", "path", "Win32_USBController", "get", "name,manufacturer"], encoding='gbk') controllers = ().split('\n')[1:] # Skip header nvidia_controllers = [] for controller in controllers: if (): if "NVIDIA" in controller: nvidia_controllers.append(()) if nvidia_controllers: print("\nVIDIA USB Controller:") for controller in nvidia_controllers: print(f"- {controller}") if "Type-C" in controller: print(" * Supports USB Type-C") # Check for Thunderbolt support try: tb_check = subprocess.check_output( ["wmic", "path", "Win32_PnPEntity", "where", "caption like '%Thunderbolt%'", "get", "caption"], encoding='gbk' ) if len(tb_check.strip().split('\n')) > 1: # Has content beyond header print(" * Supports lightning interface") except: pass else: print("NVIDIA USB Controller Not Found") except Exception as e: print(f"Get USB 控制器信息fail: {e}") if __name__ == "__main__": get_gpu_info() get_usb_controller_info()
Things to note
Make sure the system is correctly installed with NVIDIA drivers
PyTorch requires CUDA version to be installed
In Windows systems, administrator permissions are required to obtain certain hardware information
troubleshooting
If the tool reports that the GPU is not detected, check:
Is the NVIDIA driver installed correctly?
Does the CUDA toolkit match the PyTorch version
Is the environment variable configured correctly?
Is the GPU exclusive to other processes
The above is the detailed content of using Python for comprehensive GPU environment detection and analysis. For more information about Python GPU environment detection, please pay attention to my other related articles!