Use Python to conduct comprehensive GPU environment detection and analysis

Introduction

This article introduces a powerful GPU diagnostic tool that comprehensively collects and analyzes GPU-related information in the system, including hardware specifications, driver status, memory usage, and USB controller information. This tool is especially suitable for configuration checking and problem diagnosis in deep learning development environments.

Functional Features

1. System environment detection

Python Runtime Environment Version
PyTorch version information
CUDA and cuDNN version check

Environment variable check

CUDA_HOME
CUDA_PATH
CUDA_VISIBLE_DEVICES

Hardware information

Equipment quantity and model
Computational capability
Video memory capacity
Number of multiprocessors
Maximum number of threads

4. Video memory usage status

Allocated video memory
Video memory reserved
Available video memory

and lightning interface support

NVIDIA USB controller detection
Type-C interface supports checking
Lightning interface support inspection

Implementation details

1. Environmental information collection

The tool uses Python's system library and PyTorch library to collect basic environment information. By accessing system environment variables and PyTorch built-in functions, you can obtain CUDA-related configuration information.

2. GPU information acquisition

Use PyTorch's CUDA API to get detailed GPU information, including:

.is_available() Checks GPU availability
.device_count() Gets the number of GPUs
.get_device_properties() Gets GPU properties

3. Video memory monitoring

Monitor video memory usage in real time through PyTorch's memory management API:

.memory_allocated()
.memory_reserved()

4. Hardware interface detection

Use the Windows Management Instrumentation Command-line (WMIC) tool to detect the system's USB controller and lightning interface support.

How to use

Make sure that the system is installed with Python and PyTorch

Run the script to get the complete diagnostic report:

python gpu_info.py

Complete code

import sys
import os
import subprocess
import torch
from datetime import datetime

def get_gpu_info():
    print("=" * 50)
    print("GPU Diagnostic Report")
    print("=" * 50)
    print(f"Diagnosis time: {().strftime('%Y-%m-%d %H:%M:%S')}\n")

    # System Information    print("System Information:")
    print(f"Python Version: {}")
    print(f"PyTorch Version: {torch.__version__}")
    print(f"CUDA Version (PyTorch): {}")
    print(f"cuDNN Version: {()}\n")

    # CUDA Environmental Check    print("CUDA environment variables:")
    cuda_vars = {
        'CUDA_HOME': ('CUDA_HOME', 'Not set'),
        'CUDA_PATH': ('CUDA_PATH', 'Not set'),
        'CUDA_VISIBLE_DEVICES': ('CUDA_VISIBLE_DEVICES', 'Not set')
    }
    for var, value in cuda_vars.items():
        print(f"{var}: {value}")
    print()

    # NVIDIA-SMI output    print("NVIDIA-SMI Information:")
    try:
        encodings = ['gbk', 'utf-8', 'iso-8859-1']
        nvidia_smi = None
        for encoding in encodings:
            try:
                nvidia_smi = subprocess.check_output(["nvidia-smi"]).decode(encoding)
                break
            except UnicodeDecodeError:
                continue
        
        if nvidia_smi:
            print(nvidia_smi)
        else:
            print("Unable to decode nvidia-smi output")
    except Exception as e:
        print(f"implement nvidia-smi fail: {e}\n")

    # PyTorch GPU Information    print("\nPyTorch GPU details:")
    if .is_available():
        print(f"Detected {.device_count()} indivual GPU equipment")
        
        for i in range(.device_count()):
            props = .get_device_properties(i)
            print(f"\nGPU {i}: {}")
            print(f"├─ Computational capability: {}.{}")
            print(f"├─ Total video memory: {props.total_memory / (1024**2):.1f} MB")
            print(f"├─ Number of multiprocessors: {props.multi_processor_count}")
            print(f"├─ Maximum number of threads/piece: {props.max_threads_per_multi_processor}")
            
            # Video memory usage            try:
                memory_allocated = .memory_allocated(i) / (1024**2)
                memory_reserved = .memory_reserved(i) / (1024**2)
                memory_free = (props.total_memory / (1024**2)) - memory_allocated
                
                print(f"├─ Allocated video memory: {memory_allocated:.1f} MB")
                print(f"├─ Video memory reserved: {memory_reserved:.1f} MB")
                print(f"└─ Available video memory: {memory_free:.1f} MB")
            except Exception as e:
                print(f"└─ Unable to obtain video memory usage: {e}")
    else:
        print("No available GPU device detected")
        print("\nPossible Causes:")
        print("1. CUDA driver is not installed correctly")
        print("2. PyTorch not compiled CUDA support")
        print("3. GPU is occupied by other processes")
        print("4. The system does not correctly identify the GPU")

def get_usb_controller_info():
    print("\nUSB controller information:")
    try:
        result = subprocess.check_output(["wmic", "path", "Win32_USBController", "get", "name,manufacturer"], 
                                      encoding='gbk')
        controllers = ().split('\n')[1:]  # Skip header
        nvidia_controllers = []
        
        for controller in controllers:
            if ():
                if "NVIDIA" in controller:
                    nvidia_controllers.append(())
                    
        if nvidia_controllers:
            print("\nVIDIA USB Controller:")
            for controller in nvidia_controllers:
                print(f"- {controller}")
                if "Type-C" in controller:
                    print(" * Supports USB Type-C")
                    # Check for Thunderbolt support
                    try:
                        tb_check = subprocess.check_output(
                            ["wmic", "path", "Win32_PnPEntity", "where", 
                             "caption like '%Thunderbolt%'", "get", "caption"], 
                            encoding='gbk'
                        )
                        if len(tb_check.strip().split('\n')) &gt; 1:  # Has content beyond header
                            print(" * Supports lightning interface")
                    except:
                        pass
        else:
            print("NVIDIA USB Controller Not Found")
    except Exception as e:
        print(f"Get USB 控制器信息fail: {e}")

if __name__ == "__main__":
    get_gpu_info()
    get_usb_controller_info()

Things to note

Make sure the system is correctly installed with NVIDIA drivers

PyTorch requires CUDA version to be installed

In Windows systems, administrator permissions are required to obtain certain hardware information

troubleshooting

If the tool reports that the GPU is not detected, check:

Is the NVIDIA driver installed correctly?

Does the CUDA toolkit match the PyTorch version

Is the environment variable configured correctly?

Is the GPU exclusive to other processes

The above is the detailed content of using Python for comprehensive GPU environment detection and analysis. For more information about Python GPU environment detection, please pay attention to my other related articles!