SoFunction
Updated on 2025-04-14

Detailed explanation of the steps to implement load balancing of Consul/Nacos based on GPU model and video memory margin in Java

Java implements Consul/Nacos to perform load balancing based on GPU model and memory margin

Step 1: The server obtains GPU metadata

1. Add dependencies

existIntroduce Apache Commons Exec to execute commands:

<dependency>
    <groupId></groupId>
    <artifactId>commons-exec</artifactId>
    <version>1.3</version>
</dependency>
<dependency>
    <groupId></groupId>
    <artifactId>gson</artifactId>
    <version>2.8.9</version>
</dependency>

2. Implement GPU information collection

import ;
import ;
import ;
import ;
import ;
import ;
public class GpuInfoUtil {
    public static List&lt;GpuMeta&gt; getGpuMetadata() throws IOException {
        CommandLine cmd = ("nvidia-smi --query-gpu=name,, --format=csv,noheader,nounits");
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        PumpStreamHandler streamHandler = new PumpStreamHandler(outputStream);
        DefaultExecutor executor = new DefaultExecutor();
        (streamHandler);
        (cmd);
        String output = ();
        return parseOutput(output);
    }
    private static List&lt;GpuMeta&gt; parseOutput(String output) {
        List&lt;GpuMeta&gt; gpus = new ArrayList&lt;&gt;();
        for (String line : ("\\r?\\n")) {
            String[] parts = (",");
            if ( &gt;= 3) {
                String name = parts[0].trim();
                long total = (parts[1].trim()) * 1024 * 1024; // MB -&gt; bytes
                long free = (parts[2].trim()) * 1024 * 1024;
                (new GpuMeta(name, total, free));
            }
        }
        return gpus;
    }
    public static class GpuMeta {
        private String name;
        private long totalMem;
        private long freeMem;
        //Construction methods, getters, setters are omitted    }
}

Step 2: Register the service to Consul/Nacos

1. Consul registration implementation

import .;
import .;
public class ConsulRegistrar {
    public void register(String serviceName, String ip, int port) throws Exception {
        ConsulClient consul = new ConsulClient("localhost", 8500);
        List&lt;GpuMeta&gt; gpus = ();
        NewService service = new NewService();
        (serviceName + "-" + ip + ":" + port);
        (serviceName);
        (ip);
        (port);
        // Serialize GPU metadata        Gson gson = new Gson();
        ().put("gpus", (gpus));
        (service);
    }
}

2. Nacos registration implementation

import ;
import ;
import ;
public class NacosRegistrar {
    public void register(String serviceName, String ip, int port) throws Exception {
        NamingService naming = ("localhost:8848");
        List<GpuMeta> gpus = ();
        Instance instance = new Instance();
        (ip);
        (port);
        (serviceName);
        ().put("gpus", new Gson().toJson(gpus));
        (serviceName, instance);
    }
}

Step 3: Dynamically update metadata

import ;
import ;
import ;
public class MetadataUpdater {
    private ScheduledExecutorService scheduler = ();
    private ConsulClient consulClient;
    private String serviceId;
    public void startUpdating() {
        (() -&gt; {
            try {
                List&lt;GpuMeta&gt; gpus = ();
                String gpuJson = new Gson().toJson(gpus);
                // Re-register to update metadata                NewService service = new NewService();
                (serviceId);
                (("gpus", gpuJson));
                (service);
            } catch (Exception e) {
                ();
            }
        }, 0, 10, );
    }
}

Step 4: Client load balancing (Spring Cloud example)

1. Custom load balancer

import ;
import ;
import ;
public class GpuAwareServiceSupplier implements ServiceInstanceListSupplier {
    private final ServiceInstanceListSupplier delegate;
    private final Gson gson = new Gson();
    public GpuAwareServiceSupplier(ServiceInstanceListSupplier delegate) {
         = delegate;
    }
    @Override
    public Flux<List<ServiceInstance>> get() {
        return ().map(instances -> 
            ()
                .filter(instance -> {
                    String gpuJson = ().get("gpus");
                    List<GpuMeta> gpus = (gpuJson, new TypeToken<List<GpuMeta>>(){}.getType());
                    return ().anyMatch(g -> () > 2 * 1024 * 1024 * 1024L); // 2GB
                })
                .collect(())
        );
    }
}

2. Configure load balancing policies

@Configuration
public class LoadBalancerConfig {
    @Bean
    public ServiceInstanceListSupplier discoveryClientSupplier(
        ConfigurableApplicationContext context) {
        return ()
                .withDiscoveryClient()
                .withCaching()
                .withHealthChecks()
                .withBlockingDiscoveryClient()
                .build(context);
    }
}

Final verification

Check the registry metadata

curl http://localhost:8500/v1/catalog/service/my-service | jq .

The output should contain something like:

{
  "ServiceMeta": {
    "gpus": "[{\"name\":\"Tesla T4\",\"totalMem\":17179869184,\"freeMem\":8589934592}]"
  }
}

Client call verification
The client will automatically select nodes with sufficient video memory, and log output example:

INFO Selected instance 192.168.1.101:8080 with 8GB free GPU memory

Through the above steps, service registration and load balancing based on GPU metadata can be realized in Java.

This is the article about Java implementing Consul/Nacos load balancing based on GPU model and video memory margin. For more related Java load balancing content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!