2025-02-10
The release of HAMi v2.5.0 brings some exciting updates, especially with the introduction of dynamic MIG (Multi-Instance GPU) support. This new feature allows users to dynamically partition GPUs at runtime, which eliminates the need for pre-configured MIG instances and enables more flexible resource management.
Here are the main highlights from the update:
cuMallocAsync
API.ConfigMap
for easier management.referencekube-scheduler
.A critical fix was made regarding the libvgpu.so
file handling. In previous versions, restarting the hami-device-plugin could disrupt ongoing GPU tasks because the plugin copied the libvgpu.so
file each time it was restarted. Now, this issue is addressed by calculating the MD5 hash of the file and only copying it if the file is different, preventing task disruptions.
The dynamic MIG feature in this release is a game-changer. Now, for NVIDIA GPUs that support MIG (such as A100, H100, and A30), users can enable dynamic partitioning of GPU resources. The key benefits are:
To enable dynamic MIG:
hami-device-plugin
configuration to switch nodes to MIG mode.To use dynamic MIG, simply deploy a pod with specified GPU requirements, and the system will automatically allocate MIG instances from the available pool. For example:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod-test
spec:
containers:
- name: ubuntu-container
image: ubuntu:18.04
command: ["bash", "-c", "sleep 86400"]
resources:
limits:
nvidia.com/gpu: 2
nvidia.com/gpumem: 8000
You can also specify a MIG node using the nvidia.com/vgpu-mode: "mig"
annotation if you want to force the pod to be scheduled on a MIG node.
The introduction of dynamic MIG in HAMi v2.5.0 significantly enhances resource flexibility, enabling GPU partitioning at runtime. It’s an exciting step toward more efficient GPU resource utilization, particularly for users running mixed workloads in Kubernetes environments. This release also includes improvements in stability and usability, making it easier to manage and monitor GPU resources.
reference:
To learn more about RiseUnion's GPU virtualization and computing power management solutions, contact@riseunion.io