HAMi v2.8.0: Evolution of Standardization and Ecosystem

2026-02-04


HAMi v2.8.0 is here! Since the release of v2.7, the project has made significant strides in architectural integrity, scheduling reliability, and ecosystem alignment. v2.8 delivers systemic enhancements in Kubernetes native standardization, heterogeneous device support, and production-grade observability, making HAMi even more suitable for long-running, stability-critical AI production clusters.


Highlights

Overview of key features in v2.8:

  1. Standardization: Added support for Kubernetes DRA (Dynamic Resource Allocation) with a standalone implementation HAMi-DRA, driving HAMi's evolution from "custom scheduling logic" to Kubernetes native standard interfaces.
  2. Heterogeneous Ecosystem Expansion: Updated and enhanced support for domestic chips like Iluvatar, MetaX, and Huawei Ascend. Fixed vLLM compatibility issues and refined Kueue integration.
  3. High Availability & Reliability: Introduced Leader Election for Scheduler HA; added CDI mode support for standardized device management; aligned with NVIDIA k8s-device-plugin v0.18.0.
  4. Ecosystem Maturity: HAMi has evolved from a single repo into a complete ecosystem containing HAMi-DRA, mock-device-plugin, ascend-device-plugin, HAMi-WebUI, and more.

Core Features: Standardization & High Availability

1. DRA (Dynamic Resource Allocation) - Embracing Kubernetes Native Standards

DRA is the next-generation device resource declaration and allocation mechanism being advanced by the Kubernetes community, aiming to provide a more standardized, composable, and scalable resource management model for devices like GPUs and AI accelerators.

Why DRA Matters

Traditional Kubernetes device management has limitations:

  1. Inflexible Resource Declaration: Resources are hardcoded via limits[nvidia.com/gpu], unable to express complex needs like separate memory and compute requirements.
  2. Fragmented Logic: Each device plugin implements its own scheduling logic, making unified management difficult.
  3. Complex Composition: Cannot express requirements like "multiple GPUs with specific topology."

DRA introduces new APIs like ResourceClaim and DeviceClass to standardize declaration, allocation, and management, offering greater flexibility and scalability.

HAMi-DRA Core Features

HAMi-DRA is the HAMi community's standalone DRA implementation. It uses a Mutating Webhook architecture to automatically convert traditional GPU resource requests into DRA ResourceClaims.

  1. Automatic Resource Conversion: Automatically converts nvidia.com/gpu, nvidia.com/gpumem, nvidia.com/gpucores requests to DRA ResourceClaims.
  2. Device Selection: Supports selecting specific devices via Pod Annotations (by UUID, device type, etc.).
  3. Metrics Monitoring: Optional Monitor component exposing GPU usage metrics via Prometheus.
  4. CDI Support: Integrated with Container Device Interface for standardized device injection.

DRA Usage Example

When submitting a Pod, the HAMi-DRA Webhook automatically converts it to use DRA ResourceClaim.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: gpu-container
      image: nvidia/cuda:11.8.0-base-ubuntu22.04
      command: ["nvidia-smi"]
      resources:
        limits:
          nvidia.com/gpu: 2
          nvidia.com/gpumem: 4096
          nvidia.com/gpucores: 80

2. Leader Election - Scheduler High Availability

For large-scale clusters or HA deployments, HAMi v2.8.0 introduces Leader Election for multiple Scheduler instances. Using Kubernetes Lease mechanisms, it ensures only one Scheduler instance is active at any given time to make scheduling decisions.

Key Benefits:

  1. Prevent Scheduling Conflicts: Multiple concurrent Schedulers can cause resource conflicts; Leader Election ensures serialized decision-making.
  2. Automatic Failover: StandardBy instances automatically take over if the Leader fails, improving system availability.
  3. Zero-Downtime Upgrades: During rolling updates, new Pods automatically become Leaders without manual intervention.

3. CDI (Container Device Interface) Mode Support

HAMi v2.8.0 adds support for NVIDIA CDI mode. CDI is a container device interface standard maintained by CNCF TAG, providing a standardized way to inject devices. Users can enable this via global.deviceListStrategy: cdi-annotations.

4. Mock Device Plugin - Development & Testing Tool

HAMi v2.8.0 introduces the Mock Device Plugin, lowering the barrier for developers and CI/test environments to simulate devices.

Capabilities:

  • Virtual Device Registration: Registers virtual devices (e.g., gpu-memory, gpu-cores) to nodes.
  • Multi-Vendor Support: Supports simulation for NVIDIA GPU, Hygon DCU, Ascend, and more.
  • Dev/Test Efficiency: Verify functionality and debug without requiring physical GPU hardware.

5. Observability Enhancements

HAMi v2.8.0 systemically enhances observability, adding build info metrics and deprecating obsolete ones.

  • New Metric: hami_build_info containing version, build time, and Git commit.
  • Optimization: Recommended replacement of percentage-based metrics with vGPUMemoryAllocated and vGPUCoreAllocated.

Heterogeneous Ecosystem & Integration

1. Domestic Chip Support Updates

Iluvatar (天数智芯)

HAMi v2.8 enhances Iluvatar GPU support:

  • Multi-Card Scheduling: Fixed potential issues with vXPU features on P800 blades.
  • Scheduling Failure Events: Enhanced event output for easier troubleshooting.
  • Device Info: Added podInfos to DeviceUsage for better scheduling decisions.

    [Thanks] @qiangwei1983 @Kyrie336 for contributions to Iluvatar support!

MetaX (沐曦)

Continued enhancements for MetaX GPUs:

  • sGPU Compute/Memory Sharing: Supports virtual GPU sharing to improve utilization.
  • QoS Modes: Supports BestEffort, FixedShare, and BurstShare.
  • WebUI Support: Visualized heterogeneous metrics.

[Thanks] @Kyrie336 for contributions to MetaX support!

Huawei Ascend (昇腾)

The ascend-device-plugin project now supports vNPU (virtual NPU) features, compatible with both HAMi and Volcano schedulers.

  • vNPU Virtualization: Supports virtual partitioning of Ascend 910 series chips.
  • Memory Isolation: Precise control over memory usage for each vNPU.

[Thanks] @DSFans2014 @archlitchi for contributions to Ascend support!

2. Upstream/Downstream Integration

Kueue Integration

Kueue is a batch job queue management project by Kubernetes SIG Scheduling. The HAMi community contributed enhancements to Kueue to natively support HAMi's device resource management model. Kueue's ResourceTransformation can now automatically convert HAMi vGPU requests (e.g., converting nvidia.com/gpu + nvidia.com/gpucores to nvidia.com/total-gpucores) for unified management.

vLLM Compatibility

HAMi v2.8 fixes several vLLM compatibility issues:

  • Fixed crashes in multi-card scenarios.
  • Fixed initialization failures when manually specifying CUDA_VISIBLE_DEVICES.

[Thanks] @archlitchi for contributions to vLLM compatibility!


Optimizations & Fixes

Critical Fixes & Stability

v2.8 addresses issues from real-world production environments:

  • GPU/MIG Allocation: Fixed incorrect MIG instance allocation by the scheduler.
  • Concurrent Map Access: Fixed fatal errors during concurrent map iteration and writes.
  • Quota Calculation: Corrected ResourceQuota calculations.
  • Device Plugin Cleanup: Fixed residual state on nodes after plugin uninstallation.
  • Heterogeneous Boundary Cases: Fixed Pending issues with Kunlunxin vXPU multi-card allocation and MetaX P800 edge cases.

[Thanks] @litaixun @luohua13 @FouoF @Shouren for contributions to stability fixes!

Engineering Improvements

  • Node Registration: Refactored logic for better stability and maintainability.
  • Golang Upgrade: Upgraded to v1.25.5 for language features and security fixes.
  • Certificate Hot-Reload: Supports watching and hot-loading certificate changes without restarts.
  • Leaner Repo: Removed legacy binaries, significantly reducing repo size.

Once again, we would like to thank everyone who actively contributes to the community. It is because of you that HAMi continues to break through and grow.


Reference: https://mp.weixin.qq.com/s/hvpMl4bRpMENZAbdWR2peg.

To learn more about RiseUnion's vGPU resource pooling, virtualization, and AI compute management solutions:please contact us at contact@riseunion.io

WeChat QR Code