Skip to main content
Tech Guide

HAMi: Open Source GPU Virtualization for AI Computing

睿思智联
1/1/2025
HAMi: Open Source GPU Virtualization for AI Computing

Introduction

HAMi (Heterogeneous AI Computing Virtualization Middleware, formerly k8s-vGPU-Scheduler) is a Kubernetes middleware for heterogeneous device management. It manages diverse AI accelerators (GPUs, NPUs, MLUs, DCUs, etc.), enables device sharing across Pods, and makes optimized scheduling decisions based on device topology and policies.

HAMi aims to remove the gap between different heterogeneous devices and provide a unified interface for users with no changes to their applications. As of December 2024, HAMi has been widely adopted across Internet, public/private cloud, finance, securities, energy, telecommunications, education, and manufacturing sectors. Over 40 companies and institutions are not only end users but also active contributors.

cncf_logo

HAMi is a Cloud Native Computing Foundation (CNCF) sandbox and landscape project, as well as a CNAI Landscape project.

Releases

Device Virtualization

HAMi provides device virtualization for multiple heterogeneous devices, supporting device sharing and resource isolation. See Supported Devices for the full list.

Device Sharing Capabilities

  • Partial device allocation by specifying core usage (percentage)
  • Partial device allocation by specifying device memory (MB)
  • Hard limits on streaming multiprocessors
  • Zero changes to existing programs required
  • Dynamic MIG slicing support, see example

example

Device Resource Isolation

HAMi supports hard isolation of device resources. Example with NVIDIA GPU:

      resources:
        limits:
          nvidia.com/gpu: 1 # requesting 1 vGPU
          nvidia.com/gpumem: 3000 # Each vGPU contains 3000M device memory

Only 3G of visible memory will be available inside the container:

hard_limit

Note:

  1. After installing HAMi, the nvidia.com/gpu value registered on the node defaults to the number of vGPUs
  2. When requesting resources in a pod, nvidia.com/gpu indicates the number of physical GPUs needed

Supported Devices

DeviceSharingMemory IsolationCompute IsolationNotes
NVIDIA GPUvGPU & dynamic MIG
Huawei Ascend NPU910B/910C/310P series
Cambricon MLUMLU series
Hygon DCUDCU series
Iluvatar CoreX GPUCoreX series
Moore Threads GPUMTT series
MetaX GPUsGPU management
Enflame GCUPercentage-based allocation
KunlunXin XPUvXPU fine-grained slicing

Architecture

hami_arch

HAMi consists of several components, including a unified mutatingwebhook, a unified scheduler extender, device plugins, and in-container virtualization technologies for each heterogeneous AI device.

Quick Start

Choose Your Cluster Scheduler

kube-scheduler volcano-scheduler

Prerequisites

  • NVIDIA drivers >= 440
  • nvidia-docker version > 2.0
  • Default runtime configured as nvidia for containerd/docker/cri-o
  • Kubernetes version >= 1.18
  • glibc >= 2.17 & glibc < 2.30
  • Kernel version >= 3.10
  • helm > 3.0

Installation

  1. Label your GPU nodes for HAMi scheduling:
kubectl label nodes {nodeid} gpu=on
  1. Add the Helm repo:
helm repo add hami-charts https://project-hami.github.io/HAMi/
  1. Deploy:
helm install hami hami-charts/hami -n kube-system

Customize your installation via configs.

  1. Verify installation:
kubectl get pods -n kube-system

If both vgpu-device-plugin and vgpu-scheduler pods are Running, installation is successful.

WebUI

HAMi-WebUI is available from v2.4. Deployment guide.

webui-1 webui-2

Example Task Submission

NVIDIA vGPUs can be requested by containers using the resource type nvidia.com/gpu:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
    - name: ubuntu-container
      image: ubuntu:18.04
      command: ["bash", "-c", "sleep 86400"]
      resources:
        limits:
          nvidia.com/gpu: "2" # requesting 2 vGPUs
          nvidia.com/gpumem: "3000" # Each vGPU contains 3000M device memory (optional)
          nvidia.com/gpucores: "30" # Each vGPU uses 30% of actual GPU compute (optional)

Note:

  • If you use the privileged field, the task will not be scheduled as it can see all GPUs and will affect other tasks.
  • Do not set the nodeName field; use nodeSelector for similar requirements.

More examples

Monitoring

Monitoring is automatically enabled after installation. Access cluster metrics at:

http://{scheduler_ip}:{monitorPort}/metrics

Default monitorPort is 31993; set a different value using --set devicePlugin.service.httpPort during installation. Grafana dashboard example.

Talks & References

ConferenceTopic
KubeCon & AI_dev (China 2024)Is Your GPU Really Working Efficiently in the Data Center?
KubeCon & AI_dev (China 2024)Unlocking Heterogeneous AI Infrastructure K8s Cluster
KubeDay (Japan 2024)Leveraging the Power of HAMi
KubeCon (EU 2024)Cloud Native Batch Computing with Volcano

Community

HAMi is committed to fostering an open and welcoming environment.

Star History

Star History Chart

HAMi Enterprise Edition (Rise VAST)

Rise VAST is the Enterprise Edition developed by RiseUnion in collaboration with 4Paradigm, built upon the HAMi open-source foundation. It introduces mission-critical enterprise features, including compute and memory oversubscription, dynamic resource preemption, granular resource specification, topology-aware scheduling, and robust isolation. By providing unified orchestration, shared allocation, and rapid scheduling of compute clusters, Rise VAST unlocks the full potential of heterogeneous infrastructure. Read more about the RiseUnion and 4Paradigm strategic partnership.

WANT TO KNOW MORE?

Connect with our expert team directly via the buttons below