Introduction
HAMi (Heterogeneous AI Computing Virtualization Middleware, formerly k8s-vGPU-Scheduler) is a Kubernetes middleware for heterogeneous device management. It manages diverse AI accelerators (GPUs, NPUs, MLUs, DCUs, etc.), enables device sharing across Pods, and makes optimized scheduling decisions based on device topology and policies.
HAMi aims to remove the gap between different heterogeneous devices and provide a unified interface for users with no changes to their applications. As of December 2024, HAMi has been widely adopted across Internet, public/private cloud, finance, securities, energy, telecommunications, education, and manufacturing sectors. Over 40 companies and institutions are not only end users but also active contributors.

HAMi is a Cloud Native Computing Foundation (CNCF) sandbox and landscape project, as well as a CNAI Landscape project.
Releases
- HAMi v2.8.0 — DRA alignment, Scheduler HA, CDI mode
- HAMi v2.7.0 — KunlunXin vXPU, Enflame GCU, AWS Neuron support
- HAMi v2.6.0 — Enflame GCU share, MetaX sGPU, topology-aware scoring
- HAMi v2.5.0 — Dynamic MIG support
- HAMi v2.4.0 — Ascend 910B/310P, WebUI visualization
- GitHub Releases
Device Virtualization
HAMi provides device virtualization for multiple heterogeneous devices, supporting device sharing and resource isolation. See Supported Devices for the full list.
Device Sharing Capabilities
- Partial device allocation by specifying core usage (percentage)
- Partial device allocation by specifying device memory (MB)
- Hard limits on streaming multiprocessors
- Zero changes to existing programs required
- Dynamic MIG slicing support, see example

Device Resource Isolation
HAMi supports hard isolation of device resources. Example with NVIDIA GPU:
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 vGPU
nvidia.com/gpumem: 3000 # Each vGPU contains 3000M device memory
Only 3G of visible memory will be available inside the container:

Note:
- After installing HAMi, the
nvidia.com/gpuvalue registered on the node defaults to the number of vGPUs- When requesting resources in a pod,
nvidia.com/gpuindicates the number of physical GPUs needed
Supported Devices
| Device | Sharing | Memory Isolation | Compute Isolation | Notes |
|---|---|---|---|---|
| NVIDIA GPU | ✅ | ✅ | ✅ | vGPU & dynamic MIG |
| Huawei Ascend NPU | ✅ | ✅ | ✅ | 910B/910C/310P series |
| Cambricon MLU | ✅ | ✅ | ✅ | MLU series |
| Hygon DCU | ✅ | ✅ | ✅ | DCU series |
| Iluvatar CoreX GPU | ✅ | ✅ | ✅ | CoreX series |
| Moore Threads GPU | ✅ | ✅ | ✅ | MTT series |
| MetaX GPU | ✅ | ✅ | ✅ | sGPU management |
| Enflame GCU | ✅ | ✅ | — | Percentage-based allocation |
| KunlunXin XPU | ✅ | ✅ | ✅ | vXPU fine-grained slicing |
Architecture

HAMi consists of several components, including a unified mutatingwebhook, a unified scheduler extender, device plugins, and in-container virtualization technologies for each heterogeneous AI device.
Quick Start
Choose Your Cluster Scheduler
Prerequisites
- NVIDIA drivers >= 440
- nvidia-docker version > 2.0
- Default runtime configured as nvidia for containerd/docker/cri-o
- Kubernetes version >= 1.18
- glibc >= 2.17 & glibc < 2.30
- Kernel version >= 3.10
- helm > 3.0
Installation
- Label your GPU nodes for HAMi scheduling:
kubectl label nodes {nodeid} gpu=on
- Add the Helm repo:
helm repo add hami-charts https://project-hami.github.io/HAMi/
- Deploy:
helm install hami hami-charts/hami -n kube-system
Customize your installation via configs.
- Verify installation:
kubectl get pods -n kube-system
If both vgpu-device-plugin and vgpu-scheduler pods are Running, installation is successful.
WebUI
HAMi-WebUI is available from v2.4. Deployment guide.

Example Task Submission
NVIDIA vGPUs can be requested by containers using the resource type nvidia.com/gpu:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
- name: ubuntu-container
image: ubuntu:18.04
command: ["bash", "-c", "sleep 86400"]
resources:
limits:
nvidia.com/gpu: "2" # requesting 2 vGPUs
nvidia.com/gpumem: "3000" # Each vGPU contains 3000M device memory (optional)
nvidia.com/gpucores: "30" # Each vGPU uses 30% of actual GPU compute (optional)
Note:
- If you use the privileged field, the task will not be scheduled as it can see all GPUs and will affect other tasks.
- Do not set the nodeName field; use nodeSelector for similar requirements.
Monitoring
Monitoring is automatically enabled after installation. Access cluster metrics at:
http://{scheduler_ip}:{monitorPort}/metrics
Default monitorPort is 31993; set a different value using --set devicePlugin.service.httpPort during installation. Grafana dashboard example.
Talks & References
| Conference | Topic |
|---|---|
| KubeCon & AI_dev (China 2024) | Is Your GPU Really Working Efficiently in the Data Center? |
| KubeCon & AI_dev (China 2024) | Unlocking Heterogeneous AI Infrastructure K8s Cluster |
| KubeDay (Japan 2024) | Leveraging the Power of HAMi |
| KubeCon (EU 2024) | Cloud Native Batch Computing with Volcano |
Community
HAMi is committed to fostering an open and welcoming environment.
- Community meeting: Every Friday at 16:00 (UTC+8). Meeting notes
- Slack: #hami (CNCF Slack)
- Discord: HAMi Server
- Mailing list: hami-project
- Maintainers · Contributing Guide · Roadmap
Star History
HAMi Enterprise Edition (Rise VAST)
Rise VAST is the Enterprise Edition developed by RiseUnion in collaboration with 4Paradigm, built upon the HAMi open-source foundation. It introduces mission-critical enterprise features, including compute and memory oversubscription, dynamic resource preemption, granular resource specification, topology-aware scheduling, and robust isolation. By providing unified orchestration, shared allocation, and rapid scheduling of compute clusters, Rise VAST unlocks the full potential of heterogeneous infrastructure. Read more about the RiseUnion and 4Paradigm strategic partnership.
WANT TO KNOW MORE?
Connect with our expert team directly via the buttons below