2025-07-18
HAMi (Heterogeneous AI Computing Virtualization Middleware), formerly known as 'K8s-vGPU-Scheduler', is a premier heterogeneous device management middleware for Kubernetes. It enables unified management of diverse AI accelerators (GPUs, NPUs, etc.), allows seamless resource sharing among pods, and optimizes scheduling based on hardware topology and flexible policies. For a detailed technical analysis of HAMi's architecture and implementation, see our Code Analysis Series.
HAMi aims to remove the gap between different heterogeneous devices and provide a unified interface for users to manage with no change to their applications. Until June 2024, HAMi has been widely used around the world in various industries such as Internet, Cloud Computing, Finance, and Manufacturing. More than 40 companies or institutions are not only end users but also active contributors. Learn more about our enterprise success stories.

HAMi is a Cloud Native Computing Foundation(CNCF) sandbox and landscape project, as well as a CNAI Landscape project.
HAMi provides device virtualization for several heterogeneous devices including GPU, supporting device sharing and device resource isolation. For detailed implementation analysis, check our Device Plugin Analysis and Webhook Implementation.

HAMi supports hard isolation of device resources. Here's a simple demonstration using NVIDIA GPU as an example: After submitting a task defined as follows:
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 vGPU
nvidia.com/gpumem: 3000 # Each vGPU contains 3000m device memory
Only 3G of visible memory will be available


HAMi consists of several components, including a unified mutatingwebhook, a unified scheduler, and device plugins along with in-container control components for various heterogeneous computing devices. The overall architectural features are shown in the diagram above.
kubectl label nodes {nodeid} gpu=on
helm repo add hami-charts https://project-hami.github.io/HAMi/
helm install hami hami-charts/hami -n kube-system
You can customize your installation by adjusting the configs
kubectl get pods -n kube-system
If both vgpu-device-plugin and vgpu-scheduler pods are in the Running state, your installation is successful.
HAMi-WebUI is available after HAMi v2.4.0. For deployment instructions.

NVIDIA vGPUs can now be requested by containers using the resource type nvidia.com/gpu:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
containers:
- name: ubuntu-container
image: ubuntu:18.04
command: ["bash", "-c", "sleep 86400"]
resources:
limits:
nvidia.com/gpu: "2" # requesting 2 vGPUs
nvidia.com/gpumem: "3000" # Each vGPU contains 3000m device memory (optional, integer)
nvidia.com/gpucores: "30" # Each vGPU uses 30% of actual GPU computing power (optional, integer)
If your task cannot run on any node (e.g., if the task's nvidia.com/gpu is greater than the actual GPU count of any GPU node in the cluster), the task will remain in pending state.
You can now execute the nvidia-smi command in the container to compare the difference between vGPU and actual GPU memory size.
Note:
- If you use the privileged field, this task will not be scheduled as it can see all GPUs and will affect other tasks.
- Do not set the nodeName field; use nodeSelector for similar requirements.
For more examples, click here: Examples
Monitoring is automatically enabled after installation. Obtain cluster information overview by visiting:
http://{scheduler ip}:{monitorPort}/metrics
The default monitorPort is 31993; other values can be set using --set devicePlugin.service.httpPort during installation.
Grafana dashboard example
Note The vGPU status of a node will only be collected after it uses vGPU
RiseUnion, as one of the core developers of the HAMi open source community, continues to promote the co-construction and development of the HAMi community.
If you want to become a contributor to HAMi, please refer to: Contributor Guide.
For more details, please refer to: HAMi github.
Rise VAST is the Enterprise Edition developed by RiseUnion in collaboration with 4Paradigm, built upon the HAMi open-source foundation. It introduces mission-critical enterprise features, including compute and memory oversubscription, dynamic resource preemption, granular resource specification, topology-aware scheduling, and robust isolation. By providing unified orchestration, shared allocation, and rapid scheduling of compute clusters, Rise VAST unlocks the full potential of heterogeneous infrastructure, accelerating the delivery of intelligent enterprise applications. Read more about the RiseUnion and 4Paradigm strategic partnership.