Skip to main content

Rise VAST: AI Compute Management Platform

Unified Management · Full Observability: manage heterogeneous compute assets with complete clarity

Platform Overview

Rise VAST is the resource management foundation of the entire RiseUnion product family, built on the CNCF sandbox project HAMi. It provides unified heterogeneous compute management and observability for Rise CAMP (scheduling), Rise ModelX (model serving), and Rise Edge (edge AI). Whether NVIDIA, Ascend, or Hygon, whether in the datacenter or at the edge, VAST makes all compute assets as transparent as utilities.

Supported Chips

NVIDIA
ascend
hygon
cambricon
iluvatar
metax
mthread
kunlunxin
enflame
PPU
More vendors coming soon...

Unified Collection: Heterogeneous Compute Management

Building on the HAMi open source foundation, Rise VAST breaks through multi-vendor compute fragmentation with a unified scheduling framework, reducing vendor integration complexity and enabling multi-cluster management. It achieves 6 unifications: resource management, compute scheduling, task control, service deployment, monitoring & O&M, and metering & operations.
RiseUnion and 4Paradigm formed a strategic partnership to deliver enterprise-class AI compute pooling: Rise VAST (HAMi Enterprise Edition), combining HAMi scheduling with 4Paradigm AI platform expertise.

-> Learn About the HAMi Enterprise Edition Partnership

Multi-cluster Multi-tenant

Unified management across geographies and architectures (x86/c86/ARM) with flexible shared and dedicated pool combinations

Unified Multi-vendor Framework

NVIDIA, Ascend, Hygon, Cambricon, Iluvatar, KunlunXin, Enflame, MetaX, Moore Threads and 10+ vendors under one framework

Cloud-native Zero-intrusion

GPU virtualization and isolation via Kubernetes Device Plugin mechanism. No application code changes needed, with compute limiting and VRAM isolation, transparent to workloads

Compute Pooling

Shared and dedicated compute pools with tenant and project-based quota allocation. On-demand resource request, use, and release

Edge Node vGPU

Edge node vGPU management support, compatible with both CAMP and EDGE cloud-native deployments, with standalone deployment option

Multi-protocol Access

Web UI, REST API, and MCP protocol access, abstracting away underlying heterogeneous complexity

Multi-cluster Multi-tenant Unified Management

Unified management of heterogeneous GPU clusters across regions and architectures, with LAN-based inter-cluster coordination and 6 unifications covering the full pipeline

Multi-DC

Unified management across Beijing, Inner Mongolia and more, 100G interconnect

Multi-arch

x86 / c86 / ARM mixed deployment with edge node vGPU management

Shared + Dedicated

Flexible shared and dedicated pool combinations with tenant/project quota allocation

Unified Resources Unified Scheduling Unified Tasks Unified Deployment Unified Monitoring Unified Metering

Full Observability & Intelligent O&M

Full-stack Observability

Microscope-level visibility from GPU device layer to K8s task layer, from physical resources to tenant quotas. Collects per-GPU SM compute time-slice ratio, VRAM time-sharing utilization, and inter-board communication traffic. Auto-builds Pod-GPU-compute unit three-tier mapping for second-level utilization pinpointing.

Domestic Chip Dynamic Partitioning

Breaks through vendor device plugin fixed-spec partitioning limitations. Intelligent dynamic allocation on demand, no restart required. Configuration upgraded from complex manual operations to one-click deployment, utilization from 30-50% to 80-90%.

Domestic Chip VRAM Isolation & Alignment

For Ascend 910B and 910C series AI Core and VRAM partitioning combinations, provides strict VRAM boundary checks to prevent out-of-bounds access. Auto-alignment to valid specs, inter-container resource isolation, and real-time VRAM monitoring.

Platform Health Dashboard

Utilization watermark, fragmentation rate, and faulty GPU distribution views by vendor, architecture, and GPU resource pool. One dashboard for complete compute health awareness.

Auto Fault Isolation

XID fault code alerting. When a GPU faults or resource usage hits threshold, triggers P0 alerts and auto-isolates the faulty card, preventing fault propagation to business workloads.

Enterprise Alert Platform

6-step closed loop: metric collection → rule engine → alert generation → noise suppression → tiered notification → incident review. Multi-channel delivery (email, SMS, DingTalk, WeCom) with on-call rotation integration.

Full-stack Observability: Node → GPU → Task → Model

Full-pipeline visibility from GPU device layer to K8s task layer, providing microscope-level observability for heterogeneous GPU clusters

Model

E2E Latency · TTFT · Token Throughput · Req/Resp · Resource Status

Task

Eviction · Compute/VRAM Monitor · Logs · Status · Interconnect Info

GPU

Enable/Disable · Fault Recovery · XID Alerts · Resource Monitor · Temp/Power

Node

Enable/Disable · Resources · Status · Driver Version · OS

Metrics

NVIDIA · Ascend · Hygon · KunlunXin · Iluvatar · More

Chip Compatibility Certifications

Rise VAST has completed adaptation certification with the following domestic AI chips, ensuring stable management and unified scheduling

Frequently Asked Questions

01 What's the relationship between Rise VAST and the open-source HAMi project? Can I just use the OSS version?
Rise VAST is the enterprise edition built on top of HAMi, the CNCF Sandbox project that RiseUnion co-maintains. The OSS version provides basic vGPU partitioning and shared scheduling — fine for evaluation and small experiments. VAST Enterprise adds: multi-GPU virtualization with compute oversubscription, production-grade HA, enterprise observability and alerting, deep adaptation for 10+ domestic accelerators (Ascend, Cambricon, DCU and more), multi-tenant quota management, deep integration with Rise CAMP scheduling, and vendor SLAs with upgrade paths. Production deployments should run VAST Enterprise.
02 How fine-grained is vGPU partitioning? What's the precision and performance overhead?
Rise VAST supports MB-level memory and 1%-level compute partitioning, with up to 16 vGPU instances per physical GPU. The underlying implementation uses hybrid user-space + kernel-space virtualization with CUDA API interception for strict memory and compute isolation. Performance overhead stays under 5%. Combined with compute oversubscription, overall GPU utilization typically jumps from the industry average of 30% to 70%+.
03 Which GPU vendors are supported? How deep is the domestic chip support?
10+ certified accelerators: NVIDIA (full lineup), Huawei Ascend 910B / Atlas 800 A3, Cambricon MLU, Hygon DCU, Kunlunxin, Metax, Moore Threads, Iluvatar, Enflame, and others. On domestic chips, VAST goes beyond basic scheduling to deliver dynamic partitioning and sharing — capabilities most stock K8s Device Plugins can't match. This is the core differentiator.
04 I'm already using NVIDIA's K8s Device Plugin. Why do I need VAST?
NVIDIA Device Plugin only supports whole-GPU allocation — one Pod owns an entire GPU with no sharing. For inference services, Notebooks, and small training jobs that can't fully saturate a card, this wastes massive amounts of capacity (industry-average utilization sits around 30%). VAST enables multiple Pods to share a single physical GPU with strict memory isolation and compute quotas, pushing utilization to 70%+. VAST is fully compatible with the K8s Device Plugin interface — drop-in replacement.
05 Can I deploy VAST on an existing Kubernetes cluster without rebuilding it?
No cluster rebuild required. VAST deploys as native Kubernetes components via Helm chart, injecting a DaemonSet on GPU nodes that registers a custom Device Plugin. Zero impact on existing workloads — current CPU jobs and whole-GPU jobs keep running, while new vGPU requests automatically route through the VAST virtualization layer. Supports Kubernetes 1.20+ and has been rolled out via in-place upgrades in multiple production clusters.
06 In a multi-tenant environment, how does VAST guarantee fair GPU allocation and isolation?
VAST provides three layers of isolation: physical (node/GPU affinity), virtualization (hard memory and compute isolation between vGPUs), and quota (per-tenant / per-project / per-user resource quotas). Combined with Rise CAMP scheduling, you get priority preemption, fair queuing, and quota overcommit reclamation — preventing any single tenant from monopolizing resources. All scheduling decisions are logged for post-hoc attribution.
07 Does it support fully domestic-stack deployment? Compatible with Kylin / UnionTech OS?
Full domestic-stack support: OS layer compatible with Kylin V10, UnionTech UOS, openEuler, and Anolis OS; CPU layer supports Kunpeng, Hygon, and Phytium; GPU layer covers all the domestic accelerators above; Kubernetes layer compatible with mainstream domestic distributions. Already running in production at MLPS Level 3 environments across multiple finance, government, and defense customers.
08 How does VAST integrate with our existing monitoring, alerting, and CMDB systems?
VAST natively exposes Prometheus metrics (GPU utilization, memory, temperature, per-Pod vGPU usage, etc.) for direct integration with Grafana, Alibaba Cloud ARMS, or in-house observability platforms. Alerting supports webhooks, email, WeCom, and DingTalk, with integration to PagerDuty, OneAlert, and similar. CMDB integration is via RESTful APIs — existing customers have integrated with ServiceNow and custom CMDB systems.