2025-02-02
With the rapid advancement of AI and deep learning, the demand for GPU computing resources has surged dramatically. However, traditional GPU usage models struggle to meet modern enterprise requirements for performance, flexibility, and cost efficiency, especially given the diverse GPU types, hardware architectures, and multi-cloud environments. Organizations face two major challenges: maximizing GPU resource utilization and ensuring computational isolation and security between different workloads.
In response to these challenges, HAMi emerged as an open-source GPU virtualization solution, offering an efficient, flexible, and easily deployable approach to GPU resource management. Over the years, HAMi has been adopted by leading enterprises across financial services, energy, and telecommunications sectors, establishing itself as a leading GPU virtualization solution. Building upon the HAMi open-source foundation, Rise VAST introduced an enterprise edition that enhances GPU computing management and scheduling capabilities, including features like computing power and memory over-provisioning, task priority scheduling, resource preemption, and heterogeneous GPU compatibility.
Through continuous development, HAMi has accumulated extensive industry experience, supporting virtualization for various GPU types (including NVIDIA, Ascend, Cambricon, DCU, etc.) while providing robust GPU resource scheduling and management capabilities. Its open-source nature and high customizability have made HAMi a leading solution in GPU virtualization, particularly for enterprises seeking to optimize GPU resource pooling and cross-platform scheduling.
Taking NVIDIA GPUs as an example, GPU virtualization technology can be implemented across three layers from hardware to software: user space, kernel space, and hardware layer.
User space virtualization leverages standard interfaces (like CUDA and OpenGL) to intercept and forward API calls, parsing and redirecting requests to corresponding functions in vendor-provided user-space libraries. This approach enables remote GPU access through network-based remote procedure calls.
Advantages:
Disadvantages:
Kernel space virtualization implements GPU resource management by intercepting kernel-level interfaces (ioctl, mmap, read, write, etc.). This technical approach operates within the operating system's kernel space, making security and stability considerations more complex.
Advantages:
Disadvantages:
In comparing user space and kernel space virtualization, user space virtualization demonstrates several significant advantages, particularly in flexibility, security, and low intrusion:
For enterprises with complex IT infrastructures spanning multiple operating system versions, sophisticated network security policies, and geographically distributed data centers, kernel space GPU virtualization solutions are practically unfeasible because:
Therefore, user space solutions become the only viable choice. Rise VAST, based on HAMi, provides enterprise-grade GPU resource management capabilities, supporting various heterogeneous GPUs while ensuring system security, reducing operational costs, and improving GPU resource utilization.
Recently, Remote GPU solutions have gained attention, allowing CPU servers to access GPU resources on remote servers, seemingly addressing resource fragmentation issues. However, in modern AI applications (especially mixed training and inference of large and small models), remote GPU resource access is practically unusable for several reasons:
Therefore, while remote GPU access may seem attractive in certain scenarios, it faces significant performance bottlenecks and resource scheduling challenges in practice, particularly in modern AI applications. Enterprises prefer local GPU resource pooling solutions like Rise VAST to improve GPU compute resource utilization while ensuring efficient and stable operation.
In summary, user space virtualization demonstrates clear advantages in flexibility, security, and cross-platform support. Rise VAST, based on HAMi technology, leverages these advantages to provide efficient, reliable, and cross-platform GPU resource management and intelligent scheduling capabilities, helping enterprise clients optimize GPU resource utilization across diverse hardware environments. While kernel space virtualization offers some flexibility, its high intrusion and limitations make it challenging to implement in complex production environments. Additionally, while remote GPU access might seem ideal for solving GPU resource distribution challenges, it's essentially impractical in modern AI applications where large and small models coexist.
RiseUnion's Rise VAST AI Computing Power Management Platform(HAMi Enterprise Edition) enables automated resource management and workload scheduling for distributed training infrastructure. Through this platform, users can automatically execute the required number of deep learning experiments in multi-GPU environments.
Advantages of using Rise VAST AI Platform:
RiseUnion's platform simplifies AI infrastructure processes, helping enterprises improve productivity and model quality.
To learn more about RiseUnion's GPU pooling, virtualization and computing power management solutions, please contact us: contact@riseunion.io