2025-01-26
In the AI cloud-native era, the demand for computational resources has surged with the widespread adoption of large models. Efficient management and utilization of diverse computational resources have become pressing issues. Currently, the processes of model fine-tuning, inference, and AI application development align closely with cloud-native characteristics, prompting more enterprises to deploy computational tasks on Kubernetes (K8s) platforms. For instance, OpenAI mentioned in its official blog that ChatGPT's model training leverages cloud-native technologies, with K8s clusters expanded to 7,500 nodes, providing scalable infrastructure for models like GPT-3 and DALL·E, while also supporting rapid iteration research for smaller models.
However, the diversity of computing devices and significant differences in computational power domestically lead to complex and varied computational environments. Thus, efficiently managing and utilizing these heterogeneous computational resources on K8s presents a significant challenge. Currently, AI application deployment scenarios are mainly divided into three categories:
In scenarios involving small AI models, GPU utilization is low:
With K8s starting experimental support for NVIDIA's GPU resource scheduling in v1.6 and extending support to AMD GPUs in v1.9, some components on the market have implemented device plugins since v1.8. Each vendor has developed its own device plugin to enable their GPUs to be scheduled on K8s. For example, the K8s official GPU scheduling section provides examples of plugins from AMD, Intel, and NVIDIA. Despite the availability of numerous computational resource scheduling solutions, differences among vendors lead to separate maintenance of each solution. Officially supported device plugins often lack features like GPU resource isolation and sharing, resulting in inefficient GPU resource allocation and waste.
To address these issues, third-party vendors have developed various GPU resource scheduling solutions. In the public cloud, vendors have launched different vGPU resource scheduling solutions, such as Alibaba Cloud's cGPU and Tencent Cloud's qGPU. However, these solutions are often locked to the vendors' platforms and are not open-source, imposing numerous restrictions on users' applications, particularly for state-owned enterprises, financial, energy, and education sectors with strong privatization deployment needs.
To meet the urgent needs for resource sharing, isolation, and avoiding vendor lock-in, the heterogeneous AI computing virtualization middleware HAMi emerged. HAMi meets most scenario requirements and adapts to various computational resources, providing strong support for localization scenarios. Currently, HAMi has been included in the CNCF cloud-native landscape.
HAMi is a cloud-native K8s heterogeneous AI computing virtualization middleware, compatible with NVIDIA's device plugin keywords and K8s scheduler, supporting various computing devices. By integrating different vendors' docker-runtime and Device Plugin, HAMi manages them at a higher level, smoothing out scheduling differences across devices for unified scheduling. Additionally, HAMi's self-developed HAMi Core enables fine-grained GPU partitioning.
Recently, HAMi has made significant progress in functionality, and RiseUnion and 4Paradigm have officially signed a strategic partnership agreement, jointly launching the enterprise-level AI computing pooling platform: Rise VAST (Virtualized AI Computing Scalability Technology, HAMi Enterprise Edition), further enhancing the management capabilities of heterogeneous computational resources.
By unifying compute cluster management, resource sharing, on-demand allocation, and rapid scheduling, Rise VAST fully unleashes the potential of heterogeneous compute resources, accelerating the modernization and intelligent transformation of AI infrastructure.
The current computational environment is dominated by NVIDIA's GPUs, but devices from other vendors are also gaining traction. Although major vendors provide Kubernetes scheduling support, these solutions often lack fine-grained scheduling capabilities, leading to suboptimal resource utilization. HAMi integrates various vendor open-source solutions, offering more refined resource sharing and isolation capabilities, supporting unified management and scheduling of diverse computational resources.
To learn more about RiseUnion's GPU virtualization and computing power management solutions, contact@riseunion.io