Rise CAMP: AI Compute Scheduling Platform
Fine-grained Slicing · Precise Scheduling: 4-way intelligent scheduling, making every byte of VRAM count
Platform Overview
GPU cluster utilization boost
Intelligent scheduling strategies
Domestic chip vendors supported
GPUs under management
Fine-grained Slicing
vGPU Fine-grained Slicing
Fine-grained compute and VRAM partitioning with multiple tasks sharing a single physical GPU. Supports compute and VRAM overcommit for 200+ small models loaded on-demand, boosting GPU utilization from 30% to 70%+.
Domestic Chip Dynamic Partitioning
Breaks through Ascend (fixed 1/2, 1/4 card) and KunlunXin (fixed 24/48/96GB) vendor limitations. Intelligent dynamic allocation without restart, upgrading from manual complex configuration to one-click deployment.
VRAM Isolation & Alignment
Strict VRAM boundary checks preventing out-of-bounds access that causes performance degradation or crashes. Auto-alignment to valid specs, inter-container isolation, and real-time VRAM monitoring.
Multi-Model Co-location
Agent-era multi-model deployment: precisely partition a 7B router, 14B summarizer, and 8B embedding model onto one 80G GPU (20G + 30G + 30G) with hard isolation.
Speculative Decoding Foundation
Deploy 7B draft model and 72B target model on specific vGPU slices of the same physical node, leveraging shared memory for ultra-fast data exchange without wasting full GPU resources.
K8s Standardization
Reduces GPU resources to standard K8s countable resources like CPU and memory, enabling Volcano and advanced schedulers for complex Bin-packing where every byte of VRAM is precisely utilized.
Precise Scheduling
Four scheduling strategies optimize compute allocation across priority, topology, load, and resource dimensions — flexibly combined per scenario for maximum utilization
Priority-Aware Scheduling
Distinguishes online inference (high priority) from offline training (low priority) via tidal co-location. Guarantees online services during peak, auto-fills with batch training off-peak — one cluster does the work of three.
Read MoreTopology-Aware Scheduling
Deep understanding of NVLink / HCCS / XPU Link interconnects. Floyd shortest-path algorithm calculates precise communication costs, scheduling multi-GPU training to physically adjacent cards — thousand-GPU clusters deliver thousand-GPU performance.
Read MoreLoad-Aware Scheduling
Dual-dimension Binpack (dense packing) and Spread (even distribution) strategy combinations at node and GPU level. Spread for inference load balancing and HA, Binpack for multi-GPU training maximum utilization — solving memory fragmentation.
Read MoreResource-Aware Scheduling
Distinguishes allocation ratio from actual usage ratio, breaking the "allocation = utilization" illusion. Multi-dimensional resource awareness and VRAM over-subscription schedules by real usage — boosting dev environment utilization 3-5x.
Read MoreDeveloper Productivity
Ready-to-use Dev Environment
Pre-installed PyTorch/TensorFlow/Paddle on Jupyter and VSC with SSH access and native TensorBoard integration. Instant environment startup, no more tedious configuration.
Distributed Training
One-click multi-node multi-GPU distributed training with PyTorch, TensorFlow, MPI, and DeepSpeed. Built-in TensorBoard for visual training progress tracking.
Multi-tenant Resource Isolation
Four-tier RBAC (platform admin, tenant owner, project admin, project member) with team and project-based resource quotas. Flexible shared and dedicated pool combinations.
Checkpointing & Auto-recovery
Automatic checkpoint saving for training jobs. Faulty nodes auto-isolated and workloads rescheduled, minimizing training time lost to hardware failures.
Image Registry & Storage
Built-in image registry with base images, custom images, and external registry support. Public and custom storage configuration with data persistence guarantees.
Multi-cluster Management
Unified management across geographies and architectures (x86/ARM) with multiple K8s clusters. LAN-based inter-cluster coordination with edge node vGPU support.
Use Cases
Heterogeneous GPU Resource Pool
Unified management of NVIDIA H20, Ascend 910B, KunlunXin P800 multi-architecture clusters. A state-owned bank built a heterogeneous pool with CAMP, managing 600+ servers with 50%+ utilization improvement.
Inference & Agent Co-location
Mix online inference and offline training on the same cluster via vGPU slicing and priority scheduling. A telecom provider runs 500+ model services across 100+ servers at 70%+ GPU utilization.
Multi-tenant AI Dev Platform
Unified development environments and compute resources for multiple R&D teams. A financial institution deploys risk, marketing, and customer service AI applications supporting hundreds of stable model services.
Cross-region Multi-cluster Scheduling
Unified management across multiple data centers (e.g., Beijing, Inner Mongolia) with 100G interconnect. A manufacturing enterprise achieved 60%+ utilization improvement through unified local and remote GPU management.