Rise ModelX: Unified Training & Inference AI Platform
AI as a Service: unified training & inference, boosting GPU cluster utilization from 30% to 70%
Product Overview
Core Features
Unified Training & Inference
Training and inference on a single platform, from data processing to inference deployment. Multi-node distributed training with checkpointing, one-click publishing of trained models as inference services, no cross-platform migration.
Model Hub & One-click Deploy
Built-in model marketplace with one-click import from ModelScope, HuggingFace, or local paths. Bind deploy templates and launch inference services with vLLM, vLLM Ascend, SGLang, MindIE engines.
Elastic Scaling & Resource Reclamation
CronScale (scheduled) and HPA (metric-driven) dual scaling. Three-tier GPU reclamation (platform/tenant/project). Idle VRAM auto-reclaimed back to pool.
Playground & OpenWebUI
Built-in text chat (streaming SSE) and image generation Playground. OpenWebUI integration for Web Chat interface. Side-by-side multi-model comparison for rapid evaluation.
Full Model Development Pipeline
Multi-version Dataset Management
Multi-modal datasets (text, image, audio) with version control, online editing (JSON/JSONL/CSV), file preview. Cleaning and augmentation outputs auto-create new versions for reproducibility.
Data Cleaning (Data-Juicer)
Integrated Data-Juicer engine for text cleaning, filtering, deduplication, privacy protection, and format standardization. Three-step workflow: select dataset → configure rules → submit. Results auto-create new dataset version.
Data Augmentation (LLM-powered)
Batch: configure LLM endpoint to augment datasets at scale, auto-generating new versions. Interactive: real-time preview with 1-20x augmentation per sample and custom System Prompt.
Fine-tuning (Multi-framework, Multi-stage)
LlamaFactory, Unsloth, Axolotl frameworks. Full alignment coverage: SFT, DPO, KTO, RM, PPO, GRPO. LoRA / QLoRA / Full fine-tuning methods with visual hyperparameter configuration.
Evaluation (Auto + LLM-as-Judge)
Auto mode: LlamaFactory built-in metrics (MMLU, C-Eval, GSM8K). LLM-as-Judge: strong model scores outputs on accuracy, fluency, safety. Auto-trigger after fine-tuning completion.
Model Comparison & Export
Launch temporary inference after fine-tuning for A/B comparison. One-click LoRA Merge export with auto-archiving, multi-version comparison, and best model tagging.
Product Advantages
AI FinOps Cost Governance
GPU time + Token dual billing. Top-N analysis (by system/model/workspace), timeline aggregation, API-KEY analytics, cost anomaly alerts.
Canary Release & Versioning
Rolling updates, canary releases, one-click rollback. Multi-version parallel serving with gradual traffic shifting for zero-risk upgrades.
Seamless Heterogeneous Adaptation
Masks NVIDIA, Ascend, Hygon chip differences. vLLM Ascend and MindIE native support. Auto-adapt inference engines to heterogeneous hardware.
Multi-tenant Isolation
Workspace / Project two-tier isolation. Models, images, datasets scoped by tenant visibility. Resource quotas allocated per team and project.
Dev Environment
Pre-installed Jupyter, VSCode, CloudShell, SSH environments. Container snapshot save/restore. Native TensorBoard integration.
Out-of-the-Box
Built-in DeepSeek, Qwen, Kimi model images. Playground + OpenWebUI for instant experience. Publish model services in minutes.
Inference Engines & Model Monitoring
Integrated mainstream inference engines with full-spectrum monitoring from model performance to resource consumption, with version management and one-click rollback
Inference Engine Integration
One-click inference deployment with auto-adaptation for NVIDIA / Ascend heterogeneous hardware. Canary releases and multi-version parallel serving.
Model Performance Metrics
Resource Consumption
Real-time tracking of compute utilization, VRAM usage, CPU/memory consumption, and network I/O per model service, down to individual inference instance granularity.
Model Version Management
Complete model version lifecycle: release tracking, upgrade history, resource config comparison, and one-click rollback to any previous version for stable production serving.
AI Gateway
Unified API Standard
OpenAI-compatible API. WasmPlugin multi-engine adaptation, McpBridge DNS auto-registration. HTTP/HTTPS dual-port serving.
Smart Routing & Failover
Virtual ModelName (switch channels transparently), multi-channel load balancing with primary/backup strategies, custom routing by context length/headers/content. Auto-failover with unchanged API parameters.
Security & Key Management
API Key generation and management, multi-key multi-model permissions, admin global control. Token auth, IP lists, PII interception, sensitive word filtering, time-scheduled access policies.
Protocol Translation & Compatibility
OpenAI compatible + voice/image/video non-standard APIs. Function Call, JSON Mode support. Non-standard interface passthrough with field mapping. Native MCP protocol conversion.
Batch Processing & Caching
Batch API with task management, time-scheduled GPU scheduling (off-peak/overnight). Prefill/Prefix cache configuration for throughput optimization.
Full-chain Observability
Per-request full-chain logging (input/output/tokens/latency per node), streaming content and merged view. API link health monitoring with auto-alerting. ai-statistics plugin for usage collection.
AI Gateway Architecture
OpenAI Compatible · MCP Native · Multi-model Routing · HTTP/HTTPS/WebSocket · Stream/Non-stream
Access Layer
Plugin Extensions
Model Layer
Use Cases
Unified Train & Serve
Single platform from training to serving. Auto-trigger evaluation after fine-tuning, LoRA Merge export and deploy directly after confirmation.
AI Service Foundation for Agent Era
AI Gateway MCP protocol conversion and smart routing connecting enterprise ERP, CRM systems. Virtual ModelName for transparent channel switching.
High-Concurrency Inference
Gateway + CronScale + HPA elastic scaling for consumer-facing high-traffic. Multi-channel load balancing with auto-failover for zero downtime.
Vertical Industry Model Customization
SFT instruction tuning or DPO/PPO alignment on industry data. LlamaFactory + Unsloth + Axolotl frameworks. Data-Juicer cleaning + LLM augmentation. LLM-as-Judge evaluation.
Frequently Asked Questions
01 How is ModelX AI Gateway different from LiteLLM, Kong AI Gateway, or Higress?
02 How do ModelX, Rise VAST, and Rise CAMP relate?
03 Which inference engines does ModelX support? What about domestic chip compatibility?
04 Which model protocols does the AI Gateway support beyond OpenAI?
05 Is token metering based on real tokenizers or gateway estimates? Can finance trust it?
06 How do I control which teams can reach public model APIs from a private network?
07 How quickly can a fine-tuned model go to production? Does it support canary release?
enterprise-large while backend model swaps remain completely transparent.