Skip to main content

Rise ModelX: Unified Training & Inference AI Platform

AI as a Service: unified training & inference, boosting GPU cluster utilization from 30% to 70%

Product Overview

Rise ModelX is for AI application teams who don't want to worry about GPU scheduling. They just want to train a model, deploy it, and pay by usage. Built on Rise VAST heterogeneous compute management and vGPU slicing, with Rise CAMP intelligent scheduling, it provides full-lifecycle MaaS from data processing, fine-tuning to inference serving, with a Higress-based AI Gateway (routing, rate limiting, MCP) and dual-dimension metering (resource + Token) to make AI compute operable like a cloud service.
Data Eng
Multi-ver Datasets
Data-Juicer Clean
LLM Augment
Train & Tune
LlamaFactory / Unsloth
SFT·DPO·PPO·GRPO
LoRA / QLoRA / Full
Compare & Export
Eval & Hub
Auto + LLM-as-Judge
ModelScope/HF Import
Quantize INT8/INT4
Version & Experiment
AI Gateway
Higress Smart Routing
Virtual ModelName
Protocol / MCP
Batch & Cache
Inference
vLLM / SGLang / MindIE
CronScale + HPA
Playground / WebUI
GPU Reclamation
FinOps
GPU Time Billing Token Metering Top-N Analytics Cost Alerts
Rise CAMP Scheduling Rise VAST vGPU Engine Heterogeneous Chip Adaptation

Core Features

Unified Training & Inference

Training and inference on a single platform, from data processing to inference deployment. Multi-node distributed training with checkpointing, one-click publishing of trained models as inference services, no cross-platform migration.

Model Hub & One-click Deploy

Built-in model marketplace with one-click import from ModelScope, HuggingFace, or local paths. Bind deploy templates and launch inference services with vLLM, vLLM Ascend, SGLang, MindIE engines.

Elastic Scaling & Resource Reclamation

CronScale (scheduled) and HPA (metric-driven) dual scaling. Three-tier GPU reclamation (platform/tenant/project). Idle VRAM auto-reclaimed back to pool.

Playground & OpenWebUI

Built-in text chat (streaming SSE) and image generation Playground. OpenWebUI integration for Web Chat interface. Side-by-side multi-model comparison for rapid evaluation.

Full Model Development Pipeline

Multi-version Dataset Management

Multi-modal datasets (text, image, audio) with version control, online editing (JSON/JSONL/CSV), file preview. Cleaning and augmentation outputs auto-create new versions for reproducibility.

Data Cleaning (Data-Juicer)

Integrated Data-Juicer engine for text cleaning, filtering, deduplication, privacy protection, and format standardization. Three-step workflow: select dataset → configure rules → submit. Results auto-create new dataset version.

Data Augmentation (LLM-powered)

Batch: configure LLM endpoint to augment datasets at scale, auto-generating new versions. Interactive: real-time preview with 1-20x augmentation per sample and custom System Prompt.

Fine-tuning (Multi-framework, Multi-stage)

LlamaFactory, Unsloth, Axolotl frameworks. Full alignment coverage: SFT, DPO, KTO, RM, PPO, GRPO. LoRA / QLoRA / Full fine-tuning methods with visual hyperparameter configuration.

Evaluation (Auto + LLM-as-Judge)

Auto mode: LlamaFactory built-in metrics (MMLU, C-Eval, GSM8K). LLM-as-Judge: strong model scores outputs on accuracy, fluency, safety. Auto-trigger after fine-tuning completion.

Model Comparison & Export

Launch temporary inference after fine-tuning for A/B comparison. One-click LoRA Merge export with auto-archiving, multi-version comparison, and best model tagging.

Product Advantages

AI FinOps Cost Governance

GPU time + Token dual billing. Top-N analysis (by system/model/workspace), timeline aggregation, API-KEY analytics, cost anomaly alerts.

Canary Release & Versioning

Rolling updates, canary releases, one-click rollback. Multi-version parallel serving with gradual traffic shifting for zero-risk upgrades.

Seamless Heterogeneous Adaptation

Masks NVIDIA, Ascend, Hygon chip differences. vLLM Ascend and MindIE native support. Auto-adapt inference engines to heterogeneous hardware.

Multi-tenant Isolation

Workspace / Project two-tier isolation. Models, images, datasets scoped by tenant visibility. Resource quotas allocated per team and project.

Dev Environment

Pre-installed Jupyter, VSCode, CloudShell, SSH environments. Container snapshot save/restore. Native TensorBoard integration.

Out-of-the-Box

Built-in DeepSeek, Qwen, Kimi model images. Playground + OpenWebUI for instant experience. Publish model services in minutes.

Inference Engines & Model Monitoring

Integrated mainstream inference engines with full-spectrum monitoring from model performance to resource consumption, with version management and one-click rollback

Inference Engine Integration

vLLM SGLang MindIE

One-click inference deployment with auto-adaptation for NVIDIA / Ascend heterogeneous hardware. Canary releases and multi-version parallel serving.

Model Performance Metrics

E2E Latency ms
Time to First Token (TTFT) ms
Input/Output Token Throughput tokens/s
Total Request/Response Tokens count

Resource Consumption

Real-time tracking of compute utilization, VRAM usage, CPU/memory consumption, and network I/O per model service, down to individual inference instance granularity.

Model Version Management

Complete model version lifecycle: release tracking, upgrade history, resource config comparison, and one-click rollback to any previous version for stable production serving.

AI Gateway

Unified API Standard

OpenAI-compatible API. WasmPlugin multi-engine adaptation, McpBridge DNS auto-registration. HTTP/HTTPS dual-port serving.

Smart Routing & Failover

Virtual ModelName (switch channels transparently), multi-channel load balancing with primary/backup strategies, custom routing by context length/headers/content. Auto-failover with unchanged API parameters.

Security & Key Management

API Key generation and management, multi-key multi-model permissions, admin global control. Token auth, IP lists, PII interception, sensitive word filtering, time-scheduled access policies.

Protocol Translation & Compatibility

OpenAI compatible + voice/image/video non-standard APIs. Function Call, JSON Mode support. Non-standard interface passthrough with field mapping. Native MCP protocol conversion.

Batch Processing & Caching

Batch API with task management, time-scheduled GPU scheduling (off-peak/overnight). Prefill/Prefix cache configuration for throughput optimization.

Full-chain Observability

Per-request full-chain logging (input/output/tokens/latency per node), streaming content and merged view. API link health monitoring with auto-alerting. ai-statistics plugin for usage collection.

AI Gateway Architecture

OpenAI Compatible · MCP Native · Multi-model Routing · HTTP/HTTPS/WebSocket · Stream/Non-stream

Access Layer

AI Agents
Copilot · Agent
RAG Systems
Retrieval Augmented
Applications
API / SDK / MCP
Batch Jobs
Off-peak Scheduling
Smart Routing
Custom routing by context length/headers/content · Multi-channel load/failover/degradation · Auto-switch with zero business impact
⟨⟩
Protocol Translation
OpenAI compatible · Voice/image/video non-standard APIs · Function Call / JSON Mode · Passthrough & field mapping
📊
Observability
Full-chain request logging · Streaming content & merged view · API link monitoring · Prefill/Prefix cache optimization
Virtual ModelName
Switch channels seamlessly
Multi-channel
Local + Cloud combined
Key Mgmt
Multi-key · Auth · Quota
Access Policy
Schedule · Rate limit
HTTP/HTTPS
Dual-port serving
WebSocket
Real-time bidirectional
Batch
Task mgmt · Off-peak
Cache
Prefill · Prefix

Plugin Extensions

API Key Auth
Auth · Global ctrl
Rate Limiting
Circuit breaker
Auto Failover
Switch · Alert
Security
PII · Filter
Custom
Extensions

Model Layer

DeepSeek
V3 / R1
Qwen
Tongyi · Bailian
Kimi
Moonshot · Long ctx
MiniMax
Voice · Multimodal
Cloud Vendors
Volcano · SiliconFlow · Zhipu
Custom
On-premise models

Use Cases

Unified Train & Serve

Single platform from training to serving. Auto-trigger evaluation after fine-tuning, LoRA Merge export and deploy directly after confirmation.

AI Service Foundation for Agent Era

AI Gateway MCP protocol conversion and smart routing connecting enterprise ERP, CRM systems. Virtual ModelName for transparent channel switching.

High-Concurrency Inference

Gateway + CronScale + HPA elastic scaling for consumer-facing high-traffic. Multi-channel load balancing with auto-failover for zero downtime.

Vertical Industry Model Customization

SFT instruction tuning or DPO/PPO alignment on industry data. LlamaFactory + Unsloth + Axolotl frameworks. Data-Juicer cleaning + LLM augmentation. LLM-as-Judge evaluation.

Frequently Asked Questions

01 How is ModelX AI Gateway different from LiteLLM, Kong AI Gateway, or Higress?
ModelX AI Gateway is built on Higress, but the positioning is fundamentally different. LiteLLM and Kong AI Gateway are standalone gateways that solve protocol adaptation and routing only. ModelX AI Gateway integrates deeply with Rise VAST heterogeneous compute pooling and Rise CAMP intelligent scheduling, governing both token traffic and GPU compute from a single control plane. It delivers dual-axis FinOps (GPU-hours + tokens), per-team cost attribution, and budget circuit breakers — a complete "compute-as-a-service + tokens-as-governance" platform, not just an API gateway.
02 How do ModelX, Rise VAST, and Rise CAMP relate?
The three-layer architecture — VAST (resource management) + CAMP (scheduling) + ModelX (model serving) — stacks bottom-up: VAST handles compute pooling and virtualization, CAMP handles workload scheduling, ModelX handles model serving and token governance.
03 Which inference engines does ModelX support? What about domestic chip compatibility?
Native integration with vLLM, vLLM Ascend, SGLang, and MindIE, with WasmPlugin-based extensibility for additional engines. Chip-level compatibility certified across 10+ domestic accelerators including Huawei Ascend 910B, Cambricon, Hygon DCU, Kunlunxin, Metax, Moore Threads, and Iluvatar. Supports mixed NVIDIA + domestic scheduling, with applications consuming a unified OpenAI-compatible API regardless of underlying hardware.
04 Which model protocols does the AI Gateway support beyond OpenAI?
OpenAI-compatible Chat / Completions / Embeddings APIs are first-class, with native Function Call, JSON Mode, and streaming SSE. The gateway also provides passthrough and field mapping for non-standard voice, image, and video APIs, and ships native MCP (Model Context Protocol) translation — letting Agent frameworks invoke internal ERP, CRM, and other enterprise systems without custom glue code.
05 Is token metering based on real tokenizers or gateway estimates? Can finance trust it?
ModelX uses the actual model tokenizer for every request, accounting for input, output, and cache-hit portions separately. Numbers reconcile cleanly against vendor invoices. The gateway provides dual GPU-hour + token billing: self-hosted models meter by GPU time, public APIs meter by tokens, with both unified in a single FinOps dashboard. Multi-dimensional attribution by team, project, and API key feeds directly into enterprise finance systems.
06 How do I control which teams can reach public model APIs from a private network?
The AI Gateway includes built-in ACL: every API key has an allowlist of model channels. Sensitive workloads (core trading, risk) can be restricted to private models only, while data platform and batch jobs can be granted controlled egress. All cross-zone requests are logged through the gateway audit trail, satisfying financial-grade network isolation and compliance requirements.
07 How quickly can a fine-tuned model go to production? Does it support canary release?
ModelX is a unified training and inference platform — fine-tuning jobs export via one-click LoRA merge and publish directly as inference services without cross-platform migration. Inference services support rolling updates, canary releases, and one-click rollback, with multiple versions running in parallel and traffic shifted by percentage. Combined with the gateway's virtual model name feature, applications call something like enterprise-large while backend model swaps remain completely transparent.
08 Does it support fully air-gapped deployment? Any phone-home licensing?
Fully supports air-gapped private deployment with all components running inside the customer perimeter and no external license validation — meeting strict compliance requirements for finance, government, and defense. Available as bare-metal, VM, or Kubernetes deployments, compatible with Kylin, UnionTech, and other domestic operating systems, and validated against MLPS Level 3 in multiple financial production environments.