Skip to main content

Rise Router: Unified Token Entry for Enterprise LLMs

One Endpoint · Policy-based Routing · Full-chain Governance — Local inference + external channels unified, Token billing & policies in one view

Product Overview

Rise Router is the unified entry for all enterprise LLM Token traffic: applications call a single OpenAI-compatible endpoint, and Router internally routes requests by policy to Rise ModelX (local inference running on Rise VAST + Rise CAMP-managed domestic GPU clusters) or to external channels (domestic LLMs, cloud MaaS, aggregators, foreign models, BYOC). Channel aggregation, key custody, unified internal + external Token billing, budget control, and compliance audit all happen on one governance plane.

Core Capabilities

Unified OpenAI-compatible Entry

Single OpenAI-compatible endpoint for applications, abstracting protocol differences between local inference and external channels; simultaneous HTTP/HTTPS dual-port serving.

Internal/External Smart Routing

Auto-routes by multi-dimensional signals across domestic LLMs, cloud MaaS, aggregators, and self-hosted open-source OpenAI-compatible gateways such as new-api, one-api, and sub2api; integrates with enterprise data-classification and LLM-guardrail components; virtual ModelName makes path switches transparent, with auto-failover plus multi-key / multi-region load balancing.

Centralized API Key & Policy Custody

Upstream vendor keys and ModelX internal credentials unified under custody, with tenant / project / user multi-dimensional authorization and access policies including key rotation, IP allowlists, and rate limits.

Dual-rail Token Metering (GPU + API)

Local GPU-hours + external Token billing on a single rail, with multi-dimensional attribution by tenant / project / business-line / API key on a unified FinOps dashboard.

Budget Control & Runaway Prevention

Tenant / business-line budget hard-limits trigger auto-downgrade or block; cost anomaly alerts, off-peak scheduling, and rate limits provide layered protection.

Egress Compliance Audit

Full-chain logs cover internal and external traffic — including PII interception, sensitive-word filtering, and multimodal egress audit; works with enterprise data-classification and LLM-guardrail policies to meet finance and state-enterprise compliance requirements.

Rise Router Architecture

Unified LLM Entry · Internal + External Routing · Channel Aggregation · Key Custody · Dual-track Metering · Compliance Audit

Access Layer

Via AI Gateway

AI Agents
Copilot · Autonomous Agent
RAG Systems
Retrieval Augmented
Applications
API / SDK / MCP
Batch Jobs
Off-peak Scheduling
Unified Entry
OpenAI-compatible endpoint · Multi-engine/multi-channel protocol adaptation · One entry for local ModelX + external managed channels · Auto endpoint registration via McpBridge
¥
Smart Routing
Multi-dimensional auto-routing, integrating with enterprise data-classification and LLM-guardrail components; virtual ModelName for transparent switching, with auto-failover
Governance
Dual-track GPU + Token metering · Multi-dimensional attribution · Budget hard limits · Compliance audit · PII + sensitive-word filtering · Centralized key + credential custody
Virtual ModelName
Seamless switch
Internal+External
Local first, external on demand
Key + Cred Custody
Upstream + internal
Price Routing
Cost-aware
Multimodal
Text·Img·Audio·Video
Attribution
Project/Business
Budget
Hard-limit
Rate Limit
Access Policy

Plugin Extensions

Health Check
Timeout control
Auto Failover
Degrade · Switch
Rate Limiting
Circuit · QPS/RPM
Security
PII · Filter
Custom Plugin
Extensions

Routing Targets

Routing Targets

▸ Internal

Rise ModelX
vLLM / SGLang / MindIE

▸ External

Domestic LLMs
XC-LLM / Zhipu / iFlytek / Kimi
Cloud MaaS
Bailian / Qianfan / TI / MA
Aggregators
SiliconFlow / Volcano
Foreign Models
OpenAI / Anthropic / Google
Custom
Customer-specified upstream

Product Value

One Endpoint Converges All LLM Integration

Application code no longer maintains two API sets, two key sets, and two metering systems for local inference vs external channels. Integration complexity drops from N paths to 1, shipping speed improves, and the governance surface shrinks.

Single-source-of-truth Internal + External Token Billing

Rise ModelX GPU-hour billing and external API Token billing render on the same FinOps dashboard with multi-dimensional attribution by tenant/project/business-line. Finance no longer distinguishes internal from external; budget hard-limits and cost anomaly alerts apply to all LLM traffic.

Elastic Split by Scenario

Steady-state and low-latency-critical workloads run on local Rise ModelX inference; peak or externally-capable workloads route through Router to external managed channels; sensitivity classification and tiering policies can be orchestrated jointly with enterprise data-classification / LLM-guardrail components, and when local capacity saturates Router automatically absorbs overflow, avoiding repeated hardware expansion.

Channels Router Can Route To

Channel Type Examples Best For
Local Inference (Rise ModelX) vLLM / SGLang / MindIE Steady traffic, low-latency guarantees; workloads where data must be inferred on enterprise domestic GPUs
Domestic LLMs DeepSeek / Qwen / XC-LLM / Zhipu / iFlytek / Kimi / MiniMax Compliance-first default; broad coverage of leading domestic vendors
Cloud MaaS Bailian / Qianfan / TI / MA Managed model services from domestic cloud vendors with clear commercial / compliance path
Aggregators SiliconFlow / Volcano Ark Multi-vendor aggregation entry for fast trials and switches
Self-hosted Open-source Gateways new-api / one-api / sub2api Smooth onboarding for enterprises with existing open-source aggregators, reusing keys and channel configs
Foreign Models OpenAI / Anthropic / Google For workloads that genuinely require foreign capability
Enterprise BYOC Customer-specified upstream Vault the customer's existing third-party API keys into Router for unified governance

Use Cases

Unified Routing by Sensitivity & Scenario

Router integrates with enterprise data-classification and LLM-guardrail components, routing sensitive requests to local Rise ModelX inference while other traffic egresses under cost / capability control; local saturation auto-overflows, failures auto-degrade; full egress audit trails meet regulatory requirements in finance, securities, and other high-compliance scenarios.

Unified Internal + External Token Billing

Local GPU-hours + external Token billing on the same FinOps dashboard, with multi-dimensional attribution by tenant / project / business-line and monthly statements auto-generated — finance no longer distinguishes internal from external.

Unified LLM Access for Agents

Agents / Copilots / RAG call local and external models through Router with full MCP / Function Call / JSON Mode / streaming SSE compatibility; rate limits + budget hard-limits prevent runaway; full multimodal coverage.

Multi-channel Canary & Model Selection

Attach multiple upstream channels to the same workload, shift by traffic percentage, and reconcile quality metrics against Token cost on one dashboard; supports dual-writes during domestic migration transitions, and lets business teams pick models based on real production data.

Deployment Workflow

1

Register Inference Backends

Local Rise ModelX auto-discovers; external channels upload API keys, set rate limits and health checks.

2

Configure Routing Policies

Define virtual ModelName and routing rules (sensitivity / cost / capability); canary and failover supported.

3

Bind Tenants & Budgets

Set Token budget caps, compliance tags, and access policies per tenant/project.

4

Unified API Calls & Billing

Applications call one endpoint; Router auto-routes, meters, audits, and emits dual-rail statements.

Frequently Asked Questions

01 How does Rise Router relate to Rise ModelX? Why aren't they merged into a single product?
Rise Router is the unified entry and governance plane for all enterprise LLM Token traffic; Rise ModelX is one of Router's inference backends (internal compute · training-inference integrated). Applications integrate only against Router's OpenAI-compatible endpoint; internally, Router dispatches requests to local ModelX or to external managed channels by policy. The split exists because ModelX manages compute (GPU pooling, scheduling, training, inference engines) and serves AI application teams and compute operations; Router manages traffic (unified entry, routing policies, key custody, dual-rail Token billing, compliance audit) and serves business integrators, finance, and compliance. Different concerns, different evolution cadences — shipped as two parallel standalone products but delivered together in customer deployments.
02 How do I guarantee sensitive data never leaves for external channels?
Through tenant / API-key / business-tag access control. Each tenant or API key can be bound to an allowlisted channel set: sensitive workloads such as core trading and risk are restricted to local Rise ModelX (domestic GPUs), while non-sensitive workloads get controlled egress. Every outbound request is forced through the Router audit log, combined with PII detection, sensitive-word filtering, schedules, and rate limits. Router also supports content-aware routing (customizable per customer) as an extra defense layer for sensitive data. Router's role is to govern traffic that must go out; sensitive workloads should stay on the local compute stack.
03 How is Router different from open-source proxy gateways like LiteLLM, OneAPI, or Portkey?
LiteLLM / OneAPI / Portkey are protocol proxy layers solving multi-upstream adaptation. Rise Router layers enterprise-grade Token governance on top: per tenant / project / business-line Token billing and budget hard-limits, multi-channel cost comparison and smart routing, centralized API key custody with rotation and revocation, full egress audit, PII / sensitive-word interception. The real differentiator isn't the gateway core (open-source implementations are similar) — it's that Rise Router governs internal enterprise GPU compute (via Rise ModelX) and external channels together, on a single governance plane with GPU-hour + Token dual-rail billing, scenario-based internal/external split, and unified audit. Xinference / vLLM / SGLang are inference engines — local backends managed by ModelX, not peer products to Router.
04 Which channels can Router route to?
Six channel types: local inference (vLLM / SGLang / MindIE instances managed by Rise ModelX), domestic LLMs (XC-LLM / Zhipu / iFlytek / Kimi), cloud MaaS (Bailian / Qianfan / TI / MA), aggregators (SiliconFlow / Volcano Ark), foreign models (OpenAI / Anthropic / Google — on-demand for workloads that genuinely need them), and enterprise BYOC (customer-specified upstream). Broad channel coverage, all accessed through the same OpenAI-compatible interface — switching paths requires no application code changes.
05 What's the security model for API key custody?
Upstream vendor API keys and ModelX internal credentials are vaulted on the Router governance plane — application code no longer holds any LLM credential, eliminating key sprawl and leak risk. Supports key rotation (periodic upstream credential refresh transparent to applications), key revocation (individual key instant invalidation), per-tenant/project authorization (teams see only their authorized channels and budgets), and access policies (schedules / IP allowlists / rate limits). Full egress audit logs meet finance and state-enterprise compliance requirements.
06 How accurate is Token metering? Can it plug into finance systems?
Router uses the actual upstream tokenizer to precisely meter prompt / completion tokens per request — not gateway estimates. Multi-dimensional attribution (tenant / project / business-line / API key), time-series aggregation, top-N usage, automatic monthly statements. The same FinOps dashboard renders local GPU-hours + external Tokens in a combined view (a core capability of Router as the unified entry), feeding enterprise finance systems directly by department.
07 What happens when a budget is exceeded? Will it break production?
Budget behavior is configurable: per-tenant or business-line hard-limits with an action choice on threshold — auto-downgrade (reroute to cheaper channels), alert without blocking (warn but let traffic through), or hard-block (reject new requests). Beyond budgets: cost anomaly alerts (usage spikes), schedules (off-peak / night dispatch), and rate limits — layered protection against a single tenant or runaway Agent draining the budget.
08 Does it support Agent / Function Call / multimodal scenarios?
Full OpenAI-compatible Chat / Completions / Embeddings / Function Call / JSON Mode / streaming SSE support — Agent frameworks can swap upstream models with zero code changes. Multimodal models are covered too: text, image, audio, and video models all flow through Router's unified entry, with Agents directly invoking image understanding, speech recognition, video generation, and more — all subject to the same Token metering, budget control, and compliance audit. Virtual ModelName lets Router auto-route across channels by policy (cost / latency / availability); combined with rate limits and budget hard-limits, autonomous Agents can't run away in external channels — Agent Token consumption rates far exceed human-triggered traffic, making budget control essential for external channel governance.