Rise Router: Unified Token Entry for Enterprise LLMs
One Endpoint · Policy-based Routing · Full-chain Governance — Local inference + external channels unified, Token billing & policies in one view
Product Overview
Core Capabilities
Unified OpenAI-compatible Entry
Single OpenAI-compatible endpoint for applications, abstracting protocol differences between local inference and external channels; simultaneous HTTP/HTTPS dual-port serving.
Internal/External Smart Routing
Auto-routes by multi-dimensional signals across domestic LLMs, cloud MaaS, aggregators, and self-hosted open-source OpenAI-compatible gateways such as new-api, one-api, and sub2api; integrates with enterprise data-classification and LLM-guardrail components; virtual ModelName makes path switches transparent, with auto-failover plus multi-key / multi-region load balancing.
Centralized API Key & Policy Custody
Upstream vendor keys and ModelX internal credentials unified under custody, with tenant / project / user multi-dimensional authorization and access policies including key rotation, IP allowlists, and rate limits.
Dual-rail Token Metering (GPU + API)
Local GPU-hours + external Token billing on a single rail, with multi-dimensional attribution by tenant / project / business-line / API key on a unified FinOps dashboard.
Budget Control & Runaway Prevention
Tenant / business-line budget hard-limits trigger auto-downgrade or block; cost anomaly alerts, off-peak scheduling, and rate limits provide layered protection.
Egress Compliance Audit
Full-chain logs cover internal and external traffic — including PII interception, sensitive-word filtering, and multimodal egress audit; works with enterprise data-classification and LLM-guardrail policies to meet finance and state-enterprise compliance requirements.
Rise Router Architecture
Unified LLM Entry · Internal + External Routing · Channel Aggregation · Key Custody · Dual-track Metering · Compliance Audit
Access Layer
Via AI Gateway
Plugin Extensions
·
·
·
·
Routing Targets
Routing Targets
▸ Internal
▸ External
Product Value
One Endpoint Converges All LLM Integration
Application code no longer maintains two API sets, two key sets, and two metering systems for local inference vs external channels. Integration complexity drops from N paths to 1, shipping speed improves, and the governance surface shrinks.
Single-source-of-truth Internal + External Token Billing
Rise ModelX GPU-hour billing and external API Token billing render on the same FinOps dashboard with multi-dimensional attribution by tenant/project/business-line. Finance no longer distinguishes internal from external; budget hard-limits and cost anomaly alerts apply to all LLM traffic.
Elastic Split by Scenario
Steady-state and low-latency-critical workloads run on local Rise ModelX inference; peak or externally-capable workloads route through Router to external managed channels; sensitivity classification and tiering policies can be orchestrated jointly with enterprise data-classification / LLM-guardrail components, and when local capacity saturates Router automatically absorbs overflow, avoiding repeated hardware expansion.
Channels Router Can Route To
| Channel Type | Examples | Best For |
|---|---|---|
| Local Inference (Rise ModelX) | vLLM / SGLang / MindIE | Steady traffic, low-latency guarantees; workloads where data must be inferred on enterprise domestic GPUs |
| Domestic LLMs | DeepSeek / Qwen / XC-LLM / Zhipu / iFlytek / Kimi / MiniMax | Compliance-first default; broad coverage of leading domestic vendors |
| Cloud MaaS | Bailian / Qianfan / TI / MA | Managed model services from domestic cloud vendors with clear commercial / compliance path |
| Aggregators | SiliconFlow / Volcano Ark | Multi-vendor aggregation entry for fast trials and switches |
| Self-hosted Open-source Gateways | new-api / one-api / sub2api | Smooth onboarding for enterprises with existing open-source aggregators, reusing keys and channel configs |
| Foreign Models | OpenAI / Anthropic / Google | For workloads that genuinely require foreign capability |
| Enterprise BYOC | Customer-specified upstream | Vault the customer's existing third-party API keys into Router for unified governance |
Use Cases
Unified Routing by Sensitivity & Scenario
Router integrates with enterprise data-classification and LLM-guardrail components, routing sensitive requests to local Rise ModelX inference while other traffic egresses under cost / capability control; local saturation auto-overflows, failures auto-degrade; full egress audit trails meet regulatory requirements in finance, securities, and other high-compliance scenarios.
Unified Internal + External Token Billing
Local GPU-hours + external Token billing on the same FinOps dashboard, with multi-dimensional attribution by tenant / project / business-line and monthly statements auto-generated — finance no longer distinguishes internal from external.
Unified LLM Access for Agents
Agents / Copilots / RAG call local and external models through Router with full MCP / Function Call / JSON Mode / streaming SSE compatibility; rate limits + budget hard-limits prevent runaway; full multimodal coverage.
Multi-channel Canary & Model Selection
Attach multiple upstream channels to the same workload, shift by traffic percentage, and reconcile quality metrics against Token cost on one dashboard; supports dual-writes during domestic migration transitions, and lets business teams pick models based on real production data.
Deployment Workflow
Register Inference Backends
Local Rise ModelX auto-discovers; external channels upload API keys, set rate limits and health checks.
Configure Routing Policies
Define virtual ModelName and routing rules (sensitivity / cost / capability); canary and failover supported.
Bind Tenants & Budgets
Set Token budget caps, compliance tags, and access policies per tenant/project.
Unified API Calls & Billing
Applications call one endpoint; Router auto-routes, meters, audits, and emits dual-rail statements.