Skip to main content

AI Gateway

Unified API, Smart Routing, Protocol Bridging — a single entry point for enterprise model serving

Product Overview

Enterprise-grade AI Gateway built on Higress, providing a unified OpenAI-compatible API entry point for all inference services. Multi-engine adaptation via WasmPlugin, automatic endpoint registration through McpBridge DNS, smart routing with virtual ModelName, and multi-channel failover — delivering secure, observable, and metered model service operations.

Core Capabilities

OpenAI-Compatible Unified API

WasmPlugin adapts multiple engines — vLLM, SGLang, MindIE — into a single OpenAI-compatible interface. McpBridge DNS auto-registers inference endpoints. HTTP/HTTPS dual-port serving. Callers never need to know the underlying engine.

Smart Routing & Failover

Virtual ModelName makes channel switching completely transparent to consumers. Multi-channel load balancing with proportional and primary-backup strategies. Custom routing rules by context length, request headers, or request content. Auto-failover to backup channels with unchanged API parameters.

MCP Protocol Bridge

McpBridge natively supports MCP (Model Context Protocol) conversion with DNS auto-registration of inference endpoints. Connects enterprise ERP, CRM, and business systems, providing standardized tool-call interfaces for the Agent era.

Token-Level Metering & Billing

The ai-statistics plugin meters prompt/completion tokens at fine granularity across tenant, project, and API Key dimensions. Top-N usage analysis (by system/model/workspace), timeline aggregation, and cost anomaly alerts for precise AI FinOps governance.

Full-Chain Observability

Per-request full-chain logging — input/output per node, token counts, latency — with streaming content replay and merged views. Real-time API health monitoring with auto-alerting. Prometheus metric collection and Grafana dashboard integration.

Security & Access Control

API Key generation and management with multi-key multi-model permission binding and admin global control. Token auth, IP allowlists/blocklists, PII interception, sensitive word filtering, and time-scheduled rate limiting for secure multi-tenant isolation.

Routing Strategies

Strategy Trigger Use Case
Virtual ModelName Request specifies a virtual model name Transparent channel switching, A/B testing
Context-Length Routing Request token count exceeds threshold Auto-dispatch long texts to large-context models
Header-Based Routing Custom header field match Multi-tenant traffic isolation, canary releases
Content-Based Routing Request body keyword/field match Route by business scenario to specialized models
Primary-Backup Primary channel failure or timeout High-availability disaster recovery, auto-failover
Proportional Weighted percentage distribution Multi-channel load balancing, gradual traffic shifting

Onboarding Flow

1

Register Model Endpoints

McpBridge DNS auto-discovers inference services, or manually register external model API endpoints

2

Configure Routing Rules

Set up virtual ModelName, load ratios, primary-backup strategies, and custom routing conditions

3

Serve Unified API

Consumers call the unified OpenAI-compatible API; the gateway auto-routes to the target engine

4

Monitor & Bill

Full-chain logging and token metering collected in real time; usage dashboards and cost alerts generated automatically

Back to Rise ModelX