AI Gateway
Unified API, Smart Routing, Protocol Bridging — a single entry point for enterprise model serving
Product Overview
Core Capabilities
OpenAI-Compatible Unified API
WasmPlugin adapts multiple engines — vLLM, SGLang, MindIE — into a single OpenAI-compatible interface. McpBridge DNS auto-registers inference endpoints. HTTP/HTTPS dual-port serving. Callers never need to know the underlying engine.
Smart Routing & Failover
Virtual ModelName makes channel switching completely transparent to consumers. Multi-channel load balancing with proportional and primary-backup strategies. Custom routing rules by context length, request headers, or request content. Auto-failover to backup channels with unchanged API parameters.
MCP Protocol Bridge
McpBridge natively supports MCP (Model Context Protocol) conversion with DNS auto-registration of inference endpoints. Connects enterprise ERP, CRM, and business systems, providing standardized tool-call interfaces for the Agent era.
Token-Level Metering & Billing
The ai-statistics plugin meters prompt/completion tokens at fine granularity across tenant, project, and API Key dimensions. Top-N usage analysis (by system/model/workspace), timeline aggregation, and cost anomaly alerts for precise AI FinOps governance.
Full-Chain Observability
Per-request full-chain logging — input/output per node, token counts, latency — with streaming content replay and merged views. Real-time API health monitoring with auto-alerting. Prometheus metric collection and Grafana dashboard integration.
Security & Access Control
API Key generation and management with multi-key multi-model permission binding and admin global control. Token auth, IP allowlists/blocklists, PII interception, sensitive word filtering, and time-scheduled rate limiting for secure multi-tenant isolation.
Routing Strategies
| Strategy | Trigger | Use Case |
|---|---|---|
| Virtual ModelName | Request specifies a virtual model name | Transparent channel switching, A/B testing |
| Context-Length Routing | Request token count exceeds threshold | Auto-dispatch long texts to large-context models |
| Header-Based Routing | Custom header field match | Multi-tenant traffic isolation, canary releases |
| Content-Based Routing | Request body keyword/field match | Route by business scenario to specialized models |
| Primary-Backup | Primary channel failure or timeout | High-availability disaster recovery, auto-failover |
| Proportional | Weighted percentage distribution | Multi-channel load balancing, gradual traffic shifting |
Onboarding Flow
Register Model Endpoints
McpBridge DNS auto-discovers inference services, or manually register external model API endpoints
Configure Routing Rules
Set up virtual ModelName, load ratios, primary-backup strategies, and custom routing conditions
Serve Unified API
Consumers call the unified OpenAI-compatible API; the gateway auto-routes to the target engine
Monitor & Bill
Full-chain logging and token metering collected in real time; usage dashboards and cost alerts generated automatically