Channel routing & failover
How ToRouter picks a channel for your request and transparently retries when an upstream fails.
When you call a model, ToRouter has to pick one channel from all the channels that can serve it. The selection is deterministic in its rules and transparent in its output: you always see exactly one response.
How a channel is picked
For every request the gateway runs a short pipeline:
Eligibility. Collect every channel you are allowed to use that supports the requested model.
Priority filter. Among eligibles, keep only those with the lowest priority value (priority is preference rank — lower wins).
LRU tiebreaker. Among equal-priority candidates, prefer the least-recently-used ones.
Weighted random pick. If several candidates remain, pick one at random using the weights configured for those paths.
A channel with priority=10, weight=3 always beats priority=20, weight=10. Within a single priority bucket, higher weight means more traffic.
Failover
Upstreams fail. ToRouter handles it server-side so your client sees one clean response.
If the picked channel returns a transient error — 5xx, 429, or specific 4xx codes that indicate "try elsewhere" (e.g. quota exhausted, key revoked) — the gateway:
- Discards the failed response before any bytes hit your TCP socket.
- Picks the next candidate by the same priority → LRU → weight rules.
- Retries the request transparently.
- Returns the first successful response.
Terminal errors (400, 404, content policy violations) are returned to you immediately — retrying won't help.
Streaming requests can only fail over before the first SSE byte is sent. Once a stream has started, an upstream disconnect surfaces as an error mid-stream — the gateway cannot rewind a partial response.
See what got picked
Every gateway response is logged with the channel that served it. View this in the Usage page — each request row shows model, channel, latency and outcome.