Troubleshooting
Rate-limited — what to do
Three different things produce a 429 from ToRouter. Here is how to tell them apart and fix each one.
A 429 Too Many Requests from ToRouter can come from three different places. The error body tells you which.
The three sources
Client-side: retry with backoff
For per-key rate limits, the cheapest fix is to back off and retry. The OpenAI and Anthropic SDKs do this automatically with max_retries; for raw HTTP, implement exponential backoff yourself:
import time, random
from openai import OpenAI, RateLimitError
client = OpenAI(api_key="sk-***", base_url="https://portal.torouter.ai/v1")
def call_with_retry(**kwargs):
for attempt in range(6):
try:
return client.chat.completions.create(**kwargs)
except RateLimitError:
time.sleep((2 ** attempt) + random.random())
raiseasync function callWithRetry(fn) {
for (let attempt = 0; attempt < 6; attempt++) {
try {
return await fn();
} catch (err) {
if (err.status !== 429) throw err;
await new Promise(r => setTimeout(r, (2 ** attempt) * 1000 + Math.random() * 1000));
}
}
throw new Error('rate-limited after 6 retries');
}Do not retry on API_KEY_QUOTA_EXHAUSTED or USAGE_LIMIT_EXCEEDED — the error is sticky until you raise the cap or the window rolls over. Treat these as terminal in your retry loop.
Permanent fixes
- Raise the key's RPM/RPH/RPD in
/keys— fastest if you own the account. - Use multiple keys for high-fanout workloads and load-balance client-side.
- Top up if you're hitting
INSUFFICIENT_BALANCE(402) — that's not 429 but related.