Production best practices
Eight habits that keep a ToRouter integration boring in production.
A short checklist for running ToRouter in production. None of these is exotic — they're the things people wish they'd done before the first incident.
1. One key per environment
Create separate keys for dev, staging, and prod. A leaked dev key with a tight quota cap is a nuisance; a leaked prod key with no cap is a bill.
2. Set IP allowlists on prod keys
In /keys, add the CIDR of your production egress (Fly.io regions, VPC NAT, k8s egress gateway, etc.). A leaked key outside the allowlist is unusable.
3. Cap spend per key
Give every key a spending ceiling in /keys (USD or CNY). If a runaway loop bills $10,000 in an hour, the gateway stops at 429 instead of silently charging through.
Rough starting points: prod around 90% of your monthly budget, staging on the order of tens of dollars, dev around ten dollars — tune in /keys to match your team.
4. Pin model versions
# Good
model="claude-opus-4-7"
model="gpt-5.3-codex"
# Bad — silently changes behavior over time
model="claude"
model="gpt-4"A pinned version lets you A/B against the next version on your schedule, not the provider's.
5. Retry 429 and transient 5xx with exponential backoff
The OpenAI and Anthropic SDKs handle this when you set max_retries. For raw HTTP, see Rate-limited for a minimal implementation. Never retry 4xx (400/401/403/404) — same response will come back.
6. Always log x-request-id
resp = client.chat.completions.with_raw_response.create(...)
print(resp.headers["x-request-id"])When you need to file a support ticket, this is the one piece of data that lets us trace the request through gateway, scheduler, and upstream in seconds.
7. Pick a fallback model
If your primary is gpt-5 and it's globally down, you want code that tries claude-opus-4-7 (or another model in the same group) before paging an engineer:
def chat(messages, primary="gpt-5", fallback="claude-opus-4-7"):
try:
return client.chat.completions.create(model=primary, messages=messages)
except Exception:
return client.chat.completions.create(model=fallback, messages=messages)8. Watch the dashboard and spend
- Open
/dashboardonce a week — look for unusual spend spikes per model or per key. - Before traffic spikes, top up so you don't hit surprise 402s.
- When spend looks wrong, drill into Usage details row by row.
Rotate keys quarterly even when there's no incident. A regular rotation makes leak detection (audit log mismatches, "wait, why is this key still in use?") trivial.