LogoToRouter Docs
LogoToRouter Docs
HomepageWhat is ToRouter5-minute quickstartCore concepts
Common HTTP errorsRate-limited — what to doKey blocked or revokedUpstream provider errors & failoverProduction best practices
Troubleshooting

Production best practices

Eight habits that keep a ToRouter integration boring in production.

A short checklist for running ToRouter in production. None of these is exotic — they're the things people wish they'd done before the first incident.

1. One key per environment

Create separate keys for dev, staging, and prod. A leaked dev key with a tight quota cap is a nuisance; a leaked prod key with no cap is a bill.

2. Set IP allowlists on prod keys

In /keys, add the CIDR of your production egress (Fly.io regions, VPC NAT, k8s egress gateway, etc.). A leaked key outside the allowlist is unusable.

3. Cap spend per key

Give every key a spending ceiling in /keys (USD or CNY). If a runaway loop bills $10,000 in an hour, the gateway stops at 429 instead of silently charging through.

Rough starting points: prod around 90% of your monthly budget, staging on the order of tens of dollars, dev around ten dollars — tune in /keys to match your team.

4. Pin model versions

# Good
model="claude-opus-4-7"
model="gpt-5.3-codex"

# Bad — silently changes behavior over time
model="claude"
model="gpt-4"

A pinned version lets you A/B against the next version on your schedule, not the provider's.

5. Retry 429 and transient 5xx with exponential backoff

The OpenAI and Anthropic SDKs handle this when you set max_retries. For raw HTTP, see Rate-limited for a minimal implementation. Never retry 4xx (400/401/403/404) — same response will come back.

6. Always log x-request-id

resp = client.chat.completions.with_raw_response.create(...)
print(resp.headers["x-request-id"])

When you need to file a support ticket, this is the one piece of data that lets us trace the request through gateway, scheduler, and upstream in seconds.

7. Pick a fallback model

If your primary is gpt-5 and it's globally down, you want code that tries claude-opus-4-7 (or another model in the same group) before paging an engineer:

fallback
def chat(messages, primary="gpt-5", fallback="claude-opus-4-7"):
    try:
        return client.chat.completions.create(model=primary, messages=messages)
    except Exception:
        return client.chat.completions.create(model=fallback, messages=messages)

8. Watch the dashboard and spend

  • Open /dashboard once a week — look for unusual spend spikes per model or per key.
  • Before traffic spikes, top up so you don't hit surprise 402s.
  • When spend looks wrong, drill into Usage details row by row.

Rotate keys quarterly even when there's no incident. A regular rotation makes leak detection (audit log mismatches, "wait, why is this key still in use?") trivial.

Next steps

Per-key limits

Configure the levers this page recommends.

Usage dashboard

Spend, quota, and trends.

Top up

Avoid 402 surprises.

Upstream provider errors & failover

How ToRouter retries across channels when an upstream provider has issues — and what you'll see when it can't help.

Full error code reference

Gateway error types with HTTP status, meaning, and what to do next.

Table of Contents

1. One key per environment2. Set IP allowlists on prod keys3. Cap spend per key4. Pin model versions5. Retry 429 and transient 5xx with exponential backoff6. Always log x-request-id7. Pick a fallback model8. Watch the dashboard and spendNext steps