Rate-limited — what to do | ToRouter

Three different things produce a 429 from ToRouter. Here is how to tell them apart and fix each one.

A 429 Too Many Requests from ToRouter can come from three different places. The error body tells you which.

The three sources

Client-side: retry with backoff

For per-key rate limits, the cheapest fix is to back off and retry. The OpenAI and Anthropic SDKs do this automatically with max_retries; for raw HTTP, implement exponential backoff yourself:

python

import time, random
from openai import OpenAI, RateLimitError

client = OpenAI(api_key="sk-***", base_url="https://portal.torouter.ai/v1")

def call_with_retry(**kwargs):
    for attempt in range(6):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError:
            time.sleep((2 ** attempt) + random.random())
    raise

node

async function callWithRetry(fn) {
  for (let attempt = 0; attempt < 6; attempt++) {
    try {
      return await fn();
    } catch (err) {
      if (err.status !== 429) throw err;
      await new Promise(r => setTimeout(r, (2 ** attempt) * 1000 + Math.random() * 1000));
    }
  }
  throw new Error('rate-limited after 6 retries');
}

Do not retry on API_KEY_QUOTA_EXHAUSTED or USAGE_LIMIT_EXCEEDED — the error is sticky until you raise the cap or the window rolls over. Treat these as terminal in your retry loop.

Permanent fixes

Raise the key's RPM/RPH/RPD in /keys — fastest if you own the account.
Use multiple keys for high-fanout workloads and load-balance client-side.
Top up if you're hitting INSUFFICIENT_BALANCE (402) — that's not 429 but related.

import time, random
from openai import OpenAI, RateLimitError

client = OpenAI(api_key="sk-***", base_url="https://portal.torouter.ai/v1")

def call_with_retry(**kwargs):
    for attempt in range(6):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError:
            time.sleep((2 ** attempt) + random.random())
    raise

node

async function callWithRetry(fn) {
  for (let attempt = 0; attempt < 6; attempt++) {
    try {
      return await fn();
    } catch (err) {
      if (err.status !== 429) throw err;
      await new Promise(r => setTimeout(r, (2 ** attempt) * 1000 + Math.random() * 1000));
    }
  }
  throw new Error('rate-limited after 6 retries');
}

Do not retry on API_KEY_QUOTA_EXHAUSTED or USAGE_LIMIT_EXCEEDED — the error is sticky until you raise the cap or the window rolls over. Treat these as terminal in your retry loop.

Permanent fixes

Raise the key's RPM/RPH/RPD in /keys — fastest if you own the account.
Use multiple keys for high-fanout workloads and load-balance client-side.
Top up if you're hitting INSUFFICIENT_BALANCE (402) — that's not 429 but related.

Rate-limited — what to do

The three sources

Client-side: retry with backoff

Permanent fixes

Next steps

Per-key limits

Usage details

Key blocked or revoked

Table of Contents

Rate-limited — what to do

The three sources

Client-side: retry with backoff

Permanent fixes

Next steps

Per-key limits

Usage details

Key blocked or revoked

Table of Contents

Rate-limited — what to do

Per-key rate limit (requests per minute / hour / day)

Per-key spending quota exhausted

Subscription window limit (daily / weekly / monthly)

Per-key limits

Usage details

Key blocked or revoked

Table of Contents

Rate-limited — what to do

Per-key rate limit (requests per minute / hour / day)

Per-key spending quota exhausted

Subscription window limit (daily / weekly / monthly)

Per-key limits

Usage details

Key blocked or revoked

Table of Contents