Modalities

Text, multimodal, image, video, audio, embedding and rerank — what each modality filter means.

ToRouter tags every catalog model with a modality. The filter on the /models page narrows the list to one kind of capability at a time.

The eight modalities

Prop

Type

Modality vs endpoint

Modality describes what a model does. The endpoint you call describes which protocol you use:

Text + Multimodal → POST /v1/chat/completions, POST /v1/responses, POST /v1/messages, POST /v1beta/models/<id>:generateContent
Image → POST /v1/images/generations, POST /v1/images/edits
Embedding → POST /v1/embeddings
Audio → POST /v1/audio/transcriptions, POST /v1/audio/speech
Rerank → vendor-specific, usually POST /v1/rerank

The filter is a UI convenience — actual capability is defined by the underlying upstream model. Always confirm on the model detail page that the endpoint you need is supported.

Multimodal in practice

Multimodal models accept image content blocks alongside text in standard OpenAI / Anthropic / Gemini formats:

multimodal.py

client.chat.completions.create(
    model="gpt-5",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}},
        ],
    }],
)

Next steps

Browse the catalog

Filter by the modality you need.

Endpoints

Per-protocol base URLs for OpenAI / Anthropic / Gemini.

SDK examples

Working code for text and multimodal calls.

ToRouter tags every catalog model with a modality. The filter on the /models page narrows the list to one kind of capability at a time.

The eight modalities

Prop

Type

Modality vs endpoint

Modality describes what a model does. The endpoint you call describes which protocol you use:

Text + Multimodal → POST /v1/chat/completions, POST /v1/responses, POST /v1/messages, POST /v1beta/models/<id>:generateContent
Image → POST /v1/images/generations, POST /v1/images/edits
Embedding → POST /v1/embeddings
Audio → POST /v1/audio/transcriptions, POST /v1/audio/speech
Rerank → vendor-specific, usually POST /v1/rerank

The filter is a UI convenience — actual capability is defined by the underlying upstream model. Always confirm on the model detail page that the endpoint you need is supported.

Multimodal in practice

Multimodal models accept image content blocks alongside text in standard OpenAI / Anthropic / Gemini formats:

multimodal.py

client.chat.completions.create(
    model="gpt-5",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/cat.jpg"}},
        ],
    }],
)

The eight modalities

Modality vs endpoint

Multimodal in practice

Next steps

Browse the catalog

Endpoints

SDK examples

Table of Contents

Modalities

The eight modalities

Modality vs endpoint

Multimodal in practice

Next steps

Browse the catalog

Endpoints

SDK examples

Table of Contents