Model Routing

Model routing lets you configure tenant-level rules for how models are selected. You can create aliases, define fallback chains, and use load-balancing strategies — all without changing your application code.

How model resolution works

When a request is executed, PromptShuttle resolves the model in this order:

  1. Request override — If overrideModel is set, use it (but check if it's an alias first)

  2. Routing rules — Check tenant routing rules for the template's model name

  3. Template config — Use the template's llm field and fallbacks

  4. Defaults — Use provider defaults

Routing rules

A routing rule maps an alias to one or more actual models with a strategy for selection.

Example rules

[
  {
    "alias": "fast",
    "models": ["groq/llama-3.3-70b-versatile", "openai/gpt-4o-mini"],
    "strategy": "Sequential",
    "description": "Fast, cheap model with OpenAI fallback"
  },
  {
    "alias": "smart",
    "models": ["anthropic/claude-sonnet-4-20250514", "openai/gpt-4o"],
    "strategy": "Sequential",
    "description": "High-quality model with fallback"
  },
  {
    "alias": "balanced",
    "models": ["openai/gpt-4o", "anthropic/claude-sonnet-4-20250514", "google/gemini-2.5-flash"],
    "strategy": "RoundRobin",
    "description": "Distribute load across providers"
  }
]

Using aliases

Once defined, use an alias anywhere you'd use a model name:

  • In a template's llm field: set it to "fast" instead of "groq/llama-3.3-70b-versatile"

  • In a request's overrideModel: "overrideModel": "smart"

  • In the OpenAI endpoint's model field: "model": "balanced"

Strategies

Strategy
Behavior

Sequential

Uses the first model. If it fails, tries the next. Classic fallback chain.

Random

Randomly selects one model per request.

WeightedRandom

Randomly selects with weights (e.g. 70% model A, 30% model B).

RoundRobin

Rotates through models evenly across requests.

Weighted random

For WeightedRandom, provide a weights array matching the models array:

Weights are normalized automatically — they don't need to sum to 1.

Environment-scoped rules

Rules can be restricted to specific environments:

Rules without environments apply to all environments.

Managing routing rules

Get current rules

Update rules

The entire rule set is replaced on update.

Validation

Rules are validated before saving:

  • Aliases must be non-empty

  • Each rule must have at least one model

  • All model names must be recognized (use GET /api/v1/models/descriptors to check)

  • WeightedRandom rules must have a weights array matching the models count

  • Weights must be non-negative

  • No duplicate aliases

Fallback behavior

When using Sequential strategy, the first model is the primary. If it fails (rate limit, timeout, outage), PromptShuttle automatically tries the next model in the list.

The response indicates whether a fallback was used via the wasFallback and fallbackReason fields in streaming events.

Use cases

Scenario
Configuration

Cost optimization

Alias "default" → cheap model first, expensive fallback

Provider redundancy

Sequential across 2-3 providers for high availability

A/B testing models

WeightedRandom to split traffic between models

Load distribution

RoundRobin across equivalent models from different providers

Environment isolation

Different model in staging vs production

Easy model upgrades

Change the alias target — all flows using it switch immediately

Last updated