# Model Routing

Model routing lets you configure tenant-level rules for how models are selected. You can create aliases, define fallback chains, and use load-balancing strategies — all without changing your application code.

## How model resolution works

When a request is executed, PromptShuttle resolves the model in this order:

1. **Request override** — If `overrideModel` is set, use it (but check if it's an alias first)
2. **Routing rules** — Check tenant routing rules for the template's model name
3. **Template config** — Use the template's `llm` field and `fallbacks`
4. **Defaults** — Use provider defaults

## Routing rules

A routing rule maps an **alias** to one or more **actual models** with a **strategy** for selection.

### Example rules

```json
[
  {
    "alias": "fast",
    "models": ["groq/llama-3.3-70b-versatile", "openai/gpt-4o-mini"],
    "strategy": "Sequential",
    "description": "Fast, cheap model with OpenAI fallback"
  },
  {
    "alias": "smart",
    "models": ["anthropic/claude-sonnet-4-20250514", "openai/gpt-4o"],
    "strategy": "Sequential",
    "description": "High-quality model with fallback"
  },
  {
    "alias": "balanced",
    "models": ["openai/gpt-4o", "anthropic/claude-sonnet-4-20250514", "google/gemini-2.5-flash"],
    "strategy": "RoundRobin",
    "description": "Distribute load across providers"
  }
]
```

### Using aliases

Once defined, use an alias anywhere you'd use a model name:

* In a template's `llm` field: set it to `"fast"` instead of `"groq/llama-3.3-70b-versatile"`
* In a request's `overrideModel`: `"overrideModel": "smart"`
* In the OpenAI endpoint's `model` field: `"model": "balanced"`

### Strategies

| Strategy           | Behavior                                                                   |
| ------------------ | -------------------------------------------------------------------------- |
| **Sequential**     | Uses the first model. If it fails, tries the next. Classic fallback chain. |
| **Random**         | Randomly selects one model per request.                                    |
| **WeightedRandom** | Randomly selects with weights (e.g. 70% model A, 30% model B).             |
| **RoundRobin**     | Rotates through models evenly across requests.                             |

### Weighted random

For `WeightedRandom`, provide a `weights` array matching the `models` array:

```json
{
  "alias": "cost_optimized",
  "models": ["groq/llama-3.3-70b-versatile", "openai/gpt-4o-mini", "openai/gpt-4o"],
  "strategy": "WeightedRandom",
  "weights": [0.5, 0.3, 0.2],
  "description": "50% Groq, 30% GPT-4o-mini, 20% GPT-4o"
}
```

Weights are normalized automatically — they don't need to sum to 1.

### Environment-scoped rules

Rules can be restricted to specific environments:

```json
{
  "alias": "default",
  "models": ["openai/gpt-4o"],
  "strategy": "Sequential",
  "environments": ["production"],
  "description": "Use GPT-4o in production only"
}
```

Rules without `environments` apply to all environments.

## Managing routing rules

### Get current rules

```bash
GET /api/v1/model-routing
```

### Update rules

```bash
PUT /api/v1/model-routing
```

```json
[
  {
    "alias": "fast",
    "models": ["groq/llama-3.3-70b-versatile"],
    "strategy": "Sequential"
  },
  {
    "alias": "default",
    "models": ["openai/gpt-4o", "anthropic/claude-sonnet-4-20250514"],
    "strategy": "Sequential"
  }
]
```

The entire rule set is replaced on update.

### Validation

Rules are validated before saving:

* Aliases must be non-empty
* Each rule must have at least one model
* All model names must be recognized (use `GET /api/v1/models/descriptors` to check)
* `WeightedRandom` rules must have a `weights` array matching the `models` count
* Weights must be non-negative
* No duplicate aliases

## Fallback behavior

When using `Sequential` strategy, the first model is the primary. If it fails (rate limit, timeout, outage), PromptShuttle automatically tries the next model in the list.

The response indicates whether a fallback was used via the `wasFallback` and `fallbackReason` fields in streaming events.

## Use cases

| Scenario                  | Configuration                                                   |
| ------------------------- | --------------------------------------------------------------- |
| **Cost optimization**     | Alias `"default"` → cheap model first, expensive fallback       |
| **Provider redundancy**   | Sequential across 2-3 providers for high availability           |
| **A/B testing models**    | WeightedRandom to split traffic between models                  |
| **Load distribution**     | RoundRobin across equivalent models from different providers    |
| **Environment isolation** | Different model in staging vs production                        |
| **Easy model upgrades**   | Change the alias target — all flows using it switch immediately |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.promptshuttle.com/platform/model-routing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.