# Model Routing

Model routing lets you configure tenant-level rules for how models are selected. You can create aliases, define fallback chains, and use load-balancing strategies — all without changing your application code.

## How model resolution works

When a request is executed, PromptShuttle resolves the model in this order:

1. **Request override** — If `overrideModel` is set, use it (but check if it's an alias first)
2. **Routing rules** — Check tenant routing rules for the template's model name
3. **Template config** — Use the template's `llm` field and `fallbacks`
4. **Defaults** — Use provider defaults

## Routing rules

A routing rule maps an **alias** to one or more **actual models** with a **strategy** for selection.

### Example rules

```json
[
  {
    "alias": "fast",
    "models": ["groq/llama-3.3-70b-versatile", "openai/gpt-4o-mini"],
    "strategy": "Sequential",
    "description": "Fast, cheap model with OpenAI fallback"
  },
  {
    "alias": "smart",
    "models": ["anthropic/claude-sonnet-4-20250514", "openai/gpt-4o"],
    "strategy": "Sequential",
    "description": "High-quality model with fallback"
  },
  {
    "alias": "balanced",
    "models": ["openai/gpt-4o", "anthropic/claude-sonnet-4-20250514", "google/gemini-2.5-flash"],
    "strategy": "RoundRobin",
    "description": "Distribute load across providers"
  }
]
```

### Using aliases

Once defined, use an alias anywhere you'd use a model name:

* In a template's `llm` field: set it to `"fast"` instead of `"groq/llama-3.3-70b-versatile"`
* In a request's `overrideModel`: `"overrideModel": "smart"`
* In the OpenAI endpoint's `model` field: `"model": "balanced"`

### Strategies

| Strategy           | Behavior                                                                   |
| ------------------ | -------------------------------------------------------------------------- |
| **Sequential**     | Uses the first model. If it fails, tries the next. Classic fallback chain. |
| **Random**         | Randomly selects one model per request.                                    |
| **WeightedRandom** | Randomly selects with weights (e.g. 70% model A, 30% model B).             |
| **RoundRobin**     | Rotates through models evenly across requests.                             |

### Weighted random

For `WeightedRandom`, provide a `weights` array matching the `models` array:

```json
{
  "alias": "cost_optimized",
  "models": ["groq/llama-3.3-70b-versatile", "openai/gpt-4o-mini", "openai/gpt-4o"],
  "strategy": "WeightedRandom",
  "weights": [0.5, 0.3, 0.2],
  "description": "50% Groq, 30% GPT-4o-mini, 20% GPT-4o"
}
```

Weights are normalized automatically — they don't need to sum to 1.

### Environment-scoped rules

Rules can be restricted to specific environments:

```json
{
  "alias": "default",
  "models": ["openai/gpt-4o"],
  "strategy": "Sequential",
  "environments": ["production"],
  "description": "Use GPT-4o in production only"
}
```

Rules without `environments` apply to all environments.

## Managing routing rules

### Get current rules

```bash
GET /api/v1/model-routing
```

### Update rules

```bash
PUT /api/v1/model-routing
```

```json
[
  {
    "alias": "fast",
    "models": ["groq/llama-3.3-70b-versatile"],
    "strategy": "Sequential"
  },
  {
    "alias": "default",
    "models": ["openai/gpt-4o", "anthropic/claude-sonnet-4-20250514"],
    "strategy": "Sequential"
  }
]
```

The entire rule set is replaced on update.

### Validation

Rules are validated before saving:

* Aliases must be non-empty
* Each rule must have at least one model
* All model names must be recognized (use `GET /api/v1/models/descriptors` to check)
* `WeightedRandom` rules must have a `weights` array matching the `models` count
* Weights must be non-negative
* No duplicate aliases

## Fallback behavior

When using `Sequential` strategy, the first model is the primary. If it fails (rate limit, timeout, outage), PromptShuttle automatically tries the next model in the list.

The response indicates whether a fallback was used via the `wasFallback` and `fallbackReason` fields in streaming events.

## Use cases

| Scenario                  | Configuration                                                   |
| ------------------------- | --------------------------------------------------------------- |
| **Cost optimization**     | Alias `"default"` → cheap model first, expensive fallback       |
| **Provider redundancy**   | Sequential across 2-3 providers for high availability           |
| **A/B testing models**    | WeightedRandom to split traffic between models                  |
| **Load distribution**     | RoundRobin across equivalent models from different providers    |
| **Environment isolation** | Different model in staging vs production                        |
| **Easy model upgrades**   | Change the alias target — all flows using it switch immediately |
