# Billing & Credits

PromptShuttle tracks all LLM costs using a credit system and provides controls to prevent runaway spending.

## Credit system

All costs are tracked in **credits**:

```
1,000,000 credits = $1.00 USD
```

Every LLM call has a cost based on token usage and the model's pricing:

```
cost = (input_tokens * input_rate) + (output_tokens * output_rate)
```

Some models have additional pricing components:

* **Reasoning tokens** — Models like o1 charge for thinking tokens
* **Cached tokens** — Anthropic offers discounted rates for cache-hit tokens
* **Cache creation tokens** — Anthropic charges a premium for writing to cache
* **Tool costs** — Some provider-native tools (e.g. web search) have per-use charges

## Purchasing credits

Credits are purchased through Stripe:

1. Navigate to **Billing** in the PromptShuttle UI
2. Select a credit package
3. Complete checkout via Stripe
4. Credits are added to your tenant balance immediately

Manage your subscription and payment methods via the Stripe Customer Portal, accessible from the billing page.

## Cost tracking

Every request tracks cost at multiple levels:

### Per-inference costs

Each individual LLM call records:

* Input/output/reasoning tokens
* Cached token counts
* Cost in USD and credits
* Model and provider used
* Tool invocation costs

### Per-request costs (agentic)

When an agent makes multiple LLM calls (tool-calling loops), the request aggregates:

* Total credits used across all iterations
* Direct LLM costs vs. child agent costs
* Total duration
* Total tool calls

### Per-tree costs (multi-agent)

For multi-agent workflows, the root request tracks:

* Cumulative credits across the entire agent tree
* Number of agents spawned
* Maximum depth reached

## Cost controls

### Per-request limit

Set a maximum cost per request to prevent expensive runaway agent loops:

**Tenant-level** (applies to all requests):

Configure `maxRequestCostCredits` in your tenant settings.

**Per-request override:**

```json
{
  "parameters": { "..." },
  "maxRequestCostCredits": 50000
}
```

When the limit is exceeded, the request stops and returns an error with code `COST_LIMIT_EXCEEDED`.

### Agent depth limit

Limit how deep agent nesting can go:

* **System default:** 10 levels
* **Tenant override:** Set `maxAgentDepth` in tenant settings
* **Per-tool override:** Set `maxAgentDepth` on individual agent tools
* **Per-request override:** Pass in the request body

When exceeded, returns `DEPTH_LIMIT_EXCEEDED`.

### Cost alert webhooks

Configure a webhook to be notified when cumulative costs exceed a threshold:

* Set `costAlertThresholdCredits` on your tenant
* Set `alertWebhookUrl` to receive POST notifications

PromptShuttle sends a POST to your webhook URL with cost details when the threshold is crossed during a request.

## Credit balance

Check your current balance in the API response — every flow run and inference response includes:

* `creditsUsed` — how many credits this request consumed
* `creditsLeft` — remaining tenant balance

The balance is also visible on the dashboard.

## Cost in responses

### Flow execution response

```json
{
  "creditsUsed": 1250,
  "usage": {
    "creditsUsed": 1250,
    "creditsLeft": 998750,
    "tokensIn": 85,
    "tokensOut": 142,
    "reasoningTokens": 0,
    "toolCost": 0,
    "costUsd": 0.00125
  }
}
```

### Streaming usage updates

When streaming with `include_usage: true`, periodic `usageUpdate` events show real-time cost:

```json
{
  "type": "usageUpdate",
  "data": {
    "cumulativeCreditsUsed": 3400,
    "cumulativeCostUsd": 0.0034,
    "activeAgents": 1,
    "completedAgents": 2,
    "elapsedMs": 4500
  }
}
```

## Pricing

Model pricing varies by provider and model. View current pricing via:

```bash
GET /api/v1/models/descriptors
```

Each model includes `pricingGraduations` — volume-based tiers:

```json
{
  "name": "gpt-4o",
  "pricingGraduations": [
    { "fromPromptLength": 0, "input": 2.50, "output": 10.00 },
    { "fromPromptLength": 128000, "input": 1.25, "output": 5.00 }
  ]
}
```

Prices are in **$/million tokens**.

For Anthropic models, pricing also includes:

* `cachedInput` — Rate for cache-hit tokens
* `cacheCreationInput` — Rate for cache-write tokens


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.promptshuttle.com/platform/billing.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.