Billing & Credits

PromptShuttle tracks all LLM costs using a credit system and provides controls to prevent runaway spending.

Credit system

All costs are tracked in credits:

1,000,000 credits = $1.00 USD

Every LLM call has a cost based on token usage and the model's pricing:

cost = (input_tokens * input_rate) + (output_tokens * output_rate)

Some models have additional pricing components:

  • Reasoning tokens — Models like o1 charge for thinking tokens

  • Cached tokens — Anthropic offers discounted rates for cache-hit tokens

  • Cache creation tokens — Anthropic charges a premium for writing to cache

  • Tool costs — Some provider-native tools (e.g. web search) have per-use charges

Purchasing credits

Credits are purchased through Stripe:

  1. Navigate to Billing in the PromptShuttle UI

  2. Select a credit package

  3. Complete checkout via Stripe

  4. Credits are added to your tenant balance immediately

Manage your subscription and payment methods via the Stripe Customer Portal, accessible from the billing page.

Cost tracking

Every request tracks cost at multiple levels:

Per-inference costs

Each individual LLM call records:

  • Input/output/reasoning tokens

  • Cached token counts

  • Cost in USD and credits

  • Model and provider used

  • Tool invocation costs

Per-request costs (agentic)

When an agent makes multiple LLM calls (tool-calling loops), the request aggregates:

  • Total credits used across all iterations

  • Direct LLM costs vs. child agent costs

  • Total duration

  • Total tool calls

Per-tree costs (multi-agent)

For multi-agent workflows, the root request tracks:

  • Cumulative credits across the entire agent tree

  • Number of agents spawned

  • Maximum depth reached

Cost controls

Per-request limit

Set a maximum cost per request to prevent expensive runaway agent loops:

Tenant-level (applies to all requests):

Configure maxRequestCostCredits in your tenant settings.

Per-request override:

When the limit is exceeded, the request stops and returns an error with code COST_LIMIT_EXCEEDED.

Agent depth limit

Limit how deep agent nesting can go:

  • System default: 10 levels

  • Tenant override: Set maxAgentDepth in tenant settings

  • Per-tool override: Set maxAgentDepth on individual agent tools

  • Per-request override: Pass in the request body

When exceeded, returns DEPTH_LIMIT_EXCEEDED.

Cost alert webhooks

Configure a webhook to be notified when cumulative costs exceed a threshold:

  • Set costAlertThresholdCredits on your tenant

  • Set alertWebhookUrl to receive POST notifications

PromptShuttle sends a POST to your webhook URL with cost details when the threshold is crossed during a request.

Credit balance

Check your current balance in the API response — every flow run and inference response includes:

  • creditsUsed — how many credits this request consumed

  • creditsLeft — remaining tenant balance

The balance is also visible on the dashboard.

Cost in responses

Flow execution response

Streaming usage updates

When streaming with include_usage: true, periodic usageUpdate events show real-time cost:

Pricing

Model pricing varies by provider and model. View current pricing via:

Each model includes pricingGraduations — volume-based tiers:

Prices are in $/million tokens.

For Anthropic models, pricing also includes:

  • cachedInput — Rate for cache-hit tokens

  • cacheCreationInput — Rate for cache-write tokens

Last updated