Billing & Credits
PromptShuttle tracks all LLM costs using a credit system and provides controls to prevent runaway spending.
Credit system
All costs are tracked in credits:
1,000,000 credits = $1.00 USDEvery LLM call has a cost based on token usage and the model's pricing:
cost = (input_tokens * input_rate) + (output_tokens * output_rate)Some models have additional pricing components:
Reasoning tokens — Models like o1 charge for thinking tokens
Cached tokens — Anthropic offers discounted rates for cache-hit tokens
Cache creation tokens — Anthropic charges a premium for writing to cache
Tool costs — Some provider-native tools (e.g. web search) have per-use charges
Purchasing credits
Credits are purchased through Stripe:
Navigate to Billing in the PromptShuttle UI
Select a credit package
Complete checkout via Stripe
Credits are added to your tenant balance immediately
Manage your subscription and payment methods via the Stripe Customer Portal, accessible from the billing page.
Cost tracking
Every request tracks cost at multiple levels:
Per-inference costs
Each individual LLM call records:
Input/output/reasoning tokens
Cached token counts
Cost in USD and credits
Model and provider used
Tool invocation costs
Per-request costs (agentic)
When an agent makes multiple LLM calls (tool-calling loops), the request aggregates:
Total credits used across all iterations
Direct LLM costs vs. child agent costs
Total duration
Total tool calls
Per-tree costs (multi-agent)
For multi-agent workflows, the root request tracks:
Cumulative credits across the entire agent tree
Number of agents spawned
Maximum depth reached
Cost controls
Per-request limit
Set a maximum cost per request to prevent expensive runaway agent loops:
Tenant-level (applies to all requests):
Configure maxRequestCostCredits in your tenant settings.
Per-request override:
When the limit is exceeded, the request stops and returns an error with code COST_LIMIT_EXCEEDED.
Agent depth limit
Limit how deep agent nesting can go:
System default: 10 levels
Tenant override: Set
maxAgentDepthin tenant settingsPer-tool override: Set
maxAgentDepthon individual agent toolsPer-request override: Pass in the request body
When exceeded, returns DEPTH_LIMIT_EXCEEDED.
Cost alert webhooks
Configure a webhook to be notified when cumulative costs exceed a threshold:
Set
costAlertThresholdCreditson your tenantSet
alertWebhookUrlto receive POST notifications
PromptShuttle sends a POST to your webhook URL with cost details when the threshold is crossed during a request.
Credit balance
Check your current balance in the API response — every flow run and inference response includes:
creditsUsed— how many credits this request consumedcreditsLeft— remaining tenant balance
The balance is also visible on the dashboard.
Cost in responses
Flow execution response
Streaming usage updates
When streaming with include_usage: true, periodic usageUpdate events show real-time cost:
Pricing
Model pricing varies by provider and model. View current pricing via:
Each model includes pricingGraduations — volume-based tiers:
Prices are in $/million tokens.
For Anthropic models, pricing also includes:
cachedInput— Rate for cache-hit tokenscacheCreationInput— Rate for cache-write tokens
Last updated