# Streaming (SSE)

PromptShuttle supports Server-Sent Events for real-time visibility into multi-agent execution. When you enable streaming on the OpenAI-compatible endpoint, you get a structured event stream covering the full lifecycle of your request — from agent starts through tool calls to final completion.

## Enabling streaming

Set `stream: true` on the OpenAI-compatible endpoint:

```bash
curl -N -X POST https://app.promptshuttle.com/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": [{"type": "text", "text": "Research quantum computing"}]}],
    "stream": true,
    "stream_options": {
      "include_usage": true,
      "usage_interval_ms": 3000
    }
  }'
```

## Response format

The response uses standard SSE format with `Content-Type: text/event-stream`:

```
id: 67a1b2c3d4e5f6a7b8c9d0e1
data: {"id":"...","timestamp":"...","requestId":"...","type":"requestStarted","data":{...}}

id: 67a1b2c3d4e5f6a7b8c9d0e2
data: {"id":"...","timestamp":"...","requestId":"...","type":"agentInferenceStarted","data":{...}}

...

data: [DONE]
```

Each event is a JSON object:

| Field       | Type     | Description                                                                        |
| ----------- | -------- | ---------------------------------------------------------------------------------- |
| `id`        | string   | Unique event ID                                                                    |
| `timestamp` | datetime | UTC ISO 8601 with milliseconds                                                     |
| `requestId` | string   | Root request ID                                                                    |
| `type`      | string   | Event type (see below)                                                             |
| `agentPath` | array    | Breadcrumb of agent roles from root to current (e.g. `["main", "research_agent"]`) |
| `depth`     | integer  | Nesting depth in the agent tree                                                    |
| `data`      | object   | Event-specific payload                                                             |

The stream ends with `data: [DONE]`.

The response includes the header `X-Request-Id` with the root request ID.

## Event types

### Lifecycle events

#### `requestStarted`

Emitted when the request begins processing.

```json
{
  "flowName": "my_flow",
  "contextId": "flow_object_id",
  "model": "gpt-4o",
  "hasTools": true,
  "toolCount": 3
}
```

#### `requestCompleted`

Emitted when the entire request (including all agents) finishes.

```json
{
  "durationMs": 5420,
  "totalCreditsUsed": 8200,
  "totalCostUsd": 0.0082,
  "totalTokensIn": 1250,
  "totalTokensOut": 890,
  "totalInferenceCount": 4,
  "totalToolCalls": 2,
  "totalAgentSpawns": 1,
  "maxDepthReached": 1,
  "result": { "textResponse": "..." }
}
```

#### `requestFailed`

Emitted if the request fails fatally.

### Agent events

#### `agentStarted`

An agent (sub-template) begins execution.

```json
{
  "agentRole": "research_agent",
  "templateId": "template_object_id",
  "templateName": "research",
  "parentRequestId": "parent_id",
  "childRequestId": "child_id",
  "parameters": { "topic": "quantum computing" }
}
```

#### `agentInferenceStarted`

An LLM call begins within an agent.

```json
{
  "inferenceRequestId": "inference_id",
  "model": "gpt-4o",
  "provider": "openai",
  "messageCount": 5,
  "hasTools": true,
  "toolCount": 2
}
```

#### `agentInferenceCompleted`

An LLM call finishes.

```json
{
  "inferenceRequestId": "inference_id",
  "model": "gpt-4o",
  "provider": "openai",
  "durationMs": 1200,
  "usage": {
    "tokensIn": 450,
    "tokensOut": 200,
    "reasoningTokens": 0,
    "costCredits": 650,
    "costUsd": 0.00065
  },
  "finishReason": "stop",
  "toolCallCount": 0,
  "wasCached": false,
  "wasFallback": false
}
```

#### `agentCompleted`

An agent finishes all its work.

```json
{
  "childRequestId": "child_id",
  "agentRole": "research_agent",
  "durationMs": 3200,
  "totalCreditsUsed": 4700,
  "directCreditsUsed": 4700,
  "inferenceCount": 2,
  "toolCallCount": 1,
  "childAgentCount": 0,
  "status": "completed",
  "resultPreview": "Based on my research..."
}
```

#### `agentFailed`

An agent encounters an error.

### Tool events

#### `toolStarted`

A tool is about to be invoked.

```json
{
  "toolName": "search_api",
  "toolType": "External",
  "callId": "call_abc123",
  "arguments": { "query": "quantum computing breakthroughs 2025" },
  "targetUrl": "https://api.example.com/search"
}
```

#### `toolCompleted`

A tool invocation finishes.

```json
{
  "toolName": "search_api",
  "toolType": "External",
  "callId": "call_abc123",
  "durationMs": 450,
  "status": "success",
  "resultPreview": "Found 15 results for...",
  "resultSize": 4200
}
```

For agent-type tools, also includes:

```json
{
  "childRequestId": "spawned_request_id",
  "childCreditsUsed": 2300
}
```

#### `toolFailed`

A tool invocation errors.

### Usage events

#### `usageUpdate`

Periodic cost and progress updates (interval controlled by `usage_interval_ms`).

```json
{
  "cumulativeCreditsUsed": 3400,
  "cumulativeCostUsd": 0.0034,
  "cumulativeTokensIn": 800,
  "cumulativeTokensOut": 350,
  "activeAgents": 1,
  "completedAgents": 2,
  "elapsedMs": 4500
}
```

### System events

#### `heartbeat`

Keep-alive sent at `heartbeat_interval_ms` intervals.

```json
{
  "elapsedMs": 30000,
  "eventCount": 12
}
```

#### `error`

A recoverable or non-recoverable error occurred.

```json
{
  "code": "COST_LIMIT_EXCEEDED",
  "message": "Request exceeded the maximum cost of 50000 credits",
  "agentRole": "research_agent",
  "recoverable": false
}
```

## Filtering events

Use `event_types` in stream options to receive only the events you need:

```json
{
  "stream": true,
  "stream_options": {
    "event_types": ["requestCompleted", "error", "usageUpdate"]
  }
}
```

## Client example

```python
import json
import httpx

with httpx.stream(
    "POST",
    "https://app.promptshuttle.com/api/v1/chat/completions",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model": "openai/gpt-4o",
        "messages": [{"role": "user", "content": [{"type": "text", "text": "Hello"}]}],
        "stream": True,
    },
) as response:
    for line in response.iter_lines():
        if line.startswith("data: "):
            payload = line[6:]
            if payload == "[DONE]":
                break
            event = json.loads(payload)
            print(f"[{event['type']}] {json.dumps(event['data'], indent=2)}")
```
