# Streaming (SSE)

PromptShuttle supports Server-Sent Events for real-time visibility into multi-agent execution. When you enable streaming on the OpenAI-compatible endpoint, you get a structured event stream covering the full lifecycle of your request — from agent starts through tool calls to final completion.

## Enabling streaming

Set `stream: true` on the OpenAI-compatible endpoint:

```bash
curl -N -X POST https://app.promptshuttle.com/api/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": [{"type": "text", "text": "Research quantum computing"}]}],
    "stream": true,
    "stream_options": {
      "include_usage": true,
      "usage_interval_ms": 3000
    }
  }'
```

## Response format

The response uses standard SSE format with `Content-Type: text/event-stream`:

```
id: 67a1b2c3d4e5f6a7b8c9d0e1
data: {"id":"...","timestamp":"...","requestId":"...","type":"requestStarted","data":{...}}

id: 67a1b2c3d4e5f6a7b8c9d0e2
data: {"id":"...","timestamp":"...","requestId":"...","type":"agentInferenceStarted","data":{...}}

...

data: [DONE]
```

Each event is a JSON object:

| Field       | Type     | Description                                                                        |
| ----------- | -------- | ---------------------------------------------------------------------------------- |
| `id`        | string   | Unique event ID                                                                    |
| `timestamp` | datetime | UTC ISO 8601 with milliseconds                                                     |
| `requestId` | string   | Root request ID                                                                    |
| `type`      | string   | Event type (see below)                                                             |
| `agentPath` | array    | Breadcrumb of agent roles from root to current (e.g. `["main", "research_agent"]`) |
| `depth`     | integer  | Nesting depth in the agent tree                                                    |
| `data`      | object   | Event-specific payload                                                             |

The stream ends with `data: [DONE]`.

The response includes the header `X-Request-Id` with the root request ID.

## Event types

### Lifecycle events

#### `requestStarted`

Emitted when the request begins processing.

```json
{
  "flowName": "my_flow",
  "contextId": "flow_object_id",
  "model": "gpt-4o",
  "hasTools": true,
  "toolCount": 3
}
```

#### `requestCompleted`

Emitted when the entire request (including all agents) finishes.

```json
{
  "durationMs": 5420,
  "totalCreditsUsed": 8200,
  "totalCostUsd": 0.0082,
  "totalTokensIn": 1250,
  "totalTokensOut": 890,
  "totalInferenceCount": 4,
  "totalToolCalls": 2,
  "totalAgentSpawns": 1,
  "maxDepthReached": 1,
  "result": { "textResponse": "..." }
}
```

#### `requestFailed`

Emitted if the request fails fatally.

### Agent events

#### `agentStarted`

An agent (sub-template) begins execution.

```json
{
  "agentRole": "research_agent",
  "templateId": "template_object_id",
  "templateName": "research",
  "parentRequestId": "parent_id",
  "childRequestId": "child_id",
  "parameters": { "topic": "quantum computing" }
}
```

#### `agentInferenceStarted`

An LLM call begins within an agent.

```json
{
  "inferenceRequestId": "inference_id",
  "model": "gpt-4o",
  "provider": "openai",
  "messageCount": 5,
  "hasTools": true,
  "toolCount": 2
}
```

#### `agentInferenceCompleted`

An LLM call finishes.

```json
{
  "inferenceRequestId": "inference_id",
  "model": "gpt-4o",
  "provider": "openai",
  "durationMs": 1200,
  "usage": {
    "tokensIn": 450,
    "tokensOut": 200,
    "reasoningTokens": 0,
    "costCredits": 650,
    "costUsd": 0.00065
  },
  "finishReason": "stop",
  "toolCallCount": 0,
  "wasCached": false,
  "wasFallback": false
}
```

#### `agentCompleted`

An agent finishes all its work.

```json
{
  "childRequestId": "child_id",
  "agentRole": "research_agent",
  "durationMs": 3200,
  "totalCreditsUsed": 4700,
  "directCreditsUsed": 4700,
  "inferenceCount": 2,
  "toolCallCount": 1,
  "childAgentCount": 0,
  "status": "completed",
  "resultPreview": "Based on my research..."
}
```

#### `agentFailed`

An agent encounters an error.

### Tool events

#### `toolStarted`

A tool is about to be invoked.

```json
{
  "toolName": "search_api",
  "toolType": "External",
  "callId": "call_abc123",
  "arguments": { "query": "quantum computing breakthroughs 2025" },
  "targetUrl": "https://api.example.com/search"
}
```

#### `toolCompleted`

A tool invocation finishes.

```json
{
  "toolName": "search_api",
  "toolType": "External",
  "callId": "call_abc123",
  "durationMs": 450,
  "status": "success",
  "resultPreview": "Found 15 results for...",
  "resultSize": 4200
}
```

For agent-type tools, also includes:

```json
{
  "childRequestId": "spawned_request_id",
  "childCreditsUsed": 2300
}
```

#### `toolFailed`

A tool invocation errors.

### Usage events

#### `usageUpdate`

Periodic cost and progress updates (interval controlled by `usage_interval_ms`).

```json
{
  "cumulativeCreditsUsed": 3400,
  "cumulativeCostUsd": 0.0034,
  "cumulativeTokensIn": 800,
  "cumulativeTokensOut": 350,
  "activeAgents": 1,
  "completedAgents": 2,
  "elapsedMs": 4500
}
```

### System events

#### `heartbeat`

Keep-alive sent at `heartbeat_interval_ms` intervals.

```json
{
  "elapsedMs": 30000,
  "eventCount": 12
}
```

#### `error`

A recoverable or non-recoverable error occurred.

```json
{
  "code": "COST_LIMIT_EXCEEDED",
  "message": "Request exceeded the maximum cost of 50000 credits",
  "agentRole": "research_agent",
  "recoverable": false
}
```

## Filtering events

Use `event_types` in stream options to receive only the events you need:

```json
{
  "stream": true,
  "stream_options": {
    "event_types": ["requestCompleted", "error", "usageUpdate"]
  }
}
```

## Client example

```python
import json
import httpx

with httpx.stream(
    "POST",
    "https://app.promptshuttle.com/api/v1/chat/completions",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "model": "openai/gpt-4o",
        "messages": [{"role": "user", "content": [{"type": "text", "text": "Hello"}]}],
        "stream": True,
    },
) as response:
    for line in response.iter_lines():
        if line.startswith("data: "):
            payload = line[6:]
            if payload == "[DONE]":
                break
            event = json.loads(payload)
            print(f"[{event['type']}] {json.dumps(event['data'], indent=2)}")
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.promptshuttle.com/api-reference/streaming.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
