Endpoint
stream: true for SSE.
Quickstart
Request body
| Field | Type | Description |
|---|---|---|
model | string | mirobody-flash (default) or mirobody-expert. See Models. |
messages | array | OpenAI {role, content}. A system message is folded into the last user turn as a caller instruction (see System prompts). |
stream | bool | true → SSE stream of chat.completion.chunk frames. |
user | string | Tenant-isolation key → a Subject. Pass each end-user’s stable id. See multi-tenancy. |
retention | string | permanent (default) / session / 1d / none — applies to any health_context you attach. |
session_id | string | Required when retention=session; also threads multi-turn context. |
health_context | array | Inline health records for this request, handled per retention. See Inline data. |
mcp_servers | array | Bring-your-own remote MCP tools {name, url, access_token?}. See Bring Your Own Tools. |
include_tool_steps | bool | Return detailed tool_steps (default false). When false, tool_steps[].result is truncated to 2000 chars. |
Response
message.content carries one clean final answer. The gather phase (thinking + tool calls) is surfaced in separate channels, so the answer itself is never polluted by tool narration.
| Channel | Content |
|---|---|
message.content | Final answer only (the “reply” channel). |
message.reasoning_content | Gather-phase thinking text (DeepSeek/Qwen convention). |
message.tool_steps[] | Each tool call: {id, type, name, arguments, result, status}, merged by id. |
health_records / citations | Traceable record references the answer relied on. |
usage | Token accounting. |
reasoning_content and tool_steps are optional, additive channels — clients that only read message.content keep working.
Streaming
Withstream: true, frames arrive in order — tool/thinking deltas first, then a single uninterrupted answer stream:
tool_steps by id. Clients that only read delta.content keep working — the answer is one clean stream.
Inline health_context
Attach data for just this turn instead of writing it to the store first:retention="none" the data is purged when the request finishes — nothing is persisted.
System prompts
Asystem message sets the tone, persona, format, and language of the answer. Data gathering is platform-controlled — the agent always reads the Subject’s real data — so a system prompt shapes how the answer reads, never whether real data is used.
Built-in tools
You don’t call these — the agent does, against the Subject’s data:search_health_indicators,fetch_health_data— find and read the Subject’s recordsread_file,ls— read parsed files uploaded via/v1/fileseval— compute stats (mean / min / max / trends) over fetched data
mcp_servers.
Charts (vis-chart)
When the answer involves a trend / comparison / distribution, it may embed a fenced```vis-chart block of pure-data JSON ({type, title, axisXTitle, axisYTitle, data}). Rendering is the client’s job — the API only returns the data spec. Detect the fenced block and render it (line / area / bar / pie).