Chat Completions - Mirobody

Endpoint

POST /v1/chat/completions
Authorization: Bearer mb_live_*
Content-Type:  application/json

OpenAI-compatible. The agent gathers the Subject’s real health data with built-in tools, then composes one clean answer with traceable citations. Set stream: true for SSE.

Quickstart

curl https://mcp.thetahealth.cn/v1/chat/completions \
  -H "Authorization: Bearer $MIROBODY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "mirobody-flash",
        "messages": [{"role": "user", "content": "How is my fasting glucose trending?"}],
        "user": "alice"
      }'

Request body

Field	Type	Description
`model`	string	`mirobody-flash` (default) or `mirobody-expert`. See Models.
`messages`	array	OpenAI `{role, content}`. A `system` message is folded into the last user turn as a caller instruction (see System prompts).
`stream`	bool	`true` → SSE stream of `chat.completion.chunk` frames.
`user`	string	Tenant-isolation key → a Subject. Pass each end-user’s stable id. See multi-tenancy.
`retention`	string	`permanent` (default) / `session` / `1d` / `none` — applies to any `health_context` you attach.
`session_id`	string	Required when `retention=session`; also threads multi-turn context.
`health_context`	array	Inline health records for this request, handled per `retention`. See Inline data.
`mcp_servers`	array	Bring-your-own remote MCP tools `{name, url, access_token?}`. See Bring Your Own Tools.
`include_tool_steps`	bool	Return detailed `tool_steps` (default `false`). When `false`, `tool_steps[].result` is truncated to 2000 chars.

Response

message.content carries one clean final answer. The gather phase (thinking + tool calls) is surfaced in separate channels, so the answer itself is never polluted by tool narration.

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "mirobody-flash",
  "choices": [{
    "index": 0,
    "finish_reason": "stop",
    "message": {
      "role": "assistant",
      "content": "Your fasting glucose has trended down ~8% over the last 30 days ...",
      "reasoning_content": "Searching health indicators for fasting glucose ...",  // optional
      "tool_steps": [                                                              // optional, ordered
        { "id": "call_abc", "type": "tool_call", "name": "search_health_indicators",
          "arguments": { "keywords": ["fasting glucose"] },
          "result": "{...}", "status": "ok" }
      ]
    }
  }],
  "usage": { "prompt_tokens": 0, "completion_tokens": 0, "reasoning_tokens": 0, "total_tokens": 0 },
  "health_records": [ /* traceable records the answer used */ ],
  "citations": ["record:1", "record:2"]
}

Channel	Content
`message.content`	Final answer only (the “reply” channel).
`message.reasoning_content`	Gather-phase thinking text (DeepSeek/Qwen convention).
`message.tool_steps[]`	Each tool call: `{id, type, name, arguments, result, status}`, merged by `id`.
`health_records` / `citations`	Traceable record references the answer relied on.
`usage`	Token accounting.

reasoning_content and tool_steps are optional, additive channels — clients that only read message.content keep working.

Streaming

With stream: true, frames arrive in order — tool/thinking deltas first, then a single uninterrupted answer stream:

{"delta":{"role":"assistant"}}
{"delta":{"reasoning_content":"<fragment>"}}
{"delta":{"tool_steps":[{"id":"call_abc","name":"search_health_indicators","status":"running"}]}}
{"delta":{"tool_steps":[{"id":"call_abc","arguments":{...}}]}}
{"delta":{"tool_steps":[{"id":"call_abc","status":"ok","result":"..."}]}}
{"delta":{"content":"<answer fragment>"}}
{"delta":{},"finish_reason":"stop","usage":{...}}
data: [DONE]

Merge tool_steps by id. Clients that only read delta.content keep working — the answer is one clean stream.

Inline health_context

Attach data for just this turn instead of writing it to the store first:

resp = client.chat.completions.create(
    model="mirobody-flash",
    messages=[{"role": "user", "content": "Is this reading high?"}],
    user="alice",
    extra_body={
        "health_context": [{"indicator": "systolic_bp", "value": 148, "unit": "mmHg", "time": "2026-06-16T08:00:00Z"}],
        "retention": "none",   # used this turn, then deleted
    },
)

With retention="none" the data is purged when the request finishes — nothing is persisted.

System prompts

A system message sets the tone, persona, format, and language of the answer. Data gathering is platform-controlled — the agent always reads the Subject’s real data — so a system prompt shapes how the answer reads, never whether real data is used.

Built-in tools

You don’t call these — the agent does, against the Subject’s data:

search_health_indicators, fetch_health_data — find and read the Subject’s records
read_file, ls — read parsed files uploaded via /v1/files
eval — compute stats (mean / min / max / trends) over fetched data

To add your own tools, attach remote MCP servers via mcp_servers.

Charts (vis-chart)

When the answer involves a trend / comparison / distribution, it may embed a fenced ```vis-chart block of pure-data JSON ({type, title, axisXTitle, axisYTitle, data}). Rendering is the client’s job — the API only returns the data spec. Detect the fenced block and render it (line / area / bar / pie).

​Endpoint

​Quickstart

​Request body

​Response

​Streaming

​Inline health_context

​System prompts

​Built-in tools

​Charts (vis-chart)