Skip to main content

Endpoint

POST /v1/chat/completions
Authorization: Bearer mb_live_*
Content-Type:  application/json
OpenAI-compatible. The agent gathers the Subject’s real health data with built-in tools, then composes one clean answer with traceable citations. Set stream: true for SSE.

Quickstart

curl https://mcp.thetahealth.cn/v1/chat/completions \
  -H "Authorization: Bearer $MIROBODY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "mirobody-flash",
        "messages": [{"role": "user", "content": "How is my fasting glucose trending?"}],
        "user": "alice"
      }'

Request body

FieldTypeDescription
modelstringmirobody-flash (default) or mirobody-expert. See Models.
messagesarrayOpenAI {role, content}. A system message is folded into the last user turn as a caller instruction (see System prompts).
streambooltrue → SSE stream of chat.completion.chunk frames.
userstringTenant-isolation key → a Subject. Pass each end-user’s stable id. See multi-tenancy.
retentionstringpermanent (default) / session / 1d / none — applies to any health_context you attach.
session_idstringRequired when retention=session; also threads multi-turn context.
health_contextarrayInline health records for this request, handled per retention. See Inline data.
mcp_serversarrayBring-your-own remote MCP tools {name, url, access_token?}. See Bring Your Own Tools.
include_tool_stepsboolReturn detailed tool_steps (default false). When false, tool_steps[].result is truncated to 2000 chars.

Response

message.content carries one clean final answer. The gather phase (thinking + tool calls) is surfaced in separate channels, so the answer itself is never polluted by tool narration.
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "mirobody-flash",
  "choices": [{
    "index": 0,
    "finish_reason": "stop",
    "message": {
      "role": "assistant",
      "content": "Your fasting glucose has trended down ~8% over the last 30 days ...",
      "reasoning_content": "Searching health indicators for fasting glucose ...",  // optional
      "tool_steps": [                                                              // optional, ordered
        { "id": "call_abc", "type": "tool_call", "name": "search_health_indicators",
          "arguments": { "keywords": ["fasting glucose"] },
          "result": "{...}", "status": "ok" }
      ]
    }
  }],
  "usage": { "prompt_tokens": 0, "completion_tokens": 0, "reasoning_tokens": 0, "total_tokens": 0 },
  "health_records": [ /* traceable records the answer used */ ],
  "citations": ["record:1", "record:2"]
}
ChannelContent
message.contentFinal answer only (the “reply” channel).
message.reasoning_contentGather-phase thinking text (DeepSeek/Qwen convention).
message.tool_steps[]Each tool call: {id, type, name, arguments, result, status}, merged by id.
health_records / citationsTraceable record references the answer relied on.
usageToken accounting.
reasoning_content and tool_steps are optional, additive channels — clients that only read message.content keep working.

Streaming

With stream: true, frames arrive in order — tool/thinking deltas first, then a single uninterrupted answer stream:
{"delta":{"role":"assistant"}}
{"delta":{"reasoning_content":"<fragment>"}}
{"delta":{"tool_steps":[{"id":"call_abc","name":"search_health_indicators","status":"running"}]}}
{"delta":{"tool_steps":[{"id":"call_abc","arguments":{...}}]}}
{"delta":{"tool_steps":[{"id":"call_abc","status":"ok","result":"..."}]}}
{"delta":{"content":"<answer fragment>"}}
{"delta":{},"finish_reason":"stop","usage":{...}}
data: [DONE]
Merge tool_steps by id. Clients that only read delta.content keep working — the answer is one clean stream.

Inline health_context

Attach data for just this turn instead of writing it to the store first:
resp = client.chat.completions.create(
    model="mirobody-flash",
    messages=[{"role": "user", "content": "Is this reading high?"}],
    user="alice",
    extra_body={
        "health_context": [{"indicator": "systolic_bp", "value": 148, "unit": "mmHg", "time": "2026-06-16T08:00:00Z"}],
        "retention": "none",   # used this turn, then deleted
    },
)
With retention="none" the data is purged when the request finishes — nothing is persisted.

System prompts

A system message sets the tone, persona, format, and language of the answer. Data gathering is platform-controlled — the agent always reads the Subject’s real data — so a system prompt shapes how the answer reads, never whether real data is used.

Built-in tools

You don’t call these — the agent does, against the Subject’s data:
  • search_health_indicators, fetch_health_data — find and read the Subject’s records
  • read_file, ls — read parsed files uploaded via /v1/files
  • eval — compute stats (mean / min / max / trends) over fetched data
To add your own tools, attach remote MCP servers via mcp_servers.

Charts (vis-chart)

When the answer involves a trend / comparison / distribution, it may embed a fenced ```vis-chart block of pure-data JSON ({type, title, axisXTitle, axisYTitle, data}). Rendering is the client’s job — the API only returns the data spec. Detect the fenced block and render it (line / area / bar / pie).