Overview
This module provides comprehensive health data file processing capabilities, including file upload, health indicator extraction, and file deletion. The system uses a modular design, supports multiple file types, and provides real-time progress feedback.Features
- ✅ Multiple Upload Methods: WebSocket real-time upload and REST API batch upload
- ✅ Real-time Progress Feedback: WebSocket connections provide real-time progress updates
- ✅ Smart File Recognition: Automatically identifies file types and selects appropriate handler
- ✅ Health Indicator Extraction: Uses LLM to automatically extract health indicator data
- ✅ Multi-format Support: PDF, images, audio, genetic data, and more
- ✅ PDF Parallel Processing: Multi-page PDFs processed in parallel for efficiency
- ✅ File Summary Generation: Automatically generates file content summaries
- ✅ Cascade Deletion: Automatically cleans up associated health data when files are deleted
Supported File Types
| File Type | MIME Type | Handler | Description |
|---|---|---|---|
application/pdf | PDFHandler | Multi-page parallel processing with automatic health indicator extraction | |
| Images | image/* | ImageHandler | Supports JPEG, PNG, GIF, WebP; recognizes health reports and extracts indicators |
| Audio | audio/* | AudioHandler | Speech-to-text conversion for extracting verbal health information |
| Genetic Data | Specific formats | GeneticHandler | Genetic test report parsing |
File Upload Methods
WebSocket Upload (Recommended)
WebSocket upload provides real-time progress feedback, ideal for large file uploads and scenarios requiring real-time status updates.Endpoint
Connection Flow
Message Types
1. Upload Start (upload_start)
Client sends:| Field | Type | Required | Description |
|---|---|---|---|
| type | string | ✅ | Must be “upload_start” |
| messageId | string | ❌ | Unique message ID, auto-generated if not provided |
| sessionId | string | ✅ | Session ID |
| query | string | ❌ | User notes or query text |
| isFirstMessage | boolean | ❌ | Whether this is the first message of a new session |
| query_user_id | string | ❌ | Target user ID for proxy uploads |
| files | array | ✅ | Array of file metadata |
2. Upload Chunk (upload_chunk)
Client sends:3. File Received (file_received)
4. Upload Progress (upload_progress)
5. Upload Completed (upload_completed)
6. Heartbeat (ping/pong)
Client sends:Timeout Mechanism
| Type | Duration | Description |
|---|---|---|
| Idle Timeout | 5 minutes | Auto-disconnect when no activity |
| Upload Timeout | 30 minutes | Extended timeout during active uploads |
| Heartbeat | 30 seconds | Recommended ping interval |
REST API Upload
REST API provides a simple file upload method, suitable for simple scenarios or applications that don’t require real-time progress.Endpoint
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| files | File[] | ✅ | List of files to upload |
| folder | string | ❌ | Custom folder prefix, defaults to ‘uploads’ |
Response Example
Response Codes
| Code | Description |
|---|---|
| 0 | All uploads successful |
| 1 | Partial or complete failure |
Processing Architecture
Overview
Processing Steps
1. File Upload Phase (0-30%)
- Receive file data
- Validate file type and size
- Generate unique file identifier
- Upload to object storage (S3/OSS/MinIO)
2. File Type Recognition (30-35%)
The system automatically identifies file types viaFileHandlerFactory:
3. Content Processing Phase (35-90%)
PDF File Processing:4. Summary Generation Phase (90-95%)
- Generate file content summary using LLM
- Generate intelligent file name
5. Result Saving Phase (95-100%)
- Save processing results to database
- Sync health indicators to
th_series_data - Update user health profile
Health Indicator Extraction
Configuration
config.yaml
Supported Indicator Types
- Complete Blood Count (WBC, RBC, Hemoglobin, etc.)
- Biochemistry (Liver function, Kidney function, Lipids, etc.)
- Physical Examination (Blood pressure, Heart rate, Weight, etc.)
- Tumor Markers
- Thyroid Function
- Other Medical Test Indicators
Extraction Result Format
Data Storage
Extracted indicator data is stored in theth_series_data table:
| Field | Description |
|---|---|
| user_id | User ID |
| indicator_id | Indicator ID (linked to indicator dimension table) |
| value | Indicator value |
| unit | Unit of measurement |
| source_table | Source table name |
| source_table_id | Source record ID (message_id + file_key) |
| recorded_at | Test date/time |
File Deletion
Endpoint
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| message_id | string | ✅ | Message ID |
| file_keys | string[] | ❌ | List of file keys to delete; if empty, deletes all files |
Response Example
Cascade Deletion
When deleting files, the system automatically performs cascade deletion:- Storage Deletion: Delete file from object storage (S3/OSS)
- Database Update: Update file list in
th_messagestable - Health Data Cleanup: Delete associated health indicators from
th_series_data - Genetic Data Cleanup: If genetic file, delete data from
th_genetic_data - Message Marking: If all files are deleted, mark message as deleted
API Reference
File Service Endpoints
| Method | Endpoint | Description |
|---|---|---|
| WS | /ws/upload-health-report | WebSocket file upload |
| POST | /files/upload | REST API file upload |
| POST | /api/v1/data/delete-files | Delete files |
| GET | /files/{file_path} | Get file content (proxy access) |
Authentication
All endpoints require a valid authentication token:- REST API: Use
Authorization: Bearer <token>header - WebSocket: Pass via URL parameter
?token=<token>
Data Models
FileUploadData
FileDeleteRequest
FileProcessingResult
Error Handling
WebSocket Error Messages
REST API Error Response
Common Errors
| Error Type | Description | Solution |
|---|---|---|
| Invalid token | Token is invalid or expired | Obtain a new token |
| File type not supported | Unsupported file type | Use a supported file format |
| File is empty | File has no content | Check file content |
| Upload session not found | Upload session doesn’t exist | Restart the upload |
| Permission denied | No permission for operation | Check user permissions |
Timeout Handling
WebSocket connections receive a notification on timeout:Best Practices
Large File Uploads
- Use WebSocket upload with chunked transfer
- Recommended chunk size: 1MB
- Implement resumable upload mechanism
Batch Uploads
- Limit to 10 files per upload
- Total file size should not exceed 100MB
Progress Monitoring
- Listen for
upload_progressmessages during WebSocket uploads - Handle
file_progressto display individual file progress
Error Handling
- Implement retry mechanism (recommended: max 3 retries)
- Capture and display user-friendly error messages
Connection Keep-alive
- Send ping every 30 seconds for WebSocket connections
- Handle pong response to confirm connection status
Code Examples
Python REST API Upload
File Deletion
Next Steps
API Reference
Explore complete API documentation
Configuration
Configure file processing settings
Testing
Test file upload and processing
Architecture
Understand system architecture