Overview
This module provides comprehensive health data file processing capabilities, including file upload, health indicator extraction, and file deletion. The system uses a modular design, supports multiple file types, and provides real-time progress feedback.Features
- ✅ Multiple Upload Methods: WebSocket real-time upload and REST API batch upload
- ✅ Real-time Progress Feedback: WebSocket connections provide real-time progress updates
- ✅ Smart File Recognition: Automatically identifies file types and selects appropriate handler
- ✅ Health Indicator Extraction: Uses LLM to automatically extract health indicator data
- ✅ Multi-format Support: PDF, images, audio, genetic data, and more
- ✅ PDF Parallel Processing: Multi-page PDFs processed in parallel for efficiency
- ✅ File Summary Generation: Automatically generates file content summaries
- ✅ Cascade Deletion: Automatically cleans up associated health data when files are deleted
Supported File Types
| File Type | MIME Type | Handler | Description |
|---|---|---|---|
application/pdf | PDFHandler | Multi-page parallel processing with automatic health indicator extraction | |
| Images | image/* | ImageHandler | Supports JPEG, PNG, GIF, WebP; recognizes health reports and extracts indicators |
| Audio | audio/* | AudioHandler | Speech-to-text conversion for extracting verbal health information |
| Genetic Data | Specific formats | GeneticHandler | Genetic test report parsing |
File Upload Methods
WebSocket Upload (Recommended)
WebSocket upload provides real-time progress feedback, ideal for large file uploads and scenarios requiring real-time status updates.Endpoint
Connection Flow
Message Types
1. Upload Start (upload_start)
Client sends:| Field | Type | Required | Description |
|---|---|---|---|
| type | string | ✅ | Must be “upload_start” |
| messageId | string | ❌ | Unique message ID, auto-generated if not provided |
| sessionId | string | ✅ | Session ID |
| query | string | ❌ | User notes or query text |
| isFirstMessage | boolean | ❌ | Whether this is the first message of a new session |
| query_user_id | string | ❌ | Target user ID for proxy uploads |
| files | array | ✅ | Array of file metadata |
2. Upload Chunk (upload_chunk)
Client sends:3. File Received (file_received)
4. Upload Progress (upload_progress)
5. Upload Completed (upload_completed)
6. Heartbeat (ping/pong)
Client sends:Timeout Mechanism
| Type | Duration | Description |
|---|---|---|
| Idle Timeout | 5 minutes | Auto-disconnect when no activity |
| Upload Timeout | 30 minutes | Extended timeout during active uploads |
| Heartbeat | 30 seconds | Recommended ping interval |
REST API Upload
REST API provides a simple file upload method, suitable for simple scenarios or applications that don’t require real-time progress.Endpoint
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| files | File[] | ✅ | List of files to upload |
| folder | string | ❌ | Custom folder prefix, defaults to ‘uploads’ |
Response Example
Response Codes
| Code | Description |
|---|---|
| 0 | All uploads successful |
| 1 | Partial or complete failure |
Processing Architecture
Overview
Processing Steps
1. File Upload Phase (0-30%)
- Receive file data
- Validate file type and size
- Generate unique file identifier
- Upload to object storage (S3/OSS/MinIO)
2. File Type Recognition (30-35%)
The system automatically identifies file types viaFileHandlerFactory:
3. Content Processing Phase (35-90%)
PDF File Processing:4. Summary Generation Phase (90-95%)
- Generate file content summary using LLM
- Generate intelligent file name
5. Result Saving Phase (95-100%)
- Save processing results to database
- Sync health indicators to
th_series_data - Update user health profile
Health Indicator Extraction
Configuration
config.yaml
Supported Indicator Types
- Complete Blood Count (WBC, RBC, Hemoglobin, etc.)
- Biochemistry (Liver function, Kidney function, Lipids, etc.)
- Physical Examination (Blood pressure, Heart rate, Weight, etc.)
- Tumor Markers
- Thyroid Function
- Other Medical Test Indicators
Extraction Result Format
Data Storage
Extracted indicator data is stored in theth_series_data table:
| Field | Description |
|---|---|
| user_id | User ID |
| indicator_id | Indicator ID (linked to indicator dimension table) |
| value | Indicator value |
| unit | Unit of measurement |
| source_table | Source table name |
| source_table_id | Source record ID (message_id + file_key) |
| recorded_at | Test date/time |
File Deletion
Endpoint
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| message_id | string | ✅ | Message ID |
| file_keys | string[] | ❌ | List of file keys to delete; if empty, deletes all files |
Response Example
Cascade Deletion
When deleting files, the system automatically performs cascade deletion:- Storage Deletion: Delete file from object storage (S3/OSS)
- Database Update: Update file list in
th_messagestable - Health Data Cleanup: Delete associated health indicators from
th_series_data - Genetic Data Cleanup: If genetic file, delete data from
th_genetic_data - Message Marking: If all files are deleted, mark message as deleted
API Reference
File Service Endpoints
| Method | Endpoint | Description |
|---|---|---|
| WS | /ws/upload-health-report | WebSocket file upload |
| POST | /files/upload | REST API file upload |
| POST | /api/v1/data/delete-files | Delete files |
| GET | /files/{file_path} | Get file content (proxy access) |
Authentication
All endpoints require a valid authentication token:- REST API: Use
Authorization: Bearer <token>header - WebSocket: Pass via URL parameter
?token=<token>
Data Models
FileUploadData
FileDeleteRequest
FileProcessingResult
Error Handling
WebSocket Error Messages
REST API Error Response
Common Errors
| Error Type | Description | Solution |
|---|---|---|
| Invalid token | Token is invalid or expired | Obtain a new token |
| File type not supported | Unsupported file type | Use a supported file format |
| File is empty | File has no content | Check file content |
| Upload session not found | Upload session doesn’t exist | Restart the upload |
| Permission denied | No permission for operation | Check user permissions |
Timeout Handling
WebSocket connections receive a notification on timeout:Best Practices
Large File Uploads
- Use WebSocket upload with chunked transfer
- Recommended chunk size: 1MB
- Implement resumable upload mechanism
Batch Uploads
- Limit to 10 files per upload
- Total file size should not exceed 100MB
Progress Monitoring
- Listen for
upload_progressmessages during WebSocket uploads - Handle
file_progressto display individual file progress
Error Handling
- Implement retry mechanism (recommended: max 3 retries)
- Capture and display user-friendly error messages
Connection Keep-alive
- Send ping every 30 seconds for WebSocket connections
- Handle pong response to confirm connection status