The platform consists of three core components:
1. Logging Agent
The agent component is responsible for intercepting and recording LLM inference requests:
- Request Capture: Intercept API calls and record complete request parameters
- Response Recording: Capture model outputs, including incremental data from streaming responses
- Metadata Extraction: Automatically extract model name, token usage, response time, etc.
- Sampling Control: Support ratio-based sampling to balance data integrity and storage costs
The agent can be deployed as:
- Reverse Proxy: Located between the client and model service
- Sidecar: Deployed alongside the application container
- SDK Integration: Directly embedded into applications via Python/Node.js SDK
2. Ingestion Service
The ingestion service is responsible for receiving, processing, and storing log data:
- Data Validation: Verify log format and filter invalid data
- Data Enhancement: Calculate derived metrics such as token rate and cost estimation
- Data Conversion: Support multiple output formats (JSON, Parquet, etc.)
- Bulk Writing: Optimize write performance to support high-throughput scenarios
3. Storage and Query Layer
The platform supports multiple storage backends:
- Time-Series Databases: Such as InfluxDB, TimescaleDB, suitable for metric storage
- Object Storage: Such as S3, MinIO, suitable for raw log archiving
- Analytics Databases: Such as ClickHouse, suitable for complex queries and analysis
- Hybrid Mode: Hot data stored in time-series databases, cold data stored in object storage