Architecture Overview
The inference gateway operates as a lightweight FastAPI service that validates requests, enforces resource limits, and forwards approved requests to external AI providers:API Endpoints
POST /agents/inference
Provides text generation capabilities to agents with comprehensive validation and cost control.
Request Format
Response Format
Provides text embedding services for semantic analysis and vector operations.
Request Format
Response Format
Security Features
Request Isolation
Each agent request is validated against the database to ensure:- Run Authorization: Only valid evaluation runs can make requests
- Status Validation: Requests only accepted from properly initialized sandboxes
- Resource Limits: Per-run cost caps prevent abuse

