Architecture Overview
The proxy operates as a lightweight FastAPI service that validates requests, enforces resource limits, and forwards approved requests to external AI providers:Core Components
Request Validation System
The proxy implements comprehensive validation to ensure only legitimate agent requests are processed:Database Validation
The proxy validates each request by checking that the run_id exists in the evaluation_runs table and that the evaluation is in the correct state for AI service access.Status Requirements
- Sandbox Status: Only requests from
sandbox_createdevaluation runs are accepted - Run ID Validation: Every request must include a valid
run_idfrom active evaluations - Authentication: Implicit authentication through run_id validation
Cost Control System
The proxy enforces strict cost limits to prevent resource abuse:Per-Run Cost Tracking
The proxy tracks cumulative costs for each evaluation run and rejects requests that would exceed the configured maximum cost per run limit.Cost Calculation
- Inference: Token-based pricing with model-specific rates
- Embeddings: Time-based pricing per request
- Aggregation: Real-time cost tracking per evaluation run
Chutes AI Integration
The proxy seamlessly forwards validated requests to Chutes AI services:Inference Endpoint
The proxy creates detailed records for each inference request, forwards them to Chutes AI, calculates costs based on token usage and model pricing, and updates the database with comprehensive usage tracking.Embedding Endpoint
The proxy tracks request timing for embedding operations, forwards requests to Chutes embedding services, and calculates time-based costs for accurate usage billing.API Endpoints
POST /agents/inference
Provides text generation capabilities to agents with comprehensive validation and cost control.
Request Format
Response Format
POST /agents/embedding
Provides text embedding services for semantic analysis and vector operations.
Request Format
Response Format
GET /health
Simple health check endpoint for monitoring and system status verification.
Configuration
Environment Variables
Required Configuration
Optional Configuration
Model Pricing Configuration
The proxy supports flexible pricing for different AI models, with configurable rates per token for inference and time-based pricing for embeddings.Security Features
Request Isolation
Each agent request is validated against the database to ensure:- Run Authorization: Only valid evaluation runs can make requests
- Status Validation: Requests only accepted from properly initialized sandboxes
- Resource Limits: Per-run cost caps prevent abuse
Data Protection
- No Persistent Storage: Request/response data not stored beyond cost tracking
- Secure Transmission: HTTPS encryption for all external API calls
- Error Handling: Detailed errors logged but sanitized responses to agents
Development Mode
For local testing and development, the proxy can skip database validation when configured in development mode, allowing easier testing without full infrastructure setup.Database Integration
Evaluation Run Tracking
The proxy maintains detailed records of all inference and embedding requests, tracking costs, usage metrics, and timing data for comprehensive evaluation analytics.Cost Aggregation
Real-time cost tracking prevents budget overruns by continuously monitoring cumulative costs for each evaluation run and rejecting requests that would exceed limits.Error Handling
Common Error Scenarios
The proxy handles various error conditions including invalid run IDs, incorrect sandbox states, cost limit violations, and external service failures. Each error type returns appropriate status codes and descriptive messages to help with debugging while maintaining security.Performance Considerations
Connection Management
- Async HTTP Client: Non-blocking requests to external services
- Connection Pooling: Efficient database connection reuse
- Request Timeout: Configurable timeouts prevent hanging requests
Monitoring & Logging
The proxy provides comprehensive logging for request completion, cost limit warnings, and error tracking to support operations and debugging.Scalability
- Stateless Design: Easy horizontal scaling without shared state
- Database Efficiency: Optimized queries for cost calculation
- Resource Limits: Built-in protection against resource exhaustion

