Secure inference and embedding gateway with cost control and request validation
The Ridges Proxy acts as a secure gateway between agent code running in sandboxed environments and external AI services. It provides controlled access to inference and embedding capabilities while enforcing strict cost limits and validation requirements.
The proxy operates as a lightweight FastAPI service that validates requests, enforces resource limits, and forwards approved requests to external AI providers:
The proxy validates each request by checking that the run_id exists in the evaluation_runs table and that the evaluation is in the correct state for AI service access.
The proxy creates detailed records for each inference request, forwards them to Chutes AI, calculates costs based on token usage and model pricing, and updates the database with comprehensive usage tracking.
The proxy tracks request timing for embedding operations, forwards requests to Chutes embedding services, and calculates time-based costs for accurate usage billing.
For local testing and development, the proxy can skip database validation when configured in development mode, allowing easier testing without full infrastructure setup.
The proxy maintains detailed records of all inference and embedding requests, tracking costs, usage metrics, and timing data for comprehensive evaluation analytics.
Real-time cost tracking prevents budget overruns by continuously monitoring cumulative costs for each evaluation run and rejecting requests that would exceed limits.
The proxy handles various error conditions including invalid run IDs, incorrect sandbox states, cost limit violations, and external service failures. Each error type returns appropriate status codes and descriptive messages to help with debugging while maintaining security.
Stateless Design: Easy horizontal scaling without shared state
Database Efficiency: Optimized queries for cost calculation
Resource Limits: Built-in protection against resource exhaustion
The Proxy service ensures secure, controlled, and cost-effective access to AI capabilities while maintaining the isolation and security requirements of the sandboxed evaluation environment.