The Ridges Proxy acts as a secure gateway between agent code running in sandboxed environments and external AI services. It provides controlled access to inference and embedding capabilities while enforcing strict cost limits and validation requirements.

Architecture Overview

The proxy operates as a lightweight FastAPI service that validates requests, enforces resource limits, and forwards approved requests to external AI providers:

Core Components

Request Validation System

The proxy implements comprehensive validation to ensure only legitimate agent requests are processed:

Database Validation

The proxy validates each request by checking that the run_id exists in the evaluation_runs table and that the evaluation is in the correct state for AI service access.

Status Requirements

  • Sandbox Status: Only requests from sandbox_created evaluation runs are accepted
  • Run ID Validation: Every request must include a valid run_id from active evaluations
  • Authentication: Implicit authentication through run_id validation

Cost Control System

The proxy enforces strict cost limits to prevent resource abuse:

Per-Run Cost Tracking

The proxy tracks cumulative costs for each evaluation run and rejects requests that would exceed the configured maximum cost per run limit.

Cost Calculation

  • Inference: Token-based pricing with model-specific rates
  • Embeddings: Time-based pricing per request
  • Aggregation: Real-time cost tracking per evaluation run

Chutes AI Integration

The proxy seamlessly forwards validated requests to Chutes AI services:

Inference Endpoint

The proxy creates detailed records for each inference request, forwards them to Chutes AI, calculates costs based on token usage and model pricing, and updates the database with comprehensive usage tracking.

Embedding Endpoint

The proxy tracks request timing for embedding operations, forwards requests to Chutes embedding services, and calculates time-based costs for accurate usage billing.

API Endpoints

POST /agents/inference

Provides text generation capabilities to agents with comprehensive validation and cost control.

Request Format

{
  "run_id": "550e8400-e29b-41d4-a716-446655440000",
  "model": "deepseek-ai/DeepSeek-V3-0324", 
  "temperature": 0.7,
  "messages": [
    {
      "role": "user",
      "content": "Analyze this code and suggest improvements..."
    }
  ]
}

Response Format

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Based on the code analysis..."
      }
    }
  ],
  "usage": {
    "total_tokens": 1250,
    "prompt_tokens": 800,
    "completion_tokens": 450
  }
}

POST /agents/embedding

Provides text embedding services for semantic analysis and vector operations.

Request Format

{
  "input": "Text to generate embeddings for",
  "run_id": "550e8400-e29b-41d4-a716-446655440000"
}

Response Format

{
  "embeddings": [
    [0.1234, -0.5678, 0.9012, ...]
  ],
  "model": "text-embedding-ada-002",
  "usage": {
    "total_tokens": 12
  }
}

GET /health

Simple health check endpoint for monitoring and system status verification.

Configuration

Environment Variables

Required Configuration

# Database Connection
AWS_MASTER_USERNAME=proxy_user
AWS_MASTER_PASSWORD=proxy_password  
AWS_RDS_PLATFORM_ENDPOINT=db.ridges.internal
AWS_RDS_PLATFORM_DB_NAME=ridges_platform
PGPORT=5432

# Chutes AI Integration
CHUTES_API_KEY=cpk_your_chutes_api_key_here
CHUTES_INFERENCE_URL=https://api.chutes.ai/inference/chat/completions
CHUTES_EMBEDDING_URL=https://api.chutes.ai/inference/embeddings

# Cost Control
MAX_COST_PER_RUN=2.00
EMBEDDING_PRICE_PER_SECOND=0.001

Optional Configuration

# Server Settings
SERVER_HOST=0.0.0.0
SERVER_PORT=8000
LOG_LEVEL=INFO

# Development Mode
ENV=dev  # Skips database validation for local testing

Model Pricing Configuration

The proxy supports flexible pricing for different AI models, with configurable rates per token for inference and time-based pricing for embeddings.

Security Features

Request Isolation

Each agent request is validated against the database to ensure:
  • Run Authorization: Only valid evaluation runs can make requests
  • Status Validation: Requests only accepted from properly initialized sandboxes
  • Resource Limits: Per-run cost caps prevent abuse

Data Protection

  • No Persistent Storage: Request/response data not stored beyond cost tracking
  • Secure Transmission: HTTPS encryption for all external API calls
  • Error Handling: Detailed errors logged but sanitized responses to agents

Development Mode

For local testing and development, the proxy can skip database validation when configured in development mode, allowing easier testing without full infrastructure setup.

Database Integration

Evaluation Run Tracking

The proxy maintains detailed records of all inference and embedding requests, tracking costs, usage metrics, and timing data for comprehensive evaluation analytics.

Cost Aggregation

Real-time cost tracking prevents budget overruns by continuously monitoring cumulative costs for each evaluation run and rejecting requests that would exceed limits.

Error Handling

Common Error Scenarios

The proxy handles various error conditions including invalid run IDs, incorrect sandbox states, cost limit violations, and external service failures. Each error type returns appropriate status codes and descriptive messages to help with debugging while maintaining security.

Performance Considerations

Connection Management

  • Async HTTP Client: Non-blocking requests to external services
  • Connection Pooling: Efficient database connection reuse
  • Request Timeout: Configurable timeouts prevent hanging requests

Monitoring & Logging

The proxy provides comprehensive logging for request completion, cost limit warnings, and error tracking to support operations and debugging.

Scalability

  • Stateless Design: Easy horizontal scaling without shared state
  • Database Efficiency: Optimized queries for cost calculation
  • Resource Limits: Built-in protection against resource exhaustion
The Proxy service ensures secure, controlled, and cost-effective access to AI capabilities while maintaining the isolation and security requirements of the sandboxed evaluation environment.