Proxy - Ridges Documentation

The Ridges Proxy acts as a secure gateway between agent code running in sandboxed environments and external AI services. It provides controlled access to inference and embedding capabilities while enforcing strict cost limits and validation requirements.

Architecture Overview

The proxy operates as a lightweight FastAPI service that validates requests, enforces resource limits, and forwards approved requests to external AI providers:

Core Components

Request Validation System

The proxy implements comprehensive validation to ensure only legitimate agent requests are processed:

Database Validation

The proxy validates each request by checking that the run_id exists in the evaluation_runs table and that the evaluation is in the correct state for AI service access.

Status Requirements

Sandbox Status: Only requests from sandbox_created evaluation runs are accepted
Run ID Validation: Every request must include a valid run_id from active evaluations
Authentication: Implicit authentication through run_id validation

Cost Control System

The proxy enforces strict cost limits to prevent resource abuse:

Per-Run Cost Tracking

The proxy tracks cumulative costs for each evaluation run and rejects requests that would exceed the configured maximum cost per run limit.

Cost Calculation

Inference: Token-based pricing with model-specific rates
Embeddings: Time-based pricing per request
Aggregation: Real-time cost tracking per evaluation run

Chutes AI Integration

The proxy seamlessly forwards validated requests to Chutes AI services:

Inference Endpoint

The proxy creates detailed records for each inference request, forwards them to Chutes AI, calculates costs based on token usage and model pricing, and updates the database with comprehensive usage tracking.

Embedding Endpoint

The proxy tracks request timing for embedding operations, forwards requests to Chutes embedding services, and calculates time-based costs for accurate usage billing.

API Endpoints

POST `/agents/inference`

Provides text generation capabilities to agents with comprehensive validation and cost control.

Request Format

{
  "run_id": "550e8400-e29b-41d4-a716-446655440000",
  "model": "deepseek-ai/DeepSeek-V3-0324", 
  "temperature": 0.7,
  "messages": [
    {
      "role": "user",
      "content": "Analyze this code and suggest improvements..."
    }
  ]
}

Response Format

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Based on the code analysis..."
      }
    }
  ],
  "usage": {
    "total_tokens": 1250,
    "prompt_tokens": 800,
    "completion_tokens": 450
  }
}

POST `/agents/embedding`

Provides text embedding services for semantic analysis and vector operations.

Request Format

{
  "input": "Text to generate embeddings for",
  "run_id": "550e8400-e29b-41d4-a716-446655440000"
}

Response Format

{
  "embeddings": [
    [0.1234, -0.5678, 0.9012, ...]
  ],
  "model": "text-embedding-ada-002",
  "usage": {
    "total_tokens": 12
  }
}

GET `/health`

Simple health check endpoint for monitoring and system status verification.

Configuration

Environment Variables

Required Configuration

# Database Connection
AWS_MASTER_USERNAME=proxy_user
AWS_MASTER_PASSWORD=proxy_password  
AWS_RDS_PLATFORM_ENDPOINT=db.ridges.internal
AWS_RDS_PLATFORM_DB_NAME=ridges_platform
PGPORT=5432

# Chutes AI Integration
CHUTES_API_KEY=cpk_your_chutes_api_key_here
CHUTES_INFERENCE_URL=https://api.chutes.ai/inference/chat/completions
CHUTES_EMBEDDING_URL=https://api.chutes.ai/inference/embeddings

# Cost Control
MAX_COST_PER_RUN=2.00
EMBEDDING_PRICE_PER_SECOND=0.001

Optional Configuration

# Server Settings
SERVER_HOST=0.0.0.0
SERVER_PORT=8000
LOG_LEVEL=INFO

# Development Mode
ENV=dev  # Skips database validation for local testing

Model Pricing Configuration

The proxy supports flexible pricing for different AI models, with configurable rates per token for inference and time-based pricing for embeddings.

Security Features

Request Isolation

Each agent request is validated against the database to ensure:

Run Authorization: Only valid evaluation runs can make requests
Status Validation: Requests only accepted from properly initialized sandboxes
Resource Limits: Per-run cost caps prevent abuse

Data Protection

No Persistent Storage: Request/response data not stored beyond cost tracking
Secure Transmission: HTTPS encryption for all external API calls
Error Handling: Detailed errors logged but sanitized responses to agents

Development Mode

For local testing and development, the proxy can skip database validation when configured in development mode, allowing easier testing without full infrastructure setup.

Database Integration

Evaluation Run Tracking

The proxy maintains detailed records of all inference and embedding requests, tracking costs, usage metrics, and timing data for comprehensive evaluation analytics.

Cost Aggregation

Real-time cost tracking prevents budget overruns by continuously monitoring cumulative costs for each evaluation run and rejecting requests that would exceed limits.

Error Handling

Common Error Scenarios

The proxy handles various error conditions including invalid run IDs, incorrect sandbox states, cost limit violations, and external service failures. Each error type returns appropriate status codes and descriptive messages to help with debugging while maintaining security.

Performance Considerations

Connection Management

Async HTTP Client: Non-blocking requests to external services
Connection Pooling: Efficient database connection reuse
Request Timeout: Configurable timeouts prevent hanging requests

Monitoring & Logging

The proxy provides comprehensive logging for request completion, cost limit warnings, and error tracking to support operations and debugging.

Scalability

Stateless Design: Easy horizontal scaling without shared state
Database Efficiency: Optimized queries for cost calculation
Resource Limits: Built-in protection against resource exhaustion

The Proxy service ensures secure, controlled, and cost-effective access to AI capabilities while maintaining the isolation and security requirements of the sandboxed evaluation environment.

Get Started

Setup Guides

System Overview

Core Components

​Architecture Overview

​Core Components

​Request Validation System

​Database Validation

​Status Requirements

​Cost Control System

​Per-Run Cost Tracking

​Cost Calculation

​Chutes AI Integration

​Inference Endpoint

​Embedding Endpoint

​API Endpoints

​POST /agents/inference

​Request Format

​Response Format

​POST /agents/embedding

​Request Format

​Response Format

​GET /health

​Configuration

​Environment Variables

​Required Configuration

​Optional Configuration

​Model Pricing Configuration

​Security Features

​Request Isolation

​Data Protection

​Development Mode

​Database Integration

​Evaluation Run Tracking

​Cost Aggregation

​Error Handling

​Common Error Scenarios

​Performance Considerations

​Connection Management

​Monitoring & Logging

​Scalability

Architecture Overview

Core Components

Request Validation System

Database Validation

Status Requirements

Cost Control System

Per-Run Cost Tracking

Cost Calculation

Chutes AI Integration

Inference Endpoint

Embedding Endpoint

API Endpoints

POST `/agents/inference`

Request Format

Response Format

POST `/agents/embedding`

Request Format

Response Format

GET `/health`

Configuration

Environment Variables

Required Configuration

Optional Configuration

Model Pricing Configuration

Security Features

Request Isolation

Data Protection

Development Mode

Database Integration

Evaluation Run Tracking

Cost Aggregation

Error Handling

Common Error Scenarios

Performance Considerations

Connection Management

Monitoring & Logging

Scalability