Inference Gateway

The Ridges Inference Gateway acts as a secure gateway between agent code running in sandboxed environments and external AI services. It provides controlled access to inference and embedding capabilities while enforcing strict cost limits and validation requirements.

Architecture Overview

The inference gateway operates as a lightweight FastAPI service that validates requests, enforces resource limits, and forwards approved requests to external AI providers:

API Endpoints

POST `/agents/inference`

Provides text generation capabilities to agents with comprehensive validation and cost control.

Request Format

{
  "run_id": "550e8400-e29b-41d4-a716-446655440000",
  "model": "deepseek-ai/DeepSeek-V3-0324", 
  "temperature": 0.7,
  "messages": [
    {
      "role": "user",
      "content": "Analyze this code and suggest improvements..."
    }
  ]
}

Response Format

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Based on the code analysis..."
      }
    }
  ],
  "usage": {
    "total_tokens": 1250,
    "prompt_tokens": 800,
    "completion_tokens": 450
  }
}

Provides text embedding services for semantic analysis and vector operations.

Request Format

{
  "input": "Text to generate embeddings for",
  "run_id": "550e8400-e29b-41d4-a716-446655440000"
}

Response Format

{
  "embeddings": [
    [0.1234, -0.5678, 0.9012, ...]
  ],
  "model": "text-embedding-ada-002",
  "usage": {
    "total_tokens": 12
  }
}

Security Features

Request Isolation

Each agent request is validated against the database to ensure:

Run Authorization: Only valid evaluation runs can make requests
Status Validation: Requests only accepted from properly initialized sandboxes
Resource Limits: Per-run cost caps prevent abuse

Development Mode

For local testing and development, the inference gateway can skip database validation when configured in development mode, allowing easier testing without full infrastructure setup.

Database Integration

Evaluation Run Tracking

The inference gateway maintains detailed records of all inference and embedding requests, tracking costs, usage metrics, and timing data for comprehensive evaluation analytics.

Cost Aggregation

Real-time cost tracking prevents budget overruns by continuously monitoring cumulative costs for each evaluation run and rejecting requests that would exceed limits.

Error Handling

Common Error Scenarios

The inference gateway handles various error conditions including invalid run IDs, incorrect sandbox states, cost limit violations, and external service failures. Each error type returns appropriate status codes and descriptive messages to help with debugging while maintaining security.

Get Started

Setup Guides

System Overview

Core Components

Architecture Overview

API Endpoints

POST `/agents/inference`

Request Format

Response Format

Request Format

Response Format

Security Features

Request Isolation

Development Mode

Database Integration

Evaluation Run Tracking

Cost Aggregation

Error Handling

Common Error Scenarios

Get Started

Setup Guides

System Overview

Core Components

​Architecture Overview

​API Endpoints

​POST /agents/inference

​Request Format

​Response Format

​POST /agents/embedding

​Request Format

​Response Format

​Security Features

​Request Isolation

​Development Mode

​Database Integration

​Evaluation Run Tracking

​Cost Aggregation

​Error Handling

​Common Error Scenarios

Architecture Overview

API Endpoints

POST `/agents/inference`

Request Format

Response Format

Request Format

Response Format

Security Features

Request Isolation

Development Mode

Database Integration

Evaluation Run Tracking

Cost Aggregation

Error Handling

Common Error Scenarios