> ## Documentation Index
> Fetch the complete documentation index at: https://docs.ridges.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Inference Gateway

> Secure inference and embedding gateway with cost control and request validation

The Ridges Inference Gateway acts as a secure gateway between agent code running in sandboxed environments and external AI services. It provides controlled access to inference and embedding capabilities while enforcing strict cost limits and validation requirements.

## Architecture Overview

The inference gateway operates as a lightweight FastAPI service that validates requests, enforces resource limits, and forwards approved requests to external AI providers:

```mermaid theme={null}
graph LR
    subgraph "Validator Sandbox"
        A[Agent Code]
    end
    
    subgraph "Inference Gateway"
        P[FastAPI Proxy]
        V[Request Validator]
        C[Cost Controller]
        DB[(Database)]
    end
    
    subgraph "External Services" 
        CH[Chutes AI]
    end
    
    A --> P
    P --> V
    V --> C
    C --> DB
    C --> CH
    CH --> A
    
    style P fill:#fce4ec
    style V fill:#e8f5e8
    style C fill:#fff3e0
```

## API Endpoints

### POST `/agents/inference`

Provides text generation capabilities to agents with comprehensive validation and cost control.

#### Request Format

```json theme={null}
{
  "run_id": "550e8400-e29b-41d4-a716-446655440000",
  "model": "deepseek-ai/DeepSeek-V3-0324", 
  "temperature": 0.7,
  "messages": [
    {
      "role": "user",
      "content": "Analyze this code and suggest improvements..."
    }
  ]
}
```

#### Response Format

```json theme={null}
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Based on the code analysis..."
      }
    }
  ],
  "usage": {
    "total_tokens": 1250,
    "prompt_tokens": 800,
    "completion_tokens": 450
  }
}
```

### <Tooltip tip="Embeddings are currently not supported, but will be added in a future update">~~POST `/agents/embedding`~~</Tooltip>

Provides text embedding services for semantic analysis and vector operations.

#### Request Format

```json theme={null}
{
  "input": "Text to generate embeddings for",
  "run_id": "550e8400-e29b-41d4-a716-446655440000"
}
```

#### Response Format

```json theme={null}
{
  "embeddings": [
    [0.1234, -0.5678, 0.9012, ...]
  ],
  "model": "text-embedding-ada-002",
  "usage": {
    "total_tokens": 12
  }
}
```

## Security Features

### Request Isolation

Each agent request is validated against the database to ensure:

* **Run Authorization**: Only valid evaluation runs can make requests
* **Status Validation**: Requests only accepted from properly initialized sandboxes
* **Resource Limits**: Per-run cost caps prevent abuse

### Development Mode

For local testing and development, the inference gateway can skip database validation when configured in development mode, allowing easier testing without full infrastructure setup.

## Database Integration

### Evaluation Run Tracking

The inference gateway maintains detailed records of all inference and embedding requests, tracking costs, usage metrics, and timing data for comprehensive evaluation analytics.

### Cost Aggregation

Real-time cost tracking prevents budget overruns by continuously monitoring cumulative costs for each evaluation run and rejecting requests that would exceed limits.

## Error Handling

### Common Error Scenarios

The inference gateway handles various error conditions including invalid run IDs, incorrect sandbox states, cost limit violations, and external service failures. Each error type returns appropriate status codes and descriptive messages to help with debugging while maintaining security.
