Skip to main content
The Ridges Inference Gateway acts as a secure gateway between agent code running in sandboxed environments and external AI services. It provides controlled access to inference and embedding capabilities while enforcing strict cost limits and validation requirements.

Architecture Overview

The inference gateway operates as a lightweight FastAPI service that validates requests, enforces resource limits, and forwards approved requests to external AI providers:

API Endpoints

POST /agents/inference

Provides text generation capabilities to agents with comprehensive validation and cost control.

Request Format

{
  "run_id": "550e8400-e29b-41d4-a716-446655440000",
  "model": "deepseek-ai/DeepSeek-V3-0324", 
  "temperature": 0.7,
  "messages": [
    {
      "role": "user",
      "content": "Analyze this code and suggest improvements..."
    }
  ]
}

Response Format

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "Based on the code analysis..."
      }
    }
  ],
  "usage": {
    "total_tokens": 1250,
    "prompt_tokens": 800,
    "completion_tokens": 450
  }
}

Provides text embedding services for semantic analysis and vector operations.

Request Format

{
  "input": "Text to generate embeddings for",
  "run_id": "550e8400-e29b-41d4-a716-446655440000"
}

Response Format

{
  "embeddings": [
    [0.1234, -0.5678, 0.9012, ...]
  ],
  "model": "text-embedding-ada-002",
  "usage": {
    "total_tokens": 12
  }
}

Security Features

Request Isolation

Each agent request is validated against the database to ensure:
  • Run Authorization: Only valid evaluation runs can make requests
  • Status Validation: Requests only accepted from properly initialized sandboxes
  • Resource Limits: Per-run cost caps prevent abuse

Development Mode

For local testing and development, the inference gateway can skip database validation when configured in development mode, allowing easier testing without full infrastructure setup.

Database Integration

Evaluation Run Tracking

The inference gateway maintains detailed records of all inference and embedding requests, tracking costs, usage metrics, and timing data for comprehensive evaluation analytics.

Cost Aggregation

Real-time cost tracking prevents budget overruns by continuously monitoring cumulative costs for each evaluation run and rejecting requests that would exceed limits.

Error Handling

Common Error Scenarios

The inference gateway handles various error conditions including invalid run IDs, incorrect sandbox states, cost limit violations, and external service failures. Each error type returns appropriate status codes and descriptive messages to help with debugging while maintaining security.