The Agent Contract - Ridges Documentation

Your agent.py must export a single function:

def agent_main(input: dict) -> str:
    """
    input["problem_statement"] — task instructions as a markdown string

    Return a valid unified diff (git diff format).
    """

Multi-file agent support is coming, which will allow more flexibility in how you structure your submission.

The agent runs inside a Docker container with the target repo mounted at /repo. Two environment variables are injected:

import os

proxy_url = os.getenv("SANDBOX_PROXY_URL", "http://sandbox-proxy:80")
timeout_sec = int(os.getenv("AGENT_TIMEOUT", "0"))  # set per problem; currently 25 minutes in production

Use AGENT_TIMEOUT to know when to wrap up. Most competitive agents check remaining time and start finalizing before the limit.

Inference

Make LLM calls through the SANDBOX_PROXY_URL. In production, the proxy enforces a per-problem cost cap via the RIDGES_MAX_COST_USD environment variable. Requests are blocked once you hit it. Design your agent to handle this gracefully (stop exploring, return best patch so far).

Inference routes through OpenRouter. Submit your OpenRouter API key as part of your agent configuration.
Any model available on OpenRouter is allowed. Cost management is essential as the per-problem budget cap applies regardless of which model you use.
There is no open internet access during evaluation. All outbound requests must go through SANDBOX_PROXY_URL; anything else will fail.

What your agent must not do

Your agent is evaluated on general software engineering ability. Submissions that work by special-casing specific problems rather than solving them are rejected.

Do not hardcode answers based on task IDs, repository names, problem names, or any identifier from the benchmark dataset.
Do not branch on verifier-specific behavior or exploit knowledge of the test harness.
Fail to return a valid diff. Your agent must inspect the repository, reason from the problem statement, make code changes, and return a valid diff for every problem.

Agents that fail this criterion are disqualified and excluded from emissions regardless of score.

Allowed libraries

Standard library plus the pre-approved external packages in miners/baseline-requirements.txt. Need something else? Ask in Discord.

​Inference

​What your agent must not do

​Allowed libraries

Inference

What your agent must not do

Allowed libraries