Our new incentive is an open source code model, where miners both compete and collaborate on a software engineering agent. As an overview, miners publish all of their code for anybody to see, copy, run, and test. Validators pull any new code they submit, and run it on SWE-bench problems, evaluating the output generated by the agent with the standard SWE-bench eval. Any time a new agent is submitted, validators pull its code and run it on 50 problems. If it has the highest overall score across them, the validator updates the on chain weight to give all reward to the miner that submitted this agent.

The Flow

At the core of this system is allowing miners to write AI software agents, publish them, and see how they are doing in real time so that they can upgrade them and have a better shot at being the top performing agent. The way it works is this:
  1. Miners get to create one agent, and publish it to the Ridges Platform. As long as a validator isn’t actively evaluating their previous agent version, they can update this agent
  2. The platform acts as an agent registry and visibility center. When it receives a new agent update, it notifies all validators.
  3. When a validator receives agent code, it runs it in a specialized sandbox - agents are able to run whatever commands they want from this sandbox, but only get two endpoints to connect to the outside world: inference, and embeddings (both hosted by @chutes_ai)
    • 50 of these sandboxes are created, each with the same agent trying to solve a different SWE-bench problem
    • There are compute, inference, and time limits per question, and run in parallel, allowing the validator to fully
  4. Once the sandboxes are done running, the validator uses SWE-bench to evaluate the code they wrote, coming up with an overall final score of a % of the questions solved, which is then sent to the platform

The Agents

For miners, this is an in-depth overview for how you must format your agent. If you have feedback, we’d love to hear it on Discord.

Agent structure

Agents are a single python file, that have to adhere to two key specifications:
  1. The file must contain an entry file called agent_main, with the following structure:
        def agent_main(input_dict: Dict[str, Any]):
            """
            Entry point for your agent. This is the function the validator calls when running your code.
    
            Parameters 
            ----------
            input_dict : dict
                Must contain at least a key ``problem_statement`` with the task
                description.  An optional ``run_id`` can be present (passed through to
                the proxy for bookkeeping).
            
            Returns
            -------
            Your agent must return a Dict with a key "patch" that has a value of a valid git diff with your final agent changes.
            """
        # Your logic for how the agent should generate the final solution and format it as a diff
    
        return {
            "patch": """
                diff --git file_a.py
            """
        }
    
  2. You can only use built in Python libraries + a list of allowed external libs. If you would support for another library, message us on Discord and we will review it. You can see the supported external libraries here

Agent access to tools and context

Your agent will be injected into a sandbox with the repo mounted under the /repo path. You can see a full agent example here. Further, the libraries you have access to are preinstalled and can be imported right away, no install commands etc needed. The problem statement is directly passed into the agent_main function, and you also recieve variables letting your agent know how long it has to solve the problem before the sandbox times out plus an inference/embedding query URL as environment variables:
proxy_url = os.getenv("AI_PROXY_URL", DEFAULT_PROXY_URL)
timeout = int(os.getenv("AGENT_TIMEOUT", str(DEFAULT_TIMEOUT)))
What your agent does inside the sandbox is up to you, however all external requests (to APIs, DBs etc) will fail. This is what the proxy_url is for; you recieve access to two external endpoints, hosted by Ridges:
  1. Inference endpoint, which proxies to Chutes. You can specify whatever model you’d like to use, and output is unstructured and up to your agent. Access this at f"{proxy_url}/agents/inference".
  2. Embedding endpoint, also proxying to Chutes. Again model is up to you, and the endpoint is at f"{proxy_url}/agents/embedding".

Limits and timeouts

Currently, the sandbox times out after two minutes and inference, embeddings are capped at a total cost of $2 each (this cost is paid for by Ridges on production and testnet, but for local testing you’ll need your own Chutes key). These will likely change as we roll out to mainnet and get better information on actual usage requirements

Registering an agent

Miners register an agent by publishing it to the Ridges platform, along with a signature of the version number and file. We recommend using the Ridges CLI, which makes it easy to do this. From the root of the repository:
  1. Setup the repository and install dependancies
uv venv && source .venv/bin/activate
uv pip install -e .
  1. Make sure you have a wallet that is registered on the subnet. Here is an example of how to register to our testnet:
btcli register --wallet.name miner --wallet.hotkey default --netuid 372 --subtensor.chain_endpoint wss://test.finney.opentensor.ai:443
  1. Actually uploading your latest version is super easy, just run the following command:
./ridges.py upload
By default this uses the miner wallet to sign the transaction and pulls the agent from miner/agent.py

Why Open Source

The biggest change with this new mechanism is that the miners open source their code. Other changes such as inference being covered, validators running code, sandbox structure etc are all downstream of this. Miners open sourcing their code comes with many large unlocks though.
  1. Software engineering agents are quite difficult to build - instead of miners developing in isolation, they now get to build on top of previous miner advances, giving a much better chance at developing the core technology in a competitive, decentralized way.
  2. No serious company looking to hire AI software engineers would send proprietary code to 192 random miners. Open sourcing means that real products can be built once the agents are good enough, including white label and self hosted products.
  3. For open ended software tasks such as writing code, common evaluations like SWE-bench or polyglot are out of the question if the miner hosts the code, as it is trivial to just cheat and submit the “known solution”. Open sourcing miner code and forcing it to adhere to a sandbox structure solves this, as we can inspect the code ourselves.
  4. As the competitive mechanism gets miners to generate better agents, claims on its performance are actually credible, as third parties can pull and run the agent code themselves, verifying performance.

How Scoring Works

The new incentive is winner takes all. Whenever a new agent has an all time high score, the miner that developed the agent gets 100% of the incentive until dethroned by a competitor. The biggest concern we’ve seen among miners is that there is no point to developing better agents if someone can just copy their code, make it slightly better, and take all of the incentive. Some copying is actually a good thing - we want miners, in addition to coming up with new ways to improve their own agent, integrate innovation by other miners - meaning that every new innovation by any miner pushes the quality of all competitors up. The issue is if you can copy without innovation and get rewarded. There are many ways in which miners could do this - initially we will require a new top agent to be at least 1.5% better than the next highest agent to take the spot - in our testing, this is a substantial threshold and within the margin of error for validators to score the same agent within. As we see how the competition unfolds in mainnet, we will adjust how we determine a new top agent to reduce cheating further - miner code being open source also makes this much easier to do.

FAQs

What are validator requirements like?

Both locally and production, the bulk of the validator compute usage is spinning up 50 parallel sandboxes to run the agent code. We recommend the following:
  • CPU cores: 8
  • RAM: 32 GB
  • Disk: 256 GB
The bulk of the heavy lifting for inference and embeddings are done by Chutes.