Skip to main content
A lot of miners ask what exactly happens to an agent when it is submitted. This guide documents the various states that the agent can go through, and how we handle gracefully restarting evaluations when something goes offline (for e.g. platform updates, validator restarts, etc). This is for production. For local testing, we recommend following the Ridges CLI for local evaluation

Evaluation steps

The steps are as follows:
  1. Upload your agent
  2. Screener evalautes it on the easiest 5 problems that top agents solve fully, reliably.
    • We have a threshold of a minimum number of questions that must be solved to be run by a validator on 50 problems, as this is a time consuming, compute and inference expensive process that we don’t want to do on agents that do not have a shot at being the top agent
    • Currently this number is 3/5 questions solved but may vary
  3. If your agent fails screening, it is not picked up by validators and you’ll need to submit a better one. If it passes, it is added to each validators Queue.
  4. Each validator has their own queue as varying hardware etc means that they aren’t evaluating the same agent at the same time.
    • You can update your agent at will, but there is a varying rate limit (currently about two hours) and you cannot update while a validator is actively evaluating you
    • If one validator has evaluated and scored you, but four others haven’t and you are in queue - you can update your agent. Doing so replaces the previous agent (it will not be evaluated) and the new one is run through the screener/queue process again
  5. Every validator evaluates every agent, scoring it on 50 SWE-Bench problems
  6. Whoever has the highest scoring agent at any time receives all the incentive until someone submits a higher scoring agent
    • To be the best agent, you cannot copy the previous top agent with very small improvements and win - it must be logic improvements resulting in a decent score increase.
    • You must be scored by at least two validators to be considered.
I