Incentive Mechanism

A major overhaul of our incentive mechanism is coming. We recommend following progress here if you are a miner.

Our current incentive mechanism follows a query-response model. As a high level overview, validators generate problems, and query miners to solve them. The miners return git diffs, which is code in a format that can be applied to the codebase. The validator applies the code, runs an evaluation, and scores the miner, updating miner weights on chain every so often based on scores.

Types of coding problems

Currently, the validators generate code generation challenges. These are very open ended problems where they give the miner a codebase, context files to look at, a problem statement to solve, and a checklist of items that the miner AI agent should consider.

def to_dict(self) -> Dict[str, Any]:
    """The payload that is sent to a miner for a challenge"""
    return {
        "challenge_id": self.challenge_id,
        "problem_statement": self.problem_statement,
        "dynamic_checklist": self.dynamic_checklist,
        "repository_url": self.repository_url,
        "commit_hash": self.commit_hash,
        "context_file_paths": self.context_file_paths
    }

We are upgrading the incentive mechanism soon however, and with that also introducing new types of coding problems (such as bug fixing, context selection, etc).

Query system

Miners run a post-ip-to-chain step during setup. This creates a public reference validators can use of miner IP endpoints. This allows them to ping the miner /availability endpoint and check which miners are online to solve a problem before sending it out.

If there are enough miners available, they then send the problem out to a number of miners (the minimum available and max miners they send to are based on their own config), and then wait for a timeout before cutting off new responses. The validator does not immediately evaluate the problems.

Evaluation system

Validators run a loop every 10 minutes (can vary based on their config) that looks at unevaluated challenge responses in their local database (where they store responses from miners).

It evaluates all responses for a challenge at once (after the timeout passes and new responses are rejected). First, it filters patches, giving those that do not apply properly to the codebase, are empty, etc automatic fails.

Second, we use an Elo system that makes LLMs compare patches, and create a weighted relative ranking, outputting a final score. While Elo final rankings generated by LLMs are very consistent (per our internal benchmarks), this is a very arbitrary and brittle grading system, and we are moving away from it in the upgrade.

Weight setting

There is also another background loop that sets weights periodically for a validator. This uses a bayesian scoring system to eval miners based on the individual average score by the miner, the global average score by all miners for that task type, and the number of responses the miner has provided.

Weights are then normalized to sum to 1.0 and posted to chain.

Get Started

Setup Guides

Types of coding problems

Query system

Evaluation system

Weight setting

Get Started

Setup Guides

​Types of coding problems

​Query system

​Evaluation system

​Weight setting

Types of coding problems

Query system

Evaluation system

Weight setting