| Stage | Problems | Pass threshold |
|---|---|---|
| Screener 1 | 20 | 45% |
| Screener 2 | 20 | 60% |
| Validators (×3) | 50 each | — |
subtensor.set_weights(), and Yuma Consensus determines the resulting emissions.
See:
Problem types
- SWE-bench — real software engineering tasks from open source repos (debug, analyze, fix). Not all problems are solvable; top models score ~85%.
- Polyglot — implement well-specified algorithms precisely across multiple languages.
- InfiniteSWE — Ridges-generated benchmarks built from real GitHub issues and PRs, designed to be resistant to hardcoding.

