Scoring rubric v1
This is how a single event is sized on the board. It is the same document the depth pass reads before it scores and the same document you read on /rubric — there is only one copy, and this is it.
The board asks one question of every event: how much did this actually move us toward AGI, and in which direction? Not how exciting the headline is — how much the *body* of the story supports a real, measurable shift. A cold read, bound to evidence, biased toward caution.
Magnitude: 0 to 5
Every event gets a whole number from 0 to 5. The number comes from what the article body demonstrates, never from how the headline is phrased.
| Score | Name | What it takes |
|---|---|---|
| 0 | No movement | A re-report of something already counted, pure opinion or commentary, a rumor, or a body that does not support its own headline. Also the score when the body could not be read. |
| 1 | Marginal | A minor increment: a small product tweak, a benchmark bump within noise, an intention or roadmap with nothing shipped yet. |
| 2 | Modest | A real but narrow result: a shipped feature, a mid-tier benchmark gain, a concrete policy proposal, a funding round of ordinary size. |
| 3 | Notable | A solid, measured advance a specialist would stop to note: a meaningful model or product release, a binding rule entering force, a significant compute or capital commitment. |
| 4 | Major | A clear step-change the field will cite: a frontier model that moves the state of the art broadly, a landmark law or enforcement action. |
| 5 | Landmark | Rare and hard to dispute. It redefines the frontier — the kind of event you would still remember by name a year later. |
The deflation rule — ante la duda, la menor
When the evidence leaves you torn between two scores, take the lower one. The board would rather under-count a real event than let an inflated one set the pace. Concretely:
- If the body inflates the headline — "X solves Y" over a narrow benchmark with caveats — drop at least one step from what the headline implies.
- If the body contradicts the headline, score by the body, not the claim.
- A re-report of an event already on the board is a 0. The original already counts; counting it twice is the one error the board cannot see.
- If the body is unavailable, paywalled, or truncated, do not guess. Score 0 with the reason stated. An unread event is not a small event — it is an unknown one, and unknowns do not move the needle.
Source tiers
Where an event is reported bounds how far it can move the board on its own. A landmark claim needs a primary source; a blog post cannot certify one by itself.
| Tier | Sources | Magnitude cap |
|---|---|---|
| A — Primary | The paper or preprint (arXiv), the lab or company's own release, peer-reviewed venues, official registers and regulators, primary government filings. | none (0–5) |
| B — Established press | Outlets with editorial standards and a correction record: Reuters, Bloomberg, the FT, the NYT, The Verge, Ars Technica, Wired, MIT Technology Review. | 4 |
| C — Secondary | Aggregators, personal blogs, social posts, and outlets of unknown or low signal. | 3 |
When an event's evidence would earn a higher score than its tier allows, the score is lowered to the cap and the event is flagged as capped. This is not a penalty on the outlet — it reflects that a secondary report, alone, cannot carry a landmark. The same event, once the primary source appears, can be re-scored on Tier A.
The six axes
Each event touches exactly one axis. The depth pass proposes one; a human confirms it before it counts.
| Axis | Weight | Role |
|---|---|---|
| Autonomy | ×1.5 | Agency and self-improvement — agents that act, plan, and improve themselves. |
| Capability | ×1.2 | Raw ability — benchmark jumps, emergent skills, distance to human-level. Safety and alignment work lives here too. |
| Friction | ×1.2 | The brake — regulation, limits, failures, backlash. Slows the speedometer; never touches the odometer. |
| Power | ×1.0 | The fuel — training compute, chips, data centres, energy. |
| Diffusion | ×0.9 | How fast it spreads — adoption into products, infrastructure, decisions. |
| Vibes | ×0.5 | Public narrative and sentiment — viral moments, influential essays, waves of fear or optimism. |
Magnitude and axis are independent: a Vibes event and an Autonomy event can both score 4, but the axis weight decides how much each one finally bends the board.
What the depth pass returns
For each event, the depth pass reads the full body and returns a cold, structured judgment: whether the body supports, inflates, or contradicts the headline; the event type (result, announcement, promise, or re-report); the single axis it proposes and its direction (progress or friction); the magnitude with the exact sentence from the body that anchors it; the source tier and whether a cap was applied; and a short, specific reasoning line. It scores. It never decides what enters the board — a human sets the cutoff.
Rubric v1. Changes to this document are recorded in the [methodology changelog](/rubric/changelog).