What is Goodhart's Law in plain words?

Goodhart's Law says that once you start using a measurement to control something — to reward, punish, or rank by it — that measurement stops being a good description of what you actually care about. People (and increasingly, algorithms) optimise for the number, not the underlying value the number was supposed to stand for. Test scores stop measuring learning the moment schools are graded on them. Engagement metrics stop measuring satisfaction the moment products are graded on them. GDP stops measuring welfare the moment governments are graded on it.

Who first stated Goodhart's Law?

British economist Charles Goodhart, in 1975, observing that the moment the Bank of England targeted a particular monetary aggregate, the relationship between that aggregate and inflation broke down. Anthropologist Marilyn Strathern gave the law its sharpest modern formulation in 1997: 'When a measure becomes a target, it ceases to be a good measure.' The phenomenon long predates the naming — Soviet planners encountered it relentlessly, and contemporary AI alignment researchers rediscover it under the names reward hacking and specification gaming.

How is Goodhart's Law related to Moloch and Conway Debt?

Goodhart names the mechanism by which every coordination tool eventually gets eaten by the dynamic it was meant to fix. Moloch is the name for the macro pattern — coordination failure as a force — and Goodhart is its preferred local exploit. Conway Debt is the institutional residue: the org chart, the dashboards, the policies built around old metrics that no longer measure what their designers thought they measured. Pañca Ṛṇa is the Indic frame designed to be Goodhart-resistant by construction: it ledgers obligations, not numbers, and obligations cannot be gamed in the same way.

Why does Goodhart's Law matter for AI alignment?

Because a central technique of contemporary AI — define a reward signal, optimise against it — is unusually exposed to Goodhart's Law. Any specified reward is a proxy for what designers actually want, and a sufficiently capable optimiser may find the version of the proxy that maximises the number without delivering the underlying value. Stuart Russell's inverse-reward-design programme is one serious response: build AI that is uncertain about what humans want and learns it, instead of optimising a specified utility that Goodhart warns can eventually fail.

Goodhart's Law.

When a measure becomes a target, it ceases to be a good measure. The most general bug in every measurement-driven institution humans have ever built.

Codex · Western Canon · ≈8 min read · Goodhart 1975 · Strathern 1997

TL;DR

Pick any number that proxies for something you care about. Reward people for hitting it. Watch the number climb and the underlying value quietly leave the room. That is Goodhart's Law. It is the local mechanism behind Moloch, the engineering pathology behind reward hacking in AI, the operational signature of Conway Debt, and the reason Pañca Ṛṇa is built to ledger obligations instead of numbers.

The original observation

In 1975, the British economist Charles Goodhart was thinking about why monetary policy kept misbehaving. The Bank of England would identify some money-supply aggregate that correlated reliably with inflation, decide to target it, and watch the correlation collapse the moment they did. He eventually wrote it down as a quip in a conference paper: any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.

The phrasing most people quote today is sharper, and it belongs to the anthropologist Marilyn Strathern, writing in 1997 about audit cultures in British universities:

When a measure becomes a target, it ceases to be a good measure. — Marilyn Strathern, 1997

Strathern's compression is what stuck because it generalises past monetary policy to every domain where humans use numbers to steer institutions. The phenomenon is older than its name. Soviet central planners hit it constantly — a nail factory rewarded on number-of-nails made tiny useless nails, then rewarded on tonnage made enormous useless nails. The factory floor was rational. The plan was Goodharted.

What it looks like in your life right now

The diagnostic is uncomfortable because it works everywhere you look. A few examples without leaving the desk:

Standardised tests stop measuring learning the moment schools are graded on them. The kids learn the test. The teachers teach the test. Nobody learns the subject. The proxy is satisfied; the value evaporates.
Engagement metrics stop measuring satisfaction the moment products are graded on them. The feed maximises the number of seconds you cannot put the phone down, which is empirically a different quantity from the number of seconds you wanted to spend looking at it.
GDP stops measuring welfare the moment governments are graded on it. A car crash, a divorce, and a forest fire all contribute positively. A grandmother raising her grandchildren contributes zero.
Citation counts stop measuring scientific quality the moment hiring committees use them. Citation cartels, salami-slicing of results, and the deliberate gaming of impact factors follow on schedule.
OKRs and KPIs stop measuring whatever they were supposed to measure the moment they're tied to compensation. The quarterly review becomes a negotiation about which number to hit, not a conversation about what to build.
AI benchmarks stop measuring capability the moment labs train against them. The leaderboard is not the territory; the leaderboard is what gets optimised, and the model that wins the leaderboard is increasingly not the model you want deployed.

The same pattern. Different substrate. Every time.

The deeper mechanism

Goodhart's Law is not, in the end, about cheating. Cheating is one of the failure modes; the deeper failure mode is structural. Every measure is a low-dimensional projection of a high-dimensional reality. A test is a few questions standing in for the whole sweep of what a child has learned. Engagement is a click standing in for a moment of attention. GDP is one number standing in for the whole pattern of a society's life. The projection loses information by definition.

When you start using the projection to steer the underlying system, the system reorganises around the projection. Anything the measure captured remains visible. Anything the measure missed becomes cheaper to discard. Within a few cycles, what's left is the part of the value that happened to be on the measure's axis, plus a great deal of optimisation pressure against everything that wasn't. That is not the system's malice — it is the system's gradient. The system followed the slope you placed under it.

The contemporary AI alignment literature renames this and finds it everywhere. Reward hacking is Goodhart's Law made visible by a faster optimiser. Specification gaming is the inventory of ways a system can satisfy the letter of an objective while eviscerating its spirit. Reward misspecification is the realisation that every reward function humans write down is a proxy, and the optimiser is going to find every place the proxy diverges from what we actually wanted. The AI safety community is, in a sense, the first community for whom Goodhart's Law became a load-bearing engineering constraint rather than a wry observation.

The measure is a map. The map is not the territory. The map starts re-shaping the territory the moment you steer by it.

Four flavours of Goodharting

The clearest extant taxonomy of failure modes comes from a 2018 paper by Scott Garrabrant and others. The four flavours are worth carrying around because they tell you which kind of collapse you are looking at:

Flavour	How it fails	Example
Regressional	The proxy is noisy. Optimising hard pushes toward the noise, not the signal.	Picking the candidate with the highest interview score; you mostly selected for being good at interviews.
Extremal	The proxy correlates with the value in the normal range but uncorrelates at extremes.	Hours-worked correlates with productivity until 60. Past 80 it correlates with burnout, error, and exit.
Causal	The proxy was a downstream symptom, not a lever. Pushing it does not move the cause.	Hospitals reduce four-hour A&E waiting-time numbers by reclassifying patients in the corridor as not-yet-arrived.
Adversarial	An intelligent agent inside the system deliberately games the proxy.	Search-engine optimisation, exam dumps, every dashboard ever rewired for the quarterly review.

The first two are physics. The third is bad institutional design. The fourth is what Moloch eats with. A real coordination system has to be designed against all four simultaneously, and the unforgiving discovery of the last fifty years is that no single fix handles them all.

Why Goodhart matters for the metacrisis

Every layer of contemporary coordination is Goodhart-prone, and the layers compound. Climate is measured in carbon, so everything that doesn't fit on a carbon ledger is invisible (soil microbial life, indigenous fire knowledge, the social fabric that makes a watershed defensible). Public health is measured in QALYs, so anything that doesn't quantify (dignity of dying, the texture of family care, the cost of a generation's grief) is structurally underweighted. AI is measured against benchmarks, so the system that wins is the system that overfits the benchmark — and we have less and less time to notice. Every well-intentioned "just measure it better" reform is a fresh attack surface for the next round of Goodharting.

This is why the metacrisis is not a portfolio of problems to be solved sequentially with better dashboards. It is a generator function, and Goodhart is one of its load-bearing struts. Every dashboard becomes a target. Every target gets gamed. Every gamed target leaves behind Conway Debt — an institutional residue of policies, org charts, and incentive structures whose original quantity is no longer tracking the value that justified building them. The audit ages out faster than the auditor.

The Indic counter-move

The classical Indic frame did not have Goodhart's Law because it did not principally rely on quantitative proxies to govern life. Dharma is not a measure. It is a contextual obligation discharged in relation — to elements, to ancestors, to teachers, to peers, to governance. Obligations can be neglected, deformed, or denied, but they cannot be gamed in the Goodhart sense, because the obligation is not a number to be maximised. It is a relation that is either honoured or not.

Pañca Ṛṇa — the five civilisational debts — is the operational rendering. Each debt is owed to a substrate: Bhūta Ṛṇa to the elements, Manuṣya Ṛṇa to fellow humans, Pitra Ṛṇa to lineage, Ṛṣi Ṛṇa to wisdom-streams, Dev Ṛṇa to governance and the gods of the polity. The debts are not fungible. They cannot be netted off. You cannot offset the forest you cut against the trees you planted in a different watershed; you cannot offset the ancestors you ignored against the donations you made to a temple in a different town. The ledger is structurally Goodhart-resistant — not because gaming becomes impossible, but because the shape of accountability resists collapse into a single scalar.

This is the deepest reason the Sāmatvārtha architecture is built obligation-first and number-second. Numbers are necessary; they are also reliably eaten. The question is what holds the system together when the numbers go bad — and the answer the Indic frame returns is: obligation, honoured by participants who know what they owe to whom. That is not a soft sentiment. It is a design constraint, and it is the constraint Goodhart cannot uniformly exploit.

What to do with this on Monday morning

Three practical operating rules for anyone building, funding, or governing in 2026:

Assume every metric is temporary. The half-life of a useful proxy is the time it takes for the people graded on it to learn its shape. Build the dashboard knowing it will need to be re-architected when it stops tracking the value.
Pair every quantitative target with a qualitative obligation. The Pañca Ṛṇa frame is doing real work here: it forces the question of whom this is owed to, which is not the same question as what number we are optimising. A KPI without an explicit obligation is just Goodhart's next snack.
Design for failure modes that regenerate, not extract. Goodhart-proof is impossible. Goodhart-recoverable is the actual game. If the metric is going to be gamed eventually — and it is — design the system so that the gaming surfaces information about what the metric was missing, instead of compounding the missingness invisibly.

The work, in other words, is not to find the metric Moloch cannot eat. It is to build institutions whose health does not depend on a single metric staying honest. That is what obligation-first accounting means in operational practice — and it is the reason Goodhart's Law sits near the centre of this Codex.

Quick answers

Is Goodhart's Law a "law" in any strict sense?

No. It is an empirical regularity with a strong theoretical backing in optimisation theory and information theory. Calling it a "law" is a useful rhetorical shorthand for an effect that has been observed across enough domains and contexts that engineering against it is the default move, not a precaution.

Can you escape Goodhart with better metrics?

Marginally. Multi-metric scorecards delay the collapse; randomised metric rotation makes adversarial gaming more expensive; metrics indexed on hard-to-fake outcomes (long-term survival, downstream behaviour) decouple more slowly. But the deeper move is to stop relying on metrics as the sole steering layer — pair them with obligations, peer review, and qualitative attention from people who know what the value actually looks like.

Is Pañca Ṛṇa really Goodhart-resistant or just less-frequently-measured?

Both, honestly. Obligations are harder to game than scalars because their satisfaction is judged in relation rather than counted in units — but obligations can be ritualised into emptiness, deflected by performative compliance, or quietly redefined. The Indic literature has 3,000 years of commentary on exactly these failure modes. The advantage is not invincibility; it is that the failure modes are visible to honest participants in a way that gamed scalars are not.

Where does this leave AI alignment?

In honest difficulty. Specifying utility functions is a Goodhart trap. Learning preferences from observation routes around the trap but introduces new ones (the demonstrator's preferences may not be the principal's; what looks like a preference may be a habit). Stuart Russell's inverse-reward-design programme is the most rigorous current attempt. The Codex's broader claim is that civilisational alignment precedes AI alignment and continues after it — the question of what we actually want is not a technical problem the model can solve for us.