The correspondence audit is the most influential research design in the empirical study of discrimination. This guide explains how it works, what names actually signal, what the design can and cannot measure, and the decisions that separate a clean audit from a broken one. It draws throughout on a decade of work by Charles Crabtree and John Holbein.
The counts and rates are the real results of Bertrand and Mullainathan (2004), who sent about 4,870 fictional résumés (2,435 per condition) to Boston and Chicago employers: white-sounding names drew a 9.65% callback rate, identical Black-sounding names 6.45% — 1.50× more callbacks for the white-sounding name. The two example résumés use names from the validated set in this guide; the figures are B&M's published condition totals.
A correspondence audit randomizes putative group membership across otherwise matched applications. Because assignment is random, any systematic difference in outcomes is causally attributable to the one signal you manipulated. Four principles do most of the work.
The race-coded name, the disability disclosure, the religious affiliation. Random assignment means the treated and untreated applications are, in expectation, identical on everything else. That is where the causal claim comes from.
Qualifications, experience, formatting, timing. The design's credibility lives or dies on whether the applications really are equivalent apart from the manipulated signal.
Audits measure what gatekeepers do when they think no one is watching: employers screening applicants, landlords answering inquiries, officials replying to constituents. Not what people say on a survey.
The outcome is behavioral: a callback, a reply, an appointment offered. This sidesteps social desirability bias, which is why the design displaced attitude measures as the field's evidentiary standard.
It helps to be precise about the kind of discrimination an audit captures. In Block, Crabtree, Holbein, and Monson (2021, PNAS), John and I called it everyday discrimination, or paper cut discrimination. It is the small, low-stakes, often unnoticed unequal treatment that accumulates across thousands of routine interactions: the email that goes unanswered, the inquiry that gets a curter reply, the request that is quietly deprioritized.
This is a different construct from the overt, dramatic discrimination that makes the news: hate crimes, slurs, violence. Both matter. But they are not the same thing, and a method tuned to one will miss the other. The audit is built for the everyday kind. Its whole value is making visible the bias that hides inside ordinary, deniable behavior, the bias no survey will admit to and no headline will record.
Keeping the distinction straight disciplines what you claim. A callback gap is strong evidence of everyday discrimination at a decision point. It is not, on its own, a measure of animus, of violence, or of discrimination across a whole life course. The rest of this guide is about respecting that boundary.