The debates

What the field argues about

The method is powerful, but every part of it is contested. Anyone running an audit should be able to answer these questions about their own design.

What does the name actually signal?
The central construct-validity question. Gaddis (2017) showed that many names in classic audits signal class as strongly as race, so estimated race effects may bundle several treatments together. The validated names data in this guide is a direct response: measure perceptions first, then choose names whose signals match the inference you want. John and I pushed this further in the Mitterer reply, arguing the fix is rarely to partial the perceptions back out.
Is a callback the outcome we care about?
Heckman (1998) argued that callback gaps at the screening stage say little about discrimination in wages or careers, and that audits sample a nonrandom slice of the market. The response is that the callback is a real gate with real consequences, and that screening-stage discrimination compounds downstream. Naming the outcome as everyday discrimination, rather than discrimination in general, keeps the claim honest.
Is discrimination declining?
Quillian and colleagues (2017) meta-analyzed US hiring field experiments since 1989 and found no decline in callback discrimination against Black applicants over 25 years. Our own meta-analysis with Gaddis and Larsen found the gaps concentrated in hiring and housing. The accumulation of comparable audits is itself one of the method's biggest payoffs: it turns a one-off finding into a measurable trend. A newer meta-analysis with Gaddis and colleagues turns the same tools on intersectional discrimination, pooling how race and gender signals combine (Gaddis, Crabtree, and coauthors, forthcoming, Sociological Science).
Is it ethical to deceive the people we study?
Audits consume the time of employers, landlords, and officials who never consented. Crabtree and Dhima (2022) propose weighing the aggregate burden against the social value of detecting discrimination that is otherwise invisible. The framework does not dissolve the tension. It forces designs to minimize burden and justify their scale, which is the subject of the fielding page.
Does it travel beyond US hiring?
The design isn't only about American employers. In Hou, Liu, and Crabtree (2020, Journal of Comparative Economics) we sent more than 4,000 résumés to Chinese firms and found Muslim applicants about 50% less likely to hear back, with state-owned firms no better than private ones. In Hughes, Gell-Redman, Crabtree, and coauthors (2020, JEPS) the targets were local election officials, not employers. The logic carries to any gatekeeper who reads a name and decides whether to reply.
See the data

Three decades of hiring audits, plotted

White-to-Black callback ratio, pooled across studies (higher = more discrimination)

Start here

A short path into the literature

Read these in roughly this order, from the canonical results to the methodological debates to the practical tooling. Work with John Holbein is marked.

OverviewGaddis (2018), "An Introduction to Audit Studies in the Social Sciences," in Audit Studies: Behind the Scenes with Theory, Method, and Nuance. The best single entry point.
CanonicalBertrand & Mullainathan (2004), "Are Emily and Greg More Employable Than Lakisha and Jamal?" American Economic Review. The study that defined the modern correspondence audit.
CanonicalPager (2003), "The Mark of a Criminal Record," American Journal of Sociology. The in-person audit at its most influential.
DesignButler & Crabtree (2020), "Audit Experiments," in Advances in Experimental Political Science. How the design travels beyond labor markets, into politics and public services.
Scale · HolbeinBlock, Crabtree, Holbein & Monson (2021), "Are Americans Less Likely to Reply to Emails from Black People Relative to White People?" PNAS. The 250,000-person audit of the public, and the everyday-discrimination framing.
Concepts · HolbeinBlock, Crabtree, Holbein & Monson (2022), "Reply to Mitterer," PNAS. On bundled treatments and what name-based designs identify.
Heterogeneity · HolbeinGaddis, Crabtree, Holbein & Pfaff (2024), "Racial/Ethnic Discrimination and Heterogeneity Across Schools," working paper. Why the average effect is not the whole story.
Trends · HolbeinGaddis, Larsen, Crabtree & Holbein (2021), "Discrimination Against Black and Hispanic Americans Is Highest in Hiring and Housing Contexts," meta-analysis.
Government · HolbeinPfaff, Crabtree, Kern & Holbein, "Do Street-Level Bureaucrats Discriminate Based on Religion?" Public Administration Review.
Data · HolbeinCrabtree et al. (2023), "Validated Names for Experimental Studies on Race and Ethnicity," Nature Scientific Data. The dataset behind the explorer, on GitHub, Dataverse, and OSF.
LogisticsCrabtree (2018), "An Introduction to Conducting Email Audit Studies," in the Gaddis volume, Ch. 5.
EthicsCrabtree & Dhima (2022), "Auditing Ethics: A Cost-Benefit Framework for Audit Studies," Political Studies Review 20(2).
TrendsQuillian et al. (2017), "Meta-Analysis of Field Experiments Shows No Change in Racial Discrimination in Hiring over Time," PNAS.
Names · HolbeinCrabtree, Gaddis, Holbein & Larsen (2022), "Racially Distinctive Names Signal Both Race/Ethnicity and Social Class," Sociological Science. The class signal that rides along with race.
NamesCrabtree & Chykina (2018), "Last Name Selection in Audit Studies," Sociological Science. Why a name's racial signal shifts with local demographics.
TravelsHou, Liu & Crabtree (2020), "Anti-Muslim Bias in the Chinese Labor Market," Journal of Comparative Economics. The audit logic outside the US.
OfficialsHughes, Gell-Redman, Crabtree, et al. (2020), "Persistent Bias Among Local Election Officials," Journal of Experimental Political Science. Tracking whether the email was even opened.
Trends · HolbeinGaddis, Crabtree, et al. (forthcoming), "Intersectional Discrimination: A Meta-Analysis," Sociological Science. Race and gender signals, pooled.
← PreviousBefore you field