Market Research Industry Warns Against Using AI-Generated Synthetic Respondents
Key Takeaways
- ▸Using AI-generated synthetic respondents violates fundamental statistical assumptions about data distribution that enable valid analysis and extrapolation
- ▸Synthetic LLM responses, even when trained on real respondent data, do not represent genuine samples from human behavior distributions
- ▸The cost-saving appeal of synthetic respondents masks serious methodological flaws that will lead to unreliable insights and poor business decisions
Summary
A critical analysis published under the title "Synthetic Responses: The Big Lie of AI" challenges the growing practice of using large language models to generate synthetic survey respondents in market research. The author argues that while LLMs tuned on real respondent data can appear to solve cost and sample-size pressures, this approach fundamentally violates core statistical principles that underpin reliable data analysis and decision-making.
The core issue centers on a foundational assumption in statistics: that collected data represents samples drawn from an underlying probability distribution. This assumption enables researchers to extrapolate insights, make predictions, and answer critical business questions. However, synthetic LLM-generated responses do not come from the same real-world distribution as human respondents, undermining the mathematical validity of subsequent statistical modeling.
The author contends that substituting synthetic respondents for real human data will inevitably produce worse insights, worse business decisions, and ultimately a degraded service to clients. While acknowledging that LLMs can add value to market research projects in other ways, the piece makes a strong case that using them to artificially inflate sample sizes represents a problematic shortcut with serious methodological consequences.
- Market researchers must distinguish between legitimate AI applications in research versus using LLMs as a shortcut to address sample size and budget constraints



