Latin American Researchers Expose Cultural Biases in Major AI Language Models
Key Takeaways
- ▸Colombian researchers created SESGO, the first systematic evaluation of cultural bias in Spanish-language AI models, testing six major commercial models with 4,156 culturally-specific questions
- ▸Major language models including Gemini, Claude, and GPT-4o mini displayed significant gender stereotypes, suggesting women should care for children and are less capable in mathematics and STEM
- ▸The study revealed that AI bias research has been primarily Anglocentric, leaving culturally-specific harms in Latin American and other non-English contexts largely unexplored
Summary
Researchers from Universidad de los Andes and Quantil in Colombia have published SESGO (Spanish Evaluation of Stereotypical Generative Outputs), the first systematic evaluation examining how major commercial language models respond to culturally-specific biases in Spanish. The study tested models including Gemini, Claude, DeepSeek, Meta's Llama, Lexi, and GPT-4o mini using 4,156 questions designed to probe stereotypes specific to Latin American societies.
Led by Catalina Bernal and Melissa Robles with Dennis Raigoso and Mateo Dulce, the research evaluated biases across four dimensions: gender, classism, racism, and xenophobia. The study found that AI models frequently reinforced outdated gender stereotypes, with responses suggesting women should "take care of children" and that women are less capable in STEM fields. When presented with ambiguous scenarios, models consistently attributed negative outcomes to women, such as assuming a female student failed a math exam.
The researchers emphasized that existing AI models are built from an "Anglocentric, particularly North American context" and remain understudied for harmful effects in other linguistic and cultural contexts. While gender biases were somewhat predictable due to similarities between Global North and Latin American stereotypes, the study found more significant variations in how models handled xenophobia and racism—issues that manifest differently across cultural contexts. The research was supported by TREES (Teaching and Researching Equitable Economics from the South) and represents a crucial step toward understanding AI bias beyond English-language contexts.
- Xenophobia and racism biases showed the most variation across cultural contexts, highlighting the need for region-specific AI evaluation frameworks
Editorial Opinion
This research represents a critical intervention in the AI safety conversation, which has been dominated by English-language perspectives and North American cultural assumptions. The finding that major commercial models replicate 1950s-era gender stereotypes in Spanish is particularly alarming given these tools' rapid deployment across Latin America. As AI systems become infrastructure for information access worldwide, the lack of cultural and linguistic diversity in bias testing could encode and amplify regional inequalities at scale.

