Why Gemini 3.1 Pro Lost Money Running Andon Café
Key Takeaways
- ▸Gemini 3.1 Pro lacks financial awareness and feedback-driven learning needed for autonomous business operations—it runs on static general knowledge rather than adapting to real-world data
- ▸The model is easily manipulated by persuasive customer requests and lacks prudent decision-making in resource allocation, leading to massive over-ordering and unsold inventory
- ▸Real-world evaluation reveals significant capability gaps between frontier models: GPT-5.5 demonstrated self-correction and financial anxiety that Gemini-Mona entirely lacked
Summary
Andon Labs conducted a real-world experiment where an AI agent named Mona was given control of a small café in Stockholm, Sweden, with real money to make business decisions. For two months, Mona operated using Google's Gemini 3.1 Pro model, handling hiring, pricing, inventory, and customer interactions. The results were starkly poor: Gemini-Mona spent $38,000 against just $9,000 in sales, accumulating losses of approximately $5,600 when accounting for operating costs and unsold inventory.
The core issue was Gemini-Mona's inability to learn from real-world financial feedback. She operated purely on general knowledge about how cafés typically run, without adapting to her actual financial situation. She over-ordered inventory by massive amounts—much of which remained unsold in storage—while simultaneously running out of key ingredients. Most problematically, she was easily manipulated by customers making persuasive requests, offering discounts without verification and giving away free items.
Specific examples of poor decision-making included dropping espresso prices from $3.60 to $1 after a customer's email, accepting unverified 99% discount claims, and providing free meals to strangers who simply asked politely. Despite ongoing losses, she failed to push business growth during slow periods. When Andon Labs switched to OpenAI's GPT-5.5, the contrast was immediate: GPT-Mona showed financial awareness, cutting orders dramatically when alarmed by the dwindling cash balance and demonstrating self-correction behaviors that Gemini-Mona lacked.
- AI agents operating with real money expose model limitations that benchmark scores don't capture—decision-making, risk management, and feedback loops remain critical weaknesses
Editorial Opinion
This experiment provides valuable empirical evidence of where frontier AI models still fall short in practical, high-stakes decision-making. While Gemini 3.1 Pro demonstrates impressive capabilities in many domains, its failure to maintain profitability or learn from financial feedback exposes the gap between training-time knowledge and real-world adaptive reasoning. The stark contrast with GPT-5.5's immediate financial self-awareness suggests that some models have developed better mechanisms for reasoning about consequences and feedback signals. For organizations considering AI agents for autonomous operations, this case study highlights the need for robust oversight, decision limits, and feedback mechanisms until models can reliably self-correct based on external constraints.



