Google Research Finds Simple Prompt Repetition Boosts LLM Performance Without Added Latency
Key Takeaways
- ▸Repeating input prompts improves performance across major LLMs (Gemini, GPT, Claude, Deepseek) when not using reasoning modes
- ▸The technique adds no latency or additional generated tokens, making it a cost-free optimization
- ▸The cross-model effectiveness suggests this may be a fundamental characteristic of transformer architectures
Summary
A new research paper from Google researchers Yaniv Leviathan, Matan Kalman, and Yossi Matias reveals a surprisingly simple technique to improve large language model performance: repeating the input prompt. Published on arXiv, the paper demonstrates that when models are not using explicit reasoning modes, prompt repetition enhances performance across popular LLMs including Gemini, GPT, Claude, and Deepseek.
The discovery is particularly notable because it achieves performance gains without the typical trade-offs associated with model improvements. According to the researchers, the technique does not increase the number of generated tokens or add latency, making it essentially a free performance boost. This distinguishes it from other optimization methods like chain-of-thought prompting or extended reasoning, which typically require additional computational resources and time.
The paper specifically focuses on "non-reasoning" scenarios, suggesting that the technique applies when models are not explicitly engaged in step-by-step logical processes. The cross-model effectiveness—spanning Google's own Gemini, OpenAI's GPT series, Anthropic's Claude, and Deepseek's models—indicates this may be a fundamental characteristic of current transformer-based architectures rather than a model-specific quirk. The finding could have immediate practical implications for developers and users seeking to optimize LLM performance without infrastructure changes.
- The finding applies specifically to non-reasoning scenarios, distinguishing it from techniques like chain-of-thought prompting
Editorial Opinion
This counterintuitive finding challenges conventional wisdom about prompt engineering and raises fascinating questions about how LLMs process repeated information. If simply duplicating a prompt improves performance without computational cost, it suggests current models may not be fully utilizing available context on first pass. The cross-vendor effectiveness is particularly striking—it hints at shared architectural limitations or opportunities that the entire industry could address. However, the specificity to "non-reasoning" modes leaves open questions about whether this technique conflicts with or complements emerging reasoning-focused approaches like OpenAI's o1 series.


