BotBeat
...
← Back

> ▌

Google / AlphabetGoogle / Alphabet
RESEARCHGoogle / Alphabet2026-02-28

Google Research Finds Simple Prompt Repetition Boosts LLM Performance Without Added Latency

Key Takeaways

  • ▸Repeating input prompts improves performance across major LLMs (Gemini, GPT, Claude, Deepseek) when not using reasoning modes
  • ▸The technique adds no latency or additional generated tokens, making it a cost-free optimization
  • ▸The cross-model effectiveness suggests this may be a fundamental characteristic of transformer architectures
Source:
Hacker Newshttps://arxiv.org/abs/2512.14982↗

Summary

A new research paper from Google researchers Yaniv Leviathan, Matan Kalman, and Yossi Matias reveals a surprisingly simple technique to improve large language model performance: repeating the input prompt. Published on arXiv, the paper demonstrates that when models are not using explicit reasoning modes, prompt repetition enhances performance across popular LLMs including Gemini, GPT, Claude, and Deepseek.

The discovery is particularly notable because it achieves performance gains without the typical trade-offs associated with model improvements. According to the researchers, the technique does not increase the number of generated tokens or add latency, making it essentially a free performance boost. This distinguishes it from other optimization methods like chain-of-thought prompting or extended reasoning, which typically require additional computational resources and time.

The paper specifically focuses on "non-reasoning" scenarios, suggesting that the technique applies when models are not explicitly engaged in step-by-step logical processes. The cross-model effectiveness—spanning Google's own Gemini, OpenAI's GPT series, Anthropic's Claude, and Deepseek's models—indicates this may be a fundamental characteristic of current transformer-based architectures rather than a model-specific quirk. The finding could have immediate practical implications for developers and users seeking to optimize LLM performance without infrastructure changes.

  • The finding applies specifically to non-reasoning scenarios, distinguishing it from techniques like chain-of-thought prompting

Editorial Opinion

This counterintuitive finding challenges conventional wisdom about prompt engineering and raises fascinating questions about how LLMs process repeated information. If simply duplicating a prompt improves performance without computational cost, it suggests current models may not be fully utilizing available context on first pass. The cross-vendor effectiveness is particularly striking—it hints at shared architectural limitations or opportunities that the entire industry could address. However, the specificity to "non-reasoning" modes leaves open questions about whether this technique conflicts with or complements emerging reasoning-focused approaches like OpenAI's o1 series.

Large Language Models (LLMs)Natural Language Processing (NLP)Machine LearningMLOps & InfrastructureScience & Research

More from Google / Alphabet

Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
Google / AlphabetGoogle / Alphabet
INDUSTRY REPORT

Kaggle Hosts 37,000 AI-Generated Podcasts, Raising Questions About Content Authenticity

2026-04-04
Google / AlphabetGoogle / Alphabet
PRODUCT LAUNCH

Google Releases Gemma 4 with Client-Side WebGPU Support for On-Device Inference

2026-04-04

Comments

Suggested

AnthropicAnthropic
RESEARCH

Inside Claude Code's Dynamic System Prompt Architecture: Anthropic's Complex Context Engineering Revealed

2026-04-05
Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
PerplexityPerplexity
POLICY & REGULATION

Perplexity's 'Incognito Mode' Called a 'Sham' in Class Action Lawsuit Over Data Sharing with Google and Meta

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us