BotBeat
...
← Back

> ▌

AnthropicAnthropic
RESEARCHAnthropic2026-07-03

Research: How URLs in Prompts Can Influence LLM Outputs Toward Training Data

Key Takeaways

  • ▸URLs in prompts influence LLM outputs only for content that was included in the model's training data
  • ▸LLM providers lack transparency about training data sources, collection methods, and knowledge cutoff dates
  • ▸Content loaded dynamically via JavaScript is largely excluded from LLM training data due to crawler limitations
Source:
Hacker Newshttps://aifoc.us/influencing-model-output-with-urls/↗

Summary

Independent researcher Paul Kinlan conducted extensive experimentation to investigate whether URLs appearing in LLM prompts influence model outputs. Through careful testing across multiple models, he found that URLs do indeed steer LLM behavior—but only when the URL's content exists within the model's training data. The research also uncovered critical insights into how LLM training data is collected, revealing significant transparency gaps about knowledge cutoff dates and data sources.

Kinlan's investigation revealed important differences in how LLM providers gather training data. Anthropic's ClaudeBot and OpenAI's GPTBot do not execute JavaScript, meaning dynamically-loaded content is unlikely to be included in training data. Notably, OpenAI's search-specific crawler (OAI-SearchBot) does execute JavaScript. The research also highlighted substantial amounts of data excluded from LLM models, particularly JavaScript-dependent content, demonstrating that not all public web content reaches model training pipelines.

  • Different LLM providers employ different crawling strategies, affecting which content gets included in their models

Editorial Opinion

This research exposes a critical transparency gap in LLM development. As organizations rely on LLMs to reference external URLs, understanding whether and how those URLs influence model behavior has become essential. Kinlan's findings reveal that LLM providers must be far more forthcoming about their training data collection methods, knowledge cutoff dates, and crawler capabilities—transparency that is vital for users to effectively prompt these systems and understand their limitations.

Large Language Models (LLMs)Machine LearningMarket TrendsPrivacy & Data

More from Anthropic

AnthropicAnthropic
RESEARCH

How Political Beliefs Shape AI Agent Analysis: New Research Reveals Systematic Bias in AI Reasoning

2026-07-03
AnthropicAnthropic
POLICY & REGULATION

Alibaba Bans Claude Code Over Hidden Tracking Code Discovered in Anthropic's Developer Tool

2026-07-03
AnthropicAnthropic
INDUSTRY REPORT

Independent Analysis Reveals True Token Costs and Usage Limits Behind Leading Coding Agent Plans

2026-07-03

Comments

Suggested

AMDAMD
RESEARCH

AMD MI355X Proves Competitive for Frontier AI Inference at 2.75x Lower Cost Than Blackwell

2026-07-03
MetaMeta
PRODUCT LAUNCH

Meta AI Chief Claims New LLM Model Has Caught Up with OpenAI's Flagship

2026-07-03
Mistral AIMistral AI
UPDATE

Mistral AI Launches Leanstral 1.5, Enhanced Open-Source Code Agent for Mathematical Proofs

2026-07-03
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us