BotBeat
...
← Back

> ▌

Together AITogether AI
OPEN SOURCETogether AI2026-02-25

Together AI Releases CoderForge-Preview, Largest Open Dataset for Training Coding Agents

Key Takeaways

  • ▸CoderForge-Preview is the largest open dataset for coding agents, containing 258,000 test-verified trajectories across 51,000 tasks and 1,655 repositories
  • ▸Fine-tuning Qwen-3 32B with the dataset improved SWE-Bench Verified performance from 23.0% to 59.4% pass@1, ranking #1 among open-data models ≤32B parameters
  • ▸The release addresses a critical gap in open-source AI development by providing access to long-context, test-verified trajectories previously unavailable to the community
Source:
Hacker Newshttps://www.together.ai/blog/coderforge-preview↗

Summary

Together AI has released CoderForge-Preview, described as the largest open dataset of coding agent trajectories available to researchers and developers. The dataset contains 258,000 test-verified trajectories—155,000 passing and 103,000 failing—spanning 51,000 tasks across 1,655 repositories. This release addresses a critical bottleneck in the AI research community: the lack of large-scale, high-quality open training data for developing coding agents.

To demonstrate the dataset's effectiveness, Together AI fine-tuned Qwen-3 32B using CoderForge-Preview, achieving a dramatic improvement on the SWE-Bench Verified benchmark from 23.0% to 59.4% pass@1. This performance places the model at #1 among open-data models with 32 billion parameters or fewer. The company also released results for a 4B model trained on the same dataset.

The release comes as proprietary AI models continue to advance while open-weight alternatives struggle with limited access to the long-context, test-verified trajectories essential for effective agent training. By making CoderForge openly available, Together AI aims to accelerate progress across the entire open-source AI community and enable researchers worldwide to build upon and improve their work. The dataset represents a significant contribution to democratizing access to high-quality training data for coding agents.

  • Together AI has open-sourced the entire dataset to accelerate research and enable the broader AI community to build and improve upon their work

Editorial Opinion

This release marks a significant milestone in democratizing AI coding agent development. While proprietary models have benefited from vast internal datasets, the open-source community has been starved of comparable training resources. Together AI's decision to openly release 258,000 test-verified trajectories—not just successful attempts but failures too—provides invaluable learning signal that could catalyze a new generation of open coding agents. The 158% relative improvement on SWE-Bench Verified demonstrates the dataset's quality and potential impact on narrowing the gap between open and proprietary models.

Large Language Models (LLMs)AI AgentsMachine LearningStartups & FundingOpen Source

Comments

Suggested

OracleOracle
POLICY & REGULATION

AI Agents Promise to 'Run the Business'—But Who's Liable When Things Go Wrong?

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Anthropic Explores AI's Role in Autonomous Weapons Policy with Pentagon Discussion

2026-04-05
GitHubGitHub
PRODUCT LAUNCH

GitHub Launches Squad: Open Source Multi-Agent AI Framework to Simplify Complex Workflows

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us