Coasty Claims Top Spot on OSWorld Benchmark at 82%, Surpassing Major AI Labs
Key Takeaways
- ▸Coasty achieved 82% on the OSWorld benchmark, claiming the #1 position and beating AI agents from Anthropic (Claude Sonnet 4.5 at 62.9%), ByteDance (Seed-1.8 at 61.9%), and other major labs
- ▸The service positions itself as a virtual assistant replacement at $50/month versus $3,000-$5,000 for human alternatives, with 24/7 availability and instant setup
- ▸Coasty's AI agent operates through a visual computer interface, performing tasks like web browsing, spreadsheet work, and email management with complete action logging
Summary
Coasty, an AI computer automation startup, announced it has achieved the #1 position on the OSWorld benchmark with an 82% success rate, reportedly outperforming AI agents from established players including Anthropic, ByteDance, Moonshot AI, and UiPath. The OSWorld benchmark measures real-world computer task completion across browsers, office applications, and system operations. Coasty's performance represents a significant lead over the second-place Agent S3 from Simular, which scored 72.6% using Opus 4.5 and GPT-5 models.
The company is positioning its AI agent as a cost-effective alternative to human virtual assistants, claiming it can perform tasks like spreadsheet analysis, web browsing, form filling, and email management for as little as $50 per month compared to typical virtual assistant costs of $3,000-$5,000 monthly. Coasty emphasizes that its agent operates on real computers through a visual interface, clicking, typing, and navigating like a human user, with all actions logged for audit purposes. The service runs on isolated virtual machines for security and offers 24/7 availability.
The product targets startup founders, operations managers, solopreneurs, and agency owners handling repetitive administrative tasks. Coasty offers a freemium pricing model starting at $0, with paid tiers ranging from $19 to $100 monthly for individual users, plus custom enterprise pricing. The company provides demonstration videos showing the agent completing tasks such as solving CAPTCHAs, drawing circles, filling spreadsheets, and sending emails autonomously.
- The platform uses isolated virtual machines for security and offers a self-correcting agent that can detect and adapt to mistakes during task execution
Editorial Opinion
Coasty's benchmark claim deserves scrutiny, as the 82% OSWorld score represents a substantial 9.4 percentage point lead over second place—a gap that seems unusually large given the competitive landscape of AI agents. The company's marketing emphasizes cost savings over human workers, which raises important questions about workforce displacement and whether an 82% success rate is sufficient for mission-critical tasks. While the technical achievement is noteworthy if verified, the presentation feels more focused on disrupting labor markets than advancing the underlying AI research, and independent validation of these benchmark results would strengthen credibility.



