BotBeat
...
← Back

> ▌

Independent ResearchIndependent Research
RESEARCHIndependent Research2026-03-25

Comprehensive Safety Audit of Five Major LLMs Reveals Significant Vulnerabilities: 1 in 3 Harmful Requests Bypassed

Key Takeaways

  • ▸GPT-4o demonstrated the strongest safety performance (89.4% block rate), while Gemini 2.5 Pro was significantly weaker (43.9%), highlighting inconsistent safety standards across industry leaders
  • ▸Copyright/IP protection has the highest bypass rate (53%), while privacy filters failed 69% of the time even in the best-performing model, indicating systematic weaknesses in specific safety categories
  • ▸Open-source benchmark tool released with 42 prompting techniques and 16 risk categories, enabling reproducible evaluation and continuous improvement of LLM safety systems
Source:
Hacker Newshttps://github.com/aestrad7/llm-break-bench↗

Summary

An independent researcher conducted a comprehensive safety benchmark across five major AI language models—GPT-4o, Claude Haiku, Grok, DeepSeek Chat, and Gemini 2.5 Pro—running 3,360 adversarial tests across 16 risk categories and 42 prompting techniques. The results reveal critical vulnerabilities: approximately one-third of harmful requests successfully bypassed safety filters, with significant variation in defensive capabilities across models. GPT-4o emerged as the strongest performer with an 89.4% block rate, while Gemini 2.5 Pro was the most vulnerable at 43.9%, indicating inconsistent safety implementations across the industry.

The study identified copyright and intellectual property content as the highest bypass area (53% failure rate), privacy filters failing 69% of the time even in GPT-4o, and weapons/CBRN content showing persistent vulnerabilities across all models. The researcher released the benchmark as an open-source tool, enabling the AI community to systematically evaluate and improve safety measures. Using 42 different attack techniques—including jailbreaking, obfuscation, social engineering, and academic framing—the research highlights that current safety systems struggle with nuanced categorization and remain vulnerable to sophisticated prompting strategies.

  • All tested models still allow 20-56% of harmful requests through in specific categories, with weapons/CBRN content showing persistent vulnerabilities despite being the most heavily restricted

Editorial Opinion

This comprehensive safety audit serves as both a wake-up call and a constructive tool for the AI industry. While the 89% block rate from GPT-4o may seem reassuring, the fact that 1 in 3 harmful requests successfully bypass safety filters—and nearly half do in less robust models—underscores the complexity of content moderation at scale. The open-source release of this benchmark is particularly valuable; rather than functioning as a jailbreak tutorial, it provides the community with standardized metrics to measure progress and identify gaps. The stark performance differences between models (GPT-4o vs. Gemini) suggest that safety implementation remains an art rather than a mature science, and systematic approaches like this benchmark are essential for raising the baseline.

Large Language Models (LLMs)Ethics & BiasAI Safety & AlignmentOpen Source

More from Independent Research

Independent ResearchIndependent Research
RESEARCH

New Research Proposes Infrastructure-Level Safety Framework for Advanced AI Systems

2026-04-05
Independent ResearchIndependent Research
RESEARCH

DeepFocus-BP: Novel Adaptive Backpropagation Algorithm Achieves 66% FLOP Reduction with Improved NLP Accuracy

2026-04-04
Independent ResearchIndependent Research
RESEARCH

Research Reveals How Large Language Models Process and Represent Emotions

2026-04-03

Comments

Suggested

OracleOracle
POLICY & REGULATION

AI Agents Promise to 'Run the Business'—But Who's Liable When Things Go Wrong?

2026-04-05
AnthropicAnthropic
POLICY & REGULATION

Anthropic Explores AI's Role in Autonomous Weapons Policy with Pentagon Discussion

2026-04-05
GitHubGitHub
PRODUCT LAUNCH

GitHub Launches Squad: Open Source Multi-Agent AI Framework to Simplify Complex Workflows

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us