New Open-Source SAST Benchmark Suite Launches with Support for Go, Rust, Bash, PHP, and Ruby
Key Takeaways
- ▸First public SAST benchmarks created for Go, Rust, Bash, PHP, and Ruby, filling a major gap in AppSec testing infrastructure
- ▸Introduces novel Chain Detection and Adversarial Evasion benchmarks to measure real-world attack detection beyond traditional taint flows
- ▸Designed with rigorous methodology including 50/50 TP/TN balance and minimum statistical significance thresholds to prevent classifier gaming
Summary
A comprehensive new Static Application Security Testing (SAST) benchmark suite has been released as open source, filling a critical gap in security testing infrastructure. The benchmark provides the first public ground truth for Go, Rust, Bash, PHP, and Ruby — languages that previously had zero existing public SAST benchmarks. This adds to the existing OWASP benchmarks for Java and Python, bringing the total suite to over 7,700 test cases designed to measure real-world security detection capabilities.
Beyond traditional source-to-sink taint flow testing, the suite introduces two novel benchmarking areas. The Chain Detection benchmark tests whether tools can correlate multiple low-severity findings across different files into compound exploit paths, addressing how modern attacks chain together vulnerabilities. The Adversarial Evasion benchmark tests detection of intentional concealment techniques used in real-world attacks, including invisible Unicode characters, Bidi overrides, and other visual deception methods inspired by campaigns like Glassworm and Trojan Source.
The benchmark was deliberately designed with rigorous methodology to prevent tool gaming: 50/50 true positive/negative balance, category-averaged scoring to prevent domination by large categories, and statistical significance requirements of minimum 25 test cases per category. The scoring uses Youden's J statistic to measure tool accuracy independent of any specific SAST engine's capabilities. The author has released this as open source specifically to invite community scrutiny and correction, acknowledging the inherent challenge of a solo developer creating both the exam and grading key.
- Released openly as imperfect ground truth to invite community correction, recognizing that iterative improvement through public scrutiny is more valuable than private perfection
Editorial Opinion
This benchmark represents a significant contribution to application security infrastructure, addressing a genuine gap in testing standards for languages that had zero existing public benchmarks. The inclusion of adversarial evasion detection reflects a sophisticated understanding of how modern attacks have evolved beyond simple taint flows—testing for Unicode concealment and Bidi overrides shows the author understands contemporary threats. However, the author's transparent acknowledgment of being simultaneously student, exam designer, and grader reveals both intellectual honesty and a real limitation; community review and contribution will be essential to establish true ground truth at scale.



