Model2Kernel: New System Detects 353 Memory Safety Bugs in CUDA Kernels Used for LLM Inference
Key Takeaways
- ▸Model2Kernel discovered 353 previously unknown memory safety bugs in CUDA kernels used for LLM inference, demonstrating a critical vulnerability in current systems
- ▸The system combines model-aware dynamic analysis with symbolic execution to achieve high precision (only 9 false positives), making it practical for production environments
- ▸Memory bugs in LLM inference kernels can corrupt model weights, crash services, or enable adversarial attacks, making automated verification essential for safe deployment
Summary
Researchers have introduced Model2Kernel, a groundbreaking system designed to automatically verify memory safety in CUDA kernels used for large language model inference. The tool addresses a critical vulnerability in GPU-accelerated inference systems, where memory-safety bugs in CUDA kernels can corrupt model weights, crash services, or enable adversarial attacks. These kernels, which implement core transformer operations, are particularly susceptible to bugs due to model-dependent tensor layouts, complex memory indexing, and massive thread-level parallelism.
Model2Kernel combines model-aware dynamic analysis with CUDA-specialized symbolic execution to detect memory bugs with high precision. The system first analyzes how models invoke kernels to classify arguments as either fixed by model architecture or user-controlled, then applies symbolic execution with new abstractions for dynamic tensor memory and thread identifiers. In comprehensive evaluation across CUDA kernels from vLLM, Hugging Face, and recent LLM research, Model2Kernel discovered 353 previously unknown bugs while maintaining a low false positive rate of just nine.
This research addresses a significant gap in existing verification techniques, which either depend on unavailable hardware, incur prohibitive overhead, or cannot handle variable-length kernel inputs. The findings have direct implications for production LLM inference systems, which increasingly rely on hand-optimized CUDA kernels for performance-critical operations.
- The tool successfully handles variable-length kernel inputs and scales to real-world LLM frameworks like vLLM and Hugging Face
Editorial Opinion
Model2Kernel represents a crucial advancement in AI system reliability, addressing a blind spot in current LLM deployment practices where memory safety in GPU kernels has received insufficient attention. With production inference systems increasingly relying on hand-optimized CUDA code for performance, this research highlights an urgent need for better verification tools. The discovery of 353 bugs in widely-used frameworks suggests that current practices are insufficient, and automated safety verification should become standard practice before deploying LLM inference systems at scale.



