CUDA-oxide: New Rust-to-CUDA Compiler Enables Pure Rust GPU Programming
Key Takeaways
- ▸cuda-oxide enables writing GPU kernels in pure Rust without DSLs or foreign language bindings, bringing GPU programming closer to the Rust developer experience
- ▸The compiler demonstrates a complete pipeline from Rust MIR through multiple IR stages to CUDA PTX generation in a single cargo build invocation
- ▸Support for generic kernels and closure captures allows developers to write type-safe, composable GPU code with automatic parameter passing
Summary
cuda-oxide is an experimental open-source Rust compiler that allows developers to write GPU kernels in pure Rust, compiling them directly to CUDA PTX code. The project combines a custom rustc codegen backend with device-side abstractions for type-safe GPU programming, eliminating the need for DSLs or foreign language bindings. The compiler supports single-source compilation where host and device code live in the same Rust file, with features including generic kernel functions, closure captures, and both synchronous and asynchronous launch APIs.
The implementation uses a native Rust compilation pipeline built on Pliron, an MLIR-like intermediate representation framework, converting Rust code through multiple IR stages (Rust MIR → Pliron IR → LLVM IR) before generating PTX for GPU execution. The project is currently in alpha stage and actively under development, with the authors inviting community feedback and contributions. Key capabilities include device-side memory abstractions, thread indexing, shared memory management, atomic operations, and support for NVIDIA's Tensor Memory Accelerator (TMA) and warp/cluster operations.
- Both synchronous and asynchronous APIs are provided, enabling flexible GPU computation patterns with .sync() or .await for DeviceOperation execution


