BotBeat
...
← Back

> ▌

NVIDIANVIDIA
PRODUCT LAUNCHNVIDIA2026-03-04

Julia Programming Language Gets Tile-Based GPU Programming with cuTile.jl for NVIDIA Blackwell GPUs

Key Takeaways

  • ▸cuTile.jl brings NVIDIA's tile-based programming model to Julia, eliminating explicit thread and memory hierarchy management in GPU kernels
  • ▸Matrix multiplication kernels achieve 75% of CUBLAS performance with significantly simpler code compared to traditional CUDA programming
  • ▸The package is designed for high-performance kernel development, complementing rather than replacing existing Julia GPU solutions like CUDA.jl
Source:
Hacker Newshttps://discourse.julialang.org/t/ann-cutile-jl-tile-based-gpu-programming-for-cuda-gpus/136011↗

Summary

The Julia programming community has released cuTile.jl, a new package that brings tile-based GPU programming to Julia users working with NVIDIA's Blackwell architecture GPUs. Announced by Tim Besard (maleadt) on the Julia forums, the package implements NVIDIA's Tile IR abstraction, which simplifies kernel development by eliminating the need for developers to explicitly manage threads or memory hierarchies. Instead, programmers work with tiles—blocks of data—accessed from global memory, making GPU code more intuitive and closer to high-level array operations.

The new abstraction demonstrates impressive performance characteristics. A full matrix multiplication kernel implemented with cuTile.jl achieves 75% of CUBLAS performance while remaining significantly simpler than traditional CUDA kernel code. The package automatically leverages tensor cores when appropriate, converting Float32 operations to TFloat32 format for hardware acceleration. Example code shows dramatic simplification: a vector addition kernel reduces from explicit thread indexing to simple tile load/store operations with arithmetic in between.

Currently at version 0.1, cuTile.jl is under active development and includes its own Julia-to-Tile IR compiler, which means not all Julia language features are yet supported. The developers position cuTile.jl as complementary to existing solutions like CUDA.jl and KernelAbstractions.jl rather than a replacement—it's intended for implementing very high-performance kernels (matrix multiplication, FFT, etc.) where code complexity is low. The underlying MLIR dialect is open source, potentially allowing other GPU vendors like AMD to support the Tile IR abstraction in the future.

  • Currently targets NVIDIA Blackwell GPUs with an open-source MLIR dialect that could enable future support from other GPU vendors

Editorial Opinion

The release of cuTile.jl represents an important milestone in making GPU programming more accessible to scientific computing users, particularly in the Julia ecosystem where performance and usability are both priorities. Achieving 75% of highly-optimized CUBLAS performance with dramatically simplified code is impressive for an initial release, suggesting the tile-based abstraction hits a sweet spot between programmer productivity and hardware efficiency. However, the package's current limitation to NVIDIA's latest Blackwell architecture and its incomplete Julia language support may limit near-term adoption—success will depend on how quickly the ecosystem matures and whether the approach proves compelling enough to justify vendor lock-in.

Machine LearningMLOps & InfrastructureAI HardwareProduct LaunchOpen Source

More from NVIDIA

NVIDIANVIDIA
RESEARCH

Nvidia Pivots to Optical Interconnects as Copper Hits Physical Limits, Plans 1,000+ GPU Systems by 2028

2026-04-05
NVIDIANVIDIA
PRODUCT LAUNCH

NVIDIA Introduces Nemotron 3: Open-Source Family of Efficient AI Models with Up to 1M Token Context

2026-04-03
NVIDIANVIDIA
PRODUCT LAUNCH

NVIDIA Claims World's Lowest Cost Per Token for AI Inference

2026-04-03

Comments

Suggested

Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
GitHubGitHub
PRODUCT LAUNCH

GitHub Launches Squad: Open Source Multi-Agent AI Framework to Simplify Complex Workflows

2026-04-05
NVIDIANVIDIA
RESEARCH

Nvidia Pivots to Optical Interconnects as Copper Hits Physical Limits, Plans 1,000+ GPU Systems by 2028

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us