BotBeat
...
← Back

> ▌

ZMLZML
PRODUCT LAUNCHZML2026-03-31

zml-smi: Universal GPU, TPU, and NPU Monitoring Tool Now Available

Key Takeaways

  • ▸zml-smi is a unified monitoring tool supporting NVIDIA, AMD, Google TPU, and AWS Trainium devices with plans for future expansion
  • ▸The tool provides real-time performance metrics including GPU utilization, temperature, power draw, memory usage, and process-level insights across all platforms
  • ▸zml-smi requires only device drivers and GLIBC, making it lightweight and easy to deploy without additional software dependencies
Source:
Hacker Newshttps://zml.ai/posts/zml-smi/↗

Summary

ZML has released zml-smi, a universal diagnostic and monitoring tool designed to provide real-time insights into the performance and health of GPUs, TPUs, and NPUs across multiple hardware platforms. The tool transparently supports NVIDIA, AMD, Google TPU, and AWS Trainium devices, with plans to expand support for additional platforms as ZML's hardware compatibility grows. zml-smi combines the functionality of nvidia-smi and nvtop into a single cross-platform utility that requires minimal dependencies—only device drivers and GLIBC—making it lightweight and easy to deploy.

The tool offers comprehensive monitoring capabilities including GPU utilization, temperature, power draw, memory usage, and detailed process-level metrics across all supported platforms. zml-smi displays host-level system information such as CPU model, memory usage, and load averages, while also providing device-specific metrics tailored to each hardware type. Notably, the tool implements innovative engineering solutions, such as intercepting file system calls for AMD GPU driver compatibility, to ensure seamless operation across diverse hardware ecosystems without requiring external installations or patches.

  • The tool uses innovative sandboxing and API interception techniques to support the latest hardware models without requiring driver updates or system modifications

Editorial Opinion

zml-smi addresses a significant pain point in the AI infrastructure ecosystem by providing a unified monitoring solution across fragmented hardware platforms. As AI workloads increasingly leverage diverse accelerators beyond NVIDIA GPUs, having a single tool that works seamlessly across NVIDIA, AMD, Google, and AWS hardware is valuable for operations teams and researchers. The technical approach—particularly the creative sandboxing solution for AMD drivers—demonstrates thoughtful engineering that prioritizes ease of deployment and minimal system impact.

MLOps & InfrastructureAI HardwareOpen Source

More from ZML

ZMLZML
PRODUCT LAUNCH

ZML Releases Universal Diagnostic Tool for GPUs, TPUs, and NPUs Across All Major Platforms

2026-04-02

Comments

Suggested

Google / AlphabetGoogle / Alphabet
RESEARCH

Deep Dive: Optimizing Sharded Matrix Multiplication on TPU with Pallas

2026-04-05
GitHubGitHub
PRODUCT LAUNCH

GitHub Launches Squad: Open Source Multi-Agent AI Framework to Simplify Complex Workflows

2026-04-05
NVIDIANVIDIA
RESEARCH

Nvidia Pivots to Optical Interconnects as Copper Hits Physical Limits, Plans 1,000+ GPU Systems by 2028

2026-04-05
← Back to news
© 2026 BotBeat
AboutPrivacy PolicyTerms of ServiceContact Us