ZML Releases Universal Diagnostic Tool for GPUs, TPUs, and NPUs Across All Major Platforms

Key Takeaways

▸zml-smi provides unified monitoring across NVIDIA, AMD, Google TPU, and AWS Trainium devices with a single interface
▸The tool offers comprehensive metrics including GPU utilization, temperature, power draw, memory usage, and process-level resource consumption
▸zml-smi uses creative sandboxing techniques to support the latest AMD GPU models without requiring system-level installations or library patches

Source:

Hacker Newshttps://zml.ai/posts/zml-smi/↗

Summary

ZML has launched zml-smi, a universal diagnostic and monitoring tool designed to provide real-time performance insights across multiple AI hardware platforms including NVIDIA GPUs, AMD GPUs, Google TPUs, and AWS Trainium devices. The tool combines functionality similar to nvidia-smi and nvtop, offering comprehensive hardware monitoring capabilities without requiring additional software beyond device drivers and GLIBC.

zml-smi displays an extensive range of metrics including GPU utilization, temperature, power draw, memory usage, and process-level resource consumption. The tool uses platform-specific libraries—NVML for NVIDIA, AMD SMI for AMD, gRPC for Google TPU, and private APIs for AWS Trainium—to gather accurate performance data. A key innovation is its ability to recognize the latest AMD GPU models by dynamically merging GPU identification files from both Mesa and ROCm at build time, ensuring support for cutting-edge hardware like the Ryzen AI Max+ 395.

The tool is available for download as a self-contained binary that works across different hardware configurations. zml-smi also provides host-level metrics such as CPU model, memory usage, and process details with full cross-platform compatibility, making it a significant step toward unified hardware monitoring in the increasingly diverse AI accelerator landscape.

Designed as a lightweight, self-contained binary that requires minimal dependencies beyond device drivers and GLIBC

Editorial Opinion

The release of zml-smi addresses a growing pain point in the AI hardware ecosystem: the fragmentation of monitoring tools across different accelerator vendors. As organizations increasingly adopt diverse hardware accelerators, having a unified diagnostic tool that works across NVIDIA, AMD, Google, and AWS platforms significantly improves operational efficiency. The technical implementation, particularly the clever sandboxing approach for AMD GPU support, demonstrates thoughtful engineering that balances compatibility with maintainability.

ZML Releases Universal Diagnostic Tool for GPUs, TPUs, and NPUs Across All Major Platforms

Key Takeaways

▸zml-smi provides unified monitoring across NVIDIA, AMD, Google TPU, and AWS Trainium devices with a single interface
▸The tool offers comprehensive metrics including GPU utilization, temperature, power draw, memory usage, and process-level resource consumption
▸zml-smi uses creative sandboxing techniques to support the latest AMD GPU models without requiring system-level installations or library patches

Summary

Designed as a lightweight, self-contained binary that requires minimal dependencies beyond device drivers and GLIBC

Editorial Opinion

The release of zml-smi addresses a growing pain point in the AI hardware ecosystem: the fragmentation of monitoring tools across different accelerator vendors. As organizations increasingly adopt diverse hardware accelerators, having a unified diagnostic tool that works across NVIDIA, AMD, Google, and AWS platforms significantly improves operational efficiency. The technical implementation, particularly the clever sandboxing approach for AMD GPU support, demonstrates thoughtful engineering that balances compatibility with maintainability.

ZML Releases Universal Diagnostic Tool for GPUs, TPUs, and NPUs Across All Major Platforms

Key Takeaways

Summary

Editorial Opinion

More from ZML

zml-smi: Universal GPU, TPU, and NPU Monitoring Tool Now Available

Comments

Suggested

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

Researchers Discover Six Vulnerabilities in Apple AirDrop and Google/Samsung Quick Share Protocols

ZML Releases Universal Diagnostic Tool for GPUs, TPUs, and NPUs Across All Major Platforms

Key Takeaways

Summary

Editorial Opinion

More from ZML

zml-smi: Universal GPU, TPU, and NPU Monitoring Tool Now Available

Comments

Suggested

Stanford Researchers Use Multi-Agent AI and Reinforcement Learning to Improve HIP Kernel Generation for AMD GPUs

Researchers Expose Critical Payload-Less Attack on LLM Agent Supply Chains

Researchers Discover Six Vulnerabilities in Apple AirDrop and Google/Samsung Quick Share Protocols