ROCm logo

ROCm

Open software stack for AI and HPC development on AMD GPUs, including programming models, tools, compilers, libraries, and runtimes.

Solution by AMD Silo AI
Visit website

Overview

AMD ROCm™ Software is an open software stack designed for AI and High-Performance Computing (HPC) solution development on AMD GPUs. It encompasses programming models, tools, compilers, libraries, and runtimes, facilitating GPU programming from low-level kernel development to end-user applications. ROCm is particularly optimized for Generative AI and HPC applications, allowing for easy migration of existing code.

The software stack supports all AMD Instinct™ accelerator models and certain features are compatible with select AMD Radeon™ graphics cards. It offers advanced parallelism and sharding strategies for scaling AI training on AMD Instinct™ MI350/MI300X Series GPUs, enabling near-linear scaling for training trillion-parameter models.

Key Features of ROCm 7

  • Full support for AMD Instinct™ MI350 Series GPUs
  • Distributed inference with open-source framework support
  • Enterprise-ready AI tools with orchestration and endpoint integration
  • Support for large-scale models with new data types FP6 and FP4
  • Enhanced code portability with HIP 7.0

Over the years, ROCm has evolved significantly, with each version introducing new capabilities and support for various technologies. From its inception with ROCm 1.0, which demonstrated CUDA to HIP porting, to the latest ROCm 7.0, the platform has expanded its ecosystem to include support for PyTorch, Kubernetes, SLURM, and more.

ROCm is utilized by leading enterprises and research institutes, with partnerships enhancing its capabilities. Collaborations with Mosaic ML, Hugging Face, and the PyTorch Foundation highlight its integration into the AI ecosystem. Additionally, ROCm supports demanding workloads across AI and HPC domains, including life sciences, computational fluid dynamics, and environmental science.

ROCm for AI Workloads

ROCm provides a suite of optimizations for AI workloads, supporting a wide range of models and frameworks like TensorFlow and PyTorch. It offers dedicated machine learning libraries such as MIOpen and MIVisionX, simplifying model development and enabling the creation of user-specific solutions.

Meta

Category
Scientific Data Infrastructure
Field(s)
Scientific IT & Integration
Target user(s)
IT / Systems AdminComputational Scientist / Modeler
Tag(s)
Lab Automation & RoboticsAI