AMD Mi100

Test Your Application on AMD’s MI100 GPU

AMD Instinct™ MI100 accelerators supercharge AI and HPC workloads with their purpose-built AMD CDNA architecture, delivering powerful compute, high bandwidth memory and I/O offering expanded topology capabilities. The ROCm™ open ecosystem environment gives developers the choice to code once and use it everywhere, paving an open path to exascale.


Accelerate Your Discoveries

  • Uniting HPC & AI to accelerate discovery
  • Leading edge FP32 support for matrix math; your existing ML models just work
  • Scientific Discoveries with Accelerated Codes
  • All-new AMD CDNA architecture delivering a nearly 3.5x (FP32) matrix performance boost for HPC and a nearly 7x (FP16) performance boost for AI workloads vs AMD prior gen2
  • All-new CDNA architecture with Matrix Core Technology. Delivering a nearly 7x (FP16) performance boost for AI workloads vs AMD prior Gen2
  • FP16 Matrix Core for a nearly 7x boost running AI workloads vs AMD prior gen2
  • FP32 Matrix Core for a nearly 3.5x boost for HPC & AI workloads vs AMD prior gen2
  • Superior peak FP32 matrix performance for deep learning
  • Superior performance for full range of mixed precision operations
  • Work with large models and enhance memory bound operation performance
  • Support for newer ML operations like bfloat16
  • Single platform powering HPC and AI workloads
  • AMD Infinity Fabric™ technology provides up to a 2x performance boost over PCIe® Gen4 for data sharing across GPUs within GPU hives7
  • 2x more compute cores to accelerate HPC & AI workloads over AMD prior Gen9

Unleash Intelligence Everywhere

Powered by the all-new Matrix Cores technology, the AMD Instinct™ MI100 accelerator delivers nearly a 7x up-lift in FP16 performance compared to prior generation AMD accelerators for AI applications.2 MI100 also greatly expands mixed precision capabilities and P2P GPU connectivity for AI and machine learning workloads.


GPU Specifications

  • GPU Architecture: CDNA
  • Lithography: TSMC 7nm FinFET
  • Stream Processors: 7,680
  • Compute Units: 120
  • Peak Engine Clock: 1502 MHz
  • Peak Half Precision (FP16) Performance: 184.6 TFLOPs
  • Peak Single Precision Matrix (FP32) Performance: 46.1 TFLOPs
  • Peak Single Precision (FP32) Performance: 23.1 TFLOPs
  • Peak Double Precision (FP64) Performance: 11.5 TFLOPs
  • Peak INT4 Performance: 184.6 TOPs
  • Peak INT8 Performance: 184.6 TOPs
  • Peak bfloat16: 92.3 TFLOPs

Start typing and press Enter to search