AMD Mi100 and Mi200
Test Your Application on AMD’s Instinct MI-Series GPUs
AMD Instinct™ MI100 and MI200 accelerators supercharge AI and HPC workloads with their purpose-built AMD CDNA architecture, delivering powerful compute, high bandwidth memory and I/O offering expanded topology capabilities. The ROCm™ open ecosystem environment gives developers the choice to code once and use it everywhere, paving an open path to exascale.
AMD’s MI200-Series GPUs
With industry-first multi-chip GPU modules along with 3rd Gen AMD Infinity architecture, AMD is delivering performance, efficiency and overall system througput to HPC and AI using AMD EPYC™ CPUs and AMD Instinct™ MI200 series accelerators. Highlights include:
- AMD Instinct™ MI200 series accelerators powered by 2nd Gen AMD CDNA™ architecture are built on an innovative multi-chip design to maimize throughput and power efficiency for the most demanding HPC and AI workloads.
- With AMD CDNA™2, the MI250 has new Matrix Cores delivering up to 7.8x the peak theoretical FP64 performance vs. AMD previous Gen GPUs, and offers the industry’s best aggregate peak theoretical memory bandwidth at 3.2 terabytes per second.
- 3rd Gen AMD Infinity Fabric™ technology enables direct CPU to GPU connectivity extending cache coherency and allowing a quick and simple on-ramp for CPU codes to tap the power of accelerators.
- AMD Instinct MI250 accelerators with advanced GPU peer-to-peer I/O connectivity through eight AMD Infinity Fabric™ links, deliver up to 800 GB/s of total aggregate theoretical bandwidth.
AMD’s MI100-Series GPUs
- Uniting HPC & AI to accelerate discovery
- Leading edge FP32 support for matrix math; your existing ML models just work
- Scientific Discoveries with Accelerated Codes
- All-new AMD CDNA architecture delivering a nearly 3.5x (FP32) matrix performance boost for HPC and a nearly 7x (FP16) performance boost for AI workloads vs AMD prior gen2
- All-new CDNA architecture with Matrix Core Technology. Delivering a nearly 7x (FP16) performance boost for AI workloads vs AMD prior Gen2
- FP16 Matrix Core for a nearly 7x boost running AI workloads vs AMD prior gen2
- FP32 Matrix Core for a nearly 3.5x boost for HPC & AI workloads vs AMD prior gen2
- Superior peak FP32 matrix performance for deep learning
- Superior performance for full range of mixed precision operations
- Work with large models and enhance memory bound operation performance
- Support for newer ML operations like bfloat16
- Single platform powering HPC and AI workloads
- AMD Infinity Fabric™ technology provides up to a 2x performance boost over PCIe® Gen4 for data sharing across GPUs within GPU hives7
- 2x more compute cores to accelerate HPC & AI workloads over AMD prior Gen9
Unleash Intelligence Everywhere
Powered by the all-new Matrix Cores technology, the AMD Instinct™ MI100 accelerator delivers nearly a 7x up-lift in FP16 performance compared to prior generation AMD accelerators for AI applications.2 MI100 also greatly expands mixed precision capabilities and P2P GPU connectivity for AI and machine learning workloads.
- GPU Architecture: CDNA
- Lithography: TSMC 7nm FinFET
- Stream Processors: 7,680
- Compute Units: 120
- Peak Engine Clock: 1502 MHz
- Peak Half Precision (FP16) Performance: 184.6 TFLOPs
- Peak Single Precision Matrix (FP32) Performance: 46.1 TFLOPs
- Peak Single Precision (FP32) Performance: 23.1 TFLOPs
- Peak Double Precision (FP64) Performance: 11.5 TFLOPs
- Peak INT4 Performance: 184.6 TOPs
- Peak INT8 Performance: 184.6 TOPs
- Peak bfloat16: 92.3 TFLOPs