60 percent faster than its NVIDIA rival! Here is AMD MI300X

AMD made the official introduction of the MI300X, which has been eagerly awaited for a while. Coming from TSMC’s advanced production line, AMD MI300X GPU manages to offer 60 percent higher performance than NVIDIA’s H100. Here are the details…

What does AMD MI300X offer?

SDNshiftdelete.net

AMD states that the MI300X is at a similar level to the H100 in training performance and exhibits superior performance in interference workloads. According to the company’s statements, MI300X provides the following advantages in general features:

  • 2.4 times higher memory capacity
  • 1.6x higher memory bandwidth
  • 1.3x FP8 TFLOPS
  • 1.3x FP16 TFLOPS
  • Llama 2 70B is up to 20 percent faster in head-to-head comparison
  • FlashAttention 2 is up to 20 percent faster in head-to-head comparison
  • Llama 2 70B is up to 40 percent faster on 8v8 server
  • FlashAttention 2 is up to 60 percent faster on 8v8 server

The software behind the MI300X, ROCm 6.0, includes new features that support a variety of AI workloads. Because this software increases performance by supporting the latest calculation formats. MI300X will increase competition against NVIDIA’s Hopper and Intel’s Gaudi AI accelerators. Based on CDNA 3 architecture, MI300X has a total of 153 billion transistors. Additionally, the memory capacity of MI300X is 192 GB (HBM3). That is, 50 percent more than its predecessor, MI250X.

AMD introduced the Ryzen 8040 series!  Pushing the limits with artificial intelligenceAMD introduced the Ryzen 8040 series!  Pushing the limits with artificial intelligence

AMD introduced the Ryzen 8040 series! Pushing the limits with artificial intelligence

AMD introduced its new Ryzen 8040 series mobile processor models and Ryzen AI artificial intelligence unit at the event it held today.

AMD MI300X comes with a 750W TDP on the power consumption side. Additionally, while showcasing systems supporting this chip, AMD shared a configuration with 8 MI300X GPU accelerators and two AMD EPYC 9004 CPUs.

Finally, among the companies that announced that they will support AMD’s Instinct MI300 AI chips are big names such as Oracle, Dell, META and OpenAI. Since the company aims to be not only an alternative but also a leader in the field of artificial intelligence, it creates a serious competitive environment for its rivals such as NVIDIA and Intel.

GPU AMD INSTINCT MI400 AMD INSTINCT MI300X AMD INSTINCT MI300A AMD INSTINCT MI250X AMD INSTINCT MI250 AMD INSTINCT MI210 AMD INSTINCT MI100 AMD RADEON INSTINCT MI60 AMD RADEON INSTINCT MI50 AMD RADEON INSTINCT MI25 AMD RADEON INSTINCT MI8 AMD RADEON INSTINCT MI6
CPU Architecture Zen 5 (Exascale APU) None Zen 4 (Exascale APU) None None None None None None None None None
GPU Architecture CDNA 4 Water Vanjaram (CDNA 3) Water Vanjaram (CDNA 3) Aldebaran (CDNA 2) Aldebaran (CDNA 2) Aldebaran (CDNA 2) Arcturus (CDNA 1) Vega 20 Vega 20 Vegas 10 Fiji XT Polaris10
GPU Compute Node 4nm 5nm+6nm 5nm+6nm 6nm 6nm 6nm 7nm FinFET 7nm FinFET 7nm FinFET 14nm FinFET 28nm 14nm FinFET
GPU Chips not yet known 8 (MCM) 8 (MCM) 2 (MCM)
1 (Per Mold)
2 (MCM)
1 (Per Mold)
2 (MCM)
1 (Per Mold)
1 (Monolithic) 1 (Monolithic) 1 (Monolithic) 1 (Monolithic) 1 (Monolithic) 1 (Monolithic)
GPU Cores not yet known 19,456 14,592 14,080 13,312 6656 7680 4096 3840 4096 4096 2304
GPU Clock Speed not yet known 2100MHz 2100MHz 1700MHz 1700MHz 1700MHz 1500MHz 1800MHz 1725MHz 1500MHz 1000MHz 1237MHz
INT8 Calculation not yet known 2614 TOP 1961 TOPS 383 TOP 362 TOP 181 TOP 92.3 OVERS None None None None None
FP16 Calculation not yet known 1.3 PFLOPs 980.6 TFLOPs 383 TFLOPS 362 TFLOPS 181 TFLOPS 185 TFLOPS 29.5 TFLOPS 26.5 TFLOPS 24.6 TFLOPS 8.2 TFLOPs 5.7 TFLOPs
FP32 Calculation not yet known 163.4 TFLOPs 122.6 TFLOPs 95.7 TFLOPs 90.5 TFLOPs 45.3 TFLOPS 23.1 TFLOPs 14.7 TFLOPs 13.3 TFLOPs 12.3 TFLOPs 8.2 TFLOPs 5.7 TFLOPs
FP64 Calculation not yet known 81.7 TFLOPS 61.3 TFLOPS 47.9 TFLOPs 45.3 TFLOPS 22.6 TFLOPS 11.5 TFLOPS 7.4 TFLOPs 6.6 TFLOPS 768 GFLOPS 512 GFLOPS 384 GFLOPS
VRAM not yet known 192GB HBM3 128GB HBM3 128GB HBM2e 128GB HBM2e 64GB HBM2e 32GB HBM2 32GB HBM2 16GB HBM2 16GB HBM2 4GB HBM1 16GB GDDR5
Infinity Cache not yet known 256MB 256MB None None None None None None None None None
memory clock not yet known 5.2Gbps 5.2Gbps 3.2Gbps 3.2Gbps 3.2Gbps 1200MHz 1000MHz 1000MHz 945MHz 500MHz 1750MHz
Memory Bus not yet known 8192-bit 8192-bit 8192-bit 8192-bit 4096 bits 4096-bit data bus 4096-bit data bus 4096-bit data bus 2048-bit data bus 4096-bit data bus 256-bit data bus
Memory Bandwidth not yet known 5.3 TB/s 5.3 TB/s 3.2TB/s 3.2TB/s 1.6 TB/s 1.23 TB/s 1TB/s 1TB/s 484GB/s 512GB/s 224GB/s
Cooling not yet known Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling
TDP (Max.) not yet known 750W 760W 560W 500W 300W 300W 300W 300W 300W 175W 150W

source site-32