AMD has had traidionally very competitive FLOPs with their shaders. The issue is that their software stack, for lack of a better word is; shit.
For specific customers, like national labs or research institutions, they can afford to pay a bunch of poor bastards to develop some of the compute kernels using the shitty tools. Because at the end of the day, most of their expenses are in terms of electricity and hardware, with salaries not being the critical cost for some of these projects. I.e. grad students are cheap!
However, when it comes to industry, things are a bit difference. First off, nobody is going to take a risk w a platform with little momentum behind it. Also they need to have access to talent pool that can develop and get the applications up and running as soon as possible. Under those scenarios, salaries (i.e. the people developing the tools) tend to be almost as important consideration as the HW. So you go with the vendor that gives you the biggest bang for your buck in terms of performance and time to market. And that is where CUDA wins hands down.
At this point AMD is just too behind, at least to get significant traction in industry.
Apparently the AMD significantly outperforms Nvidia in specific calculations used for nuclear weapons simulation software.
Most people dont realize this is the original justification for the bans.
Amd is better at fp32 and FP64
During 2017 ish Nvidia and Amd focused on different parts with data centre cards.
Amd went in on Compute with fp32 and fp64.
Nvidia went full in on AI with Tensor cores and fp16 performance.
Amd got faster than Nvidia in some tasks. But Nvidia’s bet on AI is the clear winner.
Not FP32, MI300 has 48 TFLOPS, H100 has 60TFLOPs
https://www.topcpu.net/en/cpu/radeon-instinct-mi300
https://www.nvidia.com/en-us/data-center/h100/#:~:text=H100 triples the floating-point,of FP64 computing for HPC.
AMD FP64 still gaps Nvidia who in turn gap FP16
Nobody knows the actual flops of the mi300
The mi250x had 95.7 tflops of fp32 due the matrix cores
https://www.amd.com/en/products/server-accelerators/instinct-mi250x
That’s more than the H100 even
AMD has had traidionally very competitive FLOPs with their shaders. The issue is that their software stack, for lack of a better word is; shit.
For specific customers, like national labs or research institutions, they can afford to pay a bunch of poor bastards to develop some of the compute kernels using the shitty tools. Because at the end of the day, most of their expenses are in terms of electricity and hardware, with salaries not being the critical cost for some of these projects. I.e. grad students are cheap!
However, when it comes to industry, things are a bit difference. First off, nobody is going to take a risk w a platform with little momentum behind it. Also they need to have access to talent pool that can develop and get the applications up and running as soon as possible. Under those scenarios, salaries (i.e. the people developing the tools) tend to be almost as important consideration as the HW. So you go with the vendor that gives you the biggest bang for your buck in terms of performance and time to market. And that is where CUDA wins hands down.
At this point AMD is just too behind, at least to get significant traction in industry.