@Plazmatic

Plazmatic@alien.top · 1 year ago

AMD primitive shader issues on VEGA
AMD bugs with dynamic parallelism despite to the point it disabled the feature later, but also is required for raytracing extension to work in the first place.
AMDs variable rate shading was supposed to be on VEGA, didn’t happen, then was supposed to be on 5000 series, also didn’t happen (might be wrong about the 5000 series, but that’s what I remember, and afaik, and tried to pass off dynamic resolution as a type of VRS as a consolation, which it isn’t)
AMD only supporting harware ROV in their windows DX12 drivers, despite it being implemented by the mesa team for vulkan.

Plazmatic@alien.top · 1 year ago

Im an expert in this area, I won’t reveal more than that. My understanding, though I could be wrong, is that AMDs biggest issue with raytracing, is they don’t do workload rescheduling, which Nvidia reports to have given a 25% performance uplift on its own, and Intel has from the get go.

Basically, RT cores determine where a ray hit, and what material shader to use, but they don’t actually execute material shaders, they just figure out which one to call for that bounce. This then has to be fed back to normal compute cores, but groups of compute cores need to have the same instruction to execute that instruction in parallel, otherwise each instruction to be executed in the “subgroup” will have to be executed serially (in sequence). So what Nvidia and Intel do is reorder instructions first before handing them off to compute subgroups to increase performance.

Im not sure why AMD didn’t bother with this, but they have in recent history had hardware/driver bugs that caused them to scrap entire features on thier GPUs.

Now the upscaling and AI tech thing is a different issue. While AMD isn’t doing well power efficiency wise right now anyway, adding tensor cores, the primary driver for Nvidias ML capabilities, means sacrifices to power efficiency and die space. What I believe AMD wants to do is focus of generalized fp16 performance. This can actually be useful in non ML workloads, like HDR and other generalized low precision applications, or with sparse neural networks. (where tensor cores aren’t, they can’t be used at the same time IIRC as CUDA cores, where at least on Nvidia, fp16 and fp32 can execute at the same time within the same CUDA core/warp/subgroup)

We can see power issues on the low end especially, Jetson orins ( ampere) don’t beat or barely beat jetson tX2s (8 year old hardware, pascal) at the same power draw, and more than doubled the “standard performance” power draw.

Im addition to power draw, and tensor cores being dead weight for non ML, fully dedicated ASICs are the future for AI, not ML acceleration duct taped to GPUs, which can already accelerate it with out specialized hardware. See Microsoft news, Google Amazon, Apple, and even AMD looking to put ML acceleration on CPUs instead as a side thing (like integrated graphics).

AMD probably doesn’t want to go down that route since inevitably they are going to cease using that with GPUs in the future.

Finally, DLSS 2.0 quality upscaling should now be possible at acceptable speeds using AMDs fp16 capabilities. GPUs are so fast now that the fixed cost of DLSS is now small enough to be carried out by compute cores. AMDs solution this far has been pretty lacking give their own capabilities. Getting the data for this is very capital intensive, and it’s likely AMD still doesn’t want to spend the effort to make a better version of far n.0 despite it essentially being a software problem on the 7000 series.