I am trying to understand how the GPUs work currently.
I saw that Apple M2 Max has for example 30 GPU cores. I was surprised because I heard that GPU have hundreds of cores. So I made a bit research and I got my answer:
on Apple silicon each core is made up of 16 execution units, which each have 8 distinct compute units (ALU)
That makes more sense now. 30 cores is actually 30*16*8=3840 execution units.
But why separate in “cores” like this then ? I heard that GPUs, conversely to GPUs can have all their units corking on the SAME task.
why separate in 16 then in 8 rather than a full grid ? I don’t get it.
Does it have implications or is it just pure marketing ?
All GPUs are organized in multi-level hierarchies. Nvidia has GPCs that contain TPCs which contain SMs (probably the closest thing to a “core”) which are split into partitions containing the actual ALUs. There’s quite a lot of other hardware spread out over these different levels like caches, fixed function units like blending or texture filtering, etc.
You can find more details in documents like these:
https://images.nvidia.com/aem-dam/Solutions/Data-Center/l4/nvidia-ada-gpu-architecture-whitepaper-v2.1.pdf
https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna3-shader-instruction-set-architecture-feb-2023_0.pdf
https://cdrdv2-public.intel.com/758302/introduction-to-the-xe-hpg-architecture-white-paper.pdf