without scaling DRAM bandwidth anywhere near as much, only partially compensating for that with a much bigger L2.
For 5090 on the other hand we might also have clock increase going (another 1.15x?), and proportional 1:1 (unlike Ampere -> Ada) DRAM bandwidth increase by a factor of 1.5 due to GDDR7 (no bus width increases necessary; 1.5 = 1.3 * 1.15), so this is 1.5x perf increase 4090 -> 5090, which has to be further multiplied by whatever u-architectural improvements might bring, like Qesa is saying.
Unlike Qesa, though, I’m personally not very optimistic regarding those u-architectural improvements being very major. To get from 1.5x that comes out of node speed increase and the node shrink subdued and downscaled by node cost increase, to recently rumored 1.7x one would need to get (1.7 / 1.5 = 1.13) 13% perf and perf/w improvement, which sounds just about realistic. I’m betting it’ll be even a little bit less, yielding more like 1.6x proper average, that 1.7x might have been the result of measuring very few apps or outright “up to 1.7x” with “up to” getting lost during the leak (if there was even a leak).
1.6x is absolutely huge, and no wonder nobody’s increasing the bus width: it’s unnecessary for yielding a great product and even more expensive now than it was on 5nm (DRAM controllers almost don’t shrink and are big).
If I understand this article and what kopite7kimi said correctly, it sounds like a 33% cache increase, which he assumed meant a 33% memory controller increase. So 128MB, which they derived 512 bit from originally. That’s not that huge of a jump of cache compared to the current 96 it seems to me.
GDDR7 is supposed to start at 32Gbps, but there is also some claims of 36 Gbps. If you average to the cache (33%) and memory speed (60%) increase we’re talking maybe 45% more effective bandwidth.
More memory bandwidth does not translate 1:1 to more performance. The GPU core is by far the most important. Even at 4K the current 1TB/s memory bandwidth is sufficient and overclocking the core is what gets you the most performance.
We’ve also seen that the 128-bit 4060Ti 16GB with its pitiful bandwidth can utilize its full 16GB VRAM without any issues at 1440P.
So if you’re trying to estimate performance gains, the core is where you should look for now, especially if Blackwell keeps the increased L2 cache (Ampere’s cache was measured in kilobytes, it was a radical change and it definitely worked well for AMD with RDNA2 too). Unless you’re doing 8K gaming the extra memory bandwidth will have minimal impact.
without scaling DRAM bandwidth anywhere near as much, only partially compensating for that with a much bigger L2.
For 5090 on the other hand we might also have clock increase going (another 1.15x?), and proportional 1:1 (unlike Ampere -> Ada) DRAM bandwidth increase by a factor of 1.5 due to GDDR7 (no bus width increases necessary; 1.5 = 1.3 * 1.15), so this is 1.5x perf increase 4090 -> 5090, which has to be further multiplied by whatever u-architectural improvements might bring, like Qesa is saying.
Unlike Qesa, though, I’m personally not very optimistic regarding those u-architectural improvements being very major. To get from 1.5x that comes out of node speed increase and the node shrink subdued and downscaled by node cost increase, to recently rumored 1.7x one would need to get (1.7 / 1.5 = 1.13) 13% perf and perf/w improvement, which sounds just about realistic. I’m betting it’ll be even a little bit less, yielding more like 1.6x proper average, that 1.7x might have been the result of measuring very few apps or outright “up to 1.7x” with “up to” getting lost during the leak (if there was even a leak).
1.6x is absolutely huge, and no wonder nobody’s increasing the bus width: it’s unnecessary for yielding a great product and even more expensive now than it was on 5nm (DRAM controllers almost don’t shrink and are big).
If I understand this article and what kopite7kimi said correctly, it sounds like a 33% cache increase, which he assumed meant a 33% memory controller increase. So 128MB, which they derived 512 bit from originally. That’s not that huge of a jump of cache compared to the current 96 it seems to me.
GDDR7 is supposed to start at 32Gbps, but there is also some claims of 36 Gbps. If you average to the cache (33%) and memory speed (60%) increase we’re talking maybe 45% more effective bandwidth.
More memory bandwidth does not translate 1:1 to more performance. The GPU core is by far the most important. Even at 4K the current 1TB/s memory bandwidth is sufficient and overclocking the core is what gets you the most performance.
We’ve also seen that the 128-bit 4060Ti 16GB with its pitiful bandwidth can utilize its full 16GB VRAM without any issues at 1440P.
So if you’re trying to estimate performance gains, the core is where you should look for now, especially if Blackwell keeps the increased L2 cache (Ampere’s cache was measured in kilobytes, it was a radical change and it definitely worked well for AMD with RDNA2 too). Unless you’re doing 8K gaming the extra memory bandwidth will have minimal impact.