Do today chips still have die space for old instruction set?

nvmnghia@alien.top · 2 years ago

Do today chips still have die space for old instruction set?

jaaval@alien.top · 2 years ago

Not really in most cases. The decoder might need to spend some more transistors to accommodate the instructions but that should not be much. And the very oldest never used ones can be thrown to some very slow microcode rom or something. In the execution side SSE uses the same registers as the latest AVX does. And the low level compute operations actually done by the execution units are the same. You need to understand that each instruction is actually translated to one or more micro operation by the decoder, they are not direct execution control data.

However there are some old no longer used features in x86 CPUs that do complicate the design somewhat. And there are instructions connected to those features. But that’s really not the instructions themselves using the die area. Intel’s x86s standard proposes to remove for example the middle privilege level rings and call gates from the CPUs. As well as some no longer relevant memory access modes.

YumiYumiYumi@alien.top · 2 years ago

why can’t they implement SSE3 by other, more powerful instrutions (like AVX)

In short, the instruction semantics are slightly different, so they don’t do exactly the same thing. But it’s likely that the execution unit hardware is re-used for those.

scfw0x0f@alien.top · 2 years ago

It’s not the die space that’s the issue; it’s the time to validate the correct operation of those instructions with a pipeline that’s designed for something very different.

Jannik2099@alien.top · 2 years ago

No, no CPU has seperate FPUs for SSE & AVX - it’s compiled to the same set of uOps by microcode.

Recent x86 CPUs go as far as implementing x87 in the 128b FPU too.

wintrmt3@alien.top · 2 years ago

uOps by microcode

That’s not how it works, only a few overtly complex instructions are implemented in microcode and they are slow, most instructions use a random logic decoder.

GomaEspumaRegional@alien.top · 2 years ago

in x86 that’s not the case, only the critical path x86 instructions are implemented directly in logic lookup tables in the decoder. Some of the less used ones are on the uCode ROM on chip. And a bunch more on PAL code on off-chip ROM. And a few of the rarest ones are on the exception manager libraries of the OS.

A big chunk of the x86 ISA is rarely used so this tiered implementation has been used at least since Nehalem if not before.

einmaldrin_alleshin@alien.top · 2 years ago

The x86 instructions go through a translation layer that turns them into CPU specific instructions (microcode). So the CPU doesn’t need any specific hardware to be compatible with these old instructions, it just needs to know how to get the same result with microcode.

wintrmt3@alien.top · 2 years ago

You are confusing microcode and micro-ops.

nvmnghia@alien.top · 2 years ago

what is microcode is, then?

wintrmt3@alien.top · 2 years ago

It’s a way of creating a sequential control circuit based on a piece of memory holding the outputs and next state for each state.

DdCno1@alien.top · 2 years ago

Are there performance losses or gains through this translation?

th3typh00n@alien.top · 2 years ago

This is incorrect. Very few x86 instructions uses microcode as the microcode engine is quite slow. It’s mainly used for things like cpuid and such.

GomaEspumaRegional@alien.top · 2 years ago

A lot of x86 ISA is in the micro and PAL codes. Only the most frequent and performance-limiting ones are on-core for modern x86.

x86 is a huge set, so “very few” is a relative term ;-)

FenderMoon@alien.top · 2 years ago

Microcode is used very heavily in modern CPUs. It has been since the 90s.

AutonomousOrganism@alien.top · 2 years ago

Modern x86 chips are so large that the space the decoder takes is relatively small.

It would be a different story if you wanted a tiny cheap low power chip. Then you might be better off with ARM or RISC-V.

symmetry81@alien.top · 2 years ago

The way x86 instructions are variable length and not self-synchronizing means that you can see up to 15% of your core’s power budget go to decode if you aren’t running in the small cache of decoded instructions, at least a few generations ago when last I heard. That isn’t huge but it does mean that x86 architects have to put thought into how wide to make it, they can’t just size it to make sure it’s never a bottleneck like ARM designers can.

CHAOSHACKER@alien.top · 2 years ago

https://chipsandcheese.com/2021/07/13/arm-or-x86-isa-doesnt-matter/

GomaEspumaRegional@alien.top · 2 years ago

Mate, the 90s were a few decades back. ;-)

x86 decoding hasn’t been a limiter since then.

einmaldrin_alleshin@alien.top · 2 years ago

The x86 instructions go through a translation layer that turns them into CPU specific instructions (microcode). So the CPU doesn’t need any specific hardware to be compatible with these old instructions, it just needs to know how to get the same result with microcode.

AutonomousOrganism@alien.top · 2 years ago

Modern x86 chips are so large that the space the decoder takes is relatively small.

It would be a different story if you wanted a tiny cheap low power chip. Then you might be better off with ARM or RISC-V.

Jannik2099@alien.top · 2 years ago

No, no CPU has seperate FPUs for SSE & AVX - it’s compiled to the same set of uOps by microcode.

Recent x86 CPUs go as far as implementing x87 in the 128b FPU too.