Interesting trade offs with networking. It’s got to be more intensive to do all the smart networking at the switch, in a centralized way. If it was done the NICs, the rules policy would be smaller as it’s covering fewer endpoints. And Intel discontinued Tofino, so the next iteration will have to be different. 100G ports seem unimpressive. If the NICs aren’t smart, how many PCIe virtual functions do they expose and how is failover handled when one of the switches is down? And it’s interesting to see a session/connection-based extension to P4. Wonder if the industry will see a need for it.
For storage, I wonder what the overhead will be for I/O on ZFS. People are doing crazy things in public clouds, like caching EBS volumes with local ephemeral NVMes to get the lowest latency with full persistence. Whereas this will always take a penalty because of ZFS.
Finally, it’s interesting that they’re going with some Solaris descendant (?) as the host/dom0 OS. Everyone else uses Linux KVM or Xen for this.
Sounds very opinionated, but it might turn out to be a very convenient turn-key solution for customers.
Just my two cents, not my area, sounds pretty good for a first-gen product, looking forward to reading reports from their customers, etc.
But what are the accelerators doing? AMX surely isn’t being used because it contributes nothing compared to 8 massive GPUs? I doubt QAT has a use either, since these GPUs are going to be fed by p2p dma from the 400Gbps NICs they each have.