- cross-posted to:
- amd
- cross-posted to:
- amd
Can someone please eli5
L3 cache sits on the same chip as the CPU, its incredibly fast and usually pretty small (96MB in this example). This software turns that little bit of memory into a ram disk. Why not make the L3 cache bigger I hear you say - cause its expensive.
Also, the cache is usually used for code so if you use the cache for data your software will run slower.
There are a bunch of parts of a PC that all have the same job of storing data, but the speeds at which they can do that are wildly different.
- HDD: slooooooooooooooooooooooooooooooooooooooooow
- SSD: sloooow
- RAM: faaaaaaaaaaaast
- CPU cache: faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaast
“So why not get more of the fastest stuff?” Because a) it’s way more expensive than the slower options and b) for CPU cache specifically, there isn’t enough physical space on the CPU die to fit much more. (This is why AMD’s “3D V-Cache” was a breakthrough, being able to fit more cache on the die.)
This guy decided to take his CPU cache, and make it pretend to be an SSD. So the tools designed to check SSD speeds try to measure it and report insanely high numbers (because it’s faaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaast)
I mean, at that point the file system overhead would be the biggest issue.
Indeed it is. The raw rate is about 10x that, roughly 2TB/s.
SoftRAM to the rescue!
READ 182,923 MB/sec / WRITE 175,260.
saved you a click
The fastest PCIE 5.0 PCIE NVME SSD drives currently run around 12,500 /11,800 read/write so call it an order of magnitude faster (10x) + 50%.
They used OSFmount to create the ramdrives.
Damn son
But how does it compare to a RAM disk using “regular” RAM?
Because RAM disks are nothing new. ImDisk is a free and opensource software that lets you easily setup a RAM disk.
I use it when running AI upscaling + frame interpolation programs, as they generate tons of temp files which take up many GBs of space. A RAM disk is not only faster but it prevents battering your SSDs with tons of writes.
Holy crap. That’s insane.
You can get similar speeds with PrimoCache
https://old.reddit.com/r/homelab/comments/zxviqq/apparently_primocache_works_pretty_well/j22iytd/
I mean, for consumers PCIE 4.0 M.2 drives already load pretty much everything so fast, that getting these speeds times 10 won’t make a whole lot of difference to an average gamer, for example. But for professional use, this is Huge.
What are you guys on about? The thing with the perfect balance in between is just ram. Am I missing something here?
What kind of professional use though? The cache is only so big.
one will be recording and editing raw cinema quality footage at qualities like 12k (and higher if they even bother inventing that - there’s not really a noticeable improvement in quality past 8k unless you’re zooming in - and that’s assuming you even manage to find a screen that can put something that high quality up)
Computational fluid dynamics (CFD) simulations can only partially be done in parallel, but each simulation step requires predictor/error regularization which is a serial aggregate step. This step is the bottle neck when you try to check if everything in the total simulation adds up correctly, the memory requirement isn’t huge, but it has to happen quickly and all in one place.
If that’s the same article I’ve read a few days back, it also says that’s not entirely true, because maximum speed of that cache is 2 TB/s (you wrote 0.182 TB/s). I think it’s limited by the size, similarly to when you can’t achieve max speed during running due to insufficient road length. Or maybe it’s limited by the sampling rate.
it’s in this article as well "… AMD’s 3D V-Cache can be even faster when used for its intended purpose. First-generation 3D V-Cache is good enough for a 2 TB/s peak throughput, AMD states, while data bandwidth is even higher (2.5 TB/s) on the second-generation variant of the technology…
even losing 90% of it’s peak throughput is still good for .182TB/s and while the peak numbers came up using a 16/32MB dataset on a 96 MB drive , the tech was still able to pull a READ of 111k MB/sec and WRITE of 50k using an 8GB dataset on the same 96MB partition - the author called the results ’ puzzling ’(answered twice because reddit automod removed first post for linking to twitter where the results were posted. if you want to see it, twit handle is GPUsAreMagic)
I’m curious, why is PCIE listed twice? Is it bc you’re saying it’s pcie and also it’s v5 of pcie? Eli15
Not OP and unsure of which specific SSD they are referring to but it could be that PCIE is both a data transfer standard and a physical connection on your computer. Saying “PCIE 5 PCIE SSD” would specify that it both uses PCIE 5 data transmission protocols as well as the physical PCIE slot.
Simply saying “PCIE 5 SSD” could leave some ambiguity as to whether the SSD is installed in a PCIE slot or an M.2 slot, with the latter being more common but also being less powerful (although still more than enough for the average user). Simply saying “PCIE SSD” is even less clear as it could be any PCIE specification in either a PCIE or M.2 slot. Not relevant for this specific question, but saying just “M.2 SSD” would be very unclear as well as you know the physical slot it will go in, but you now open the door for it to use the SATA transmission standard which will bottleneck modern SSDs.
I’ll be right back, I have to go to the ATM Machine.
You want all that extra read/write heat on the CPU dye? Cool, but in current implementation seems like asking for trouble.
CPU cache is already used extremely often, far more than any ssd is. It’s purpose is to store data that the CPU (and it’s multiple threads) is working on and may need for it’s next instructions. As you can imagine, with everything the CPU is constantly doing, there’s a lot of instructions and a lot of cache that’s constantly being swapped out for new data.
If anything, using cpu cache as a RAM disk would probably reduce the amount of read/writes done on the cache, as you’re taking away space for actual caching
Because if they don’t do it, someone else will. The world demands progress and all problems are solvable.
It’s a cool feature to have access to as a software tester, to help replicate issues where loading a table with a lot of rows would take too long from disk.
This belongs in r/camping. Those are S’mores!
So I can uninstall RamDoubler now?
What would be more interesting is if one could boot completely without dram and just use cache as ram. For small latency critical projects this could be great.
In 2019 cloud providers were already discussing using CPU cache for DB in around 2025 based on AMD server CPU architecture plans. So I imagine this is 1-2 steps below cutting edge technology of what’s going on behind the scenes between top cloud providers and AMD. Still really cool though, but top of the line AMD server CPUs could be doing some insane result retrieval speeds with their large cache.
Imagine the Minecraft server you could run off that.
M.2 isn’t fast enough?
*NVMe isn’t fast enough?
This reminds me of the “I can’t see how we’ll ever need more than 1MB of RAM.” statement. I forget who said it but it stuck with me.
Bill Gates
He denies saying it, and there is no record of him saying it.
I don’t think he’d have said that, he had already seen massive improvements in technology.
Am I reading this correctly!?!?
EARLIER THIS WEEK, in a column on Bill Gates, fellatio and media, and how all three relate to a profile of Gates in last week’s Time magazine, this column daringly offered free software into the millennium to anyone who remembers one thing Bill Gates ever said.
Random reads no, sequential reads yes, m2 is plenty fast
4K reads, pukes
This is like 3 nanoseconds faster. Think of all the microseconds you will save over the lifetime of the device
Are we in the future?
Cool, but for 99.9% of people, completely useless.
You’ve got to do something with that data. And that turns out to be a pretty darn difficult problem. Even at regular NVMe speeds, developers have to pay very careful attention to performance, and often make the right design decisions (like choosing the right compression algorithms).
Because otherwise you might go with something like LZMA, which is an okay choice for a hard disk, but will absolutely become a huge bottleneck on a NVMe, nevermind this.
Cool, but for 99.9% of people, completely useless.
Not enough 9s. Heck, I think it’s a true 100%, but it’s still cool.