For this month's GPU pricing update, we have Nvidia's upcoming GeForce 40 Super rumors, current-gen GPU pricing, and what sort of deals you should be looking out...
Like the gguf’d Mistral 7b versions that are lighter on memory, for example. I need fast inference and I don’t really feel like depending on OpenAI, or paying them a bunch of money. I’ve fucked up and spent like $200 on api charges before, so definitely trying to avoid that.
I have a 980ti and it’s just too damn old. It works with some stuff but it’s super hit or miss with any of the newer libraries.
Consider something with a bit more VRAM, like a 2080ti or a 3080. The extra headroom of 10gbs will help with higher quantisation precision (e.g. 4 bit vs 8 bit). You need a bit over 14GBs to run full 7B models w/o quantisation, so you’ll need quantisation.
Best cheapest option to run smaller ai models on?
Like the gguf’d Mistral 7b versions that are lighter on memory, for example. I need fast inference and I don’t really feel like depending on OpenAI, or paying them a bunch of money. I’ve fucked up and spent like $200 on api charges before, so definitely trying to avoid that.
I have a 980ti and it’s just too damn old. It works with some stuff but it’s super hit or miss with any of the newer libraries.
Maybe a used RTX 30 series? They have tensor cores which helps a lot ini running AI stuff. I got a used 3070 for $240 a few weeks ago.
Consider something with a bit more VRAM, like a 2080ti or a 3080. The extra headroom of 10gbs will help with higher quantisation precision (e.g. 4 bit vs 8 bit). You need a bit over 14GBs to run full 7B models w/o quantisation, so you’ll need quantisation.