Consider something with a bit more VRAM, like a 2080ti or a 3080. The extra headroom of 10gbs will help with higher quantisation precision (e.g. 4 bit vs 8 bit). You need a bit over 14GBs to run full 7B models w/o quantisation, so you’ll need quantisation.
Consider something with a bit more VRAM, like a 2080ti or a 3080. The extra headroom of 10gbs will help with higher quantisation precision (e.g. 4 bit vs 8 bit). You need a bit over 14GBs to run full 7B models w/o quantisation, so you’ll need quantisation.