• YoloSwaggedBased@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Consider something with a bit more VRAM, like a 2080ti or a 3080. The extra headroom of 10gbs will help with higher quantisation precision (e.g. 4 bit vs 8 bit). You need a bit over 14GBs to run full 7B models w/o quantisation, so you’ll need quantisation.