Quantization, Explained: Why Big Models Run on Small Hardware
A 7B model is 14 GB in full precision. A 70B is 140 GB. Quantization is the trick that brings those numbers down to something your machine can hold — and the tradeoff is smaller than most people expect.