devRant - A fun community for developers to connect over code, tech & life as a programmer

Search - "quantization"

9

Wisecrack

9365

2y

New models of LLM have realized they can cut bit rates and still gain relative efficiency by increasing size. They figured out its actually worth it.

However, and theres a caveat, under 4bit quantization and it loses a *lot* of quality (high perplexity). Essentially, without new quantization techniques, they're out of runway. The only direction they can go from here is better Lora implementations/architecture, better base models, and larger models themselves.

I do see one improvement though.
By taking the same underlying model, and reducing it to 3, 2, or even 1 bit, assuming the distribution is bit-agnotic (even if the output isn't), the smaller network acts as an inverted-supervisor.

In otherwords the larger model is likely to be *more precise and accurate* than a bitsize-handicapped one of equivalent parameter count. Sufficient sampling would, in otherwords, allow the 4-bit quantization model to train against a lower bit quantization of itself, on the theory that its hard to generate a correct (low perpelixyt, low loss) answer or sample, but *easy* to generate one thats wrong.
And if you have a model of higher accuracy, and a version that has a much lower accuracy relative to the baseline, you should be able to effectively bootstrap the better model.

This is similar to the approach of alphago playing against itself, or how certain drones autohover, where they calculate the wrong flight path first (looking for high loss) because its simpler, and then calculating relative to that to get the "wrong" answer.

If crashing is flying with style, failing at crashing is *flying* with style.

random ml . chatgpt diffusion machine learning

15

Top Tags

rant linux code windows fuck i java c android programming dev the is javascript js a life joke python

Weekly Rant

Most unrealistic deadline you've had?

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service