Quantization War Stories: INT8 and the Accuracy Cliff I Didn't Predict
Post-training quantization to INT8 worked beautifully on the benchmark — and lost 4 points on our production distribution. An anatomy of the failure.
Post-training quantization to INT8 worked beautifully on the benchmark — and lost 4 points on our production distribution. An anatomy of the failure.
Overview
This note is part of the field-notes archive generated for this site. The summary below is the published excerpt; you can expand the full write-up anytime in the CMS.
Series
Part of ML in Production (installment 2).
Related notes
Tags
- quantization
- machine-learning
- llm
- inference
- production
Manish Bookreader
Electronics enthusiast, Embedded Systems Expert, Linux/Networking programmer, and Software Engineer passionate about AI, electronics, books, and cooking.