Quantization War Stories: INT8 and the Accuracy Cliff I Didn't Predict

Post-training quantization to INT8 worked beautifully on the benchmark — and lost 4 points on our production distribution. An anatomy of the failure.

Overview

This note is part of the field-notes archive generated for this site. The summary below is the published excerpt; you can expand the full write-up anytime in the CMS.

Series

Part of ML in Production (installment 2).

Related notes

Ml Drift Monitoring Production
Ml Paper Reproducibility Notes

Quantization War Stories: INT8 and the Accuracy Cliff I Didn't Predict

Overview

Series

Related notes

Tags

Tags

Manish Bookreader

You Might Also Like

Attention Is All You Need: Notes Seven Years Later

Post-Quantum Cryptography in Embedded Systems: CRYSTALS-Kyber on Cortex-M4

Designing for Debuggability: The System Properties That Make 2 AM Easier