11 January 2025 Hessian matrix–aware comprehensive post-training quantization for vision transformers
Weixing Zhang, Zhuang Tian, Nan Lin, Cong Yang, Yongxia Chen
Author Affiliations +
Abstract

In recent years, vision transformers (ViTs) have made significant breakthroughs in computer vision and have demonstrated great potential in large-scale models. However, the quantization methods for convolutional neural network models do not perform well on ViTs models, leading to a significant decrease in accuracy when applied to ViTs models. We extend the quantization parameter optimization method based on the Hessian matrix and apply it to the quantization of the LayerNorm module in ViT models. This approach reduces the impact of quantization on task accuracy for the LayerNorm module and enables more comprehensive quantization of ViT models. To achieve fast quantization of ViTs models, we propose a quantization framework specifically designed for ViTs models: Hessian matrix–aware post-training quantization for vision transformers (HAPTQ). The experimental results on various models and datasets demonstrate that our HAPTQ method, after quantizing the LayerNorm module of various ViT models, can achieve lossless quantization (with an accuracy drop of less than 1%) in ImageNet classification tasks. Specifically, the HAPTQ method achieves 85.81% top-1 accuracy on the ViT-L model.

© 2025 SPIE and IS&T
Weixing Zhang, Zhuang Tian, Nan Lin, Cong Yang, and Yongxia Chen "Hessian matrix–aware comprehensive post-training quantization for vision transformers," Journal of Electronic Imaging 34(1), 013009 (11 January 2025). https://doi.org/10.1117/1.JEI.34.1.013009
Received: 8 September 2024; Accepted: 23 December 2024; Published: 11 January 2025
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Quantization

Visual process modeling

Performance modeling

Matrices

Data modeling

Transformers

Object detection

Back to Top