In recent years, vision transformers (ViTs) have made significant breakthroughs in computer vision and have demonstrated great potential in large-scale models. However, the quantization methods for convolutional neural network models do not perform well on ViTs models, leading to a significant decrease in accuracy when applied to ViTs models. We extend the quantization parameter optimization method based on the Hessian matrix and apply it to the quantization of the LayerNorm module in ViT models. This approach reduces the impact of quantization on task accuracy for the LayerNorm module and enables more comprehensive quantization of ViT models. To achieve fast quantization of ViTs models, we propose a quantization framework specifically designed for ViTs models: Hessian matrix–aware post-training quantization for vision transformers (HAPTQ). The experimental results on various models and datasets demonstrate that our HAPTQ method, after quantizing the LayerNorm module of various ViT models, can achieve lossless quantization (with an accuracy drop of less than 1%) in ImageNet classification tasks. Specifically, the HAPTQ method achieves 85.81% top-1 accuracy on the ViT-L model. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Quantization
Visual process modeling
Performance modeling
Matrices
Data modeling
Transformers
Object detection