Focal ViT: image transformer catches up with CNN on small datasets

Bin Chen; Xin Feng

doi:10.1117/12.2681103

23 May 2023 Focal ViT: image transformer catches up with CNN on small datasets

Bin Chen, Xin Feng

Proceedings Volume 12645, International Conference on Computer, Artificial Intelligence, and Control Engineering (CAICE 2023); 1264519 (2023) https://doi.org/10.1117/12.2681103
Event: International Conference on Computer, Artificial Intelligence, and Control Engineering (CAICE 2023), 2023, Hangzhou, China

Abstract

Recent advances of transformers have brought new trust to computer vision tasks. However, on small dataset, transformers is hard to train and have lower performance compared to convolutional neural networks. We make vision transformers as data efficient as convolutional neural networks by introducing focal attention. Inspired by the local attention networks, we constrained the self-attention of ViT to have multi-scale localized receptive field. We provide empirical evidence that proper constrain of receptive field reduce the amount of training data for vision transformers. Our best model reaches 83.16% accuracy when training from scratch on CIFAR-100 which is a significant improvement in data efficiency over the previous transformer. We also perform analysis on ImageNet to show our method does not lose accuracy on large datasets.

Citation Download Citation

Bin Chen and Xin Feng "Focal ViT: image transformer catches up with CNN on small datasets", Proc. SPIE 12645, International Conference on Computer, Artificial Intelligence, and Control Engineering (CAICE 2023), 1264519 (23 May 2023); https://doi.org/10.1117/12.2681103

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available