28 November 2023 Multi-attention aggregation network for remote sensing scene classification
Xin Wang, Yingying Li, Aiye Shi, Huiyu Zhou
Author Affiliations +
Abstract

Remote sensing (RS) scene classification is a highly challenging task because of the unique characteristics of RS scenes, such as high intra-class variability, large inter-class similarity, and various objects with different scales. Attention, interpreted as an important mechanism of the human visual system, can emphasize meaningful features of deep neural networks, which is beneficial for boosting the classification performance. Motivated by it, we present a multi-attention aggregation network (MAANet), which contains various specially designed attention models, for precise RS scene classification. First, a gated attention fluid coding structure is constructed for mining hierarchical gated attention features from RS images. Second, a progressive pyramid refinement architecture is designed to explore correlations of cross-layer attention features to learn enhanced multi-scale representations. Third, a two-stream attention aggregation structure, equipped with three different attention models, is developed to guide the generation of aggregated features. Finally, a scene label prediction module is proposed for scene label prediction. We conduct extensive experiments on three famous RS scene datasets, and the experimental results show that our MAANet outperforms a number of current representative state-of-the-art approaches for the RS scene classification task.

© 2023 Society of Photo-Optical Instrumentation Engineers (SPIE)
Xin Wang, Yingying Li, Aiye Shi, and Huiyu Zhou "Multi-attention aggregation network for remote sensing scene classification," Journal of Applied Remote Sensing 17(4), 046508 (28 November 2023). https://doi.org/10.1117/1.JRS.17.046508
Received: 16 July 2023; Accepted: 14 November 2023; Published: 28 November 2023
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Remote sensing

Convolution

Education and training

Transformers

Scene classification

Semantics

Visualization

Back to Top