31 January 2025 TransUNetFormer: let hybrid convolutional neural network + transformer encoder and decoder provide powerful support for remote sensing image segmentation
Siyong Liu, Yili Zhao
Author Affiliations +
Abstract

We focus on the complexity and multi-scale importance of semantic segmentation of remote sensing images, a fundamental task in earth science research. We propose an architecture, TransUNetFormer, that is potent such as U-Net designed that extensively integrates convolutional neural network (CNN) + transformer fusion in both the encoder and decoder; this fusion emphasizes the significance of global contextual information and local feature details. TransUNetFormer achieves superior generalization for remote sensing image segmentation, particularly in capturing multi-scale features within its encoder-decoder architecture. The encoder incorporates design principles inspired by TransUNet, leveraging a CNN + transformer component for an efficient hybrid. In addition, a CNN + transformer hybrid block in the decoder, DP-hybrid, efficiently captures rich global-local features at each upsampling step. We introduce a fusion-concatenation module to dynamically generate weights during the interaction between the encoder and decoder, facilitating feature map fusion. Finally, an efficient feature refinement segmentation head is devised to purify shallow-stage encoder features and optimize the most profound global-local features in the decoder for fusion output. Experimental results on two widely used datasets, ISPRS Potsdam and LoveDA Urban, show the effectiveness and potential of TransUNetFormer. To our knowledge, this is the first hybrid CNN + transformer network specifically designed for remote sensing image segmentation.

© 2025 SPIE and IS&T

Funding Statement

Siyong Liu and Yili Zhao "TransUNetFormer: let hybrid convolutional neural network + transformer encoder and decoder provide powerful support for remote sensing image segmentation," Journal of Electronic Imaging 34(1), 013024 (31 January 2025). https://doi.org/10.1117/1.JEI.34.1.013024
Received: 10 October 2024; Accepted: 3 January 2025; Published: 31 January 2025
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Image segmentation

Transformers

Remote sensing

Semantics

Convolutional neural networks

Feature fusion

Head

Back to Top