We propose a novel deep learning approach which performs building semantic segmentation of large-scale textured 3D meshes, followed by a polygonal extraction of footprints and heights. Extracting accurate individual building structures poses a challenge due to the complexity and the variety of architecture and urban designs, where a single overhead image is not enough. Integrating elevation data from a 3D mesh allows to better distinguish individual buildings in three-dimensional space. Another advantage is to avoid occlusion issues in the case of oblique imagery, where tall buildings mask smaller buildings behind them in the case of non-nadir images (especially problematic in urban areas). The proposed method transforms the input data from a 3D textured mesh to a true orthorectified RGB image by rendering both the color information and the depth information from a virtual camera looking straight down. Depth information is then converted to a normalized DSM (nDSM) by subtracting the Copernicus GDEM v3 30-meter Digital Elevation Model (DEM). Viewing the 3D textured mesh as a four-band raster image (RGB + nDSM) allows us to use a very efficient fully convolutional neural network based on the U-net architecture for processing large-scale areas. The proposed method was evaluated on three urban areas in Brazil, America, and France. It allows a fourfold improvement in productivity for cartography of buildings in complex urban areas.
Semantic segmentation on satellite images is used to automatically detect and classify objects of interest over very large areas. Training a neural network for this task generally requires a lot of human-made ground truth classification masks for each object class of interest. We aim to reduce the time spent by humans in the whole process of image segmentation by learning generic features in an unsupervised manner. Those features are then used to leverage sparse human annotations to compute a dense segmentation of the image. This is achieved by essentially labeling groups of semantically similar pixels at once, instead of labeling each pixel almost individually using strokes. While we apply this method to satellite images, our approach is generic and can be applied to any image and to any class of objects on that image.
Automated 3D reconstruction of urban models from satellite images remains a challenging research topic, with many interesting outcoming applications, such as telecommunications and urban simulation. To reconstruct 3D city from stereo pairs of satellite images, semi-automatic strategies are typically applied, which are based either on procedural modeling, or on the use of both image processing and machine learning methods to infer scene geometries together with semantics. In both cases, human interaction still plays a key role, in particular for the rooftop buildings extraction. In the last decade, the use of deep learning algorithms, notably convolutional neural networks (CNNs), has shown a remarkable success for automatic image interpretation. We propose an approach using CNN architecture to automatize the procedure of building contour extraction with the final purpose to automatize 3D urban reconstruction chain and improve the quality of the generated city models. The developed algorithm consists of three steps: 1) We apply a mask-based normalization technique to the input image. 2) CNN network is applied to obtain a raster map of buildings. 3) A polygonization algorithm is designed, which processes a raster map of building to output an ensemble of building contours. We have adopted a U-Net neural network for building segmentation task. We compare the use of several U-Net architectures with the purpose to retain the best suited model. To train models, we have built a dataset of high-resolution satellite images over 15 different cities, and the corresponding building masks. The experimental results show that the proposed approach succeeds in predicting building polygons in a short time, and exhibits good generalization properties to be applied on diverse Earth areas. The developed algorithm combined with the existing LuxCarta reconstruction chain improves 3D urban scene modeling results, and thus supplies an important step towards the automatic reconstruction of 3D city scenes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.