Open Access Paper
13 September 2024 Evaluating the effectiveness of improved digital soil maps, generated through a hybrid CNN-XGBoost approach, for estimating soil loss due to water erosion
N. Tziolas, N. Samarinas, I. Tsividis, G. Zalidis
Author Affiliations +
Proceedings Volume 13212, Tenth International Conference on Remote Sensing and Geoinformation of the Environment (RSCy2024); 132120T (2024) https://doi.org/10.1117/12.3037235
Event: Tenth International Conference on Remote Sensing and Geoinformation of the Environment (RSCy2024), 2024, Paphos, Cyprus
Abstract
Improved management of grazing resources has proven to be effective in mitigating soil erosion and enhancing carbon sequestration. Efficient monitoring of soil descriptors plays a crucial role in achieving this goal, as it provides valuable information for evaluating soil loss estimation by water erosion based on the Revised Universal Soil Loss Equation (RUSLE) model. The accuracy of RUSLE model depends on the quality of the input soil data, namely, soil texture and organic carbon. However, the existing soil spatial products are created using conventional machine learning methods, which combine spaceborne spectral input data with environmental covariates, resulting in moderate performance and coarse resolution. Therefore, novel approaches are needed to tackle the challenge posed by the synergistic framework of data analytics, which require effective fusion of multispectral data with environmental and topographical covariates. In this study, we explore the potential of employing a deep learning architecture to obtain a new data representation from spaceborne Sentinel-2 information for the regression task. Concurrently, we feed an eXtrem Gradient Boosting (XGBoost) regressor, with the (128) features extracted by a convolution neural network (CNN). Additionally, 85 spatial layers, representing landscape features, and bioclimatic variables, have also been used as input features in the XGBoost regressor. The CNN-XGBoost model was trained using a subset of 83 Greek soil samples corresponding to grassland from the LUCAS 2015 dataset. The generation of enhanced soil input layers, including clay and organic carbon, resulted in a reduction of RMSE. These spatial products were integrated into the RUSLE to improve the soil erodibility factor, leading to the creation of a soil erosion layer with higher spatial resolution (10m). Mapping conducted at a study site with significant areas of grasslands in Elassona, Greece, highlight the importance of our approach compared to existing soil products.

1.

INTRODUCTION

Grasslands should be managed in a manner to promote simultaneously both soil and rangeland quality, especially in countries that base their agricultural production on livestock farming. Therefore, there is a need for improved management of grazing resources which can result in productive bio-diverse grasslands, also mitigating soil erosion and enhancing carbon sequestration [1]. In this context, recent studies highlight the need for efficient monitoring of key soil descriptors to efficiently estimate soil loss by water erosion based on the Revised Universal Soil Loss Equation (RUSLE) model with higher accuracy [2] while state-of-the-art Artificial Intelligence (AI) and data mining techniques are considered critical to achieve this goal [3]. More specific, the soil layers then introduced into the RUSLE’s soil erodibility factor (K-factor), producing a more reliable soil erosion layer and with improved spatial resolution.

The existing soil explicit indicators provided information in moderate performance and coarse resolution, mainly relying on environmental covariates fed into machine learning models [4]. Several techniques, such as Random Forest and eXtreme Gradient Boosting (XGBoost), have resulted promising results, while deep learning techniques have also been used with some of them proposing a synergistic framework. On the other hand, the Sentinel-2 satellite has been extensively employed to map the soil texture from multispectral data. However, simple merging techniques may not fully exploit the complementary nature of the data, potentially resulting to information loss or misinterpretation.

Therefore, novel approaches are needed to tackle the challenge posed by the synergistic framework of Earth Observation (EO) data analytics, which require effective fusion of multispectral data with environmental and topographical covariates. In this study, we explore the potential of employing a deep learning architecture to obtain a new data representation from spaceborne Sentinel-2 information for the regression task. Concurrently, we employ a XGBoost regressor, using features extracted by a convolutional neural network (CNN).

2.

MATERIALS AND METHODS

2.1

Study Area

For this work we have selected a mountainous agricultural area in the Elassona region, Greece. The region is characterized by a Mediterranean climate (temperature: 30-35°C and precipitation: 600-800 mm) with continental influences due to its inland position and varied elevation. Due to its fertile valleys and slopes, Elassona has a significant portion of the Greek livestock capital, particularly in terms of goat and sheep livestock, some of which involve free grazing. Therefore, assessing environmental degradation is crucial for stakeholders involved.

2.2

Datasets

2.2.1

Multispectral data and environmental covariates

The Copernicus Sentinel-2 archive was utilized to access multispectral imagery data from 2018 to 2023. We filtered cloudy pixels >10% to ensure data quality and then the mean values, were computed for each band to provide insights into temporal trends and variability in land surface reflectance. In addition, we derived several geo-covariates such as vegetation indices like Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), and Land Surface Temperature from MODIS, as well as climate data featuring mean temperature and yearly precipitation. Terrain analysis factors such as Digital Elevation Model (DEM), and its derivatives have also been calculated to provide insights into landscape topography. Further details on these covariates are provided in Table 1.

Table 1.

Geo-environmental covariates used in the current study

CovariatesDescriptionSource
Land Surface TemperatureCaptures temperature variations across land surfaces, including daytime, nighttime, and their standard deviation.[5]
VegetationIndicates the presence and health of vegetation through variables like FAPARMODIS
Climate DataMean temperature and yearly precipitation, which are essential climate indicators[6]
TopograpicalElevation and slope of the terrain surfaceCopernicus DEM
SoilSoil characteristics such as texture, composition, and water content[7]

2.2.2

Soil Data LUCAS

The soil data utilized in this study came from from the LUCAS soil archive (version 2015). More specific, the data includes soil texture (clay, sand silt) and soil organic carbon (SOC) content measurements extracted from 338 points corresponding to land cover classes including Grassland, Shrubland, and Woodland. A detailed description for the LUCAS soil data archived is provided by Orgiazzi et al. [8].

2.3

Methodological Approach

2.3.1

AI Approach

The proposed approach comprises two steps starting from data collection and then the regression analysis using a hybrid CNN-XGBoost algorithm (Figure 1).

Figure 1.

Overall approach from data collection to regression

00033_PSISDG13212_132120T_page_3_1.jpg

CNN, as feature generator, is initiated with an input layer designed to accept one-dimensional features with a length of 12 that corresponds to Sentinel-2 bands. Then, two layers with kernel sizes of 3x1 followed by Leaky ReLU activation functions have been utilized, having 48, and 24 filters, respectively. Subsequently, the feature maps are flattened and passed through two fully connected layers with 128, and 48 neurons, respectively, employing Leaky ReLU activation functions. Then, we utilized the spectral generated features along with the environmental covariates to feed an XGBoost algorithm which builds a series of decision trees sequentially. It should be mentioned that XGBoost is considered as a powerful ensemble model, since each subsequent tree adjusts the errors made by the previous one at each step. After a grid search the following hyper-parameters have been selected for the XGBoost: the ‘number of estimators’ was set to 60, the ‘minimum samples per split’ to 4, the ‘minimum samples per leaf’ to 2, the ‘maximum depth’ to 4, and the ‘learning rate’ to 0.05. The same hyperparameters were used to calibrate an XGBoost model that got as input features together the Sentinel-2 data and environmental covariates in order to make a comparison with our approach. We trained the model based on 50 random splits. The assessment of the regression performances is done considering the Root Mean Squared Error(RMSE), the concordance correlation coefficients (CCC) and the Ratio of Performance to Inter Quartile distance (RPIQ).

2.3.2

RUSLE Approach

The annual soil loss was estimated following the RUSLE empirical equation by using improved AI geospatial layers and open access EO datasets [9]:

00033_PSISDG13212_132120T_page_4_1.jpg

where A is the average annual soil loss (ton/ha/yr) and following factors explained below:

  • R-factor—Rainfall erosivity

    For the R-factor (MJ·mm/ha/h/yr), the ERA-5 dataset was used covering a timeseries from 2007 to 2023 with 1km of spatial resolution. while the equation of Wichmeier and Smith (1978) [10] was used:

    00033_PSISDG13212_132120T_page_4_2.jpg

    where Pm is the monthly precipitation in mm and Pa is the annual precipitation in mm.

  • K-factor—Soil erodibility

    The K-factor [(t·ha·h)/(ha·MJ·mm)] was calculated by using the enhanced AI layers (SOC and soil texture) generated in this work, followed the methodology proposed by [2] and based on the equation described in [11]:

    00033_PSISDG13212_132120T_page_4_3.jpg

    where Sand, Silt, Clay and OC are the percentage contents of sand, silt, clay and organic carbon, while SN equates to 1 – Sand/100.

  • C-factor—Crop cover and management

    In this study, the C-factor is determined by taking the median value of the NDVI index, which is calculated using bands B4 and B8 from multi-temporal Sentinel-2 imagery with a spatial resolution of 10m, using the following epxpression:

    00033_PSISDG13212_132120T_page_4_4.jpg

    where α and b are unitless parameters that determine the shape of the curve relating to the values of the NDVI and C factors.

  • LS-factor—Slope length and steepness

    The RUSLE topographic factor (LS) defined by the combination of the slope length factor (L) and slope steepness factor (S), calculated based on the Copernicus DEM (30m) while the flow accumulation, downloaded from https://github.com/davidbrochart/flow_acc_3s, which is a 3 s flow accumulation derived form HydroSHEDS. At the end, the equation proposed by [12] was used:

    00033_PSISDG13212_132120T_page_4_5.jpg
    00033_PSISDG13212_132120T_page_4_6.jpg
    00033_PSISDG13212_132120T_page_4_7.jpg
    00033_PSISDG13212_132120T_page_4_8.jpg

    where L is the slope length factor; S is the slope steepness factor; λ is the horizontal projected slope length (based on flow accumulation); m is a variable length-slope exponent; β is a factor that varies with slope gradient; and θ is the slope angle.

  • P-factor—Support practices

    In this study, the estimated P-factor data developed by [13] was used which has a spatial resolution of 1km (https://esdac.jrc.ec.europa.eu/themes/support-practices-factor).

3.

RESUTLS

3.1

AI Results

Table 2 present the results obtained by XGBoost and CNN-XGBoost approaches on the Elassona region. We can notice that CNN-XGBoost approach outperforms the XGBoost for the prediction for SOC, Clay and Sand, with the only exception of Silt, where is is gained a lower RMSE. High values of SOC cannot accurately predicted, however our region is not characterized by SOC content values bigger than 2 g/kg.

Table 2.

Evaluation metrics considering CNN-XGBoost approach and XGBoost competing method

ParametersXGBoostCNN-XGBoost
 CCCRMSERPIQCCCRMSERPIQ
Clay0.6375.291.860.6474.631.88
Sand0.58137.961.960.61133.752.02
Silt0.5587.752.170.5188.212.13
SOC0.111.981.120.131.971.13

The best models were employed to produce spatial representations for soil texture and SOC content. Figure 2 illustrates the maps alongside the distribution of estimated values within the region of interest. The outcomes align with the distribution patterns observed in the Greek soil data archive for the specified soil variables.

Figure 2.

Soil spatial descriptors as estimated by the CNN-XGBoost algorithm

00033_PSISDG13212_132120T_page_5_1.jpg

3.2

Soil loss estimations

First, each of the RUSLE’s factor was calculated (see sect. 2.3.2) and the multiplication of all the factors resulted in the final soil erosion map generation with 10 m of spatial resolution (Figure 3). Our study area characterized mainly by low to medium rainfall erosivity values while the LS-factor has mainly high values due to the steep slopes, that prevail in the area, and in combination with the intense stream network. The area has an average soil loss value of 4.6 ton/ha/yr with min and max values of 0 to 51 respectively. The 71% of the total area has a soil loss less than 5 ton/ha/yr while the 4% suffers of soil loss more than 20 ton/ha/yr.

Figure 3.

Spatial distribution of the produced RUSLE factors: a) C-factor, b) R-factor, c) LS-factor, d) K-factor and e) soil erosion

00033_PSISDG13212_132120T_page_6_1.jpg

Considering that the spatial resolution will be the primary distinction between our soil erosion map and current available products, we opted to re-evaluate the readily available products. In that regard, we performed an additional simulation using SOC and soil texture layers from the SoilGrids platform [https://soilgrids.org/] and keeping the same datasets for the generation of the rest RUSLE factors (C, R, P and LS) producing a final soil erosion map with 250 m resolution. Although a similar pattern exists in the soil erosion products (Figure 4), critical differences are existed that can lead to erroneous estimations due to variations in the spatial distribution of soil layers and map resolution. For instance, certain areas within the sub-region categorized with low soil loss values <1 may exhibit significantly higher soil loss in coarser-resolution products.

Figure 4.

Soil erosion maps produced with different spatial resolution, using the current soil spatial products from SoilGrids platform (left) and the products as generated by the proposed CNN-XGBoost approach

00033_PSISDG13212_132120T_page_7_1.jpg

4.

DISCUSSION

The accuracy of the estimations for the soil texture were deemed acceptable, with SOC content exhibiting the lowest predictive performance with an RMSE 1.98 g/kg (Table 2). Having our results in comparison with other studies in the literature, we can notice that the performance for soil texture is similar to where the research performed at regional scale. A significant percentage of recent studies have relied on data sourced directly from the specific region rather than from national archives, as we did. Therefore, our results can be attributed to this difference in data sources. Therefore, the absence of ground data should be noted as a limitation of the current study. Incorporating both field observations with EO data would enhance our ability to quantify and calibrate soil erosion AI-models more effectively.

Based on our results, Elassona region is generally characterized by low to moderate erosion levels (Figure 4). This is a significant result compared to the current estimates that result to significant uncertainties and higher soil loss estimations since they are long-term averages performed with empirical models. Through the lens of emphasizing soil ecosystem protection via dedicated monitoring as advocated by a set of policies, it is suggested that improved estimations, such as those proposed here, which integrate AI and high spatial resolution data, should be utilized. Enhanced soil products have demonstrated notable improvements when integrated into physical process models. This integration presents an avenue for further exploration, particularly in the realm of soil loss estimation [14]. The products illustrated in Figure 3 allow us to offer more timely and consistent estimations, facilitating the monitoring of soil loss on a scale able to propose best practices. Our model, which effectively fused spectral information with environmental covariates, has also facilitated the interpretation of results. Therefore, Shapley analysis [15] can be employed to further enhance our understanding of the contributions of different variables to the predictive outcomes. Moreover, further studies could explore techniques that integrate additional factors influencing soil erosion, such as management practices, through an interpretable approach [16] of post hoc analysis to derive recommendations on the most effective management practices (e.g., cover crops, buffer strips, etc.) for reducing soil loss [17].

5.

CONCLUSION

In our study, we proposed an approach to enhance the spatial representation of soil loss estimation by water erosion by leveraging cutting-edge deep learning techniques. Our approach involved integrating a CNN, to handle multispectral data from Sentinel-2, with a XGBoost regressor complemented by landscape features and bioclimatic variables. By training our CNN-XGBoost model on Greek soil samples from the LUCAS 2015 dataset, we achieved an improvement in soil input layers, resulting in a reduction of approximately 5% in RMSE. These enhanced spatial products were seamlessly integrated into the RUSLE framework, thereby enhancing the soil erodibility factor and yielding a soil erosion layer with unprecedented spatial resolution (10m). Our field mapping endeavors in Elassona, Greece, provided compelling evidence of the efficacy of our approach compared to existing soil products, that overestimate the current situation resulting also significant uncertainties. The current approach demonstrates a transformative potential in soil erosion monitoring able to evaluate the impact of various management practices and restoration policies.

ACKNOWLEDGMENTS

The research leading to these findings and results has been implemented during the Earthgraze project that has received funding from the M16.1 of the Rural Development Program 2014-2020, Ministry of Rural Development and Food of the Hellenic Republic.

REFERENCES

[1] 

Petermann, J. S. and Buzhdygan, O. Y., “Grassland biodiversity,” Current Biology, 31 (19), R1195 –R1201 (2021). https://doi.org/10.1016/j.cub.2021.06.060 Google Scholar

[2] 

Samarinas, N., Tsakiridis, N. L., Kalopesa, E., and Zalidis, G. C., “Soil loss estimation by water erosion in agricultural areas introducing artificial intelligence geospatial layers into the rusle model,” Land, 13 174 (2024). https://doi.org/10.3390/land13020174 Google Scholar

[3] 

Kalopesa, E., Tsakiridis, N. L., Boletos, G., Tziolas, N., and Zalidis, G. C., “The greek soil data cube in support of generating soil-related analysis ready data,” [IGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing Symposium], IEEE(2023). https://doi.org/10.1109/IGARSS52108.2023.10281582 Google Scholar

[4] 

Poggio, L., de Sousa, L. M., Batjes, N. H., Heuvelink, G. B. M., Kempen, B., Ribeiro, E., and Rossiter, D., “Soilgrids 2.0: producing soil information for the globe with quantified spatial uncertainty,” SOIL, 7 (1), 217 –240 (2021). https://doi.org/10.5194/soil-7-217-2021 Google Scholar

[5] 

Hengl, T. and Parente, L., “Long-term MODIS LST day-time and night-time temperatures, sd and differences at 1 km based on the 2000–2020 time series,” (2022). Google Scholar

[6] 

Karger, D. N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R. W., Zimmermann, N. E., Linder, H. P., and Kessler, M., “Climatologies at high resolution for the earth’s land surface areas,” Scientific Data, 4 (2017), 2161 Google Scholar

[7] 

Hengl, T., “Global landform and lithology class at 250 m based on the USGS global ecosystem map,” (2018). Google Scholar

[8] 

Orgiazzi, A., Ballabio, C., Panagos, P., Jones, A., and Fernández-Ugalde, O., “Lucas soil, the largest expandable soil dataset for europe: a review,” European Journal of Soil Science, 69 (1), 140 –153 (2018). https://doi.org/10.1111/ejss.2018.69.issue-1 Google Scholar

[9] 

Renard, K. G., Foster, G. R., Weesies, G. A., McCool, D., Yoder, D., et al., “Predicting soil erosion by water: a guide to conservation planning with the revised universal soil loss equation (rusle),” Agriculture Handbook (Washington), (703), (1997). Google Scholar

[10] 

Wischmeier, W. H. and Smith, D. D., “Predicting rainfall erosion losses a guide to conservation planning,” [Predicting rainfall erosion losses a guide to conservation planning], Agriculture handbook (United States. Dept. of Agriculture), 537 Google Scholar

[11] 

Sharply AN, Williams JR, “[EPIC—erosion/productivity impact calculator 1.Model documentation.],” United 257 States Department of Agriculture Technical Bulletin Number 1768, Washington DC, 258 USDA-ARS, (1990). Google Scholar

[12] 

McCool, D. K., Brown, L. C., Foster, G. R., Mutchler, C. K., and Meyer, L. D., “Revised slope steepness factor for the universal soil loss equation,” Transactions of the ASAE, 30 (5), 1387 –1396 (1987). https://doi.org/10.13031/2013.30576 Google Scholar

[13] 

Panagos, P., Borrelli, P., Meusburger, K., van der Zanden, E. H., Poesen, J., and Alewell, C., “Modelling the effect of support practices (p-factor) on the reduction of soil erosion by water at european scale,” Environmental Science & Policy, 51 23 –34 (2015). https://doi.org/10.1016/j.envsci.2015.03.012 Google Scholar

[14] 

Samarinas, N., Tziolas, N., and Zalidis, G., “Improved estimations of nitrate and sediment concentrations based on swat simulations and annual updated land cover products from a deep learning classification algorithm,” ISPRS International Journal of Geo-Information, 9 (10), (2020). https://doi.org/10.3390/ijgi9100576 Google Scholar

[15] 

Sundararajan, M. and Najmi, A., “The many shapley values for model explanation,” CoRRabs/1908.08474, (2019). Google Scholar

[16] 

Arrieta, A. B., Rodríguez, N. D., Ser, J. D., Bennetot, A., Tabik, S., Barbado, A., García, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., and Herrera, F., “Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI,” CoRR, abs/1910.10045 (2019). Google Scholar

[17] 

Schuler, J. and Sattler, C., “The estimation of agricultural policy effects on soil erosion—an application for the bio-economic model modam,” Land Use Policy, 27 (1), 61 –69 (2010). https://doi.org/10.1016/j.landusepol.2008.05.001 Google Scholar
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
N. Tziolas, N. Samarinas, I. Tsividis, and G. Zalidis "Evaluating the effectiveness of improved digital soil maps, generated through a hybrid CNN-XGBoost approach, for estimating soil loss due to water erosion", Proc. SPIE 13212, Tenth International Conference on Remote Sensing and Geoinformation of the Environment (RSCy2024), 132120T (13 September 2024); https://doi.org/10.1117/12.3037235
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Soil science

Spatial resolution

Artificial intelligence

Data archive systems

Data modeling

Sand

Temperature metrology

Back to Top