Date Log
An Evaluation of Some Machine Learning Algorithms as Tools for Predicting Soil Characteristics Based on Their Spectral Response in the Vis‑NIR Range
Corresponding Author(s) : Stanisław Gruszczyński
Geomatics and Environmental Engineering,
Vol. 15 No. 1 (2021): Geomatics and Environmental Engineering
Abstract
Using the Land Use and Coverage Frame Survey (LUCAS) database of European soil surface layer properties, statistical and machine learning predictive models for several key soil characteristics (clay content, pH in CaCl2, concentration of organic carbon, calcium carbonates and nitrogen and exchange cations capacity) were compared on the basis of processing their spectral responses in the visible (Vis) and near‑infrared (NIR) parts. Standard methods of relationship modeling were used: stepwise regression, partial least squares regression and linear regression with input data obtained from principal components analysis. Using the inputs extracted by statistical algorithms various machine learning algorithms were used in the modeling. The usefulness of the models was analyzed by comparison with the values of the determination coefficients, the root mean square error and the distribution of residual values. The mean square error of estimation in the cross‑validation procedure for the stack model using the multilayer perceptron and the distributed random forest were as follows: for clay content ‑ ca. 4.5%; for pH ‑ ca. 0.35; for SOC ‑ ca. 7.5 g/kg (0.75% by weight); for CaCO3 content ‑ ca. 19 g/kg; for N content ‑ ca. 0.50 g/kg; and for CEC ‑ ca. 3.5 cmol(+)/kg.
Keywords
Download Citation
Endnote/Zotero/Mendeley (RIS)BibTeX
- McBratney A., Minasny B., Viscarra Rossel R.: Spectral soil analysis and infer‑ ence systems: A powerful combination for solving the soil data crisis. Geoderma, vol. 136, 2006, pp. 272–278. https://doi.org/10.1016/j.geoderma.2006.03.051.
- Al‑Abbas A.H., Swain P.H., Baumgardner M.F.: Relating Organic Matter and Clay Content to the Multispectral Radiance of Soils. Soil Science, vol. 114, 1972, pp. 477–485.
- Kokaly R.F., Clark R.N., Swayze G.A., Livo K.E., Hoefen T.M., Pearson N.C., Wise R.A., Benzel W.M., Lowers H.A., Driscoll R.L., Klein A.J.: USGS spectral library version 7. Data Series 1035, U.S. Geological Survey, U.S. Department of the Interior, 2017. https://doi.org/10.3133/ds1035.
- Stenberg B., Viscara Rossel R.A., Mounem Mouazen A., Wetterlind J.: Visible and near infrared spectroscopy in soil science. [in:] Sparks D.L. (ed.), Advances in Agronomy, vol. 107, Academic Press, Burlington 2010, pp. 163–215. http://doi.org/10.1016/S0065‑2113(10)07005‑7.
- Wetterlind J., Stenberg B., Viscarra Rossel R.A.: Soil analysis using visible and near infrared spectroscopy. [in:] Maathuis F.J.M. (ed.), Plant Mineral Nutrients: Methods and Protocols, Methods in Molecular Biology, vol. 953, Humana Press, Springer, New York 2013, pp. 95–107.
- Masso C., Ziadi N., Parent L., Tremblay G., Thuries L.: Opportunities for, and limitations of, near infrared reflectance spectroscopy applications in soil analysis: A review. Canadian Journal of Soil Science, vol. 89, 2009, pp. 531–541. https://doi.org/10.4141/CJSS08076.
- Wetterlind J., Stenberg B.: Near‑infrared spectroscopy for within‑field soil characterization: Small local calibrations compared with national libraries spiked with local samples. European Journal of Soil Science, vol. 61, 2010, pp. 823–843. https://doi.org/10.1111/j.1365‑2389.2010.01283.x.
- Shi Z., Ji W., Viscarra Rossel R.A., Chen S., Zhou Y.: Prediction of soil organic matter using a spatially constrained local partial least squares regression and the Chinese Vis‑NIR spectral library. European Journal of Soil Science, vol. 66, 2015, pp. 679–687. https://doi.org/10.1111/ejss.12272.
- Gholizadeh A., Saberioon M., Carmon N., Boruvka L., Ben‑Dor E.: Examining the performance of PARACUDA‑II data‑mining engine versus selected techniques to model soil carbon from reflectance spectra. Remote Sensing, vol. 10(8), 2018, 1172. https://doi.org/10.3390/rs10081172.
- Ng W., Minasny B., Montazerolghaem M., Padarian J., Ferguson R., Bailey S., McBratney A.B.: Convolutional neural network for simultaneous prediction of several soil properties using visible/near‑infrared, mid‑infrared, and their combined spectra. Geoderma, vol. 352, 2019, pp. 251–267.
- Tóth G., Jones A., Montanarella L. (eds.): LUCAS topsoil survey. Methodology, data and results. JRC Technical Reports, Publications Office of the European Union, Luxembourg 2013. https://doi.org/10.2788/97922.
- Orgiazzi A., Ballabio C., Panagos P., Jones A., Fernández‑Ugalde O.: LUCAS Soil, the largest expandable soil dataset for Europe: a review. European Journal of Soil Science, vol. 69, 2017, pp. 140–153.
- Iordache M.‑D.: Matlab code and demo for continuum removal. 2016. https://doi.org/10.13140/RG.2.1.2885.9285.
- MATLAB., version 9.7.0.1190202 (R2019b). The MathWorks Inc., Natick, Massechusetts 2019.
- Kohonen T.: Self‑organized formation of topologically correct feature maps. Biological Cybernetics, vol. 43, 1982, pp. 59–69. https://doi.org/10.1007/BF00337288.
- Quinlan J.R.: Learning with continuous classes. [in:] Adams A., Sterling L. (eds.), Ai ’92 – Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, World Scientific, 1992, pp. 343–348.
- Kuhn M., Quinlan R.: Cubist: Rule‑ and Instance‑Based Regression Modeling. R package version 0.2.2. 2018. https://CRAN.R‑project.org/package=Cubist [access: 31.01.2020].
- Wolpert D.: Stacked generalization. Neural Networks, vol. 5(2), 1992, pp. 241–259. https://doi.org/10.1016/S0893‑6080(05)80023‑1.
- Breiman L.: Stacked regressions. Machine Learning, vol. 24, 1996, pp. 49–64. https://doi.org/10.1023/A:1018046112532.
- Kursa M.B., Rudnicki W.R.: Feature selection with the Boruta package. Journal of Statistical Software, vol. 36, 2010, pp. 1–13. https://doi.org/10.18637/jss.v036.i11.
- Frank E., Hall M.A., Witten I.H.: The WEKA Workbench. Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques, 4th ed. Morgan Kaufmann, 2016.
- R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria 2019. https://www.R‑project.org/ [access: 29.05.2020].
- Cook D.: Practical Machine Learning with H2O. Powerful, Scalable Techniques for Deep Learning and AI. O’Reilly Media, 2016.
- Geurts P., Ernst D., Wehenkel L.: Extremely randomized trees. Machine Learning, vol. 63, 2006, pp. 3–42. https://doi.org/10.1007/s10994‑006‑6226‑1.
- Friedman J.H.: Greedy function approximation: A gradient boosting machine. Annals of Statistics, vol. 29(5), 2001, pp. 1189–1232.
- Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., Citro C., Corrado G.S., Davis A., Dean J., Devin M., Ghemawat S., Goodfellow I., Harp H., Irving G., Isard M., Jozefowicz R., Jia Y., Kaiser L., Kudlur M., Levenberg J., Mané D., Schuster M., Monga R., Moore S., Murray D., Olah C., Shlens J., Steiner B., Sutskever I., Talwar K., Tucker P., Vanhoucke V., Vasudevan V., Viégas F., Vinyals O., Warden P., Wattenberg M., Wicke M., Yu Y., Zheng X.: Tensor‑ Flow: Large‑scale machine learning on heterogeneous systems. 2015. https://www.tensorflow.org [access: 29.05.2020].
- Chollet F.: Keras. 2015. https://keras.io [access: 29.05.2020].
- Allaire J.J., Chollet F.: Keras: R Interface to ‘Keras’. R package version 2.2.5.0. 2019. https://cran.r‑project.org/web/packages/keras/index.html [access: 29.05.2020].
- Allaire J.J., Tang Y.: Tensoorflow: Interface to ‘TensorFlow’. R package version 2.0.0. 2019. https://CRAN.R‑project.org/package=tensorflow.
- Westfall P.H.: Kurtosis as peakedness, 1905–2014. R.I.P. The American Statistician, vol. 68(3), 2014, pp. 191–195. https://doi.org/10.1080/00031305.2014.917055.
- Rinnan Å., Berg F.V., Engelsen S.B.: Review of the most common pre‑processing techniques for near‑infrared spectra. Trends in Analytical Chemistry, vol. 28, 2009, pp. 1201–1222.
- Dunn B., Batten G., Beecher H.G., Ciavarella S.: The Potential of near‑infrared reflectance specytroscopy for soil analysis: a case study from the Riverine Plain of south‑eastern Australia. Animal Production Science, vol. 42, no. 5, 2002, pp. 607–614. https://doi.org/10.1071/EA01172.
- Gholizadeh A., Carmon N., Klement A., Ben‑Dor E., Borůvka L.: Agricultural soil spectral response and properties assessment: effects of measurement protocol and data mining technique. Remote Sensing, vol. 9, 2017, 1078.
- Stevens A., Nocita M., Toth G., Montanarella L., Van Wesemael B.: Prediction of soil organic carbon at the European scale by visible and near infrared reflectance spectroscopy. PLoS ONE, vol. 8, 2013, e66409. https://doi.org/10.1371/journal.pone.0066409.
- Liu L., Ji M., Buchroithner M.: Combining partial least squares and the gradient‑boosting method for soil property retrieval using visible near‑infrared shortwave infrared spectra. Remote Sensing, vol. 9, 2017, 1299.
- Liu L., Ji M., Buchroithner M.: Transfer Learning for Soil Spectroscopy Based on Convolutional Neural Networks and Its Application in Soil Clay Content Mapping Using Hyperspectral Imagery. Sensors, vol. 18, 2018, 3169.
- Veres M., Lacey G., Taylor G.W.: Deep Learning Architectures for Soil Property Prediction. [in:] CRV 2015: 12th Conference on Computer and Robot Vision: Proceedings: 3–5 June 2015, Halifax, Nova Scotia, Canada, IEEE, 2015, pp. 8–15. https://ieeexplore.ieee.org/document/7158315.
- Tsakiridis N.L., Chadoulos C.G., Theocharis J.B., Ben‑Dor E., Zalid‑ is G.C.: A three‑level multiple‑kernel learning approach for soil spectral analysis. Neurocomputing, vol. 389, 2020, pp. 27–41. https://doi.org/10.1016/j.neucom.2020.01.008.
- Tsakiridis N.L., Keramaris K.D., Theocharis J.B., Zalidis B.C.: Simultaneous prediction of soil properties from VNIR‑SWIR spectra using a localized multi‑channel 1‑D convolutional neural network. Geoderma, vol. 367, 2020, 114208. https://doi.org/10.1016/j.geoderma.2020.114208.
References
McBratney A., Minasny B., Viscarra Rossel R.: Spectral soil analysis and infer‑ ence systems: A powerful combination for solving the soil data crisis. Geoderma, vol. 136, 2006, pp. 272–278. https://doi.org/10.1016/j.geoderma.2006.03.051.
Al‑Abbas A.H., Swain P.H., Baumgardner M.F.: Relating Organic Matter and Clay Content to the Multispectral Radiance of Soils. Soil Science, vol. 114, 1972, pp. 477–485.
Kokaly R.F., Clark R.N., Swayze G.A., Livo K.E., Hoefen T.M., Pearson N.C., Wise R.A., Benzel W.M., Lowers H.A., Driscoll R.L., Klein A.J.: USGS spectral library version 7. Data Series 1035, U.S. Geological Survey, U.S. Department of the Interior, 2017. https://doi.org/10.3133/ds1035.
Stenberg B., Viscara Rossel R.A., Mounem Mouazen A., Wetterlind J.: Visible and near infrared spectroscopy in soil science. [in:] Sparks D.L. (ed.), Advances in Agronomy, vol. 107, Academic Press, Burlington 2010, pp. 163–215. http://doi.org/10.1016/S0065‑2113(10)07005‑7.
Wetterlind J., Stenberg B., Viscarra Rossel R.A.: Soil analysis using visible and near infrared spectroscopy. [in:] Maathuis F.J.M. (ed.), Plant Mineral Nutrients: Methods and Protocols, Methods in Molecular Biology, vol. 953, Humana Press, Springer, New York 2013, pp. 95–107.
Masso C., Ziadi N., Parent L., Tremblay G., Thuries L.: Opportunities for, and limitations of, near infrared reflectance spectroscopy applications in soil analysis: A review. Canadian Journal of Soil Science, vol. 89, 2009, pp. 531–541. https://doi.org/10.4141/CJSS08076.
Wetterlind J., Stenberg B.: Near‑infrared spectroscopy for within‑field soil characterization: Small local calibrations compared with national libraries spiked with local samples. European Journal of Soil Science, vol. 61, 2010, pp. 823–843. https://doi.org/10.1111/j.1365‑2389.2010.01283.x.
Shi Z., Ji W., Viscarra Rossel R.A., Chen S., Zhou Y.: Prediction of soil organic matter using a spatially constrained local partial least squares regression and the Chinese Vis‑NIR spectral library. European Journal of Soil Science, vol. 66, 2015, pp. 679–687. https://doi.org/10.1111/ejss.12272.
Gholizadeh A., Saberioon M., Carmon N., Boruvka L., Ben‑Dor E.: Examining the performance of PARACUDA‑II data‑mining engine versus selected techniques to model soil carbon from reflectance spectra. Remote Sensing, vol. 10(8), 2018, 1172. https://doi.org/10.3390/rs10081172.
Ng W., Minasny B., Montazerolghaem M., Padarian J., Ferguson R., Bailey S., McBratney A.B.: Convolutional neural network for simultaneous prediction of several soil properties using visible/near‑infrared, mid‑infrared, and their combined spectra. Geoderma, vol. 352, 2019, pp. 251–267.
Tóth G., Jones A., Montanarella L. (eds.): LUCAS topsoil survey. Methodology, data and results. JRC Technical Reports, Publications Office of the European Union, Luxembourg 2013. https://doi.org/10.2788/97922.
Orgiazzi A., Ballabio C., Panagos P., Jones A., Fernández‑Ugalde O.: LUCAS Soil, the largest expandable soil dataset for Europe: a review. European Journal of Soil Science, vol. 69, 2017, pp. 140–153.
Iordache M.‑D.: Matlab code and demo for continuum removal. 2016. https://doi.org/10.13140/RG.2.1.2885.9285.
MATLAB., version 9.7.0.1190202 (R2019b). The MathWorks Inc., Natick, Massechusetts 2019.
Kohonen T.: Self‑organized formation of topologically correct feature maps. Biological Cybernetics, vol. 43, 1982, pp. 59–69. https://doi.org/10.1007/BF00337288.
Quinlan J.R.: Learning with continuous classes. [in:] Adams A., Sterling L. (eds.), Ai ’92 – Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, World Scientific, 1992, pp. 343–348.
Kuhn M., Quinlan R.: Cubist: Rule‑ and Instance‑Based Regression Modeling. R package version 0.2.2. 2018. https://CRAN.R‑project.org/package=Cubist [access: 31.01.2020].
Wolpert D.: Stacked generalization. Neural Networks, vol. 5(2), 1992, pp. 241–259. https://doi.org/10.1016/S0893‑6080(05)80023‑1.
Breiman L.: Stacked regressions. Machine Learning, vol. 24, 1996, pp. 49–64. https://doi.org/10.1023/A:1018046112532.
Kursa M.B., Rudnicki W.R.: Feature selection with the Boruta package. Journal of Statistical Software, vol. 36, 2010, pp. 1–13. https://doi.org/10.18637/jss.v036.i11.
Frank E., Hall M.A., Witten I.H.: The WEKA Workbench. Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques, 4th ed. Morgan Kaufmann, 2016.
R Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria 2019. https://www.R‑project.org/ [access: 29.05.2020].
Cook D.: Practical Machine Learning with H2O. Powerful, Scalable Techniques for Deep Learning and AI. O’Reilly Media, 2016.
Geurts P., Ernst D., Wehenkel L.: Extremely randomized trees. Machine Learning, vol. 63, 2006, pp. 3–42. https://doi.org/10.1007/s10994‑006‑6226‑1.
Friedman J.H.: Greedy function approximation: A gradient boosting machine. Annals of Statistics, vol. 29(5), 2001, pp. 1189–1232.
Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., Citro C., Corrado G.S., Davis A., Dean J., Devin M., Ghemawat S., Goodfellow I., Harp H., Irving G., Isard M., Jozefowicz R., Jia Y., Kaiser L., Kudlur M., Levenberg J., Mané D., Schuster M., Monga R., Moore S., Murray D., Olah C., Shlens J., Steiner B., Sutskever I., Talwar K., Tucker P., Vanhoucke V., Vasudevan V., Viégas F., Vinyals O., Warden P., Wattenberg M., Wicke M., Yu Y., Zheng X.: Tensor‑ Flow: Large‑scale machine learning on heterogeneous systems. 2015. https://www.tensorflow.org [access: 29.05.2020].
Chollet F.: Keras. 2015. https://keras.io [access: 29.05.2020].
Allaire J.J., Chollet F.: Keras: R Interface to ‘Keras’. R package version 2.2.5.0. 2019. https://cran.r‑project.org/web/packages/keras/index.html [access: 29.05.2020].
Allaire J.J., Tang Y.: Tensoorflow: Interface to ‘TensorFlow’. R package version 2.0.0. 2019. https://CRAN.R‑project.org/package=tensorflow.
Westfall P.H.: Kurtosis as peakedness, 1905–2014. R.I.P. The American Statistician, vol. 68(3), 2014, pp. 191–195. https://doi.org/10.1080/00031305.2014.917055.
Rinnan Å., Berg F.V., Engelsen S.B.: Review of the most common pre‑processing techniques for near‑infrared spectra. Trends in Analytical Chemistry, vol. 28, 2009, pp. 1201–1222.
Dunn B., Batten G., Beecher H.G., Ciavarella S.: The Potential of near‑infrared reflectance specytroscopy for soil analysis: a case study from the Riverine Plain of south‑eastern Australia. Animal Production Science, vol. 42, no. 5, 2002, pp. 607–614. https://doi.org/10.1071/EA01172.
Gholizadeh A., Carmon N., Klement A., Ben‑Dor E., Borůvka L.: Agricultural soil spectral response and properties assessment: effects of measurement protocol and data mining technique. Remote Sensing, vol. 9, 2017, 1078.
Stevens A., Nocita M., Toth G., Montanarella L., Van Wesemael B.: Prediction of soil organic carbon at the European scale by visible and near infrared reflectance spectroscopy. PLoS ONE, vol. 8, 2013, e66409. https://doi.org/10.1371/journal.pone.0066409.
Liu L., Ji M., Buchroithner M.: Combining partial least squares and the gradient‑boosting method for soil property retrieval using visible near‑infrared shortwave infrared spectra. Remote Sensing, vol. 9, 2017, 1299.
Liu L., Ji M., Buchroithner M.: Transfer Learning for Soil Spectroscopy Based on Convolutional Neural Networks and Its Application in Soil Clay Content Mapping Using Hyperspectral Imagery. Sensors, vol. 18, 2018, 3169.
Veres M., Lacey G., Taylor G.W.: Deep Learning Architectures for Soil Property Prediction. [in:] CRV 2015: 12th Conference on Computer and Robot Vision: Proceedings: 3–5 June 2015, Halifax, Nova Scotia, Canada, IEEE, 2015, pp. 8–15. https://ieeexplore.ieee.org/document/7158315.
Tsakiridis N.L., Chadoulos C.G., Theocharis J.B., Ben‑Dor E., Zalid‑ is G.C.: A three‑level multiple‑kernel learning approach for soil spectral analysis. Neurocomputing, vol. 389, 2020, pp. 27–41. https://doi.org/10.1016/j.neucom.2020.01.008.
Tsakiridis N.L., Keramaris K.D., Theocharis J.B., Zalidis B.C.: Simultaneous prediction of soil properties from VNIR‑SWIR spectra using a localized multi‑channel 1‑D convolutional neural network. Geoderma, vol. 367, 2020, 114208. https://doi.org/10.1016/j.geoderma.2020.114208.