Sweet corn yield prediction using machine learning models and field-level data
Tóm tắt
The advent of modern technologies, acquisition of large amounts of crop management and weather data, and advances in computing are reshaping modern agriculture. These advancements have unlocked the power of data by providing valuable insights and more accurate yield predictions. This study utilizes a historic US sweet corn dataset to: (a) evaluate machine learning model performances on sweet corn yield prediction and (b) identify the most influential variables for crop yield predictions. The sweet corn data comprised field-level data for over a quarter-century period (1992–2018) from two primary commercial sweet corn production regions for processing, namely the Upper Midwest and the Pacific Northwest. Several machine learning models were trained to predict field-level sweet corn yield from 67 variables of crop genetics, management, weather, and soil factors. The random forest model outperformed all trained models with the lowest RMSE (3.29 Mt/ha) and the highest Pearson’s correlation coefficient (0.77) between predicted and observed yields. Variable importance plots revealed the top three most influential predictor variables as year (time), location (space), and seed source (genetics). Season long total precipitation and average minimum temperature during anthesis were the two most important weather variables in yield prediction. This is the first report of using fine-scale (time and space) crop data and advanced data analytics to leverage insights into commercial sweet corn production.
Tài liệu tham khảo
Ahalawat, J. (2016). Data driven modeling of corn yield: A machine learning approach [Master’s Thesis]. University of Illinois. http://hdl.handle.net/2142/90600.
Atkin, O. K., & Tjoelker, M. G. (2003). Thermal acclimation and the dynamic response of plant respiration to temperature. Trends in Plant Science, 8(7), 343–351. https://doi.org/10.1016/S1360-1385(03)00136-5.
Breiman, L. (2001). Random Forests. Machine Learning 2001, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.
R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Crane-Droesch, A. (2018). Machine learning methods for crop yield prediction and climate change impact assessment in agriculture. Environmental Research Letters, 13(11), 114003.
Dhaliwal, D. S., & Williams, M. M. II. (2019). Optimum plant density for crowding stress tolerant processing sweet corn. Plos One, 14(9), https://doi.org/10.1371/journal.pone.0223107.
Dietterich, T. G. (2000). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1–15). Berlin, Heidelberg: Springer Berlin Heidelberg.
Fisher, A., Rudin, C., & Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research, 20. https://arxiv.org/abs/1801.01489v5.
Friedman, J. H. (1991). Multivariate adaptive regression splines. (1), 1–67. https://doi.org/10.1214/AOS/1176347963.
Friedman, J., Tibshirani, R., & Hastie, T. (2010). Regularization Paths for generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). Linear methods for regression. In T. Hastie, R. Tibshirani, J. Friedman (Eds.), The Elements of Statistical Learning: Data Mining, Inference, Prediction (pp. 261–294). Springer Series in Statistics. New York, NY. pp. 261–294. doi: https://doi.org/10.1007/978-0-387-84858-7_8.
Iwańska, M., Oleksy, A., Dacko, M., Skowera, B., Oleksiak, T., & Wójcik-Gront, E. (2018). Use of classification and regression trees (CART) for analyzing determinants of winter wheat yield variation among fields in Poland. Biometrical Letters, 55(2), 197–214. https://doi.org/10.2478/BILE-2018-0013.
Jeong, J. H., Resop, J. P., Mueller, N. D., Fleisher, D. H., Yun, K., Butler, E. E., Timlin, D. J., Shim, K. M., Gerber, J. S., Reddy, V. R., & Kim, S. H. (2016). Random forests for global and regional crop yield predictions. Plos One, 11(6), 1–15. https://doi.org/10.1371/journal.pone.0156571.
Jolliffe, I. T. (1986). Principal components in regression analysis. Principal component analysis (pp. 129–155). New York, NY: Springer.
Khaki, S., Pham, H., & Wang, L. (2021). Simultaneous corn and soybean yield prediction from remote sensing data using deep transfer learning. Scientific Reports, 11(1), 1–14. https://doi.org/10.1038/s41598-021-89779-z.
Kuhn, M., & Johnson, K. (2013). Linear regression and its cousins. Applied Predictive modeling. New York, NY: Springer. https://doi.org/10.1007/978-1-4614-6849-3_6.
Li, Y., Guan, K., Peng, B., Franz, T. E., Wardlow, B., & Pan, M. (2020). Quantifying irrigation cooling benefits to maize yield in the US Midwest. Global Change Biology, 26(5), 3065–3078. https://doi.org/10.1111/GCB.15002.
Lobell, D. B., Bonfils, C. J., Kueppers, L. M., & Snyder, M. A. (2008). Irrigation cooling effect on temperature and heat index extremes. Geophysical Research Letters, 35(9), 9705. https://doi.org/10.1029/2008GL034145.
Mevik, B. H., & Wehrens, R. (2007). The pls package: Principal component and partial least squares regression in R. Journal of Statistical Software, 18(2), 1–23.
Milborrow, M. S. (2019). Package “earth. R Software package.
Muchow, R. C., Sinclair, T. R., & Bennett, J. M. (1990). Temperature and solar radiation effects on potential maize yield across locations. Agronomy Journal, 82(2), 338–343. https://doi.org/10.2134/AGRONJ1990.00021962008200020033X.
Osman, T., Psyche, S. S., Kamal, M. R., Tamanna, F., Haque, F., & Rahman, R. M. (2017). Predicting early crop production by analysing prior environment factors. In M. Akagi, T. T, Nguyen, D. T, Vu, T. N. Phung, V. N. Huynh (Eds.), Advances in information and communication technology. International Conference on Advances in Information and Communication Technology 2016. Advances in intelligent systems and computing, vol 538. Springer, Cham. https://doi.org/10.1007/978-3-319-49073-1_51.
Pantazi, X. E., Moshou, D., Alexandridis, T., Whetton, R. L., & Mouazen, A. M. (2016). Wheat yield prediction using machine learning and advanced sensing techniques. Computers and Electronics in Agriculture, 121, 57–65. https://doi.org/10.1016/J.COMPAG.2015.11.018.
Priestley, C. H. B., & Taylor, R. J. (1972). On the assessment of surface heat flux and evaporation using large-scale parameters. Monthly Weather Review, 100(2), 81–92. https://doi.org/10.1175/1520-0493(1972)100>0081:OTAOSH<2.3.CO;2.
Rangarajan, A., Ingall, B., Orfanedes, M., & Wolfe, D. (2002). In-row spacing and cultivar affects ear yield and quality of early-planted sweet corn. HortTechnology, 12(3), 410–415.
Ranjan, A. K., & Parida, B. R. (2019). Paddy acreage mapping and yield prediction using sentinel-based optical and SAR data in Sahibganj district, Jharkhand (India). Spatial Information Research, 27(4), 399–410. https://doi.org/10.1007/S41324-019-00246-4.
Rao, D. T. V. N., & Manasa, S. (2019). Artificial neural networks for soil quality and crop yield prediction using machine learning. International Journal on Future Revolution in Computer Science & Communication Engineering, 5(1), 57–60. http://www.ijfrcsce.org/index.php/ijfrcsce/article/view/1835.
Roberts, M. J., Braun, N. O., Sinclair, T. R., Lobell, D. B., & Schlenker, W. (2017). Comparing and combining process-based crop models and statistical models with some implications for climate change. Environmental Research Letters, 12(9), 095010. https://doi.org/10.1088/1748-9326/AA7F33.
Sadok, W., & Jagadish, S. V. K. (2020). The hidden costs of nighttime warming on yields. Trends in Plant Science, 25(7), 644–651. https://doi.org/10.1016/J.TPLANTS.2020.02.003.
Schlenker, W., & Roberts, M. J. (2009). Nonlinear temperature effects indicate severe damages to U.S. crop yields under climate change. Proceedings of the National Academy of Sciences, 106(37), 15594–15598. https://doi.org/10.1073/PNAS.0906865106.
Shahhosseini, M., Hu, G., & Archontoulis, S. (2020). Forecasting corn yield with machine learning ensembles. Frontiers in Plant Science, 11(July), 1–16. https://doi.org/10.3389/fpls.2020.01120.
Shook, J., Gangopadhyay, T., Wu, L., Ganapathysubramanian, B., Sarkar, S., & Singh, A. K. (2021). Crop yield prediction integrating genotype and weather variables using deep learning. Plos One, 16(6), e0252402. https://doi.org/10.1371/journal.pone.0252402.
Siebert, S., Webber, H., Zhao, G., & Ewert, F. (2017). Heat stress is overestimated in climate impact studies for irrigated agriculture. Environmental Research Letters, 12(5), 054023. https://doi.org/10.1088/1748-9326/AA702F.
Thomison, P. (2005). Impact of warm night temperatures on corn grain yields. CORN newsletter 25. 12 Oct. 2021. http://corn.osu.edu/newsletters/2005/article?issueid=97&articleid=574
Thornton, P. E., Shrestha, R., Thornton, M., Kao, S. C., Wei, Y., & Wilson, B. E. (2021). Gridded daily weather data for North America with comprehensive uncertainty quantification. Scientific Data, 8(1), https://doi.org/10.1038/S41597-021-00973-0.
USDA National Agricultural Statistics Service (2021). NASS - Quick Stats. USDA National Agricultural Statistics Service. https://data.nal.usda.gov/dataset/nass-quick-stats. Accessed 2021-09-18.
Vyn, T. J. (2010). Excessive heat and humidity not ideal for corn. Pest & Crop Newsletter. Issue 19. 12 Oct. 2021. http://extension.entm.purdue.edu/pestcrop/2010/issue19/index.html.
Wang, A. X., Tran, C., Desai, N., Lobell, D., & Ermon, S. (2018). Deep transfer learning for crop yield prediction with remote sensing data. ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS), pp. 1–5. https://doi.org/10.1145/3209811.3212707.
Web Soil Survey (2020). Soil Survey Staff, Natural Resources Conservation Service, United States Department of Agriculture. Web Soil Survey. Available at https://websoilsurvey.sc.egov.usda.gov/App/HomePage.htm.
Williams, M. M. II. (2008). Sweet corn growth and yield responses to planting dates of the North Central United States. Hortscience, 43(6), 1775–1779. https://doi.org/10.21273/HORTSCI.43.6.1775.
Williams, M. M. II. (2015). Identifying crowding stress-tolerant hybrids in processing sweet corn. Agronomy Journal, 107(5), 1782–1788. https://doi.org/10.2134/agronj15.0011.
Williams, M. M., II, & Lindquist, J. L. (2007). Influence of planting date and weed interference on sweet corn growth and development. Agronomy Journal, 99(4), 1066–1072. https://doi.org/10.2134/AGRONJ2007.0009.
Xu, Y., Zhang, X., Li, H., Zheng, H., Zhang, J., Olsen, M. S., Varshney, R. K., Prasanna, B. M., & Qian, Q. (2022). Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. Molecular Plant, 15, 1664–1695.
Yoosefzadeh-Najafabadi, M., Tulpan, D., & Eskandari, M. (2021). Application of machine learning and genetic optimization algorithms for modeling and optimizing soybean yield using its component traits. Plos One, 16(4), e0250665. https://doi.org/10.1371/journal.pone.0250665.