Mineral Prospectivity Mapping based on Isolation Forest and Random Forest: Implication for the Existence of Spatial Signature of Mineralization in Outliers

Springer Science and Business Media LLC - Tập 31 - Trang 1981-1999 - 2021
Shuai Zhang1,2, Emmanuel John M. Carranza2, Keyan Xiao3, Hantao Wei3, Fan Yang4, Zhenghui Chen3, Nan Li3, Jie Xiang3
1China Aero Geophysical Survey and Remote Sensing Center for Natural Resources, Beijing, China
2School of Agricultural, Earth and Environmental Sciences, University of KwaZulu-Natal, Durban, South Africa
3MLR Key Laboratory of Metallogeny and Mineral Resource Assessment, Institute of Mineral Resources, Chinese Academy of Geological Sciences, Beijing, China
4Key Laboratory of Mineral Resources in Western China (Gansu Province), School of Earth Sciences, Lanzhou University, Lanzhou, China

Tóm tắt

Known mineralized locations and randomly chosen non-mineralized locations are used traditionally as training samples in data-driven mineral prospectivity mapping (MPM). In this paper, we took advantage of (a) the variable importance and partial dependence plot, which enable interpretation of random forest (RF) modeling, and (b) the synthetic minority over-sampling technique, and investigated the efficacy of outlier-based training samples used for data-driven MPM in contrast to traditional practice of using known mineralized locations as positive training samples. The prediction maps obtained by RF modeling based on different sets of training samples suggest bias toward known mineralized locations in data-driven MPM. The proposed outlier-based training samples for data-driven MPM involve both unsupervised learning and supervised learning. The former aims at outlier detection, while the latter uses the resulting outliers as positive training samples to investigate the following: firstly, the delineation of prospective area or spatial signature of existing mineral system by avoiding the bias arising from the known mineralized locations in data-driven MPM and secondly, the coherence of spatial signature of outliers, which justifies their feasibility as positive training samples for data-driven MPM by RF modeling. Analyses of receiver operating curves and correlations of the resulting prediction maps imply that outliers derived by isolation forest show consistent spatial signature as the known mineralized location and, thus, were effective in narrowing down the prospective target areas just like in traditional data-driven MPM.

Tài liệu tham khảo

Abe, N., Zadrozny, B., & Langford, J. (2006). Outlier detection by active learning. In Paper presented at the proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. Agterberg, F. (1988). Application of recent developments of regression analysis in regional mineral resource evaluation Quantitative analysis of mineral and energy resources, pp. 1–28. Springer. Cao, X. F., Sanogo, M. L. S., Lue, X. B., He, M. C., Chen, C., Zhu, J., & Zhang, B. (2012). Ore-forming process of the zaozigou gold deposit: constraints from geological characteristics, gold occurrence and stable isotope compositions. Journal of Jilin University (earth Science Edition), 42(4), 1039–1054. Carranza, E. J. M. (2004). Weights of evidence modeling of mineral potential: a case study using small number of prospects, Abra, Philippines. Natural Resources Research, 13(3), 173–187. Carranza, E. J. M. (2008). Geochemical anomaly and mineral prospectivity mapping in GIS. Handbook of exploration and environmental geochemistry, Vol. 11, p. 351. Elsevier Carranza, E., Hale, M., & Faassen, C. (2008). Selection of coherent deposit-type locations and their application in data-driven mineral prospectivity mapping. Ore Geology Reviews, 33(3–4), 536–558. Carranza, E. J. M., & Laborte, A. G. (2015a). Data-driven predictive mapping of gold prospectivity, Baguio district, Philippines: Application of random forests algorithm. Ore Geology Reviews, 71, 777–787. Carranza, E. J. M., & Laborte, A. G. (2015b). Random forest predictive modeling of mineral prospectivity with small number of prospects and data with missing values in Abra (Philippines). Computers and Geosciences, 74, 60–70. Carranza, E. J. M., & Laborte, A. G. (2016). Data-driven predictive modeling of mineral prospectivity using random forests: A case study in Catanduanes Island (Philippines). Natural Resources Research, 25(1), 35–50. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. Chawla, N. V., Japkowicz, N., & Kotcz, A. (2004). Special issue on learning from imbalanced data sets. ACM SIGKDD Explorations Newsletter, 6(1), 1–6. Chen, Y., & Wu, W. (2019). Isolation forest as an alternative data-driven mineral prospectivity mapping method with a higher data-processing efficiency. Natural Resources Research, 28(1), 31–46. Cheng, Q. (2007). Mapping singularities with stream sediment geochemical data for prediction of undiscovered mineral deposits in Gejiu, Yunnan Province, China. Ore Geology Reviews, 32(1–2), 314–324. Chung, C., & Agterberg, F. (1980). Regression models for estimating mineral resources from geological map data. Journal of the International Association for Mathematical Geology, 12(5), 473–488. Chung, C. J., & Keating, P. B. (2002). Mineral potential evaluation based on airborne geophysical data. Exploration Geophysics, 33(1), 28–34. Díaz-Uriarte, R., & De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7(1), 3. Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., & Barcelo-Vidal, C. (2003). Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3), 279–300. Elkan, C. (2001). The foundations of cost-sensitive learning. Paper presented at the International joint conference on artificial intelligence. Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 1189–1232. Hariharan, S., Tirodkar, S., Porwal, A., Bhattacharya, A., & Joly, A. (2017). Random forest-based prospectivity modelling of greenfield terrains using sparse deposit data: An example from the Tanami Region, Western Australia. Natural Resources Research, 26(4), 489–507. He, Z., Xu, X., & Deng, S. (2003). Discovering cluster-based local outliers. Pattern Recognition Letters, 24(9–10), 1641–1650. Hronsky, J. M., & Kreuzer, O. P. (2019). Applying spatial prospectivity mapping to exploration targeting: Fundamental practical issues and suggested solutions for the future. Ore Geology Reviews. Jiang, L., Li, C., Cai, Z., & Zhang, H. (2013). Sampled Bayesian network classifiers for class-imbalance and cost-sensitive learning. Paper presented at the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence. Jin, X. Y., Li, J. W., Hofstra, A. H., & Sui, J. X. (2017). Magmatic-hydrothermal origin of the early Triassic Laodou lode gold deposit in the Xiahe-Hezuo district, West Qinling orogen, China: implications for gold metallogeny. Mineralium Deposita, 52(6), 883–902. Li, H., Li, X., Yuan, F., Jowitt, S. M., Zhang, M., Zhou, J., & Wu, B. (2020a). Convolutional neural network and transfer learning based mineral prospectivity modeling for geochemical exploration of Au mineralization within the Guandian-Zhangbaling area, Anhui Province, China. Applied Geochemistry, 122, 104747. Li, J., Sui, J., Jin, X., Wen, G., & Chang, J. (2014). A magmatic related gold system in the Xiahe-Hezuo district, Western Qinling Orogen. China. Acta Geologica Sinica-English Edition, 88(s2), 751–752. Li, J. W., Sui, J. X., Jin, X. Y., Wen, G., Chang, J., Zhu, R., Zhang, H. Y., & Wu, W. H. (2019). The intrusion-related gold deposits in the Xiahe-Hezuo district, West Qinling Orogen: geodynamic setting and exploration potential. Earth Science Frontiers, 26(5), 017–032. (in Chinese with English Abstract). Li, T., Xia, Q., Zhao, M., Gui, Z., & Leng, S. (2020b). Prospectivity mapping for tungsten polymetallic mineral resources, Nanling metallogenic belt, south China: Use of random forest algorithm from a perspective of data imbalance. Natural Resources Research, 29(1), 203–227. Li, T., Zuo, R., Xiong, Y., & Peng, Y. (2020b). Random-drop data augmentation of deep convolutional neural network for mineral prospectivity mapping. Natural Resources Research, pp. 1–12. Liu, C., Li, L., & Sui, J. (2011). Mineralization characteristics and ore genesis of the Zaozigou gold deposit, Gansu Province. Geol. Sci. Tech. Inf, 30(6), 66–74. Liu, F. T., Ting, K. M., & Zhou, Z. H. (2012). Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 6(1), 1–39. Lu, Z. Y., Nicklaw, C., Fleetwood, D., Schrimpf, R., & Pantelides, S. (2003). Erratum: Structure, properties, and dynamics of oxygen vacancies in amorphous S i O 2 [Phys. Rev. Lett. PRLTAO0031-9007 89, 285505 (2002)]. Physical Review Letters, 91(3), 039901. Lunetta, K. L., Hayward, L. B., Segal, J., & Van Eerdewegh, P. (2004). Screening large-scale association study data: exploiting interactions using random forests. BMC Genetics, 5(1), 32. Molnar, C. (2019). Interpretable machine learning. Lulu.com Nykänen, V., Lahti, I., Niiranen, T., & Korhonen, K. (2015). Receiver operating characteristics (ROC) as validation tool for prospectivity models—A magmatic Ni–Cu case study from the Central Lapland Greenstone Belt, Northern Finland. Ore Geology Reviews, 71, 853–860. Pourghasemi, H. R., & Kerle, N. (2016). Random forests and evidential belief function-based landslide susceptibility assessment in Western Mazandaran Province, Iran. Environmental Earth Sciences, 75(3), 185. Prado, E. M. G., de Souza Filho, C. R., Carranza, E. J. M., & Motta, J. G. (2020). Modeling of Cu-Au prospectivity in the Carajás mineral province (Brazil) through machine learning: Dealing with imbalanced training data. Ore Geology Reviews, p. 103611. Qu, Y., Ostrouchov, G., Samatova, N., & Geist, A. (2002). Principal component analysis for dimension reduction in massive distributed data sets. In Paper presented at the proceedings of IEEE international conference on data mining (ICDM). Quinlan, J. R. (1991). Improved estimates for the accuracy of small disjuncts. Machine Learning, 6(1), 93–98. Reimann, C., Filzmoser, P., & Garrett, R. G. (2005). Background and threshold: critical comparison of methods of determination. Science of the Total Environment, 346(1–3), 1–16. Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., & Chica-Rivas, M. (2015). Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews, 71, 804–818. Shi, T., & Horvath, S. (2006). Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15(1), 118–138. Stensgaard, B. M., Chung, C.-J., Rasmussen, T. M., & Stendal, H. (2006). Assessment of mineral potential using cross-validation techniques and statistical analysis: a case study from the Paleoproterozoic of West Greenland. Economic Geology, 101(7), 1397–1413. Strobl, C., Boulesteix, A. L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8(1), 25. Sui, J. X., Li, J. W., Wen, G., & Jin, X. Y. (2017). The Dewulu reduced Au-Cu skarn deposit in the Xiahe-Hezuo district, West Qinling orogen, China: Implications for an intrusion-related gold system. Ore Geology Reviews, 80, 1230–1244. Wang, J., Zuo, R., & Xiong, Y. (2020). Mapping mineral prospectivity via semi-supervised random forest. Natural Resources Research, 29(1), 189–202. Xiong, Y., & Zuo, R. (2017). Effects of misclassification costs on mapping mineral prospectivity. Ore Geology Reviews, 82, 1–9. Xiong, Y., & Zuo, R. (2018). GIS-based rare events logistic regression for mineral prospectivity mapping. Computers and Geosciences, 111, 18–25. Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by cost-proportionate example weighting. Paper presented at the Third IEEE international conference on data mining. Zhang, S., Carranza, E. J. M., Wei, H., Xiao, K., Yang, F., Xiang, J., Zhang, S. H., & Xu, Y. (2021). Data-driven mineral prospectivity mapping by joint application of unsupervised convolutional auto-encoder network and supervised convolutional neural network. Natural Resources Research. https://doi.org/10.1007/s11053-020-09789-y Zhang, S., Xiao, K., Carranza, E. J. M., & Yang, F. (2018). Maximum entropy and random forest modeling of mineral potential: analysis of gold prospectivity in the Hezuo-Meiwu District, West Qinling Orogen China. Natural Resources Research, 28(3), 645–664. Zhao, J., & Yang, F. (1991). Early and middle Triassic basin-slope environments in the Hezuo region, Gansu. Lithofacies Paleogeography, 5, 27–34. Zhao, Q., & Hastie, T. (2019). Causal interpretations of black-box models. Journal of Business and Economic Statistics, pp. 1–10. Zuo, R. (2018). Selection of an elemental association related to mineralization using spatial analysis. Journal of Geochemical Exploration, 184, 150–157. Zuo, R., & Carranza, E. J. M. (2011). Support vector machine: a tool for mapping mineral prospectivity. Computers and Geosciences, 37(12), 1967–1975.