Exploration of data science techniques to predict fatigue strength of steel from composition and processing parameters

Ankit Agrawal1, Parijat D Deshpande2, Ahmet Cecen3, Gautham P Basavarsu2, Alok N Choudhary1, Surya R Kalidindi3,4
1Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, USA
2Tata Research Development and Design Centre, Tata Consultancy Services, Pune, India
3School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, USA
4Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, USA

Tóm tắt

This paper describes the use of data analytics tools for predicting the fatigue strength of steels. Several physics-based as well as data-driven approaches have been used to arrive at correlations between various properties of alloys and their compositions and manufacturing process parameters. Data-driven approaches are of significant interest to materials engineers especially in arriving at extreme value properties such as cyclic fatigue, where the current state-of-the-art physics based models have severe limitations. Unfortunately, there is limited amount of documented success in these efforts. In this paper, we explore the application of different data science techniques, including feature selection and predictive modeling, to the fatigue properties of steels, utilizing the data from the National Institute for Material Science (NIMS) public domain database, and present a systematic end-to-end framework for exploring materials informatics. Results demonstrate that several advanced data analytics techniques such as neural networks, decision trees, and multivariate polynomial regression can achieve significant improvement in the prediction accuracy over previous efforts, with R2 values over 0.97. The results have successfully demonstrated the utility of such data mining tools for ranking the composition and process parameters in the order of their potential for predicting fatigue strength of steels, and actually develop predictive models for the same.

Tài liệu tham khảo

Committee on Integrated Computational Materials Engineering N. R. C.: Integrated Computational Materials Engineering: A Transformational Discipline for Improved Competitiveness and National Security. 2008.http://www.nap.edu/openbook.php?record_id=12199 National Science and Technology Council: Materials genome initiative for global competitiveness. Technical report, National Science and Technology Council. 2011.http://www.whitehouse.gov/sites/default/files/microsites/ostp/materials_genome_initiative-final.pdf Kalidindi SR, Niezgoda SR, Salem AA: Microstructure informatics using higher-order statistics and efficient data-mining protocols. JOM - J Minerals, Met Mater Soc 2011, 63(4):40–41. Rajan K: Materials informatics. Materials Today 2005, 8(10):38–45. 10.1016/S1369-7021(05)71123-8 Hey T, Tansley S, Tolle K: The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, 1st edition. 2009.http://research.microsoft.com/en-us/collaboration/fourthparadigm/ ISBN: 0982544200, URL: . Linden G, Smith B, York J: Amazon.com recommendations: item-to-item collaborative filtering. Internet Comput IEEE 2003, 7(1):76–80. 10.1109/MIC.2003.1167344 Mobasher B: Data mining for web personalization. In Brusilovsky P, Kobsa A, Nejdl W (eds) The adaptive web. Lecture Notes in Computer Science, vol. 4321. Berlin, Heidelberg: Springer-Verlag; 2007:90–135. Zhou Y, Wilkinson D, Schreiber R, Pan R: Large-scale parallel collaborative filtering for the netflix prize. In Proceedings of the 4th International Conference on Algorithmic Aspects in Information and Management. Berlin, Heidelberg: AAIM ’08, Springer; 2008:337–348. Das AS, Datar M, Garg A, Rajaram S: Google news personalization: Scalable online collaborative filtering. In Proceedings of the 16th International Conference on World Wide Web. New York, NY, USA: WWW ’07, ACM; 2007:271–280. URL: Walmart is making big data part of its DNA. Bigdata startups, 2013, . http://www.bigdata-startups.com/BigData-startup/walmart-making-big-data-part-dna/ King M: URL: Data Mining the TARGET way. 2012.http://www.slideshare.net/ipullrank/datamining-the-target-way Rajan K, Suh C, Mendez P: Principal component analysis and dimensional analysis as materials informatics tools to reduce dimensionality in materials science and engineering. Stat Anal Data Min 2009, 1: 361–371. 10.1002/sam.10031 Suh C, Rajan K: Virtual screening and qsar formulations for crystal chemistry. QSAR & Comb. Sci 2005, 24(1):114–119. 10.1002/qsar.200420057 Nowers JR, Broderick SR, Rajan K, Narasimhan B: Combinatorial methods and informatics provide insight into physical properties and structure relationships during ipn formation. Macromol Rapid Commun 2007, 28: 972–976. 10.1002/marc.200600780 Gadzuric S, Suh C, Gaune-Escard M, Rajan K: Extracting information from the molten salt database. Metallogr Mater Trans A 2006, 37(12):3411–3414. 10.1007/s11661-006-1034-6 George L, Hrubiak R, Rajan K, Saxena SK: Principal component analysis on properties of binary and ternary hydrides and a comparison of metal versus metal hydride properties. J Alloys Compounds 2009, 478(1–2):731–735. Singh S, Bhadeshia H, MacKay D, Carey H, Martin I: Neural network analysis of steel plate processing. Iron-mak Steelmak 1998, 25: 355–365. Fujii H, MacKay D, Bhadeshia H: Bayesian neural network analysis of fatigue crack growth rate in nickel base superalloys. ISIJ INT 1996, 36: 1373–1382. 10.2355/isijinternational.36.1373 Hancheng Q, Bocai X, Shangzheng L, Fagen W: Fuzzy neural network modeling of material properties. J Mater Process Technol 2002, 122(2–3):196–200. Gopalakrishnan K, Ceylan H, Kim S, Khaitan SK: Natural selection of asphalt mix stiffness predictive models with genetic programming. ANNIE Int Eng Syst Artif Neural Netw 2010, 20: 10. Gopalakrishnan K, Manik A, Khaitan SK: Runway stiffness evaluation using an artificial neural systems approach. Int J Electrical Comput Eng 2006, 1(7):496–502. Wen YF, Cai CZ, Liu XH, Pei JF, Zhu XJ, Xiao TT: Corrosion rate prediction of 3c steel under different seawater environment by using support vector regression. Corrosion Sci 2009, 51(2):349–355. 10.1016/j.corsci.2008.10.038 Rao BV, Gopalakrishna SJ: Hardgrove grindability index prediction using support vector regression. Int J Miner Process 2009, 91(1–2):55–59. Gautham BP, Kumar R, Bothra S, Mohapatra G, Kulkarni N, Padmanabhan KA: More Efficient ICME through Materials Informatics and Process Modeling. In Proceedings of the 1st World Congress on Integrated Computational Materials Engineering (ICME) (eds J. Allison, P. Collins and G. Spanos). Hoboken, NJ, USA: John Wiley & Sons, Inc.; 2011. doi: 10.1002/9781118147726.ch5 doi: 10.1002/9781118147726.ch5 Dieter GE: Mechanical Metallurgy. Mc Graw-Hill Book Co. 3rd edition, ISBN: 0–07–016893–8 26. Deshpande PD, Gautham BP, Cecen A, Kalidindi S, Agrawal A, Choudhary (2013) Application of Statistical and Machine Learning Techniques for Correlating Properties to Composition and Manufacturing Processes of Steels. In 2nd World Congress on Integrated Computational Materials Engineering. John Wiley & Sons, Inc.; 1986:155–160. ISBN: 9781118767061 ISBN: 9781118767061 Deshpande PD, Gautham BP, Cecen A, Kalidindi S, Agrawal A, Choudhary: Application of Statistical and Machine Learning Techniques for Correlating Properties to Composition and Manufacturing Processes of Steels. In 2nd World Congress on Integrated Computational Materials Engineering. John Wiley & Sons, Inc.; 2013:155–160. ISBN: 9781118767061 ISBN: 9781118767061 URL: National Institute of Materials Science . http://smds.nims.go.jp/fatigue/index_en.html . Weher E, Allen EL: An introduction to linear regression and correlation. (a series of books in psychology.) w. h. freeman and comp., San Francisco 1976. 213 s., tafelanh., s 7.00. Biom J 1977, 19(1):83–84. Wang Y: A new approach to fitting linear models in high dimensional spaces. 2000.http://books.google.com/books?id=Z0OntgAACAAJ Wang Y, Witten IH: Modeling for optimal probability prediction. In Proceedings of the Nineteenth International Conference on Machine Learning. Morgan Kaufmann Publishers; 2002:650–657. URL: Robust Fit Regression, Mathworks . http://www.mathworks.in/help/stats/robustfit.html . Aha DW, Kibler D: Instance-based learning algorithms. Machine Learning Vol. 6. Boston,: Kluwer Academic Publishers; 1991. pp 37–66 pp 37–66 Cleary JG, Trigg LE: K*: An instance-based learner using an entropic distance measure. In Proceedings of the 12th International Conference on Machine Learning. Morgan Kaufmann Publishers; 1995:108–114. Kohavi R: The power of decision tables. In Proceedings of the 8th European Conference on Machine Learning. London, UK: ECML ’95, Springer-Verlag; 1995:174–189. Vapnik VN: The nature of statistical learning theory. Information Science and Statistics Series. New York, NY, USA: Springer-Verlag New York, Inc.; 1995. ISBN: 0–387–94559–8 ISBN: 0-387-94559-8 Bishop C: Neural Networks for Pattern Recognition. USA: Oxford University Press; 1995. ISBN: 0198538642 ISBN: 0198538642 Fausett L: Fundamentals of Neural Networks. New York: Prentice Hall, Pearson; 1994. ISBN: 0133341860 ISBN: 0133341860 Witten IH, Frank E: Data Mining: Practical Machine Learning Tools and Techniques. The Morgan Kaufmann Series in Data Management Systems,. Morgan Kaufmann Publishers; 2005. ISBN: 0120884070 ISBN: 0120884070 Wang Y, Witten IH: Induction of model trees for predicting continuous classes. In Proc European Conference on Machine Learning Poster Papers. Prague, Czech Republic; 1997:128–137. Quinlan JR: Learning with continuous classes. In 5th Australian Joint Conference on Artificial Intelligence. World Scientific; 1992:343–348. Breiman L, Friedman J, Olshen R, Stone C: Classification and Regression Trees. Monterey, CA: Wadsworth and Brooks; 1984. R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria: R Foundation for Statistical Computing; 2011. ISBN 3–900051–07–0. . http://www.R-project.org. MATLAB: Version 7.10.0 (R2010a). Natick, Massachusetts: The MathWorks Inc.; 2010. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The weka data mining software: An update. SIGKDD Explorations Newsletter 2009, 11(1):10–18. doi:10.1145/1656274.1656278, URL: . http://doi.acm.org/10.1145/1656274.1656278