Knowledge Discovery: Methods from data mining and machine learning

Social Science Research - Tập 110 - Trang 102817 - 2023
Xiaoling Shu1, Yiwan Ye1
1University of California, Davis, USA

Tài liệu tham khảo

Aizawa, 2021, Decomposition of improvements in infant mortality in asian developing countries over three decades, Demography, 58, 137, 10.1215/00703370-8931544 Akaike, 1977 Anand, 1998 Anyadike-Danes, 2010, My brilliant career: characterizing the early labor market trajectories of British women from generation X, Socio. Methods Res., 38, 482, 10.1177/0049124110362968 Arpino, 2022, What tears couples apart: a machine learning analysis of union dissolution in Germany, Demography, 59, 161, 10.1215/00703370-9648346 Athey, 2015, A measure of robustness to misspecification, Am. Econ. Rev., 105, 476, 10.1257/aer.p20151020 Athey, 2016, Recursive partitioning for heterogeneous causal effects, Proc. Natl. Acad. Sci., 113, 7353, 10.1073/pnas.1510489113 Bacher, 2000, A probabilistic clustering model for variables of mixed type, Qual. Quantity, 34, 223, 10.1023/A:1004759101388 Bail, 2008, The configuration of symbolic boundaries against immigrants in Europe, Am. Socio. Rev., 73, 37, 10.1177/000312240807300103 Bankes, 2002, Agent-based modeling: a revolution, Proc. Natl. Acad. Sci. USA, 99, 7199, 10.1073/pnas.072081299 Billari, 2006, Timing, sequencing, and quantum of life course events: a machine learning approach, Eur. J. Popul., 22, 37, 10.1007/s10680-005-5549-0 2014 Bond, 2012, A 61-million-person experiment in social influence and political mobilization, Nature, 489, 295, 10.1038/nature11421 Bonikowski, 2016, Varieties of American popular nationalism, Am. Socio. Rev., 81, 949, 10.1177/0003122416663683 Brand, 2021, Uncovering sociological effect heterogeneity using tree-based machine learning, Socio. Methodol., 51, 189, 10.1177/0081175021993503 Brand, 2023, Recent developments in causal inference and machine learning, Annu. Rev. Sociol., 10.1146/annurev-soc-030420-015345 Breiman, 2001, Statistical modeling: two cultures (with discussion), Stat. Sci., 16, 199, 10.1214/ss/1009213726 Breiman, 2001, Random forests, Mach. Learn., 45, 5, 10.1023/A:1010933404324 Breiman, 1984 Clogg, 1995, Latent class models” in Conte, 2016, Computational social and behavioral science Deza, 2006 Diamond, 2013, Genetic matching for estimating causal effects: a general multivariate matching method for achieving balance in observational studies, Rev. Econ. Stat., 95, 932, 10.1162/REST_a_00318 Donoho, 2017, 50 Years of data science, J. Comput. Graph Stat., 26, 745, 10.1080/10618600.2017.1384734 Dumbill, 2013, A revolution that will transform how we live, work, and think: an interview with the author of big data, Big Data, 1, 73, 10.1089/big.2013.0016 Epstein, 2006, Remarks on the foundations of agent-based generative social science, Handb. Comput. Econ., 2, 1585, 10.1016/S1574-0021(05)02034-4 Fayyad, 1996, Knowledge discovery and data mining: towards a unifying framework, KDD-96 Proceedings, 82 Frye, 2015, Ideals as anchors for relationship experiences, Am. Socio. Rev., 80, 496, 10.1177/0003122415581333 Garip, 2012 Garip, 2017 Garson, 1998 Gilbert, 2006, Emerging artificial societies through learning, J. Artif. Soc. Soc. Simulat., 9, 9 Glymour, 1997, Statistical themes and lessons for data mining, Data Min. Knowl. Discov., 1, 11, 10.1023/A:1009773905005 Goldberger, 2017 Gondal, 2022, Multiplexity as a lens to investigate the cultural meanings of interpersonal ties, Soc. Network., 68, 209, 10.1016/j.socnet.2021.07.002 Gorunescu, 2011 Hagenaars, 2002 Han, 2018 Hand, 2001 Hedt, 2011, Health indicators: eliminating bias from convenience sampling estimators, Stat. Med., 30, 560, 10.1002/sim.3920 Heiberger, 2021, Facets of Specialization and its Relation to Career Success: An Analysis of U.S. Sociology, 1980 to 2015." American Sociological Review, 86, 1164 Hofman, 2017, Prediction and explanation in social systems, Science, 355, 486, 10.1126/science.aal3856 Holton, 2017 Hu, 2021, Analysis of heterogeneity effects: opportunities and challenges of machine learning, Sociol. Stud. ImageNet Kim, 2018, Evaluating sampling methods for content analysis of twitter data, Social Media + Soc., 4, 10.1177/2056305118772836 Kramer, 2014, Experimental evidence of massive-scale emotional contagion through social networks, Proc. Natl. Acad. Sci. USA, 111, 8788, 10.1073/pnas.1320040111 Lazer, 2009, Computational social science, Science, 323, 721, 10.1126/science.1167742 Lee, 2017, Social disadvantage, severe child abuse, and biological profiles in adulthood, J. Health Soc. Behav., 58, 371, 10.1177/0022146516685370 Levenshtein, 1966, Binary codes capable of correcting deletions, insertions, and reversals, Dokl. Phys., 10, 707 Lundberg, 2022 Luma-Osmani, 2020, 48 MacKay, 2003 Manyika, 2011 Mason, 2014, Computational social science and social computing, Mach. Learn., 95, 257, 10.1007/s10994-013-5426-8 Mauro, 2016, A formal definition of big data based on its essential features, Libr. Rev., 65, 122, 10.1108/LR-06-2015-0061 Michel, 2011, The google books team, joseph P. Pickett, dale hoiberg, dan clancy, peter norvig, jon orwant, steven pinker, martin A nowak, erez lieberman aiden, Quantit. Anal. Cult. Using Millions Digitized Books.” Sci., 331, 176 Molina, 2019, Machine learning for sociology, Annu. Rev. Sociol., 45, 27, 10.1146/annurev-soc-073117-041106 Moody, 2004, The structure of a social science collaboration network: disciplinary cohesion from 1963 to 1999, Am. Socio. Rev., 69, 213, 10.1177/000312240406900204 Morgan, 2015 Muthén, 2004, Latent variable analysis: growth mixture modeling and related techniques for longitudinal data Neal, 1992, Connectionist learning of belief networks, Artif. Intell., 56, 71, 10.1016/0004-3702(92)90065-6 Nelson, 2021, Cycles of conflict, a century of continuity: the impact of persistent place-based political logics on women’s movement form, Am. J. Sociol., 127, 10.1086/714915 Nelson, 2020, Computational grounded theory: a methodological framework, Socio. Methods Res., 49, 3, 10.1177/0049124117729703 Pavlova, 2020, Mental health discourse and social media: which mechanisms of cultural power drive discourse on twitter, Soc. Sci. Med., 263, 10.1016/j.socscimed.2020.113250 Peterson, 2014, Convenience samples of college students and research reproducibility, J. Bus. Res., 67, 1035, 10.1016/j.jbusres.2013.08.010 Provost, 2013 Reitermanova, 2010, Data Splitting, WDS’10 Proceedings of Contributed Papers, 1, 31 Rigobon, 2019, Winning models for GPA, grit, and layoff in the fragile families challenge, Socius, 5, 1, 10.1177/2378023118820418 Ross, 1986, Induction of decision trees, Mach. Learn., 1, 81, 10.1007/BF00116251 Salganik, 2020, Measuring the predictability of life outcomes with a scientific mass collaboration, Proc. Natl. Acad. Sci. USA, 117, 8398, 10.1073/pnas.1915006117 Samuel, 1959, Some studies in machine learning using the game of checkers, IBM J. Res. Dev., 3, 210, 10.1147/rd.33.0210 Scarborough, 2020, Gendered places: the dimensions of local gender norms across the United States, Gend. Soc., 34, 705, 10.1177/0891243220948220 Seife, 2015, Big data: the revolution is digitized, Nature, 518, 480, 10.1038/518480a Scarborough, 2021, The intersection of racial and gender attitudes, 1977 through 2018, Am. Socio. Rev., 86, 823, 10.1177/00031224211033582 Scarborough, 2019, Attitudes and the stalled gender revolution: egalitarianism, traditionalism, and ambivalence from 1977 through 2016, Gend. Soc., 33, 173, 10.1177/0891243218809604 Shu, 2003 Shu, 2020 Sianes, 2014, Rating the rich: an ordinal classification to determine which rich countries are helping poorer ones the most, Soc. Indicat. Res., 116, 47, 10.1007/s11205-013-0270-6 Soehl, 2021, How legacies of geopolitical trauma shape popular nationalism today, Am. Socio. Rev., 86, 406, 10.1177/00031224211011981 Van de Rijt, 2013, Only 15 minutes? The social stratification of fame in printed media, Am. Socio. Rev., 78, 266, 10.1177/0003122413480362 Watts, 2013, Computational social science: exciting progress and future directions, The Bridge on Frontiers of Engineering, 43, 5 Wager, 2018, Estimation and inference of heterogeneous treatment effects using random forests, J. Am. Stat. Assoc., 113, 1228, 10.1080/01621459.2017.1319839 Westreich, 2010, Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression, J. Clin. Epidemiol., 63, 826, 10.1016/j.jclinepi.2009.11.020 Winton, 2021, A multi-group Analysis of convenience samples: free, cheap, friendly, and fancy sources, Int. J. Soc. Res. Methodol., 1 Witten, 2011 Wyss, 2014, The role of prediction modeling in propensity score estimation: an evaluation of logistic regression, bCART, and the covariate-balancing propensity score, Am. J. Epidemiol., 180, 645, 10.1093/aje/kwu181 Xu, 2021, Detecting suicide risk using knowledge-aware natural language processing and counseling service data, Soc. Sci. Med., 283, 10.1016/j.socscimed.2021.114176 Zhang, 2019, CASM: a deep learning approach for identifying collective action events with text and image data from social media, Socio. Methodol., 49, 1, 10.1177/0081175019860244 Zhang, 2022, Image clustering: an unsupervised approach to categorize visual data in social science research, Socio. Methods Res., 10.1177/00491241221082603 Zhang, 2016, Tweet sarcasm detection using deep neural network.” Paper presented at the COLING 2016 - 26th International Conference on Computational Linguistics, Proceedings of COLING 2016: Technical Papers, 2449