“Bad smells” in software analytics papers
Tài liệu tham khảo
Agrawal, 2018, What is wrong with topic modeling? And how to fix it using search-based software engineering, Inf. Softw. Technol., 98, 74, 10.1016/j.infsof.2018.02.005
A. Agrawal, T. Menzies, Is “better data” better than “better data miners”? On the benefits of tuning SMOTE for defect prediction, 2018. Proceedings of the 40th International Conference on Software Engineering, ACM1050–1061.
Agrawal, 2018, We don't need another hero?: the impact of heroes on software development, 245
Arcuri, 2011, A practical guide for using statistical tests to assess randomized algorithms in software engineering, 1
Beck, 1999, Bad smells in code, 75
A. Begel, T. Zimmermann, Analyze this! 145 questions for data scientists in software engineering, 2014. Proceedings of the 36th ACM International Conference on Software Engineering, ACM, 12–23.
Bender, 2001, Adjusting for multiple testing—when and how?, J. Clin. Epidemiol., 54, 343, 10.1016/S0895-4356(00)00314-0
Benjamini, 2001, The control of the false discovery rate in multiple testing under dependency, Ann. Stat., 29, 1165, 10.1214/aos/1013699998
Bergstra, 2012, Random search for hyper-parameter optimization, J. Mach. Learn. Res., 13, 281
Blair, 1985, Comparison of the power of the paired samples t test to that of Wilcoxon’s signed-ranks test under various population shapes, Psychol. Bull., 97, 119, 10.1037/0033-2909.97.1.119
Booth, 1997, The value of structured abstracts in information retrieval from MEDLINE, Health Libr. Rev., 14, 157, 10.1046/j.1365-2532.1997.1430157.x
Borenstein, 2009
M. Bosu, S. MacDonell, Data quality in empirical software engineering: a targeted review, 2013. Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering ACM, 171–176.
Budgen, 2011, Reporting computing projects through structured abstracts: a quasi-experiment, Empir. Softw. Eng., 16, 244, 10.1007/s10664-010-9139-3
Button, 2013, Power failure: why small sample size undermines the reliability of neuroscience, Nat. Rev. Neurosci., 14, 365, 10.1038/nrn3475
Carpenter, 2000, Bootstrap confidence intervals: when, which, what? a practical guide for medical statisticians, Stat. Med., 19, 1141, 10.1002/(SICI)1097-0258(20000515)19:9<1141::AID-SIM479>3.0.CO;2-F
Carver, 1993, The case against statistical significance testing, revisited, J. Exp. Educ., 61, 287, 10.1080/00220973.1993.10806591
Cawley, 2010, On over-fitting in model selection and subsequent selection bias in performance evaluation, J. Mach. Learn. Res., 11, 2079
Chen, 2018, “Sampling” as a baseline optimizer for search-based software engineering, IEEE Trans. Software Eng., 10.1109/TSE.2018.2790925
Chen, 2005, Finding the right data for software cost modeling, IEEE Softw., 22, 38, 10.1109/MS.2005.151
Cohen, 1988
Cohen, 1992, A power primer, Pyschol. Bull., 112, 155, 10.1037/0033-2909.112.1.155
Colquhoun, 2014, An investigation of the false discovery rate and the misinterpretation of p-values, R. Soc. Open Sci., 1
Courtney, 1993, Shotgun correlations in software measures, Softw. Eng. J., 8, 5, 10.1049/sej.1993.0002
Cruzes, 2011, Research synthesis in software engineering: a tertiary study, Inf. Softw. Technol., 53, 440, 10.1016/j.infsof.2011.01.004
De Veaux, 2005, How to lie with bad data, Stat. Sci., 20, 231, 10.1214/088342305000000269
Deb, 2014, An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part i: solving problems with box constraints, IEEE Trans. Evol. Comput., 18, 577, 10.1109/TEVC.2013.2281535
P. Devanbu, T. Zimmermann, C. Bird, Belief & evidence in empirical software engineering, 2016. IEEE/ACM 38th International Conference on Software Engineering (ICSE), IEEE, 108–119.
T. Dybå, T. Dingsøyr, Strength of evidence in systematic reviews in software engineering, 2008. Proceedings of the 2nd ACM-IEEE international Symposium on Empirical Software Engineering and Measurement, ACM, 178–187.
Dybå, 2006, A systematic review of statistical power in software engineering experiments, Inf. Softw. Technol., 48, 745, 10.1016/j.infsof.2005.08.009
Earp, 2015, Replication, falsification, and the crisis of confidence in social psychology, Front. Psychol., 6, 621, 10.3389/fpsyg.2015.00621
Efron, 1993
Ellis, 2010
Erceg-Hurn, 2008, Modern robust statistical methods: an easy way to maximize the accuracy and power of your research, Am. Psychol., 63, 591, 10.1037/0003-066X.63.7.591
Fu, 2017, Easy over hard: A case study on deep learning, 49
W. Fu, T. Menzies, Revisiting unsupervised learning for defect prediction, ACM, 2017b. Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 72–83.
Fu, 2016, Tuning for software analytics, Inf. Softw. Technol., 76, 135, 10.1016/j.infsof.2016.04.017
Fu, 2017, Why is differential evolution better than grid search for tuning defect predictors?, CoRR
Gelman, 2013
B. Ghotra, S. McIntosh, A.E. Hassan, Revisiting the impact of classification techniques on the performance of defect prediction models, IEEE Press, 2015. Proceedings of the 37th International Conference on Software Engineering-Volume 1, 789–800.
Goodman, 2016, What does research reproducibility mean?, Sci. Transl. Med., 8, 10.1126/scitranslmed.aaf5027
Hall, 2003, Benchmarking attribute selection techniques for discrete class data mining, IEEE Trans. Knowl. Data Eng., 15, 1437, 10.1109/TKDE.2003.1245283
Hawkins, 2004, The problem of overfitting, J. Chem. Inf. Comput. Sci., 44, 1, 10.1021/ci0342472
Healy, 2018
Herodotou, 2011, Starfish: A self-tuning system for big data analytics, volume 11, 261
Hoaglin, 1983
Holte, 1993, Very simple classification rules perform well on most commonly used datasets, Mach. Learn., 11, 63, 10.1023/A:1022631118932
Huang, 2017, Power, false discovery rate and winner’s curse in eQTL studies, bioRxiv, 209171
Q. Huang, X. Xia, D. Lo, Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction, IEEE, 2017. IEEE International Conference on Software Maintenance and Evolution (ICSME), 159–170.
Ioannidis, 2005, Why most published research findings are false, PLoS Med., 2, e124, 10.1371/journal.pmed.0020124
Ivarsson, 2011, A method for evaluating rigor and industrial relevance of technology evaluations, Empir. Softw. Eng., 16, 365, 10.1007/s10664-010-9146-4
Johnson, 2012, Where’s the theory for software engineering?, IEEE Softw., 29, 10.1109/MS.2012.127
Jørgensen, 2016, Incorrect results in software engineering experiments: how to improve research practices, J. Syst. Softw., 116, 133, 10.1016/j.jss.2015.03.065
Kampenes, 2007, A systematic review of effect size in software engineering experiments, Inf. Softw. Technol., 49, 1073, 10.1016/j.infsof.2007.02.015
Kitchenham, 2015
Kitchenham, 2004, Evidence-based software engineering, 273
Kitchenham, 2017, Robust statistical methods for empirical software engineering, Empir. Softw. Eng., 22, 579, 10.1007/s10664-016-9437-5
Kitchenham, 2002, Preliminary guidelines for empirical research in software engineering, IEEE Trans. Softw. Eng., 28, 721, 10.1109/TSE.2002.1027796
Kitchenham, 2008, Length and readability of structured software engineering abstracts, IET Softw., 2, 37, 10.1049/iet-sen:20070044
Ko, 2015, A practical guide to controlled experiments of software engineering tools with human participants, Empir. Softw. Eng., 20, 110, 10.1007/s10664-013-9279-3
Krishna, 2017, What is the connection between issues, bugs, and enhancements?: lessons learned from 800+ software projects, 306
R. Krishna, T. Menzies, Simpler transfer learning (using “bellwethers”), http://arxiv.org/abs/1703.06218.
Krishna, 2016, The ’bigSE’ project: lessons learned from validating industrial text mining, 65
Lakens, 2017, Equivalence tests: a practical primer for t tests, correlations, and meta-analyses, Soc. Psychol. Pers. Sci., 8, 355, 10.1177/1948550617697177
Lakens, 2014, Sailing from the seas of chaos into the corridor of stability: practical recommendations to increase the informational value of studies, Perspect. Psychol. Sci., 9, 278, 10.1177/1745691614528520
LeCun, 2015, Deep learning, Nature, 521, 436, 10.1038/nature14539
Lessmann, 2008, Benchmarking classification models for software defect prediction: a proposed framework and novel findings, IEEE Trans. Softw. Eng., 34, 485, 10.1109/TSE.2008.35
Liebchen, 2008, Data sets and data quality in software engineering
G. Liebchen, M. Shepperd, Data sets and data quality in software engineering: eight years on, ACM, 2016. Proceedings of the 12th International Conference on Predictive Models and Data Analytics in Software Engineering.
D. Lo, N. Nagappan, T. Zimmermann, How practitioners perceive the relevance of software engineering research, ACM, 2015. Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, 415–425.
Loken, 2017, Measurement error and the replication crisis, Science, 355, 584, 10.1126/science.aal3618
Madeyski, 2017, Would wider adoption of reproducible research be beneficial for empirical software engineering research?, J. Intell. Fuzzy Syst., 32, 1509, 10.3233/JIFS-169146
Manly, 1997
Maxwell, 2008, Sample size planning for statistical power and accuracy in parameter estimation, Annu. Rev. Psychol., 59, 537, 10.1146/annurev.psych.59.103006.093735
McClelland, 2000, Increasing statistical power without increasing sample size, Am. Psychol., 55, 963, 10.1037/0003-066X.55.8.963
Menzies, 2007, Data mining static code attributes to learn defect predictors, IEEE Trans. Softw. Eng., 33, 2, 10.1109/TSE.2007.256941
Menzies, 2012
Mittas, 2013, Ranking and clustering software cost estimation models through a multiple comparisons algorithm, IEEE Trans. Softw. Eng., 39, 537, 10.1109/TSE.2012.45
Munafò, 2017, A manifesto for reproducible science, Nat. Hum. Behav., 1, 0021, 10.1038/s41562-016-0021
Myatt, 2009
Nair, 2018, Data-driven search-based software engineering
Nickerson, 1998, Confirmation bias: a ubiquitous phenomenon in many guises, Rev. Gen. Psychol., 2, 175, 10.1037/1089-2680.2.2.175
Collaboration, 2015, Estimating the reproducibility of psychological science, Science, 349, aac4716, 10.1126/science.aac4716
Petersen, 2015, Guidelines for conducting systematic mapping studies in software engineering: an update, Inf. Softw. Technol., 64, 1, 10.1016/j.infsof.2015.03.007
Rosli, 2013, Can we trust our results? A mapping study on data quality, 116
Runeson, 2009, Guidelines for conducting and reporting case study research in software engineering, Empir. Softw. Eng., 2, 131, 10.1007/s10664-008-9102-8
Saltelli, 2000, Sensitivity analysis as an ingredient of modeling, Stat. Sci., 15, 377, 10.1214/ss/1009213004
A. Sarkar, J. Guo, N. Siegmund, S. Apel, K. Czarnecki, Cost-efficient sampling for performance prediction of configurable systems (t), IEEE, 2015. Automated Software Engineering (ASE), 2015 30th IEEE/ACM International Conference on, 342–352.
A.S. Sayyad, H. Ammar, Pareto-optimal search-based software engineering (POSBSE): A literature survey, IEEE, 2013. 2nd International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE), 21–27.
Schmidt, 2014
Shadish, 2002
M. Shaw, Writing good software engineering research papers, IEEE Computer Society, 2003. 25th IEEE International Conference on Software Engineering, 726–736
Shepperd, 2014, Researcher bias: the use of machine learning in software defect prediction, IEEE Trans. Softw. Eng., 40, 603, 10.1109/TSE.2014.2322358
Shepperd, 2013, Data quality: some comments on the NASA software defect datasets, IEEE Trans. Softw. Eng., 39, 1208, 10.1109/TSE.2013.11
R. Silberzahn, E. Uhlmann, D. Martin, P. Anselmi, F. Aust, E. Awtrey, v. Bahník, F. Bai, C. Bannard, E. Bonnier, et al., Many analysts, one dataset: making transparent how variations in analytical choices affect results, https://osf.io/j5v8f.
Simmons, 2011, False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant, Psychol. Sci., 22, 1359, 10.1177/0956797611417632
Sjøberg, 2008, Building theories in software engineering, 312
Smaldino, 2016, The natural selection of bad science, R. Soc. Open Sci., 3, 160384, 10.1098/rsos.160384
Snoek, 2012, Practical Bayesian optimization of machine learning algorithms, 2951
Spence, 2016, Prediction interval: what to expect when you’re expecting...a replication, PLoS ONE, 11, e0162874, 10.1371/journal.pone.0162874
K.J. Stol, B. Fitzgerald, Uncovering theories in software engineering, IEEE, 2013. 2nd SEMAT Workshop on a General Theory of Software Engineering (GTSE), 5–14,
K.J. Stol, P. Ralph, B. Fitzgerald, Grounded theory in software engineering research: a critical review and guidelines, IEEE, 2016. Software Engineering (ICSE), 2016 IEEE/ACM 38th International Conference on, 120–131,
Storn, 1997, Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces, J. Global Optim., 11, 341, 10.1023/A:1008202821328
Szucs, 2017, Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature, PLoS Biol., 15, e2000797, 10.1371/journal.pbio.2000797
C. Tantithamthavorn, S. McIntosh, A. Hassan, K. Matsumoto, Automated parameter optimization of classification techniques for defect prediction models, IEEE, 2016. IEEE/ACM 38th International Conference on Software Engineering (ICSE 2016), 321–332.
Tantithamthavorn, 2018, The impact of automated parameter optimization on defect prediction models, IEEE Trans. Softw. Eng., 10.1109/TSE.2018.2794977
C. Theisen, M. Dunaiski, L. Williams, W. Visser, Writing good software engineering research papers: revisited, IEEE, 2017. IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C), 402–402.
D.V. Aken, A. Pavlo, G. Gordon, B. Zhang, Automatic database management system tuning through large-scale machine learning, ACM, 2017. Proceedings of the 2017 ACM International Conference on Management of Data, 1009–1024.
Whigham, 2015, A baseline model for software effort estimation, ACM Trans. Softw. Eng. Method. (TOSEM), 24, 20, 10.1145/2738037
Wilcox, 2012
Wohlin, 2012
Zhang, 2007, MOEA/D: a multiobjective evolutionary algorithm based on decomposition, IEEE Trans. Evol. Comput., 11, 712, 10.1109/TEVC.2007.892759
Zimmerman, 1998, Invalidation of parametric and nonparametric statistical tests by concurrent violation of two assumptions, J. Exp. Educ., 67, 55, 10.1080/00220979809598344
Zimmermann, 2004, Mining version histories to guide software changes, 563