A Learning Analytics Approach to Identify Students at Risk of Dropout: A Case Study with a Technical Distance Education Course

Applied Sciences - Tập 10 Số 11 - Trang 3998
Emanuel Marques Queiroga1,2, João Ladislau Barbará Lopes2, Kristofer S. Kappel1, Marílton Sanchotene de Aguiar1, Ricardo Araújo1, Roberto Muñoz3, Rodolfo Villarroel4, Cristian Cechinel5
1Centro de Desenvolvimento Tecnológico (CDTEC), Universidade Federal de Pelotas (UFPel), Pelotas 96010610, Brazil
2Instituto Federal de Educação, Ciência e Tecnologia Sul-rio-Grandense (IFSul), Pelotas 96015560, Brazil
3Escuela de Ingeniería Informática, Universidad de Valparaíso, Valparaíso 2362735, Chile
4Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Valparaíso 2362807, Chile
5Centro de Ciências, Tecnologias e Saúde (CTS), Universidade Federal de Santa Catarina (UFSC), Araranguá 88906072, Brazil

Tóm tắt

Contemporary education is a vast field that is concerned with the performance of education systems. In a formal e-learning context, student dropout is considered one of the main problems and has received much attention from the learning analytics research community, which has reported several approaches to the development of models for the early prediction of at-risk students. However, maximizing the results obtained by predictions is a considerable challenge. In this work, we developed a solution using only students’ interactions with the virtual learning environment and its derivative features for early predict at-risk students in a Brazilian distance technical high school course that is 103 weeks in duration. To maximize results, we developed an elitist genetic algorithm based on Darwin’s theory of natural selection for hyperparameter tuning. With the application of the proposed technique, we predicted the student at risk with an Area Under the Receiver Operating Characteristic Curve (AUROC) above 0.75 in the initial weeks of a course. The results demonstrate the viability of applying interaction count and derivative features to generate prediction models in contexts where access to demographic data is restricted. The application of a genetic algorithm to the tuning of hyperparameters classifiers can increase their performance in comparison with other techniques.

Từ khóa


Tài liệu tham khảo

Chatti, 2013, A reference model for learning analytics, Int. J. Technol. Enhanc. Learn., 4, 318, 10.1504/IJTEL.2012.051815

Siemens, 2013, Learning analytics: The emergence of a discipline, Am. Behav. Sci., 57, 1380, 10.1177/0002764213498851

Sheehan, M., and Park, Y. (2012, January 9–13). pGPA: A personalized grade prediction tool to aid student success. Proceedings of the Sixth ACM Conference on Recommender Systems, Dublin City, Ireland.

Manhães, L.M.B., Cruz, S.d., Costa, R.J.M., Zavaleta, J., and Zimbrão, G. (2011, January 21–25). Previsão de Estudantes com Risco de Evasão Utilizando Técnicas de Mineração de Dados. Proceedings of the Anais do XXII SBIE-XVII WIE, Aracaju, Brazil.

Lykourentzou, 2009, Dropout prediction in e-learning courses through the combination of machine learning techniques, Comput. Educ., 53, 950, 10.1016/j.compedu.2009.05.010

Cano, 2016, Early dropout prediction using data mining: A case study with high school students, Expert Syst., 33, 107, 10.1111/exsy.12135

OECD (2019). Benchmarking Higher Education System Performance, OECD.

Yukselturk, 2014, Predicting Dropout Student: An Application of Data Mining Methods in an Online Education Program, Comput. Educ., 17, 118

Li, Q., Baker, R., and Warschauer, M. (2020). Using clickstream data to measure, understand, and support self-regulated learning in online courses. Internet High. Educ., 100727.

Rastrollo-Guerrero, J.L., Gómez-Pulido, J.A., and Durán-Domínguez, A. (2020). Analyzing and Predicting Students’ Performance by Means of Machine Learning: A Review. Appl. Sci., 10.

Vossensteyn, J.J., Kottmann, A., Jongbloed, B.W., Kaiser, F., Cremonini, L., Stensaker, B., Hovdhaugen, E., and Wollscheid, S. (2015). Dropout and Completion in Higher Education in Europe: Main Report. European Commission, Center for Higher Education Policy Studies and Nordic Institute for Studies in Innovation Research and Education.

Gregori, 2018, Learner support in MOOCs: Identifying variables linked to completion, Comput. Educ., 122, 153, 10.1016/j.compedu.2018.03.014

Censo, E. (2018). BR 2018-Relatório Analítico da Aprendizagem a Distância no Brasil. Acesso Em, 16.

Dickson, W.P. (2005). Toward a deeper understanding of student performance in virtual high school courses: Using quantitative analyses and data visualization to inform decision making. A Synthesis of New Research in K–12 Online Learning, Michigan Virtual University.

Murray, M., Pérez, J., Geist, D., and Hedrick, A. (July, January 30). Student interaction with content in online and hybrid courses: Leading horses to the proverbial water. Proceedings of the Informing Science and Information Technology Education Conference, Santa Rosa, CA, USA.

Leitner, P., Ebner, M., and Ebner, M. (2019). Learning Analytics Challenges to Overcome in Higher Education Institutions. Utilizing Learning Analytics to Support Study Success, Springer.

Gursoy, 2016, Privacy-preserving learning analytics: Challenges and techniques, IEEE Trans. Learn. Technol., 10, 68, 10.1109/TLT.2016.2607747

Drachsler, H., and Greller, W. (2016, January 25–29). Privacy and analytics: It’s a DELICATE issue a checklist for trusted learning analytics. Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, Edinburgh, Scotland.

Baker, R.S., and Inventado, P.S. (2014). Educational data mining and learning analytics. Learning Analytics, Springer.

Olivares, R., Munoz, R., Soto, R., Crawford, B., Cárdenas, D., Ponce, A., and Taramasco, C. (2020). An Optimized Brain-Based Algorithm for Classifying Parkinson’s Disease. Appl. Sci., 10.

Bergstra, J.S., Bardenet, R., Bengio, Y., and Kégl, B. (2011). Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems, Curran Associates Inc.

Li, 2017, Hyperband: A novel bandit-based approach to hyperparameter optimization, J. Mach. Learn. Res., 18, 6765

Queiroga, E., Cechinel, C., and Araújo, R. (November, January 30). Predição de estudantes com risco de evasão em cursos técnicos a distância. Proceedings of the Brazilian Symposium on Computers in Education (Simpósio Brasileiro de Informática na Educação-SBIE), Recife, Brazil.

Queiroga, E., Cechinel, C., Araújo, R., and da Costa Bretanha, G. (2016, January 3–7). Generating models to predict at-risk students in technical e-learning courses. Proceedings of the IEEE Latin American Conference on Learning Objects and Technology (LACLO), San Carlos, CA, USA.

Detoni, 2015, Modelagem e Predição de Reprovação de Acadêmicos de Cursos de Educação a Distância a partir da Contagem de Interações, Revista Brasileira de Informática na Educação, 23, 1, 10.5753/rbie.2015.23.03.1

Jayaprakash, 2014, Early Alert of Academically At-Risk Students: An Open Source Analytics Initiative, J. Learn. Anal., 1, 6, 10.18608/jla.2014.11.3

Cano, 2013, Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data, Appl. Intell., 38, 315, 10.1007/s10489-012-0374-8

Xing, 2015, Participation-based student final performance prediction model through interpretable Genetic Programming: Integrating learning analytics, educational data mining and theory, Comput. Hum. Behav., 47, 168, 10.1016/j.chb.2014.09.034

Munoz, 2018, Using black hole algorithm to improve eeg-based emotion recognition, Comput. Intell. Neurosci., 2018, 22, 10.1155/2018/3050214

Liz-Domínguez, M., Caeiro-Rodríguez, M., Llamas-Nistal, M., and Mikic-Fonte, F.A. (2019). Systematic Literature Review of Predictive Analysis Tools in Higher Education. Appl. Sci., 9.

Herodotou, C., Rienties, B., Verdin, B., and Boroowa, A. (2019). Predictive learning analytics ‘at scale’: Towards guidelines to successful implementation in Higher Education based on the case of the Open University UK. J. Learn. Anal.

Hilliger, 2020, Identifying needs for learning analytics adoption in Latin American universities: A mixed-methods approach, Internet High. Educ., 45, 100726, 10.1016/j.iheduc.2020.100726

Cechinel, C., Ochoa, X., Lemos dos Santos, H., Carvalho Nunes, J.B., Rodés, V., and Marques Queiroga, E. (2020). Mapping Learning Analytics initiatives in Latin America. Br. J. Educ. Technol.

2017, Factores que favorecen las presencia docente en entornos virtuales de aprendizaje, Tendencias Pedagógicas, 29, 43, 10.15366/tp2017.29.001

De Pablo González, G. (2016). La Importancia de la Presencia Docente en Entornos Virtuales de Aprendizaje, Universidad Autónoma de Madrid.

Herodotou, C., Rienties, B., Boroowa, A., Zdrahal, Z., Hlosta, M., and Naydenova, G. (2017, January 13–17). Implementing predictive learning analytics on a large scale: The teacher’s perspective. Proceedings of the Seventh International Learning Analytics & Knowledge Conference, Vancouver, BC, Canada.

Zohair, 2019, Prediction of Student’s performance by modelling small dataset size, Int. J. Educ. Technol. High. Educ., 16, 27, 10.1186/s41239-019-0160-3

Whitehill, J., Mohan, K., Seaton, D., Rosen, Y., and Tingley, D. (2017). Delving deeper into MOOC student dropout prediction. arXiv.

Macarini, B., Antonio, L., Cechinel, C., Batista Machado, M.F., Faria Culmant Ramos, V., and Munoz, R. (2019). Predicting Students Success in Blended Learning—Evaluating Different Interactions Inside Learning Management Systems. Appl. Sci., 9.

Queiroga, E., Cechinel, C., and Araújo, R. (2015, January 26–30). Um Estudo do Uso de Contagem de Interações Semanais para Predição Precoce de Evasão em Educação a Distância. Proceedings of the Anais dos Workshops do Congresso Brasileiro de Informática na Educação, Maceio, Brazil.

Swan, 2003, Learning effectiveness online: What the research tells us, Elem. Qual. Online Educ. Pract. Dir., 4, 13

Halawa, 2014, Dropout Prediction in MOOCs using Learner Activity Features, Eur. MOOC Summit EMOOCs, 37, 1

Minaei-Bidgoli, B., and Punch, W.F. (2003, January 12–16). Using genetic algorithms for data mining optimization in an educational web-based system. Proceedings of the Genetic and eVolutionary Computation Conference, Chicago, IL, USA.

Motejunas, 2007, A evasão no ensino superior brasileiro, Cadernos de Pesquisa, 37, 641, 10.1590/S0100-15742007000300007

Resende, M.L.d.A. (2012). Evasão Escolar No Primeiro Ano Do Ensino médio Integrado Do Ifsuldeminas-Campus Machado, Encontro Anual da ANPOCS.

Fonseca, C.M., and Fleming, P.J. (1993, January 17–22). Genetic Algorithms for Multiobjective Optimization: Formulation Discussion and Generalization. Proceedings of the ICGA, San Mateo, CA, USA.

Hartmann, 1998, A competitive genetic algorithm for resource-constrained project scheduling, Nav. Res. Logist. (NRL), 45, 733, 10.1002/(SICI)1520-6750(199810)45:7<733::AID-NAV5>3.0.CO;2-C

Sebastiani, 2002, Machine learning in automated text categorization, ACM Comput. Surv. (CSUR), 34, 1, 10.1145/505282.505283

Fawcett, 2006, An introduction to ROC analysis, Pattern Recognit. Lett., 27, 861, 10.1016/j.patrec.2005.10.010

Dawson, 2016, Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success, Internet High. Educ., 28, 68, 10.1016/j.iheduc.2015.10.002

Bruce, P., and Bruce, A. (2017). Practical Statistics for Data Scientists: 50 Essential Concepts, O’Reilly Media, Inc.

Hughes, 2019, The efficacy of learning analytics interventions in higher education: A systematic review, Br. J. Educ. Technol., 50, 2594, 10.1111/bjet.12720

Zöller, M.A., and Huber, M.F. (2019). Survey on automated machine learning. arXiv.