A survey of machine learning approaches for student dropout prediction in online courses

A taxonomy of student modelling approaches and prediction strategies.


The recent diffusion of online education (both MOOCs and e-courses) has led to an increased economic and scientific interest in e-learning environments. As widely documented, online students have a much higher chance of dropping out than those attending conventional classrooms. It is of paramount interest for institutions, students, and faculty members to find more efficient methodologies to mitigate withdrawals. Following the rise of attention on the Student Dropout Prediction (SDP) problem, the literature has witnessed a significant increase in contributions to this subject. In this survey, we present an in-depth analysis of the state-of-the-art literature in the field of SDP, under the central perspective, but not exclusive, of machine learning predictive algorithms. Our main contributions are the following; (i) we propose a comprehensive hierarchical classification of existing literature that follows the workflow of design choices in the SDP; (ii) to facilitate the comparative analysis, we introduce a formal notation to describe in a uniform way the alternative dropout models investigated by the researchers in the field; (iii) we analyse some other relevant aspects to which the literature has given less attention, such as evaluation metrics, gathered data, and privacy concerns; (iv) we pay specific attention to deep sequential machine learning methods—recently proposed by some contributors—which represent one of the most effective solutions in this area. Overall, our survey provides novice readers who address these topics with practical guidance on design choices, as well as directs researchers to the most promising approaches, highlighting current limitations and open challenges in the field.

In ACM Computing Surveys, Volume 53, Issue 3