keyboard_arrow_up
Predicting Student Performance Categories using Interpretable Machine Learning for Early Academic Quality Intervention

Authors

Sultan AlSultan and Walid Karamti , Qassim University, Saudi Arabia

Abstract

Early identification of students at risk of weak academic performance is essential for effective support and quality assurance in higher education. However, many educational prediction studies rely on variables that are direct components of the final outcome, which may lead to information leakage and overly optimistic performance estimates. This study proposes an interpretable machine learning framework for predicting student performance categories using academic, demographic, behavioral, health, engagement, and support-related variables. The target variable is represented by five ordered categories: Poor, Needs Improvement, Satisfactory, Good, and Excellent. A signal-enhanced modeling subset consisting of 16,892 student records and 42 variables was constructed through a correlation-guided filtering process. To support realistic evaluation, direct outcome-score variables were removed from the main training setting. Data preprocessing included type-specific encoding, imputation, standardization, and SMOTENC balancing applied only to the training set. Random Forest, Gradient Boosting, and LightGBM were trained and compared. LightGBM achieved the strongest overall performance with 44.4% accuracy, 44.0% precision, 40.0% recall, 41.5% F1-score, and 72.2% multiclass AUC. The confusion matrix showed that most errors occurred between neighboring categories, suggesting that the model captured meaningful ordinal structure even when exact five-class classification remained difficult. SHAP analysis identified GPA, attendance rate, research involvement, high school GPA, access to academic resources, and entrance examination score as the most influential predictors. The proposed framework provides interpretable evidence that can support academic advisors, instructors, and quality units in early intervention planning.

Keywords

Student Performance Prediction, Machine Learning, Learning Analytics, Educational Data Mining, Academic Quality Assurance, Explainable AI

Full Text  Volume 16, Number 11