Machine learning strategies for predicting Alzheimer’s disease progression

¹ Faculty of Computing, Engineering and The Built Environment, School of Computing, Engineering, and Intelligent Systems, Ulster University, Londonderry, Ireland

DP 2025, 2(3), 025270031 https://doi.org/10.36922/DP025270031

Received: 3 July 2025 | Revised: 23 July 2025 | Accepted: 1 August 2025 | Published online: 21 August 2025

© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC-by the license) ( https://creativecommons.org/licenses/by-nc/4.0/ )

Download PDF

XML

Cite

Abstract

Alzheimer’s disease (AD) represents a significant global health challenge, affecting millions of individuals worldwide through progressive cognitive decline and behavioral changes. The burden extends beyond patients to caregivers and healthcare systems. While traditional diagnostic methods pose financial obstacles, emerging non-imaging techniques show promise. Machine learning has emerged as a transformative approach for enhancing both diagnosis and management. This study aims to develop a robust multi-class classification model using random forest (RF) and extreme gradient boosting algorithms on non-imaging data from the Australian AD Neuroimaging Initiative, with emphasis on the Australian Imaging, Biomarkers, and Lifestyle Study of Aging. Extensive data analysis was conducted, including feature importance and selection, to improve interpretability and classification accuracy. Synthetic oversampling was applied to address class imbalance. The findings indicate the superiority of the tuned RF model, achieving 90% in accuracy, precision, recall, and F1 scores. In addition, cost-effective diagnostic variables were explored, with neuropsychology assessment variables demonstrating exceptional accuracy (90%). This research contributes to early AD detection, personalized treatment, and optimized resource allocation.

Keywords

Alzheimer’s disease

Machine learning

Python classification model

Non-imaging data

Random Forest

Extreme gradient boosting

Australian imaging biomarkers and lifestyle study of aging

Diagnosis

Funding

None.

Conflict of interest

The authors declare that they have no competing interests.

References

Alzheimer’s Association. 2023 Alzheimer’s disease facts and figures. Alzheimers Dement. 2023;19(4):1598-1695. doi: 10.1002/alz.13016.

Yang Q, Li X, Ding X, Xu F, Ling Z. Deep learning-based speech analysis for Alzheimer’s disease detection: A literature review. Alzheimers Res Ther. 2022;14(1):186. doi: 10.1186/s13195-022-01131-3

Shahbaz M, Ali S, Guergachi A, Niazi A, Umer A. Classification of Alzheimer’s Disease Using Machine Learning Techniques. In: Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies. Prague, Czech Republic; 2019. p. 296-303. doi: 10.5220/0007949902960303

AIBL Study ADNI Non-imaging Data. aibl.csiro.au. Available from: https://aibl.csiro.au/adni/nonimaging.php [Last accessed on 2024 Apr 30].

ADNI. About. Available from: https://adni.loni.usc.edu/ about [Last accessed on 2024 Apr 30].

Rahman M, Prasad G. Comprehensive study on machine learning methods to increase the prediction accuracy of classifiers and reduce the number of medical tests required to diagnose Alzheimer’s disease. arXiv (Machine Learning). 2022;1-10. doi: 10.48550/arXiv.2212.00414

Wirth R, Hipp J. CRISP-DM: Towards a Standard Process Model for Data Mining. Available from: https://cs.unibo. it/~danilo.montesi/CBD/Beatriz/10.1.1.198.5133.pdf [Last accessed on 2024 Apr 30].

Harrikrishna NB. Confusion Matrix, Accuracy, Precision, Recall, F1 Score. Medium; 2020. Available from: https:// medium.com/analytics-vidhya/confusion-matrix-accuracy-precision-recall-f1-score-ade299cf63cd [Last accessed on 2024 Apr 30].

Scikit-Learn. Scikit-learn: Machine Learning in Python. Scikit-Learn; 2019. Available from: https://scikit-learn.org/ stable [Last accessed on 2024 Apr 30].

Nabel R. PyDrive: Google Drive API Made Easy. PyPI. Available from: https://pypi.org/project/pydrive [Last accessed on 2024 Apr 30].

Bhandari A. Multicollinearity. Causes, Effects and Detection Using VIF. Analytics Vidhya; 2023. Available from: https://www.analyticsvidhya.com/blog/2020/03/what-is-multicollinearity/#:~:text=multicollinearity%20is%20a%20 statistical%20phenomenon [Last Accessed on 2024 Apr 30].

Turney S. Chi-Square (2) Tests. Types, Formula and Examples. Scribbr; 2022. Available from: https://www. scribbr.com/statistics/chi-square-tests [Last accessed on 2024 Apr 30].

Pandas.factorize -- Pandas 1.5.3 Documentation. Available from: https://pandas.pydata.org/docs/reference/api/pandas. factorize.html [Last accessed on 2024 Apr 30].

Goyal C. Data Leakage and Its Effect on the Performance of an ML Model. Analytics Vidhya; 2021. Available from: https:// www.analyticsvidhya.com/blog/2021/07/data-leakage-and-its-effect-on-the-performance-of-an-ml-model [Last accessed on 2024 Apr 30].

Stekhoven D, Bühlmann P. MissForest--non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112-118. doi: 10.1093/bioinformatics/btr597

Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507-2517. doi: 10.1093/bioinformatics/btm344

Malato G. Feature Selection with Random Forest. Your Data Teacher; 2021. Available from: https://www.yourdatateacher. com/2021/10/11/feature-selection-with-random-forest [Last accessed on 2024 Apr 30].

Tanuja D, Goutam S. Classification of imbalanced big data using SMOTE with rough random forest. Int J Eng Adv Technol. 2019;9:5174. doi: 10.35940/ijeat.B4096.129219

Brownlee J. Parametric and Nonparametric Machine Learning Algorithms. Machine Learning Mastery; 2016. Available from: https://machinelearningmastery.com/parametric-and-nonparametric-machine-learning-algorithms [Last accessed on 2024 Apr 30].

Breiman L. Random forests. Mach Learn. 2001;45(1):5-32. doi: 10.1023/a:1010933404324

N. Room. How to Use XGBoost for Time-Series Forecasting? Datadance; 2024. Available from: https://datadance.ai/ machine-learning/how-to-use-xgboost-for-time-series-forecasting/#step-3-handling-missing-values-and-outliers [Last accessed on 2024 Apr 30].

Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ‘16. 2016. 785-794. doi: 10.1145/2939672.2939785

Scikit-Learn. Sklearn.Model_Selection. RandomizedSearchCV - Scikit-Learn 0.21.3 Documentation. Scikit-Learn; 2019. Available from: https://scikit-learn. org/stable/modules/generated/sklearn.model_selection. RandomizedSearchCV.html [Last accessed on 2024 Apr 30].

Scikit. 3.3. Metrics and Scoring: Quantifying the Quality of Predictions - Scikit-Learn 0.23.2 Documentation. Scikit- Learn. Available from: https://scikit-learn.org/stable/ modules/model_evaluation.html#classification-report [Last accessed on 2024 Apr 30].

Lanier ST. Choosing Performance Metrics. Medium; 2020. Available from: https://towardsdatascience.com/choosing-performance-metrics-61b40819eae1 [Last accessed on2024 Apr 30].

Palmqvist S. Comparison of brief cognitive tests and CSF biomarkers in predicting Alzheimer’s disease in mild cognitive impairment: Six-year follow-up study. PLoS One. 2012;7(6):e38639. doi: 10.1371/journal.pone.0038639

Bloch L, Friedrich CM. Data analysis with shapley values for automatic subject selection in Alzheimer’s disease data sets using interpretable machine learning. Alzheimers Res Ther. 2021;13(1):155. doi: 10.1186/s13195-021-00879-4

Shah R. GridSearchCV. Tune Hyperparameters with GridSearchCV. Analytics Vidhya; 2021. Available from: https://www.analyticsvidhya.com/blog/2021/06/tune-hyperparameters-with-gridsearchcv [Last accessed on 2024 Apr 30].

Brownlee J. How to Use One-vs-Rest and One-vs-One for Multi-Class Classification. Machine Learning Mastery; 2020. Available from: https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification [Last accessed on 2024 Apr 30].

Previous article in this issue

Next article in this issue

Design+, Electronic ISSN: 3060-8953 Published by AccScience Publishing