AccScience Publishing / DP / Volume 2 / Issue 3 / DOI: 10.36922/DP025270031
ARTICLE

Machine learning strategies for predicting Alzheimer’s disease progression

Adhinrag Kalarikkal Induchudan1 Kevin Curran1*
Show Less
1 Faculty of Computing, Engineering and The Built Environment, School of Computing, Engineering, and Intelligent Systems, Ulster University, Londonderry, Ireland
DP 2025, 2(3), 025270031 https://doi.org/10.36922/DP025270031
Received: 3 July 2025 | Revised: 23 July 2025 | Accepted: 1 August 2025 | Published online: 21 August 2025
© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC-by the license) ( https://creativecommons.org/licenses/by-nc/4.0/ )
Abstract

Alzheimer’s disease (AD) represents a significant global health challenge, affecting millions of individuals worldwide through progressive cognitive decline and behavioral changes. The burden extends beyond patients to caregivers and healthcare systems. While traditional diagnostic methods pose financial obstacles, emerging non-imaging techniques show promise. Machine learning has emerged as a transformative approach for enhancing both diagnosis and management. This study aims to develop a robust multi-class classification model using random forest (RF) and extreme gradient boosting algorithms on non-imaging data from the Australian AD Neuroimaging Initiative, with emphasis on the Australian Imaging, Biomarkers, and Lifestyle Study of Aging. Extensive data analysis was conducted, including feature importance and selection, to improve interpretability and classification accuracy. Synthetic oversampling was applied to address class imbalance. The findings indicate the superiority of the tuned RF model, achieving 90% in accuracy, precision, recall, and F1 scores. In addition, cost-effective diagnostic variables were explored, with neuropsychology assessment variables demonstrating exceptional accuracy (90%). This research contributes to early AD detection, personalized treatment, and optimized resource allocation.

Keywords
Alzheimer’s disease
Machine learning
Python classification model
Non-imaging data
Random Forest
Extreme gradient boosting
Australian imaging biomarkers and lifestyle study of aging
Diagnosis
Funding
None.
Conflict of interest
The authors declare that they have no competing interests.
References
  1. Alzheimer’s Association. 2023 Alzheimer’s disease facts and figures. Alzheimers Dement. 2023;19(4):1598-1695. doi: 10.1002/alz.13016.

 

  1. Yang Q, Li X, Ding X, Xu F, Ling Z. Deep learning-based speech analysis for Alzheimer’s disease detection: A literature review. Alzheimers Res Ther. 2022;14(1):186. doi: 10.1186/s13195-022-01131-3

 

  1. Shahbaz M, Ali S, Guergachi A, Niazi A, Umer A. Classification of Alzheimer’s Disease Using Machine Learning Techniques. In: Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies. Prague, Czech Republic; 2019. p. 296-303. doi: 10.5220/0007949902960303

 

  1. AIBL Study ADNI Non-imaging Data. aibl.csiro.au. Available from: https://aibl.csiro.au/adni/nonimaging.php [Last accessed on 2024 Apr 30].

 

  1. ADNI. About. Available from: https://adni.loni.usc.edu/ about [Last accessed on 2024 Apr 30].

 

  1. Rahman M, Prasad G. Comprehensive study on machine learning methods to increase the prediction accuracy of classifiers and reduce the number of medical tests required to diagnose Alzheimer’s disease. arXiv (Machine Learning). 2022;1-10. doi: 10.48550/arXiv.2212.00414

 

  1. Wirth R, Hipp J. CRISP-DM: Towards a Standard Process Model for Data Mining. Available from: https://cs.unibo. it/~danilo.montesi/CBD/Beatriz/10.1.1.198.5133.pdf [Last accessed on 2024 Apr 30].

 

  1. Harrikrishna NB. Confusion Matrix, Accuracy, Precision, Recall, F1 Score. Medium; 2020. Available from: https:// medium.com/analytics-vidhya/confusion-matrix-accuracy-precision-recall-f1-score-ade299cf63cd [Last accessed on 2024 Apr 30].

 

  1. Scikit-Learn. Scikit-learn: Machine Learning in Python. Scikit-Learn; 2019. Available from: https://scikit-learn.org/ stable [Last accessed on 2024 Apr 30].

 

  1. Nabel R. PyDrive: Google Drive API Made Easy. PyPI. Available from: https://pypi.org/project/pydrive [Last accessed on 2024 Apr 30].

 

  1. Bhandari A. Multicollinearity. Causes, Effects and Detection Using VIF. Analytics Vidhya; 2023. Available from: https://www.analyticsvidhya.com/blog/2020/03/what-is-multicollinearity/#:~:text=multicollinearity%20is%20a%20 statistical%20phenomenon [Last Accessed on 2024 Apr 30].

 

  1. Turney S. Chi-Square (2) Tests. Types, Formula and Examples. Scribbr; 2022. Available from: https://www. scribbr.com/statistics/chi-square-tests [Last accessed on 2024 Apr 30].

 

  1. Pandas.factorize -- Pandas 1.5.3 Documentation. Available from: https://pandas.pydata.org/docs/reference/api/pandas. factorize.html [Last accessed on 2024 Apr 30].

 

  1. Goyal C. Data Leakage and Its Effect on the Performance of an ML Model. Analytics Vidhya; 2021. Available from: https:// www.analyticsvidhya.com/blog/2021/07/data-leakage-and-its-effect-on-the-performance-of-an-ml-model [Last accessed on 2024 Apr 30].

 

  1. Stekhoven D, Bühlmann P. MissForest--non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112-118. doi: 10.1093/bioinformatics/btr597

 

  1. Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507-2517. doi: 10.1093/bioinformatics/btm344

 

  1. Malato G. Feature Selection with Random Forest. Your Data Teacher; 2021. Available from: https://www.yourdatateacher. com/2021/10/11/feature-selection-with-random-forest [Last accessed on 2024 Apr 30].

 

  1. Tanuja D, Goutam S. Classification of imbalanced big data using SMOTE with rough random forest. Int J Eng Adv Technol. 2019;9:5174. doi: 10.35940/ijeat.B4096.129219

 

  1. Brownlee J. Parametric and Nonparametric Machine Learning Algorithms. Machine Learning Mastery; 2016. Available from: https://machinelearningmastery.com/parametric-and-nonparametric-machine-learning-algorithms [Last accessed on 2024 Apr 30].

 

  1. Breiman L. Random forests. Mach Learn. 2001;45(1):5-32. doi: 10.1023/a:1010933404324

 

  1. N. Room. How to Use XGBoost for Time-Series Forecasting? Datadance; 2024. Available from: https://datadance.ai/ machine-learning/how-to-use-xgboost-for-time-series-forecasting/#step-3-handling-missing-values-and-outliers [Last accessed on 2024 Apr 30].

 

  1. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ‘16. 2016. 785-794. doi: 10.1145/2939672.2939785

 

  1. Scikit-Learn. Sklearn.Model_Selection. RandomizedSearchCV - Scikit-Learn 0.21.3 Documentation. Scikit-Learn; 2019. Available from: https://scikit-learn. org/stable/modules/generated/sklearn.model_selection. RandomizedSearchCV.html [Last accessed on 2024 Apr 30].

 

  1. Scikit. 3.3. Metrics and Scoring: Quantifying the Quality of Predictions - Scikit-Learn 0.23.2 Documentation. Scikit- Learn. Available from: https://scikit-learn.org/stable/ modules/model_evaluation.html#classification-report [Last accessed on 2024 Apr 30].

 

  1. Lanier ST. Choosing Performance Metrics. Medium; 2020. Available from: https://towardsdatascience.com/choosing-performance-metrics-61b40819eae1 [Last accessed on2024 Apr 30].

 

  1. Palmqvist S. Comparison of brief cognitive tests and CSF biomarkers in predicting Alzheimer’s disease in mild cognitive impairment: Six-year follow-up study. PLoS One. 2012;7(6):e38639. doi: 10.1371/journal.pone.0038639

 

  1. Bloch L, Friedrich CM. Data analysis with shapley values for automatic subject selection in Alzheimer’s disease data sets using interpretable machine learning. Alzheimers Res Ther. 2021;13(1):155. doi: 10.1186/s13195-021-00879-4

 

  1. Shah R. GridSearchCV. Tune Hyperparameters with GridSearchCV. Analytics Vidhya; 2021. Available from: https://www.analyticsvidhya.com/blog/2021/06/tune-hyperparameters-with-gridsearchcv [Last accessed on 2024 Apr 30].

 

  1. Brownlee J. How to Use One-vs-Rest and One-vs-One for Multi-Class Classification. Machine Learning Mastery; 2020. Available from: https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification [Last accessed on 2024 Apr 30].
Share
Back to top
Design+, Electronic ISSN: 3060-8953 Published by AccScience Publishing