Machine learning strategies for predicting Alzheimer’s disease progression

Alzheimer’s disease (AD) represents a significant global health challenge, affecting millions of individuals worldwide through progressive cognitive decline and behavioral changes. The burden extends beyond patients to caregivers and healthcare systems. While traditional diagnostic methods pose financial obstacles, emerging non-imaging techniques show promise. Machine learning has emerged as a transformative approach for enhancing both diagnosis and management. This study aims to develop a robust multi-class classification model using random forest (RF) and extreme gradient boosting algorithms on non-imaging data from the Australian AD Neuroimaging Initiative, with emphasis on the Australian Imaging, Biomarkers, and Lifestyle Study of Aging. Extensive data analysis was conducted, including feature importance and selection, to improve interpretability and classification accuracy. Synthetic oversampling was applied to address class imbalance. The findings indicate the superiority of the tuned RF model, achieving 90% in accuracy, precision, recall, and F1 scores. In addition, cost-effective diagnostic variables were explored, with neuropsychology assessment variables demonstrating exceptional accuracy (90%). This research contributes to early AD detection, personalized treatment, and optimized resource allocation.
- Alzheimer’s Association. 2023 Alzheimer’s disease facts and figures. Alzheimers Dement. 2023;19(4):1598-1695. doi: 10.1002/alz.13016.
- Yang Q, Li X, Ding X, Xu F, Ling Z. Deep learning-based speech analysis for Alzheimer’s disease detection: A literature review. Alzheimers Res Ther. 2022;14(1):186. doi: 10.1186/s13195-022-01131-3
- Shahbaz M, Ali S, Guergachi A, Niazi A, Umer A. Classification of Alzheimer’s Disease Using Machine Learning Techniques. In: Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies. Prague, Czech Republic; 2019. p. 296-303. doi: 10.5220/0007949902960303
- AIBL Study ADNI Non-imaging Data. aibl.csiro.au. Available from: https://aibl.csiro.au/adni/nonimaging.php [Last accessed on 2024 Apr 30].
- ADNI. About. Available from: https://adni.loni.usc.edu/ about [Last accessed on 2024 Apr 30].
- Rahman M, Prasad G. Comprehensive study on machine learning methods to increase the prediction accuracy of classifiers and reduce the number of medical tests required to diagnose Alzheimer’s disease. arXiv (Machine Learning). 2022;1-10. doi: 10.48550/arXiv.2212.00414
- Wirth R, Hipp J. CRISP-DM: Towards a Standard Process Model for Data Mining. Available from: https://cs.unibo. it/~danilo.montesi/CBD/Beatriz/10.1.1.198.5133.pdf [Last accessed on 2024 Apr 30].
- Harrikrishna NB. Confusion Matrix, Accuracy, Precision, Recall, F1 Score. Medium; 2020. Available from: https:// medium.com/analytics-vidhya/confusion-matrix-accuracy-precision-recall-f1-score-ade299cf63cd [Last accessed on 2024 Apr 30].
- Scikit-Learn. Scikit-learn: Machine Learning in Python. Scikit-Learn; 2019. Available from: https://scikit-learn.org/ stable [Last accessed on 2024 Apr 30].
- Nabel R. PyDrive: Google Drive API Made Easy. PyPI. Available from: https://pypi.org/project/pydrive [Last accessed on 2024 Apr 30].
- Bhandari A. Multicollinearity. Causes, Effects and Detection Using VIF. Analytics Vidhya; 2023. Available from: https://www.analyticsvidhya.com/blog/2020/03/what-is-multicollinearity/#:~:text=multicollinearity%20is%20a%20 statistical%20phenomenon [Last Accessed on 2024 Apr 30].
- Turney S. Chi-Square (2) Tests. Types, Formula and Examples. Scribbr; 2022. Available from: https://www. scribbr.com/statistics/chi-square-tests [Last accessed on 2024 Apr 30].
- Pandas.factorize -- Pandas 1.5.3 Documentation. Available from: https://pandas.pydata.org/docs/reference/api/pandas. factorize.html [Last accessed on 2024 Apr 30].
- Goyal C. Data Leakage and Its Effect on the Performance of an ML Model. Analytics Vidhya; 2021. Available from: https:// www.analyticsvidhya.com/blog/2021/07/data-leakage-and-its-effect-on-the-performance-of-an-ml-model [Last accessed on 2024 Apr 30].
- Stekhoven D, Bühlmann P. MissForest--non-parametric missing value imputation for mixed-type data. Bioinformatics. 2012;28(1):112-118. doi: 10.1093/bioinformatics/btr597
- Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007;23(19):2507-2517. doi: 10.1093/bioinformatics/btm344
- Malato G. Feature Selection with Random Forest. Your Data Teacher; 2021. Available from: https://www.yourdatateacher. com/2021/10/11/feature-selection-with-random-forest [Last accessed on 2024 Apr 30].
- Tanuja D, Goutam S. Classification of imbalanced big data using SMOTE with rough random forest. Int J Eng Adv Technol. 2019;9:5174. doi: 10.35940/ijeat.B4096.129219
- Brownlee J. Parametric and Nonparametric Machine Learning Algorithms. Machine Learning Mastery; 2016. Available from: https://machinelearningmastery.com/parametric-and-nonparametric-machine-learning-algorithms [Last accessed on 2024 Apr 30].
- Breiman L. Random forests. Mach Learn. 2001;45(1):5-32. doi: 10.1023/a:1010933404324
- N. Room. How to Use XGBoost for Time-Series Forecasting? Datadance; 2024. Available from: https://datadance.ai/ machine-learning/how-to-use-xgboost-for-time-series-forecasting/#step-3-handling-missing-values-and-outliers [Last accessed on 2024 Apr 30].
- Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ‘16. 2016. 785-794. doi: 10.1145/2939672.2939785
- Scikit-Learn. Sklearn.Model_Selection. RandomizedSearchCV - Scikit-Learn 0.21.3 Documentation. Scikit-Learn; 2019. Available from: https://scikit-learn. org/stable/modules/generated/sklearn.model_selection. RandomizedSearchCV.html [Last accessed on 2024 Apr 30].
- Scikit. 3.3. Metrics and Scoring: Quantifying the Quality of Predictions - Scikit-Learn 0.23.2 Documentation. Scikit- Learn. Available from: https://scikit-learn.org/stable/ modules/model_evaluation.html#classification-report [Last accessed on 2024 Apr 30].
- Lanier ST. Choosing Performance Metrics. Medium; 2020. Available from: https://towardsdatascience.com/choosing-performance-metrics-61b40819eae1 [Last accessed on2024 Apr 30].
- Palmqvist S. Comparison of brief cognitive tests and CSF biomarkers in predicting Alzheimer’s disease in mild cognitive impairment: Six-year follow-up study. PLoS One. 2012;7(6):e38639. doi: 10.1371/journal.pone.0038639
- Bloch L, Friedrich CM. Data analysis with shapley values for automatic subject selection in Alzheimer’s disease data sets using interpretable machine learning. Alzheimers Res Ther. 2021;13(1):155. doi: 10.1186/s13195-021-00879-4
- Shah R. GridSearchCV. Tune Hyperparameters with GridSearchCV. Analytics Vidhya; 2021. Available from: https://www.analyticsvidhya.com/blog/2021/06/tune-hyperparameters-with-gridsearchcv [Last accessed on 2024 Apr 30].
- Brownlee J. How to Use One-vs-Rest and One-vs-One for Multi-Class Classification. Machine Learning Mastery; 2020. Available from: https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification [Last accessed on 2024 Apr 30].