Validation strategies for automated MRI-based classification of Alzheimer’s disease using deep feature extraction and machine learning
Alzheimer’s disease (AD) is the most prevalent cause of dementia worldwide, yet early and accurate diagnosis remains a significant clinical challenge. This study systematically evaluates machine learning (ML) models for AD classification using a shared magnetic resonance imaging (MRI) dataset, focusing on both binary (e.g., healthy vs. demented) and multiclass (four-stage dementia) tasks. MRI images were preprocessed and deep features were extracted using a pretrained GoogLeNet architecture. Classification was performed using support vector machines for binary tasks and error-correcting output codes (ECOC) for multiclass tasks. Model performance was assessed using both hold-out and k-fold (5- and 10-fold) cross-validation (CV) strategies to ensure robust evaluation. Results indicate that CV yields substantially higher and more reliable accuracy than the hold-out method, with binary classification achieving up to 84% accuracy (10-fold CV) and multiclass classification reaching 87% (5-fold CV). The model demonstrated high specificity and precision, particularly for moderate-stage AD, but lower and less stable performance for early disease (very mild) cases. Learning curve analysis confirmed improved generalization with increased training data and minimal overfitting in well-validated models. However, our findings also reveal that high-performance metrics alone are insufficient to judge clinical utility. For example, a hold-out model distinguishing healthy from moderate Alzheimer’s achieved a seemingly impressive 99% test accuracy, but learning curves revealed severe overfitting and likely data leakage, indicating that such results would not generalize to new patients. In contrast, the healthy vs. mild Alzheimer’s task, with a more modest ~70% test accuracy, demonstrated well-behaved learning dynamics and genuine generalization. These results highlight that high reported accuracy is only clinically meaningful when supported by healthy training dynamics; otherwise, models risk underperforming in real-world clinical settings regardless of their reported metrics. We advocate for rigorous validation and learning curve analysis as prerequisites for any clinically actionable ML tool. These findings underscore the importance of transparent per-class performance reporting and robust validation to ensure that ML models can truly support early detection and staging of AD in clinical practice.

- Bucholc M, James C, Khleifat AA, et al. Artificial intelligence for dementia research methods optimization. Alzheimers Dement. 2023;19(12):5934-5951. doi: 10.1002/alz.13441
- Zhang Y. The risk factors and causes for Alzheimer’s disease. In: Proceedings of the 2022 8th International Conference on Humanities and Social Science Research (ICHSSR 2022). Atlantis Press; 2022. p. 879-884. doi: 10.2991/assehr.k.220504.160
- Castellani RJ, Perry G. Molecular pathology of Alzheimer’s disease. In: Colloquium Series on Neurobiology of Alzheimer’s Disease. Vol. 1. California: Morgan and Claypool Life Sciences; 2013. p. 1-91. doi: 10.4199/c00095ed1v01y201310alz001
- DeTure MA, Dickson DW. The neuropathological diagnosis of Alzheimer’s disease. Mol Neurodegener. 2019;14(1):32. doi: 10.1186/s13024-019-0333-5
- Rasmussen J, Langerman H. Alzheimer’s disease - Why we need early diagnosis. Degener Neurol Neuromuscul Dis. 2019;9:123-130. doi: 10.2147/dnnd.s228939
- Juganavar A, Joshi A, Shegekar T. Navigating early Alzheimer’s diagnosis: A comprehensive review of diagnostic innovations. Cureus. 2023;15(9):e44937. doi: 10.7759/cureus.44937
- Shimizu S, Hirose D, Hatanaka H, et al. Role of neuroimaging as a biomarker for neurodegenerative diseases. Front Neurol. 2018;9:265. doi: 10.3389/fneur.2018.00265
- Frisoni GB, Fox NC, Jack CR Jr., Scheltens P, Thompson PM. The clinical use of structural MRI in Alzheimer disease. Nat Rev Neurol. 2010;6(2):67-77. doi: 10.1038/nrneurol.2009.215
- McKhann GM, Knopman DS, Chertkow H, et al. The diagnosis of dementia due to Alzheimer’s disease: Recommendations from the National Institute on Aging- Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 2011;7(3):263-269. doi: 10.1016/j.jalz.2011.03.005
- Bhat S, Acharya UR, Dadmehr N, Adeli H. Clinical neurophysiological and automated EEG-based diagnosis of the Alzheimer’s disease. Eur Neurol. 2015;74(3-4):202-210. doi: 10.1159/000441447
- Akkas H, Latifoglu F, Tokmakcı M. The diagnosis of Alzheimer’s disease using EEG signals. Eur J Res Dev. 2023;3(3):1-13. doi: 10.56038/ejrnd.v3i3.273
- Malik I, Iqbal A, Gu YH, Al-antari MA. Deep learning for Alzheimer’s disease prediction: A comprehensive review. Diagnostics (Basel). 2024;14(12):1281. doi: 10.3390/diagnostics14121281
- Chang CH, Lin CH, Lane HY. Machine learning and novel biomarkers for the diagnosis of Alzheimer’s disease. Int J Mol Sci. 2021;22(5):2761. doi: 10.3390/ijms22052761
- Kloppel S, Stonnington CM, Barnes J, et al. Accuracy of dementia diagnosis: A direct comparison between radiologists and a computerized method. Brain. 2008;131(Pt 11):2969-2974. doi: 10.1093/brain/awn239
- Jo T, Nho K, Saykin AJ. Deep learning in Alzheimer’s disease: Diagnostic classification and prognostic prediction using neuroimaging data. Front Aging Neurosci. 2019;11:220. doi: 10.3389/fnagi.2019.00220
- Diogo VS, Ferreira HA, Prata D. Early diagnosis of Alzheimer’s disease using machine learning: A multi-diagnostic, generalizable approach. Alzheimers Res Ther. 2022;14(1):107. doi: 10.1186/s13195-022-01047-y
- Kavitha C, Mani V, Srividhya SR, Khalaf OI, Tavera Romero CA. Early-stage Alzheimer’s disease prediction using machine learning models. Front Public Health. 2022;10:853294. doi: 10.3389/fpubh.2022.853294
- Rezaei M, Zereshki E, Shahsavari S, Salehi MG, Sharini H. Prediction of Alzheimer’s disease using machine learning classifiers. Int Electron J Med. 2020;9(2):116-120. doi: 10.34172/iejm.2020.21
- Bron EE, Smits M, Niessen WJ, Klein S. Feature selection based on the SVM weight vector for classification of dementia. IEEE J Biomed Health Inform. 2015;19(5):1617-1626. doi: 10.1109/jbhi.2015.2432832
- Fu’adah YN, Wijayanto I, Pratiwi NKC, Taliningsih FF, Rizal S, Pramudito MA. Automated classification of Alzheimer’s disease based on MRI image processing using convolutional neural network (CNN) with AlexNet architecture. J Phys Conf Ser. 2021;1844(1):012020. doi: 10.1088/1742-6596/1844/1/012020
- Liu M, Cheng D, Yan W. Classification of Alzheimer’s disease by combination of convolutional and recurrent neural networks using FDG-PET images. Front Neuroinform. 2018;12:35. doi: 10.3389/fninf.2018.00035
- Folego G, Weiler M, Casseb RF, Pires R, Rocha A. Alzheimer’s disease detection through whole-brain 3D-CNN MRI. Front Bioeng Biotechnol. 2020;8:534592. doi: 10.3389/fbioe.2020.534592
- Liu J, Li M, Luo Y, Yang S, Li W, Bi Y. Alzheimer’s disease detection using depthwise separable convolutional neural networks. Comput Methods Programs Biomed. 2021;203:106032. doi: 10.1016/j.cmpb.2021.106032
- Murugan S, Venkatesan C, Sumithra MG, et al. DEMNET: A deep learning model for early diagnosis of Alzheimer diseases and dementia from MR images. IEEE Access. 2021;9:90319-90329. doi: 10.1109/access.2021.3090474
- Khasanah I. Enhancing Alzheimer’s disease diagnosis with K-NN: A study on pre-processed MRI data. Int J Artif Intell Med Issues. 2024;2(1):49-60. doi: 10.56705/ijaimi.v2i1.150
- Ahmed G, Er MJ, Fareed MMS, et al. DAD-Net: Classification of Alzheimer’s disease using ADASYN oversampling technique and optimized neural network. Molecules. 2022;27(20):7085. doi: 10.3390/molecules27207085
- Umalakshmi NP, Sathyanarayana S, Chicktotlikere Nagappa P, Javarappa T, Kuppanna Rajuk V. Borderline- DEMNET: A workflow for detecting Alzheimer’s and dementia stage by solving class imbalance problem. Pertanika J Sci Technol. 2024;32(4):1629-1650. doi: 10.47836/pjst.32.4.10
- Husain G, Nasef D, Jose R, et al. SMOTE vs. SMOTEENN: A study on the performance of resampling algorithms for addressing class imbalance in regression models. Algorithms. 2025;18(1):37. doi: 10.3390/a18010037
- Feng Y, Li J. A novel αDistance Borderline-ADASYN-SMOTE algorithm for imbalanced data and its application in Alzheimer’s disease classification based on dense convolutional network. J Phys Conf Ser. 2021;2031(1):012046. doi: 10.1088/1742-6596/2031/1/012046
- Chandrasekaran S, Khan SB, Gupta M, Mahesh TR, Alqhatani A, Almusharraf A. Enhanced deep learning framework for precise MRI-based Alzheimer’s disease stage classification. Comput Intell. 2025;41(1):e70123. doi: 10.1111/coin.70123
- Ali MU, Kim KS, Khalid M, Farrash M, Zafar A, Lee SW. Enhancing Alzheimer’s disease diagnosis and staging: A multistage CNN framework using MRI. Front Psychiatry. 2024;15:1395563. doi: 10.3389/fpsyt.2024.1395563
- Yousafzai S, Khan GZ, Ulhaq S, Areebah, Butt MR. Improved neural network-based system for early and accurate diagnosis of Alzheimer disease. J Comput Sci Technol Stud. 2023;5(4):32-40. doi: 10.32996/jcsts.2023.5.4.4
- Younis MT, Younus YT, Hasoon JN, Fadhil AH, Mostafa SA. An accurate Alzheimer’s disease detection using a developed convolutional neural network model. Bull Electr Eng Inform. 2022;11(4):2005-2012. doi: 10.11591/eei.v11i4.3659
- Uraninjo. Augmented Alzheimer MRI Dataset. Kaggle; 2023. Available from: https://www.kaggle.com/datasets/uraninjo/augmented-alzheimer-mri-dataset [Last accessed on 2025 Aug 25].
- Bottani S, Burgos N, Maire A, et al. Evaluation of MRI-based machine learning approaches for computer-aided diagnosis of dementia in a clinical data warehouse. Med Image Anal. 2023;89:102903. doi: 10.1016/j.media.2023.102903
- Tanveer H, Adam MA, Khan MA, Ali MA, Shakoor A. Analyzing the performance and efficiency of machine learning algorithms, such as deep learning, decision trees, or support vector machines, on various datasets and applications. Asian Bull Big Data Manage. 2024;3(2):126-136. doi: 10.62019/abbdm.v3i2.83
- Pelletier ED, Jeffries SD, Song K, Hemmerling TM. Comparative analysis of machine-learning model performance in image analysis: The impact of dataset diversity and size. Anesth Analg. 2024;139(6):1332-1339. doi: 10.1213/ane.0000000000007088
- Tougui I, Jilbab A, Mhamdi JE. Impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications. Healthc Inform Res. 2021;27(3):189-199. doi: 10.4258/hir.2021.27.3.189
- Yuan H. Toward real‐world deployment of machine learning for health care: External validation, continual monitoring, and randomized clinical trials. Health Care Sci. 2024;3(5):360-364. doi: 10.1002/hcs2.114
- Toma M. AI-Assisted Medical Diagnostics: A Clinical Guide to Next-Generation Diagnostics. New York: Dawning Research Press; 2025. Available from: https://openlibrary.org/books/ ol60165315m [Last accessed on 2025 Oct 13].
