Validation strategies for automated MRI-based classification of Alzheimer’s disease using deep feature extraction and machine learning

¹ Department of Osteopathic Manipulative Medicine, College of Osteopathic Medicine, New York Institute of Technology, Old Westbury, New York, United States of America

AIH, 025360073 https://doi.org/10.36922/AIH025360073

Received: 5 September 2025 | Revised: 25 October 2025 | Accepted: 29 October 2025 | Published online: 12 November 2025

© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )

Download PDF

XML

Cite

Abstract

Alzheimer’s disease (AD) is the most prevalent cause of dementia worldwide, yet early and accurate diagnosis remains a significant clinical challenge. This study systematically evaluates machine learning (ML) models for AD classification using a shared magnetic resonance imaging (MRI) dataset, focusing on both binary (e.g., healthy vs. demented) and multiclass (four-stage dementia) tasks. MRI images were preprocessed and deep features were extracted using a pretrained GoogLeNet architecture. Classification was performed using support vector machines for binary tasks and error-correcting output codes (ECOC) for multiclass tasks. Model performance was assessed using both hold-out and k-fold (5- and 10-fold) cross-validation (CV) strategies to ensure robust evaluation. Results indicate that CV yields substantially higher and more reliable accuracy than the hold-out method, with binary classification achieving up to 84% accuracy (10-fold CV) and multiclass classification reaching 87% (5-fold CV). The model demonstrated high specificity and precision, particularly for moderate-stage AD, but lower and less stable performance for early disease (very mild) cases. Learning curve analysis confirmed improved generalization with increased training data and minimal overfitting in well-validated models. However, our findings also reveal that high-performance metrics alone are insufficient to judge clinical utility. For example, a hold-out model distinguishing healthy from moderate Alzheimer’s achieved a seemingly impressive 99% test accuracy, but learning curves revealed severe overfitting and likely data leakage, indicating that such results would not generalize to new patients. In contrast, the healthy vs. mild Alzheimer’s task, with a more modest ~70% test accuracy, demonstrated well-behaved learning dynamics and genuine generalization. These results highlight that high reported accuracy is only clinically meaningful when supported by healthy training dynamics; otherwise, models risk underperforming in real-world clinical settings regardless of their reported metrics. We advocate for rigorous validation and learning curve analysis as prerequisites for any clinically actionable ML tool. These findings underscore the importance of transparent per-class performance reporting and robust validation to ensure that ML models can truly support early detection and staging of AD in clinical practice.

Graphical abstract

Keywords

Alzheimer’s disease

Magnetic resonance imaging

Machine learning

Deep learning

Dementia classification

Funding

None.

Conflict of interest

The authors declare that they have no competing interests.

References

Bucholc M, James C, Khleifat AA, et al. Artificial intelligence for dementia research methods optimization. Alzheimers Dement. 2023;19(12):5934-5951. doi: 10.1002/alz.13441

Zhang Y. The risk factors and causes for Alzheimer’s disease. In: Proceedings of the 2022 8^thInternational Conference on Humanities and Social Science Research (ICHSSR 2022). Atlantis Press; 2022. p. 879-884. doi: 10.2991/assehr.k.220504.160

Castellani RJ, Perry G. Molecular pathology of Alzheimer’s disease. In: Colloquium Series on Neurobiology of Alzheimer’s Disease. Vol. 1. California: Morgan and Claypool Life Sciences; 2013. p. 1-91. doi: 10.4199/c00095ed1v01y201310alz001

DeTure MA, Dickson DW. The neuropathological diagnosis of Alzheimer’s disease. Mol Neurodegener. 2019;14(1):32. doi: 10.1186/s13024-019-0333-5

Rasmussen J, Langerman H. Alzheimer’s disease - Why we need early diagnosis. Degener Neurol Neuromuscul Dis. 2019;9:123-130. doi: 10.2147/dnnd.s228939

Juganavar A, Joshi A, Shegekar T. Navigating early Alzheimer’s diagnosis: A comprehensive review of diagnostic innovations. Cureus. 2023;15(9):e44937. doi: 10.7759/cureus.44937

Shimizu S, Hirose D, Hatanaka H, et al. Role of neuroimaging as a biomarker for neurodegenerative diseases. Front Neurol. 2018;9:265. doi: 10.3389/fneur.2018.00265

Frisoni GB, Fox NC, Jack CR Jr., Scheltens P, Thompson PM. The clinical use of structural MRI in Alzheimer disease. Nat Rev Neurol. 2010;6(2):67-77. doi: 10.1038/nrneurol.2009.215

McKhann GM, Knopman DS, Chertkow H, et al. The diagnosis of dementia due to Alzheimer’s disease: Recommendations from the National Institute on Aging- Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 2011;7(3):263-269. doi: 10.1016/j.jalz.2011.03.005

Bhat S, Acharya UR, Dadmehr N, Adeli H. Clinical neurophysiological and automated EEG-based diagnosis of the Alzheimer’s disease. Eur Neurol. 2015;74(3-4):202-210. doi: 10.1159/000441447

Akkas H, Latifoglu F, Tokmakcı M. The diagnosis of Alzheimer’s disease using EEG signals. Eur J Res Dev. 2023;3(3):1-13. doi: 10.56038/ejrnd.v3i3.273

Malik I, Iqbal A, Gu YH, Al-antari MA. Deep learning for Alzheimer’s disease prediction: A comprehensive review. Diagnostics (Basel). 2024;14(12):1281. doi: 10.3390/diagnostics14121281

Chang CH, Lin CH, Lane HY. Machine learning and novel biomarkers for the diagnosis of Alzheimer’s disease. Int J Mol Sci. 2021;22(5):2761. doi: 10.3390/ijms22052761

Kloppel S, Stonnington CM, Barnes J, et al. Accuracy of dementia diagnosis: A direct comparison between radiologists and a computerized method. Brain. 2008;131(Pt 11):2969-2974. doi: 10.1093/brain/awn239

Jo T, Nho K, Saykin AJ. Deep learning in Alzheimer’s disease: Diagnostic classification and prognostic prediction using neuroimaging data. Front Aging Neurosci. 2019;11:220. doi: 10.3389/fnagi.2019.00220

Diogo VS, Ferreira HA, Prata D. Early diagnosis of Alzheimer’s disease using machine learning: A multi-diagnostic, generalizable approach. Alzheimers Res Ther. 2022;14(1):107. doi: 10.1186/s13195-022-01047-y

Kavitha C, Mani V, Srividhya SR, Khalaf OI, Tavera Romero CA. Early-stage Alzheimer’s disease prediction using machine learning models. Front Public Health. 2022;10:853294. doi: 10.3389/fpubh.2022.853294

Rezaei M, Zereshki E, Shahsavari S, Salehi MG, Sharini H. Prediction of Alzheimer’s disease using machine learning classifiers. Int Electron J Med. 2020;9(2):116-120. doi: 10.34172/iejm.2020.21

Bron EE, Smits M, Niessen WJ, Klein S. Feature selection based on the SVM weight vector for classification of dementia. IEEE J Biomed Health Inform. 2015;19(5):1617-1626. doi: 10.1109/jbhi.2015.2432832

Fu’adah YN, Wijayanto I, Pratiwi NKC, Taliningsih FF, Rizal S, Pramudito MA. Automated classification of Alzheimer’s disease based on MRI image processing using convolutional neural network (CNN) with AlexNet architecture. J Phys Conf Ser. 2021;1844(1):012020. doi: 10.1088/1742-6596/1844/1/012020

Liu M, Cheng D, Yan W. Classification of Alzheimer’s disease by combination of convolutional and recurrent neural networks using FDG-PET images. Front Neuroinform. 2018;12:35. doi: 10.3389/fninf.2018.00035

Folego G, Weiler M, Casseb RF, Pires R, Rocha A. Alzheimer’s disease detection through whole-brain 3D-CNN MRI. Front Bioeng Biotechnol. 2020;8:534592. doi: 10.3389/fbioe.2020.534592

Liu J, Li M, Luo Y, Yang S, Li W, Bi Y. Alzheimer’s disease detection using depthwise separable convolutional neural networks. Comput Methods Programs Biomed. 2021;203:106032. doi: 10.1016/j.cmpb.2021.106032

Murugan S, Venkatesan C, Sumithra MG, et al. DEMNET: A deep learning model for early diagnosis of Alzheimer diseases and dementia from MR images. IEEE Access. 2021;9:90319-90329. doi: 10.1109/access.2021.3090474

Khasanah I. Enhancing Alzheimer’s disease diagnosis with K-NN: A study on pre-processed MRI data. Int J Artif Intell Med Issues. 2024;2(1):49-60. doi: 10.56705/ijaimi.v2i1.150

Ahmed G, Er MJ, Fareed MMS, et al. DAD-Net: Classification of Alzheimer’s disease using ADASYN oversampling technique and optimized neural network. Molecules. 2022;27(20):7085. doi: 10.3390/molecules27207085

Umalakshmi NP, Sathyanarayana S, Chicktotlikere Nagappa P, Javarappa T, Kuppanna Rajuk V. Borderline- DEMNET: A workflow for detecting Alzheimer’s and dementia stage by solving class imbalance problem. Pertanika J Sci Technol. 2024;32(4):1629-1650. doi: 10.47836/pjst.32.4.10

Husain G, Nasef D, Jose R, et al. SMOTE vs. SMOTEENN: A study on the performance of resampling algorithms for addressing class imbalance in regression models. Algorithms. 2025;18(1):37. doi: 10.3390/a18010037

Feng Y, Li J. A novel αDistance Borderline-ADASYN-SMOTE algorithm for imbalanced data and its application in Alzheimer’s disease classification based on dense convolutional network. J Phys Conf Ser. 2021;2031(1):012046. doi: 10.1088/1742-6596/2031/1/012046

Chandrasekaran S, Khan SB, Gupta M, Mahesh TR, Alqhatani A, Almusharraf A. Enhanced deep learning framework for precise MRI-based Alzheimer’s disease stage classification. Comput Intell. 2025;41(1):e70123. doi: 10.1111/coin.70123

Ali MU, Kim KS, Khalid M, Farrash M, Zafar A, Lee SW. Enhancing Alzheimer’s disease diagnosis and staging: A multistage CNN framework using MRI. Front Psychiatry. 2024;15:1395563. doi: 10.3389/fpsyt.2024.1395563

Yousafzai S, Khan GZ, Ulhaq S, Areebah, Butt MR. Improved neural network-based system for early and accurate diagnosis of Alzheimer disease. J Comput Sci Technol Stud. 2023;5(4):32-40. doi: 10.32996/jcsts.2023.5.4.4

Younis MT, Younus YT, Hasoon JN, Fadhil AH, Mostafa SA. An accurate Alzheimer’s disease detection using a developed convolutional neural network model. Bull Electr Eng Inform. 2022;11(4):2005-2012. doi: 10.11591/eei.v11i4.3659

Uraninjo. Augmented Alzheimer MRI Dataset. Kaggle; 2023. Available from: https://www.kaggle.com/datasets/uraninjo/augmented-alzheimer-mri-dataset [Last accessed on 2025 Aug 25].

Bottani S, Burgos N, Maire A, et al. Evaluation of MRI-based machine learning approaches for computer-aided diagnosis of dementia in a clinical data warehouse. Med Image Anal. 2023;89:102903. doi: 10.1016/j.media.2023.102903

Tanveer H, Adam MA, Khan MA, Ali MA, Shakoor A. Analyzing the performance and efficiency of machine learning algorithms, such as deep learning, decision trees, or support vector machines, on various datasets and applications. Asian Bull Big Data Manage. 2024;3(2):126-136. doi: 10.62019/abbdm.v3i2.83

Pelletier ED, Jeffries SD, Song K, Hemmerling TM. Comparative analysis of machine-learning model performance in image analysis: The impact of dataset diversity and size. Anesth Analg. 2024;139(6):1332-1339. doi: 10.1213/ane.0000000000007088

Tougui I, Jilbab A, Mhamdi JE. Impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications. Healthc Inform Res. 2021;27(3):189-199. doi: 10.4258/hir.2021.27.3.189

Yuan H. Toward real‐world deployment of machine learning for health care: External validation, continual monitoring, and randomized clinical trials. Health Care Sci. 2024;3(5):360-364. doi: 10.1002/hcs2.114

Toma M. AI-Assisted Medical Diagnostics: A Clinical Guide to Next-Generation Diagnostics. New York: Dawning Research Press; 2025. Available from: https://openlibrary.org/books/ ol60165315m [Last accessed on 2025 Oct 13].

Previous article in this issue

Next article in this issue

Artificial Intelligence in Health, Electronic ISSN: 3029-2387 Print ISSN: 3041-0894, Published by AccScience Publishing