An ensemble-driven machine learning framework for robust heart disease prediction using dimensionality reduction and comparative evaluation
Coronary heart disease is a major cardiovascular condition characterized by reduced coronary blood flow, often due to atherosclerotic narrowing of the coronary arteries. Early and prompt identification of these abnormalities is critical to minimizing healthcare costs and improving patient survival rates. Despite impressive progress in predictive modeling, many existing machine learning models suffer from overfitting, in which they perform well on training or test data but fail to generalize to larger and more heterogeneous patient populations. This limitation diminishes their clinical validity and applicability in medical diagnosis. To address these issues, the proposed study presents an ensemble-based machine learning approach for the early prediction of heart disease. The proposed system incorporates several supervised learning algorithms to increase predictive accuracy and generalization performance. Decision tree, support vector machine, random forest, and classifiers trained on principal component analysis-transformed features were compared. The framework involves systematic data preprocessing, dimensionality reduction, and class imbalance management to ensure methodological robustness and consistency of diagnostic outcomes. Experimental results demonstrate that the ensemble-based approach achieves superior predictive performance while minimizing data bias. These findings highlight the potential of machine learning-based diagnostic systems to support clinical decision-making, improve healthcare resource allocation, and enhance patient outcomes by enabling timely and accurate cardiovascular risk identification.
- Chen LH, Hsiao HD. Feature selection to diagnose a business crisis using a GA-based support vector machine: an empirical study. Expert Syst Appl. 2008;35(3):1145-1155. doi: 10.1016/j.eswa.2007.08.010.
- Temitayo F, Stephen O, Abimbola A. Hybrid GA-SVM for efficient feature selection in e-mail classification. Comput Eng Intell Syst. 2012;3(3):17-28.
- Khammassi C, Krichen S. A GA-LR wrapper approach for feature selection in network intrusion detection. Comput Secur. 2017;70:255-277. doi: 10.1016/j.cose.2017.06.005.
- De Hert M, Detraux J, Vancampfort D. The intriguing relationship between coronary heart disease and mental disorders. Dialogues Clin Neurosci. 2018;20(1):31-40. doi: 10.31887/DCNS.2018.20.1/mdehert
- Pathan MS, Nag A, Pathan MM, Dev S. Analyzing the impact of feature selection on the accuracy of heart disease prediction. Health Anal. 2022;2:100060. doi: 10.1016/j.health.2022.100060.
- El-Shafiey MG, Hagag A, El-Dahshan ESA, Ismail MA. A hybrid GA and PSO optimized approach for heart-disease prediction based on random forest. Multimed Tools Appl. 2022;81(13):18155-18179. doi: 10.1007/s11042-022-12425-x.
- Wang D, Tan D, Liu L. Particle swarm optimization algorithm: an overview. Soft Comput. 2018;22(2):387-408. doi: 10.1007/s00500-016-2474-6.
- Wang J, Rao C, Goh M, Xiao X. Risk assessment of coronary heart disease based on cloud-random forest. Artif Intell Rev. 2022;56(1):203-232. doi: 10.1007/s10462-022-10170-z.
- Vijayashree J, Sultana HP. A machine learning framework for feature selection in heart disease classification using improved particle swarm optimization with support vector machine classifier. Program Comput Softw. 2018;44(6):388- 397. doi: 10.1134/S0361768818060129.
- Tao Z, Huiling L, Wenwen W, Xia Y. GA-SVM–based feature selection and parameter optimization in hospitalization expense modelling. Appl Soft Comput. 2019;75:323-332. doi: 10.1016/j.asoc.2018.11.001.
- Huang CL, Wang CJ. A GA-based feature selection and parameters optimization for support vector machines. Expert Syst Appl. 2006;31(2):231-240. doi: 10.1016/j.eswa.2005.09.024.
- Katarya R, Srinivas P. Predicting heart disease at early stages using machine learning: a survey. In: Proceedings of the IEEE International Conference Electronics Sustainable Communication Systems (ICESC), July 2-4, 2020, Coimbatore, India; 2020:302-305. doi: 10.1109/ICESC48915.2020.9155586.
- Nandipati SCR, Chen XY. Polycystic ovarian syndrome (PCOS) classification and feature selection by machine learning techniques. Appl Math Comput Intell. 2020;9:65-74.
- Coffey S, Roberts-Thomson R, Brown A, et al. Global epidemiology of valvular heart disease. Nat Rev Cardiol. 2021;18(12):853-864. doi: 10.1038/s41569-021-00570-z.
- Zhou Y, Zhu X, Cui H, et al. The role of the VEGF family in coronary heart disease. Front Cardiovasc Med. 2021;8:738325. doi: 10.3389/fcvm.2021.738325.
- Tsao CW, Aday AW, Almarzooq ZI, et al. Heart disease and stroke statistics—2022 update: a report from the American Heart Association. Circulation. 2022;145(8). doi: 10.1161/cir.0000000000001052.
- Liu A, Diller GP, Moons P, et al. Changing epidemiology of congenital heart disease: effect on outcomes and quality of care in adults. Nat Rev Cardiol. 2023;20(2):126-137. doi: 10.1038/s41569-022-00749-y.
- Rajendran R, Karthi A. Heart disease prediction using entropy-based feature engineering and ensembling of machine learning classifiers. Expert Syst Appl. 2022;207:117882. doi: 10.1016/j.eswa.2022.117882.
- Ahsan MM, Siddique Z. Machine learning–based heart disease diagnosis: a systematic literature review. Artif Intell Med. 2022;128:102289. doi: 10.1016/j.artmed.2022.102289.
- Behera MP, Sarangi A, Mishra D, Sarangi SK. A hybrid machine learning algorithm for heart and liver disease prediction using modified particle swarm optimization with support vector machine. Procedia Comput Sci. 2023;218:818- 827. doi: 10.1016/j.procs.2023.01.062.
- Ghasemieh A, Lloyed A, Bahrami P, et al. A novel machine learning model with stacking ensemble learner for predicting emergency readmission of heart-disease patients. Decis Anal J. 2023;7:100242. doi: 10.1016/j.dajour.2023.100242.
- Emmons-Bell S, Johnson C, Roth G. Prevalence, incidence and survival of heart failure: a systematic review. Heart. 2022;108(17):1351-1360. doi: 10.1136/heartjnl-2021-320131.
- Rainio O, Teuho J, Klén R. Evaluation metrics and statistical tests for machine learning. Sci Rep. 2024;14(1). doi: 10.1038/s41598-024-56706-x.
- Bhatt CM, Patel P, Ghetia T, Mazzeo PL. Effective heart disease prediction using machine learning techniques. Algorithms. 2023;16(2):88. doi: 10.3390/a16020088.
- Abdulsalam G, Meshoul S, Shaiba H. Explainable heart disease prediction using ensemble-quantum machine learning approach. Intell Autom Soft Comput. 2023;36(1):761- 779. doi: 10.32604/iasc.2023.032262.
- Wong TT, Yeh PY. Reliable accuracy estimates from k-fold cross validation. IEEE Trans Knowl Data Eng. 2020;32(8):1586-1594. doi: 10.1109/TKDE.2019.2912815.
