AccScience Publishing / AIH / Online First / DOI: 10.36922/aih.2992
ORIGINAL RESEARCH ARTICLE

Interpretability analysis of deep models for COVID-19 detection

Daniel Peixoto Pinto da Silva1 Edresson Casanova2 Lucas Rafael Stefanel Gris3 Marcelo Matheus Gauy4* Arnaldo Candido Junior5 Marcelo Finger4 Flaviane Romani Fernandes Svartman6 Beatriz Raposo de Medeiros7 Marcus Vinícius Moreira Martins8 Sandra Maria Aluísio2 Larissa Cristina Berti9 João Paulo Teixeira10
Show Less
1 Academic Department of Computing, Federal University of Technology – Paraná, Medianeira, Paraná, Brazil
2 Department of Computer Science, Institute of Mathematical and Computer Sciences, University of São Paulo, São Carlos, São Paulo, Brazil
3 Institute of Informatics, Federal University of Goiás, Goiania, Goiás, Brazil
4 Department of Computer Science, Institute of Mathematics and Statistics, University of São Paulo, São Paulo, São Paulo, Brazil
5 Department of Computing and Statistics, Institute of Biosciences, Humanities and Exact Sciences, São Paulo State University, São José do Rio Preto, São Paulo, Brazil
6 Department of Classical and Vernacular Literature, Faculty of Philosophy, Language, Literature and Human Sciences, University of São Paulo, São Paulo, São Paulo, Brazil
7 Department of Linguistics, Faculty of Philosophy, Language, Literature and Human Sciences, University of São Paulo, São Paulo, São Paulo, Brazil
8 Department of Literature and Linguistics, University of the State of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
9 Department of Speech Therapy, Faculty of Philosophy and Sciences, São Paulo State University, Marília, São Paulo, Brazil
10 Department of Eletronics, Research Centre in Digitalization and Intelligent Robotics (CeDRI), Instituto Politécnico de Bragança, Bragança, Portugal
AIH 2024, 1(3), 114–126; https://doi.org/10.36922/aih.2992
Submitted: 21 February 2024 | Accepted: 17 June 2024 | Published: 30 July 2024
© 2024 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )
Abstract

During the coronavirus disease 2019 (COVID-19) pandemic, various research disciplines collaborated to address the impacts of severe acute respiratory syndrome coronavirus-2 infections. This paper presents an interpretability analysis of a convolutional neural network-based model designed for COVID-19 detection using audio data. We explore the input features that play a crucial role in the model’s decision-making process, including spectrograms, fundamental frequency (F0), F0 standard deviation, sex, and age. Subsequently, we examine the model’s decision patterns by generating heat maps to visualize its focus during the decision-making process. Emphasizing an explainable artificial intelligence approach, our findings demonstrate that the examined models can make unbiased decisions even in the presence of noise in training set audios, provided appropriate preprocessing steps are undertaken. Our top-performing model achieves a detection accuracy of 94.44%. Our analysis indicates that the analyzed models prioritize high-energy areas in spectrograms during the decision process, particularly focusing on high-energy regions associated with prosodic domains, while also effectively utilizing F0 for COVID-19 detection.

Keywords
Coronavirus disease 2019 detection
Voice processing
Gradient-weight class activation mapping
Funding
This work was supported by FAPESP grants 2022/16374-6 (MMG), 2020/06443-5 (SPIRA), and 2023/00488-5 (SPIRA-BM) and by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
Conflict of interest
The authors declare that they have no competing interests.
References
  1. Who Director-General’s Opening Remarks at the Media Briefing on Covid-19. World Health Organization; 2020. Available from: https://www.who.int/director-general/ speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19-11-march-2020 [Last accessed on 2024 Jul 19].

 

  1. Brinati D, Campagner A, Ferrari D, Locatelli M, Banfi G, Cabitza F. Detection of COVID-19 infection from routine blood exams with machine learning: A feasibility study. J Med Syst. 2020;44(8):135. doi: 10.1007/s10916-020-01597-4

 

  1. Zoabi Y, Deri-Rozov S, Shomron N. Machine learning-based prediction of COVID-19 diagnosis based on symptoms. NPJ Digit Med. 2021;4(1):3. doi: 10.1038/s41746-020-00372-6

 

  1. Ozturk T, Talo M, Yildirim EA, Baloglu UB, Yildirim O, Acharya UR. Automated detection of COVID-19 cases using deep neural networks with x-ray images. Comput Biol Med. 2020;121:103792. doi: 10.1016/j.compbiomed.2020.103792

 

  1. Acar E, Şahin E, Yılmaz İ. Improving effectiveness of different deep learning-based models for detecting COVID-19 from computed tomography (CT) images. Neural Comput Appl. 2021;33:17589-17609. doi: 10.1007/s00521-021-06344-5

 

  1. Han J, Brown C, Chauhan J, et al. Exploring Automatic COVID-19 Diagnosis via Voice and Symptoms from Crowdsourced Data. In: IEEE International Conference on Acoustics, Speech and Signal Processing. p. 8328-8332. doi: 10.1109/ICASSP39728.2021.9414576

 

  1. Brown C, Chauhan J, Grammenos A, et al. Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound Data. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD; 2020. p. 3474-3484. doi: 10.1145/3394486.3412865

 

  1. Aluísio SM, Camargo Neto AC, Casanova E, et al. Detecting Respiratory Insufficiency via Voice Analysis: The SPIRA Project. In: Practical Machine Learning for Developing Countries on the Tenth International Conference on Learning Representations; 2022.

 

  1. Casanova E, Gris L, Camargo A, et al. Deep learning against COVID-19: Respiratory insufficiency detection in Brazilian Portuguese speech. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP; 2021. p. 625-633. doi: 10.18653/v1/2021.findings-acl.55

 

  1. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual Explanations from Deep Networks Via Gradient-Based Localization. In: Proceedings of the IEEE International Conference on Computer Vision; 2017. p. 618-626. doi: 10.1109/ICCV.2017.74

 

  1. Zheng F, Zhang G, Song Z. Comparison of different implementations of MFCC. J Comput Sci Technol. 2001;16(6):582-589. doi: 10.1007/BF02943243

 

  1. Gauy MM, Finger M. Audio MFCC-Gram Transformers for Respiratory Insufficiency Detection in COVID-19. In: Proceedings XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, STIL; 2021. p. 143-152. doi: 10.5753/stil.2021.17793

 

  1. Kong Q, Cao Y, Iqbal T, Wang Y, Wang W, Plumbley MD. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition. Vol. 28. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing; 2020. p. 2880-2894. doi: 10.1109/TASLP.2020.3030497

 

  1. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Adv Neural Inform Process Syst. 2017;30:5998-6008. doi: 10.5555/3295222.3295349

 

  1. Bartl-Pokorny KD, Pokorny FB, Batliner A, et al. The voice of COVID-19: Acoustic correlates of infection. J Acoust Soc Am. 2021;149(6):4377. doi: 10.1121/10.0005194

 

  1. Berti LC, Spazzapan EA, Pereira PL, et al. Mudanças Nos Parâmetros Acústicos da voz em Brasileiros com COVID-19. In: XXIX Congresso Brasileiro e o IX Congresso Internacional de Fonoaudiologia; 2021. p. 2819-2819.

 

  1. Berti LC, Spazzapan EA, Queiroz M, et al. Fundamental frequency related parameters in Brazilians with COVID-19. J Acoust Soc Am. 2023;153:576-585. doi: 10.1121/10.0016848

 

  1. Fernandes-Svartman FR, Berti LC, Martins MVM, de Medeiros BR, Queiroz M. Temporal Prosodic Cues for COVID-19 in Brazilian Portuguese Speakers. In: Proceedings Speech Prosody; 2022. p. 210-214. doi: 10.21437/SpeechProsody.2022-43

 

  1. Schuller BW, Batliner A, Bergler C, et al. The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation and Primates. In: 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH; 2021. doi: 10.21437/Interspeech.2021-19

 

  1. Casanova E, Cândido A, Fernandes RC, et al. Transfer Learning and Data Augmentation Techniques to the COVID-19 Identification Tasks in COMPARE 2021. In: 22nd Annual Conference of the International Speech Communication Association; 2021. p. 4301-4305. doi: 10.21437/Interspeech.2021-1798

 

  1. Gauy MM, Berti LC, Cândido Júnior A, et al. Discriminant Audio Properties in Deep Learning Based Respiratory Insufficiency Detection in Brazilian Portuguese. In: Artificial Intelligence in Medicine: 21st International Conference on Artificial Intelligence in Medicine; 2023. p. 271-275. doi: 10.1007/978-3-031-34344-5_32

 

  1. Sobahi N, Atila O, Deniz E, Sengur A, Acharya UR. Explainable COVID-19 detection using fractal dimension and vision Transformer with Grad-CAM on cough sounds. Biocybern Biomed Eng. 2022;42(3):1066-1080. doi: 10.1016/j.bbe.2022.08.005

 

  1. Moujahid H, Cherradi B, Al-Sarem M, et al. Combining CNN and Grad-CAM for COVID-19 disease prediction and visual explanation. Intell Autom Soft Comput. 2022;32(2):723-745. doi: 10.32604/iasc.2022.022179

 

  1. Panwar H, Gupta P, Siddiqui MK, Morales-Menendez R, Bhardwaj P, Singh V. A deep learning and Grad-CAM based color visualization approach for fast detection of COVID-19 cases using chest x-ray and ct-scan images. Chaos Solitons Fractals. 2020;140:110190. doi: 10.1016/j.chaos.2020.110190

 

  1. Gauy MM, Finger M. Pretrained Audio Neural Networks for Speech Emotion Recognition in Portuguese. In: First Workshop on Automatic Speech Recognition for Spontaneous and Prepared Speech Speech emotion recognition in Portuguese, SE&R; 2022.

 

  1. Xu X, Dinkel H, Wu M, Xie Z, Yu K. Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2021. p. 905-909. doi: 10.1109/ICASSP39728.2021.9413982

 

  1. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D. mixup: Beyond Empirical Risk Minimization. In: International Conference on Learning Representations; 2018.

 

  1. Xu K, Feng D, Mi H, et al. Mixup-based acoustic scene classification using multi-channel convolutional neural network. In: Advances in Multimedia Information Processing. Vol. 11166. Cham: Springer; 2018. p. 14-23. doi: 10.1007/978-3-030-00764-5_2

 

  1. Park DS, Chan W, ZhangY, et al. SpecAugment: A simple data augmentation method for automatic speech recognition. Proc Interspeech. 2019;1:2613-2617. doi: 10.21437/Interspeech.2019-2680

 

  1. Brigham EO, Morrow R. The fast fourier transform. IEEE Spectrum. 1967;4(12):63-70. doi: 10.1109/MSPEC.1967.5217220

 

  1. Kingma DP, Ba J. Adam: A Method for Stochastic Optimization. In: International Conference on Learning Representations; 2015.

 

  1. Frota S. Prosody and Focus in European Portuguese: Phonological Phrasing and Intonation. London: Routledge; 2014.

 

  1. Tenani L. Domínios Prosódicos no Português do Brasil: Implicações Para Prosódia e Para a Aplicação de Processos Fonológicos. Sínteses; 2023. p. 8. Available from: https:// revistas.iel.unicamp.br/index.php/sinteses/article/ view/6275 [Last accessed on 2024 July 29].
Share
Back to top
Artificial Intelligence in Health, Electronic ISSN: 3029-2387 Print ISSN: 3041-0894, Published by AccScience Publishing