Large language models-in-the-loop: Leveraging expert small artificial intelligence models for multilingual anonymization and de-identification of protected health information

Murat Gunay^1*, Bunyamin Keles², Raife Hizlan¹

Show Less

¹ Department of Research and Development, AI Handed LLC, Lewes, Delaware, United States of America

² Department of Health Management, Hacettepe University Institute of Social Sciences, Ankara, Turkey

AIH 2026, 3(1), 138–151; https://doi.org/10.36922/AIH025120021

Received: 19 March 2025 | Revised: 21 August 2025 | Accepted: 26 August 2025 | Published online: 19 September 2025

© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )

Download PDF

XML

Cite

Abstract

The rise of chronic diseases and pandemics, such as COVID-19 has emphasized the need for effective patient data processing while ensuring privacy through anonymization and de-identification of protected health information. Anonymized data facilitates research without compromising patient confidentiality. This paper introduces expert small artificial intelligence (AI) models developed using the large language model (LLM)-in-the-loop methodology to meet the demand for domain-specific de-identification of named entity recognition (NER) models. These models overcome the privacy risks associated with LLMs used through application programming interfaces by eliminating the need to transmit or store sensitive data. More importantly, they consistently outperform LLMs in de-identification tasks, offering superior performance and reliability. Our de-identification NER models, developed in eight languages—English, German, Italian, French, Romanian, Turkish, Spanish, and Arabic—achieved F1-macro score averages of 0.931, 0.960, 0.955, 0.937, 0.930, 0.963, 0.957, and 0.922, respectively. These results establish our de-identification NER models as the most accurate healthcare anonymization solutions, surpassing existing small models and even general-purpose LLMs, such as GPT-4o. While Part I of this series introduced the LLM-in-the-loop methodology for biomedical document translation, this second paper showcases its success in developing cost-effective expert small NER models in de-identification tasks. Our findings lay the groundwork for future healthcare AI innovations, including biomedical entity and relation extraction, demonstrating the value of specialized models for domain-specific challenges.

Keywords

De-identification

Health Insurance Portability and Accountability Act

Protected health information

Patient safety

Large language models-in-the-loop

Anonymization

Funding

None.

Conflict of interest

The authors declare that they have no competing interests.

References

Ahmed T, Al Aziz MM, Mohammed N. De-identification of electronic health record using neural network. Sci Rep. 2020;10(1):18600. doi: 10.1038/s41598-020-75544-1

Wood A, Denholm R, Hollings S, et al. Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: Data resource. BMJ. 2021;373:n826. doi: 10.1136/bmj.n826

Gungoren M, Orhan F, Kurutkan N. Mikro Rekabetcilikte Yeni Yaklasımlar: Hastanelerde Olusan Etik Iklimin Kalite ve Akreditasyon Acısından Degerlendirilmesi [New Approaches in Micro-Competitiveness: Evaluating the Ethical Climate in Hospitals in Terms of Quality and Accreditation]. Vol. 18. Suleyman Demirel Universitesi Iktisadi ve Idari Bilimler Fakultesi Dergisi; 2013. p. 221-241. Available from: https://dergipark.org.tr/tr/pub/sduiibfd/issue/ 20819/222797 [Last accessed on 2025 Sep 17].

Varol S, Orhan F, Tuncer S, Akyuz S. Saglık kurumlarında bilgi guvenligi baglamında biyometrik sistemler [Biometric systems in the context of information security in healthcare institutions]. Saglık Akadem Derg. 2016;3(4):155-162. doi: 10.5455/sad.13-1483706096

Yilmaz D, Ozkoc EE, Ogutcu Ulas G. Elektronik saglik kayitlarinda farkındalık [Awareness of electronic health records]. Hacettepe Sağlık İdaresi Derg. 2021;24(4):777-792.

HealthITSecurity. De-Identification of PHI According to the HIPAA Privacy Rule. Available from: https://healthitsecurity.com/features/de/identification/of/phi/according/to/the/ hipaa/privacy/rule [Last accessed on 2023 Apr 13].

Act A. Health Insurance Portability and Accountability Act of 1996. Vol. 104. Public Law; 1996. p. 191. Available from: https://www.govinfo.gov/content/pkg/PLAW-104publ191/ pdf/PLAW-104publ191.pdf [Last accessed on 2025 Sep 17].

Fernández-Alemán JL, Señor IC, Lozoya PÁ, Toval A. Security and privacy in electronic health records: A systematic literature review. J Biomed Inform. 2013;46(3):541-562. doi: 10.1016/j.jbi.2012.12.003

Office for Civil Rights HH. Standards for privacy of individually identifiable health information. Final rule. Fed Regist. 2002;67(157):53181-53273.

Toscano F, O’Donnell E, Unruh MA, et al. Electronic health records implementation: Can the European union learn from the United States? Eur J Public Health. 2018;28 Suppl 4:pcky213.401. doi: 10.1093/eurpub/cky213.401

Guidance on De-Identification of Protected Health Information hhs Deid Guidance.pdf; 2012. Available from: https://www. hhs.gov/sites/default/files/ocr/priv/identification/hhs/deid/ guidance.pdf [Last accessed on 2023 Jul 17].

Standards for Privacy of Individually Identifiable Health Information HHS.gov,; 2013. Available from: https://www.hhs.gov/hipaa/for/professionals/privacy/guidance/standards/ privacy/individually/identifiable/health/ information/index.html [Last accessed on 2023 Jul 17].

Neamatullah I, Douglass MM, Lehman LW, et al. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak. 2008;8:32. doi: 10.1186/1472-6947-8-32

Paul T, Rana MKZ, Tautam PA, et al. Investigation of the utility of features in a clinical de-identification model: A demonstration using EHR pathology reports for advanced NSCLC patients. Front Digit Health. 2022;4:728922. doi: 10.3389/fdgth.2022.728922

Garfinkel S. De-Identification of Personal Information, 2015: US Department of Commerce, National Institute of Standards and Technology. Available from: https://nvlpubs.nist.gov/ nistpubs/ir/2015/nist.ir.8053.pdf [Last accessed on 2025 Sep 17].

Wu H, Toti G, Morley KI, et al. SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research. J Am Med Inform Assoc. 2018;25(5):530-537. doi: 10.1093/jamia/ocx160

Stubbs A, Uzuner O. Annotating risk factors for heart disease in clinical narratives for diabetic patients. J Biomed Inform. 2015;58 Suppl: S78-S91. doi: 10.1016/j.jbi.2015.05.009

Catelli R, Gargiulo F, Casola V, De Pietro G, Fujita H, Esposito M. A novel COVID-19 data set and an effective deep learning approach for the de-identification of italian medical records. IEEE Access. 2021;9:19097-19110. doi: 10.1109/ACCESS.2021.3054479

Reddy S, Allan S, Coghlan S, Cooper P. A governance model for the application of AI in health care. J Am Med Inform Assoc. 2020;27(3):491-497. doi: 10.1093/jamia/ocz192

Ong JCL, Seng BJ, Law JZ, et al. Artificial intelligence, ChatGPT, and other large language models for social determinants of health: Current state and future directions. Cell Rep Med. 2024;5(1):101356. doi: 10.1016/j.xcrm.2023.101356

Gunasekeran DV, Tham YC, Ting DS, Tan GS, Wong TY. Digital health during COVID-19: Lessons from operationalising new models of care in ophthalmology. Lancet Digit Health. 2021;3(2):e124-e134. doi: 10.1016/S2589-7500(20)30287-9

Ting DS, Carin L, Dzau V, Wong TY. Digital technology and COVID-19. Nat Med. 2020;26(4):459-461. doi: 10.1038/s41591-020-0824-5

Verdicchio M, Perin A. When doctors and AI interact: On human responsibility for artificial risks. Philos Technol. 2022;35(1):11. doi: 10.1007/s13347-022-00506-6

Dai SC, Xiong A, Ku LW. LLM-in-the-Loop: Leveraging Large Language Model for Thematic Analysis. [arXiv Preprint]; 2023. doi: 10.48550/arXiv.2310.15100

De Paoli S. Can Large Language Models Emulate an Inductive Thematic Analysis of Semi-Structured Interviews? An Exploration and Provocation on the Limits of the Approach and the Model. [arXiv Preprint]; 2023. doi: 10.48550/arXiv.2305.13014

Gilardi F, Alizadeh M, Kubli M. ChatGPT outperforms crowd workers for text- annotation tasks. Proc Natl Acad Sci U S A. 2023;120(30):e2305016120. doi: 10.1073/pnas.2305016120

Islam T, Goldwasser D. Discovering Latent Themes in Social Media Messaging: A Machine-in-the-Loop Approach Integrating Llms. [arXiv Preprint]; 2024. doi: 10.48550/arXiv.2403.10707

Pham DK, Vo BQ. Towards Reliable Medical Question Answering: Tech- niques and Challenges in Mitigating Hallucinations in Language Models. [arXiv Preprint]; 2024. doi: 10.48550/arXiv.2408.13808

Umphrey R, Roberts J, Roberts L. Investigating Expert-in-the-Loop LLM Discourse Patterns for Ancient Intertextual Analysis. [arXiv Preprint]; 2024. doi: 10.48550/arXiv.2409.01882

Keles B, Gunay M, Caglar SI. LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation. [arXiv Preprint]; 2024. doi: 10.48550/arXiv.2407.12126

Khin K, Burckhardt P, Padman R. A Deep Learning Architecture for De- identification of Patient Notes: Implementation and Evaluation. [arXiv Pre-print ]; 2018. doi: 10.48550/arXiv.1810.01570

Morrison FP, Sengupta S, Hripcsak G. Using a pipeline to improve de-identification performance. AMIA Annu Symp Proc. 2009;2009:447-451.

Stubbs A, Kotfila C, Uzuner O. Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UThealth shared task track 1. J Biomed Inform. 2015;58 Suppl: S11-S19. doi: 10.1016/j-bi.2015.06.007

Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc. 2007;14(5):550-563. doi: 10.1197/jamia.M2444

Dernoncourt F, Lee JY, Uzuner O, Szolovits P. De-identification of patient notes with recurrent neural networks. J Am Med Inform Assoc. 2017;24(3):596-606. doi: 10.48550/arXiv.1606.03475

Ferrández O, South BR, Shen S, Friedlin FJ, Samore MH, Meystre SM. Evaluating current automatic de-identification methods with Veteran’s health administration clinical documents. BMC Med Res Methodol. 2012;12:109. doi: 10.1186/1471-2288-12-109

Meystre SM, Friedlin FJ, South BR, Shen S, Samore MH. Automatic de-identification of textual documents in the electronic health record: A review of recent research. BMC Med Res Methodol. 2010;10:70. doi: 10.1186/1471-2288-10-70

Liu Z, Chen Y, Tang B, et al. Automatic de-identification of electronic medical records using token-level and character-level conditional random fields. J Biomed Inform. 2015;58 Suppl: S47-S52. doi: 10.1016/j.jbi.2015.06.009

Yang H, Garibaldi JM. Automatic detection of protected health information from clinic narratives. J Biomed Inform. 2015;58 Suppl: S30-S38. doi: 10.1016/j.jbi.2015.06.015

Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: An introduction. J Am Med Inform Assoc. 2011;18(5):544-551. doi: 10.1136/amiajnl-2011-000464

Sweeney L. Replacing personally-identifying information in medical records, the Scrub system. Proc AMIA Annu Fall Symp. 1996:333-337.

Gupta D, Saul M, Gilbertson J. Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol. 2004;121(2):176-186. doi: 10.1309/E6K3-3GBP-E5C2-7FYU

He B, Guan Y, Cheng J, Cen K, Hua W. CRFs based de-identification of medical records. J Biomed Inform. 2015;58 Suppl: S39-S46. doi: 10.1016/j.jbi.2015.08.012

Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Williamstown, MA: ACM; 2001. doi: 10.1145/3696410.3714901

Kocaman V, Talby D, Hak HU. Beyond accuracy: Automated de-identification of large real-world clinical text datasets. Value in Health. 2023;26(12):S532. doi: 10.48550/arXiv.2312.08495

Liu Z, Huang Y, Cao C, et al. Deid-Gpt: Zero-Shot Medical Text de-Identification by Gpt-4. [arXiv Preprint]; 2023. doi: 10.48550/arXiv.2303.11032

Stubbs A, Kotfila C, Xu H, Uzuner O. Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task track 2. J Biomed Inform. 2015;58 Suppl: S67-S77. doi: 10.1016/j.jbi.2015.07.001

Previous article in this issue

Next article in this issue

Artificial Intelligence in Health, Electronic ISSN: 3029-2387 Print ISSN: 3041-0894, Published by AccScience Publishing