Retrieval-augmented generation and agentic AI in healthcare: Applications, challenges, and future directions

¹ Department of Information Science and Intelligent Systems, Faculty of Science and Engineering, Tokushima University, Information Science South Building, 2-1 Minamijyousanjima-cho, Tokushima, Japan

AIH, 026170040 https://doi.org/10.36922/AIH026170040

Received: 24 April 2026 | Revised: 25 May 2026 | Accepted: 29 May 2026 | Published online: 26 June 2026

© 2026 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )

Download PDF

XML

Cite

Abstract

Large language models have shown growing potential in healthcare, but their clinical use remains constrained by hallucinations, outdated knowledge, limited traceability, and weak alignment with evidence-based practice. Retrieval-augmented generation and emerging agentic artificial intelligence (AI) frameworks offer a promising path to address these limitations by combining external medical knowledge, iterative reasoning, and tool-supported decision support. This review examines the progression from foundational medical retrieval-augmented generation (RAG) systems to more adaptive agentic frameworks, with emphasis on retrieval design, knowledge integration, reasoning control, and healthcare-oriented deployment. Key application areas including medical question answering, radiology support, rare disease reasoning, and clinical decision support are reviewed alongside the technical and practical challenges that continue to limit routine use. These challenges include retrieval noise, factual inconsistency, verification difficulties, privacy and governance concerns, and the lack of standardized evaluation. The review also highlights future directions in multimodal integration, human oversight, evidence grounding, and clinically responsible implementation. By organizing current developments around applications, challenges, and future directions, this article provides a healthcare-oriented perspective on the opportunities and limitations of RAG and agentic AI.

Graphical abstract

Keywords

Retrieval-augmented generation

Agentic AI

Healthcare

Clinical decision support

Medical question answering

Evidence grounding

Funding

This work was supported by JSPS KAKENHI (Grant Number JP26K14963).

Conflict of interest

The authors declare no competing interests.

References

Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv. Preprint posted online 2017. doi: 10.48550/arXiv.1711.05225

Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24-29. doi: 10.1038/s41591-018-0316-z

Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44- 56. doi: 10.1038/s41591-018-0300-7

Singhal K, Azizi S, Tu T, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172-180. doi: 10.1038/s41586-023-06291-2

Zhou H, Liu F, Gu B, et al. A survey of large language models in medicine: Progress, application, and challenge. arXiv. Preprint posted online 2023. doi: 10.48550/arXiv.2311.05112

Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8

Ondov B, Attal K, Demner-Fushman D. A survey of automated methods for biomedical text simplification. J Am Med Inform Assoc. 2022;29(11):1976-1988. doi: 10.1093/jamia/ocac149

Clusmann J, Kolbinger FR, Muti HS, et al. The future landscape of large language models in medicine. Commun Med. 2023;3(1):141. doi: 10.1038/s43856-023-00370-1

Yang X, Chen A, PourNejatian N, et al. A large language model for electronic health records. NPJ Digit Med. 2022;5(1):194. doi: 10.1038/s41746-022-00742-2

Abd-alrazaq A, AlSaad R, Alhuwail D, et al. Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions. JMIR Med Educ. 2023;9:e48291. doi: 10.2196/48291

Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv. Preprint posted online 2020. doi: 10.48550/arXiv.2005.11401

Guu K, Lee K, Tung Z, Pasupat P, Chang MW. REALM: Retrieval-augmented language model pre-training. arXiv. Preprint posted online 2020. doi: 10.48550/arXiv.2002.08909

Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng. 2018;2(10):719-731. doi: 10.1038/s41551-018-0305-z

Abo El-Enen M, Saad S, Nazmy T. A survey on retrieval-augmentation generation (RAG) models for healthcare applications. Neural Comput Appl. 2025;37(33):28191- 28267. doi: 10.1007/s00521-025-11666-9

Li Y, Zhang W, Yang Y, et al. Towards agentic RAG with deep reasoning: A survey of RAG-reasoning systems in LLMs. arXiv. Preprint posted online 2025. doi: 10.48550/arXiv.2507.09477

Neha F, Bhati D. Traditional RAG vs. Agentic RAG: A comparative study of retrieval-augmented systems. TechRxiv. Preprint posted online August 26, 2025. doi: 10.36227/techrxiv.175624551.12254549/v1

Zheng Q, Sun Y, Wu C, et al. End-to-end agentic RAG system training for traceable diagnostic reasoning. arXiv. Preprint posted online 2025. doi: 10.48550/arXiv.2508.15746

Yang R, Ning Y, Keppo E, et al. Retrieval-augmented generation for generative artificial intelligence in health care. npj Health Syst. 2025;2(1):2. doi: 10.1038/s44401-024-00004-1

Xiong G, Jin Q, Lu Z, Zhang A. Benchmarking Retrieval- Augmented Generation for Medicine. In: Findings of the Association for Computational Linguistics ACL 2024. Kerrville, USA: Association for Computational Linguistics; 2024:6233-6251. doi: 10.18653/v1/2024.findings-acl.372

Yun J, Sohn J, Park J, et al. Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards. In: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Kerrville, USA: Association for Computational Linguistics; 2025:16565-16582. doi: 10.18653/v1/2025.emnlp-main.837

Jin B, Yoon J, Han J, Arik SO. Long-context LLMs Meet RAG: Overcoming challenges for long inputs in RAG. arXiv. Preprint posted online 2024. doi: 10.48550/arXiv.2410.05983

Song M, Sim SH, Bhardwaj R, Chieu HL, Majumder N, Poria S. Measuring and enhancing trustworthiness of LLMs in RAG through grounded attributions and learning to refuse. arXiv. Preprint posted online 2024. doi: 10.48550/arXiv.2409.11242

Singhal K, Tu T, Gottweis J, et al. Toward expert-level medical question answering with large language models. Nat Med. 2025;31(3):943-950. doi: 10.1038/s41591-024-03423-7

Zhao X, Liu S, Yang SY, Miao C. MedRAG: Enhancing retrieval-augmented generation with knowledge graph-elicited reasoning for healthcare copilot. arXiv. Preprint posted online 2025. doi: 10.48550/arXiv.2502.04413

Wang C, Long Q, Xiao M, et al. BioRAG: A RAG-LLM framework for biological question reasoning. arXiv. Preprint posted online 2024. doi: 10.48550/arXiv.2408.01107

Xiong G, Jin Q, Wang X, Zhang M, Lu Z, Zhang A. Improving retrieval-augmented generation in medicine with iterative follow-up questions. arXiv. Preprint posted online 2024. doi: 10.48550/arXiv.2408.00727

Rezaei MR, Fard RS, Parker JL, Krishnan RG, Lankarany M. Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge. In: Findings of the Association for Computational Linguistics: EMNLP 2025. Kerrville, USA: Association for Computational Linguistics; 2025:12682- 12701. doi: 10.18653/v1/2025.findings-emnlp.679

Wu J, Zhu J, Qi Y, et al. Medical Graph RAG: Evidence-based Medical Large Language Model via Graph Retrieval- Augmented Generation. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol 1. Kerrville, USA: Association for Computational Linguistics; 2025:28443-28467. doi: 10.18653/v1/2025.acl-long.1381

Liang S, Zhang L, Zhu H, Wang W, He Y, Zhou D. RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering. In: Findings of the Association for Computational Linguistics: EMNLP 2025. Kerrville, USA: Association for Computational Linguistics; 2025:4006-4033. doi: 10.18653/v1/2025.findings-emnlp.214

Wang F, Shi D, Aguilar J, et al. LLM-KGMQA: Large language model-augmented multi-hop question-answering system based on knowledge graph in medical field. Res Sq. Preprint posted online August 5, 2024. doi: 10.21203/rs.3.rs-4721418/v1

Jia X, Xiong Y, Pei S, Zhang Y, Yan C, Fang Z. Semantic Feedback-Based RAG for Radiology Report Generation. Big Data Min Anal. 2026;9(2):393-406. doi: 10.26599/BDMA.2025.9020037

Li C, Wong C, Zhang S, et al. LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day. arXiv. Preprint posted online 2023. doi: 10.48550/arXiv.2306.00890

Chen X, Reibman A, Arora S. Sequential Recommendation Model for Next Purchase Prediction. In: Machine Learning & Applications. Academy & Industry Research Collaboration; 2023:141-158. doi: 10.5121/csit.2023.131013

Zheng D, Liu D, Lapata M, Pan JZ. TrustScore: Reference-free evaluation of LLM response trustworthiness. arXiv. Preprint posted online 2024. doi: 10.48550/arXiv.2402.12545

Yu H, Zhou J, Li L, et al. Simulated patient systems powered by large language model-based AI agents offer potential for transforming medical education. Commun Med. 2025;6(1):27. doi: 10.1038/s43856-025-01283-x

Dong X, Zhu W, Wang H, et al. Talk before you retrieve: agent-led discussions for better rag in medical QA. arXiv. Preprint posted online April 30, 2025. doi: 10.48550/arXiv.2504.21252

Wu C, Lin W, Zhang X, Zhang Y, Wang Y, Xie W. PMC-LLaMA: Towards Building Open-source Language Models for Medicine. arXiv. Preprint posted online August 25, 2023. doi: 10.48550/arXiv.2304.14454

Bolton E, Venigalla A, Yasunaga M, et al. BioMedLM: A 2.7B Parameter Language Model Trained on Biomedical Text. arXiv. Preprint posted online March 27, 2024. doi: 10.48550/arXiv.2403.18421

Luo R, Sun L, Xia Y, et al. BioGPT: Generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022;23(6):bbac409. doi: 10.1093/bib/bbac409

Jin D, Pan E, Oufattole N, Weng WH, Fang H, Szolovits P. What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams. Appl Sci. 2021;11(14):6421. doi: 10.3390/app11146421

Pal A, Umapathi LK, Sankarasubbu M. MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. arXiv. Preprint posted online March 27, 2022. doi: 10.48550/arXiv.2203.14371

Jin Q, Dhingra B, Liu Z, Cohen WW, Lu X. PubMedQA: A Dataset for Biomedical Research Question Answering. arXiv. Preprint posted online September 13, 2019:arXiv:1909.06146. doi: 10.48550/arXiv.1909.06146

Welsby P, Cheung BMY. ChatGPT. Postgrad Med J. 2023;99(1176):1047-1048. doi: 10.1093/postmj/qgad056

Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Wren J, ed. Bioinformatics. 2020;36(4):1234- 1240. doi: 10.1093/bioinformatics/btz682

Gu Y, Tinn R, Cheng H, et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans Comput Healthc. 2022;3(1):1-23. doi: 10.1145/3458754

Karpukhin V, Oguz B, Min S, et al. Dense Passage Retrieval for Open-Domain Question Answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Kerrville, USA: Association for Computational Linguistics; 2020:6769-6781. doi: 10.18653/v1/2020.emnlp-main.550

Hu Y, Xuan W, Zhou Q, et al. A self-correcting Agentic Graph RAG for clinical decision support in hepatology. Front Med. 2025;12. doi: 10.3389/fmed.2025.1716327

Zhao W, Wu C, Fan Y, et al. An agentic system for rare disease diagnosis with traceable reasoning. Nature. 2026;651(8106):775-784. doi: 10.1038/s41586-025-10097-9

Kumari M, Chauhan R, Jain R, Garg P. A novel context-aware retrieval framework for biomedical knowledge integration with large language models. Inf Fusion. 2026;127:103902. doi: 10.1016/j.inffus.2025.103902

Kim S. MedBioRAG: Semantic Search and Retrieval- Augmented Generation with Large Language Models for Medical and Biological QA. arXiv. Preprint posted online December 10, 2025. doi: 10.48550/arXiv.2512.10996

Sohn J, Park Y, Yoon C, et al. Rationale-Guided Retrieval Augmented Generation for Medical Question Answering. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Vol 1. Kerrville, USA: Association for Computational Linguistics; 2025:12739-12753. doi: 10.18653/v1/2025.naacl-long.635

Lopez I, Swaminathan A, Vedula K, et al. Clinical entity augmented retrieval for clinical information extraction. NPJ Digit Med. 2025;8(1):45. doi: 10.1038/s41746-024-01377-1

Singh A. Agentic RAG Systems for Improving Adaptability and Performance in AI-Driven Information Retrieval. SSRN. Preprint posted online 2025. doi: 10.2139/ssrn.5188363

Tian F, Fang J, Ganguly D, Meng Z, Macdonald C. Am I on the Right Track? What Can Predicted Query Performance Tell Us about the Search Behaviour of Agentic RAG. arXiv. Preprint posted online July 14, 2025. doi: 10.48550/arXiv.2507.10411

Yao S, Zhao J, Yu D, et al. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv. Preprint posted online March 10, 2023. doi: 10.48550/arXiv.2210.03629

Park JS, O’Brien J, Cai CJ, Morris MR, Liang P, Bernstein MS. Generative Agents: Interactive Simulacra of Human Behavior. In: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. New York USA: ACM; 2023:1-22. doi: 10.1145/3586183.3606763

Yadav V, Gaurav, Rana A, Sharma S. MedRAG-Agent: Medical Query Resolution By Employing A Multi-Agent, Knowledge Graph-Enhanced RAG-Based AI Framework. In: 2025 IEEE 6th Global Conference for Advancement in Technology (GCAT). New York USA: IEEE; 2025:1-4. doi: 10.1109/GCAT66372.2025.11368379

Han H, Wang Y, Shomer H, et al. Retrieval-Augmented Generation with Graphs (GraphRAG). arXiv. Preprint posted online January 8, 2025. doi: 10.48550/arXiv.2501.00309

Kim D, Yoo S, Jeong O. MedSumGraph: enhancing GraphRAG for medical QA with summarization and optimized prompts. Artif Intell Med. 2026;172:103311. doi: 10.1016/j.artmed.2025.103311

Su X, Wang Y, Gao S, et al. KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA. arXiv. Preprint posted online March 3, 2025. doi: 10.48550/arXiv.2410.04660

Izacard G, Grave E. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics; 2021:874-880. doi: 10.18653/v1/2021.eacl-main.74

Li L, Zhou X, Zhang Y, Wu X. From Retrieval to Generation: Unifying External and Parametric Knowledge for Medical Question Answering. arXiv. Preprint posted online October 21, 2025. doi: 10.48550/arXiv.2510.18297

Frisoni G, Cocchieri A, Presepi A, Moro G, Meng Z. To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol 1. Kerrville, USA: Association for Computational Linguistics; 2024:9878-9919. doi: 10.18653/v1/2024.acl-long.533

Wei J, Zhou H, Zhang X, et al. Retrieval is Not Enough: Enhancing RAG Reasoning through Test-Time Critique and Optimization. arXiv. Preprint posted online October 11, 2025. doi: 10.48550/arXiv.2504.14858

Madaan A, Tandon N, Gupta P, et al. Self-Refine: Iterative Refinement with Self-Feedback. arXiv. Preprint posted online May 25, 2023. doi: 10.48550/arXiv.2303.17651

Jia S, Bit S, Jasodanand VH, Liu Y, Kolachalama VB. Agentic memory-augmented retrieval and evidence grounding for medical question-answering tasks. Int J Med Inform. 2026;212:106339. doi: 10.1016/j.ijmedinf.2026.106339

Shi Y, Xu S, Yang T, et al. MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering. AMIA Annu Symp Proc. 2024;2024:1011-1020.

Prayitno LOMY, Nurfadilah A, Saudi SB, Tsunami WD, Sajiah AM. Conversational Agent for Medical Question- Answering Using RAG and LLM. J Artif Intell Eng Appl. 2025;4(3):1894-1899. doi: 10.59934/jaiea.v4i3.1077

Zeng Z, Cheng Q, Yin Z, et al. Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective. arXiv. Preprint posted online December 18, 2024. doi: 10.48550/arXiv.2412.14135

Guo D, Yang D, Zhang H, et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature. 2025;645(8081):633-638. doi: 10.1038/s41586-025-09422-z

Previous article in this issue

Next article in this issue

Artificial Intelligence in Health, Electronic ISSN: 3029-2387 Print ISSN: 3041-0894, Published by AccScience Publishing