AccScience Publishing / AIH / Online First / DOI: 10.36922/AIH026170040
Cite this article
2
Download
41
Views
Related Info Links
More by Authors Links
Journal Browser
Volume | Year
Issue
Search
News and Announcements
View All
REVIEW ARTICLE

Retrieval-augmented generation and agentic AI in healthcare: Applications, challenges, and future directions

Bihong Zhu1 Xin Kang1* Kazuyuki Matsumoto1 Minoru Yoshida1
Show Less
1 Department of Information Science and Intelligent Systems, Faculty of Science and Engineering, Tokushima University, Information Science South Building, 2-1 Minamijyousanjima-cho, Tokushima, Japan
Received: 24 April 2026 | Revised: 25 May 2026 | Accepted: 29 May 2026 | Published online: 26 June 2026
© 2026 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( https://creativecommons.org/licenses/by/4.0/ )
Abstract

Large language models have shown growing potential in healthcare, but their clinical use remains constrained by hallucinations, outdated knowledge, limited traceability, and weak alignment with evidence-based practice. Retrieval-augmented generation and emerging agentic artificial intelligence (AI) frameworks offer a promising path to address these limitations by combining external medical knowledge, iterative reasoning, and tool-supported decision support. This review examines the progression from foundational medical retrieval-augmented generation (RAG) systems to more adaptive agentic frameworks, with emphasis on retrieval design, knowledge integration, reasoning control, and healthcare-oriented deployment. Key application areas including medical question answering, radiology support, rare disease reasoning, and clinical decision support are reviewed alongside the technical and practical challenges that continue to limit routine use. These challenges include retrieval noise, factual inconsistency, verification difficulties, privacy and governance concerns, and the lack of standardized evaluation. The review also highlights future directions in multimodal integration, human oversight, evidence grounding, and clinically responsible implementation. By organizing current developments around applications, challenges, and future directions, this article provides a healthcare-oriented perspective on the opportunities and limitations of RAG and agentic AI.

Graphical abstract
Keywords
Retrieval-augmented generation
Agentic AI
Healthcare
Clinical decision support
Medical question answering
Evidence grounding
Funding
This work was supported by JSPS KAKENHI (Grant Number JP26K14963).
Conflict of interest
The authors declare no competing interests.
References
  1. Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. arXiv. Preprint posted online 2017. doi: 10.48550/arXiv.1711.05225

 

  1. Esteva A, Robicquet A, Ramsundar B, et al. A guide to deep learning in healthcare. Nat Med. 2019;25(1):24-29. doi: 10.1038/s41591-018-0316-z

 

  1. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44- 56. doi: 10.1038/s41591-018-0300-7

 

  1. Singhal K, Azizi S, Tu T, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172-180. doi: 10.1038/s41586-023-06291-2

 

  1. Zhou H, Liu F, Gu B, et al. A survey of large language models in medicine: Progress, application, and challenge. arXiv. Preprint posted online 2023. doi: 10.48550/arXiv.2311.05112

 

  1. Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29(8):1930-1940. doi: 10.1038/s41591-023-02448-8

 

  1. Ondov B, Attal K, Demner-Fushman D. A survey of automated methods for biomedical text simplification. J Am Med Inform Assoc. 2022;29(11):1976-1988. doi: 10.1093/jamia/ocac149

 

  1. Clusmann J, Kolbinger FR, Muti HS, et al. The future landscape of large language models in medicine. Commun Med. 2023;3(1):141. doi: 10.1038/s43856-023-00370-1

 

  1. Yang X, Chen A, PourNejatian N, et al. A large language model for electronic health records. NPJ Digit Med. 2022;5(1):194. doi: 10.1038/s41746-022-00742-2

 

  1. Abd-alrazaq A, AlSaad R, Alhuwail D, et al. Large Language Models in Medical Education: Opportunities, Challenges, and Future Directions. JMIR Med Educ. 2023;9:e48291. doi: 10.2196/48291

 

  1. Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. arXiv. Preprint posted online 2020. doi: 10.48550/arXiv.2005.11401

 

  1. Guu K, Lee K, Tung Z, Pasupat P, Chang MW. REALM: Retrieval-augmented language model pre-training. arXiv. Preprint posted online 2020. doi: 10.48550/arXiv.2002.08909

 

  1. Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng. 2018;2(10):719-731. doi: 10.1038/s41551-018-0305-z

 

  1. Abo El-Enen M, Saad S, Nazmy T. A survey on retrieval-augmentation generation (RAG) models for healthcare applications. Neural Comput Appl. 2025;37(33):28191- 28267. doi: 10.1007/s00521-025-11666-9

 

  1. Li Y, Zhang W, Yang Y, et al. Towards agentic RAG with deep reasoning: A survey of RAG-reasoning systems in LLMs. arXiv. Preprint posted online 2025. doi: 10.48550/arXiv.2507.09477

 

  1. Neha F, Bhati D. Traditional RAG vs. Agentic RAG: A comparative study of retrieval-augmented systems. TechRxiv. Preprint posted online August 26, 2025. doi: 10.36227/techrxiv.175624551.12254549/v1

 

  1. Zheng Q, Sun Y, Wu C, et al. End-to-end agentic RAG system training for traceable diagnostic reasoning. arXiv. Preprint posted online 2025. doi: 10.48550/arXiv.2508.15746

 

  1. Yang R, Ning Y, Keppo E, et al. Retrieval-augmented generation for generative artificial intelligence in health care. npj Health Syst. 2025;2(1):2. doi: 10.1038/s44401-024-00004-1

 

  1. Xiong G, Jin Q, Lu Z, Zhang A. Benchmarking Retrieval- Augmented Generation for Medicine. In: Findings of the Association for Computational Linguistics ACL 2024. Kerrville, USA: Association for Computational Linguistics; 2024:6233-6251. doi: 10.18653/v1/2024.findings-acl.372

 

  1. Yun J, Sohn J, Park J, et al. Med-PRM: Medical Reasoning Models with Stepwise, Guideline-verified Process Rewards. In: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Kerrville, USA: Association for Computational Linguistics; 2025:16565-16582. doi: 10.18653/v1/2025.emnlp-main.837

 

  1. Jin B, Yoon J, Han J, Arik SO. Long-context LLMs Meet RAG: Overcoming challenges for long inputs in RAG. arXiv. Preprint posted online 2024. doi: 10.48550/arXiv.2410.05983

 

  1. Song M, Sim SH, Bhardwaj R, Chieu HL, Majumder N, Poria S. Measuring and enhancing trustworthiness of LLMs in RAG through grounded attributions and learning to refuse. arXiv. Preprint posted online 2024. doi: 10.48550/arXiv.2409.11242

 

  1. Singhal K, Tu T, Gottweis J, et al. Toward expert-level medical question answering with large language models. Nat Med. 2025;31(3):943-950. doi: 10.1038/s41591-024-03423-7

 

  1. Zhao X, Liu S, Yang SY, Miao C. MedRAG: Enhancing retrieval-augmented generation with knowledge graph-elicited reasoning for healthcare copilot. arXiv. Preprint posted online 2025. doi: 10.48550/arXiv.2502.04413

 

  1. Wang C, Long Q, Xiao M, et al. BioRAG: A RAG-LLM framework for biological question reasoning. arXiv. Preprint posted online 2024. doi: 10.48550/arXiv.2408.01107

 

  1. Xiong G, Jin Q, Wang X, Zhang M, Lu Z, Zhang A. Improving retrieval-augmented generation in medicine with iterative follow-up questions. arXiv. Preprint posted online 2024. doi: 10.48550/arXiv.2408.00727

 

  1. Rezaei MR, Fard RS, Parker JL, Krishnan RG, Lankarany M. Agentic Medical Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge. In: Findings of the Association for Computational Linguistics: EMNLP 2025. Kerrville, USA: Association for Computational Linguistics; 2025:12682- 12701. doi: 10.18653/v1/2025.findings-emnlp.679

 

  1. Wu J, Zhu J, Qi Y, et al. Medical Graph RAG: Evidence-based Medical Large Language Model via Graph Retrieval- Augmented Generation. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol 1. Kerrville, USA: Association for Computational Linguistics; 2025:28443-28467. doi: 10.18653/v1/2025.acl-long.1381

 

  1. Liang S, Zhang L, Zhu H, Wang W, He Y, Zhou D. RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering. In: Findings of the Association for Computational Linguistics: EMNLP 2025. Kerrville, USA: Association for Computational Linguistics; 2025:4006-4033. doi: 10.18653/v1/2025.findings-emnlp.214

 

  1. Wang F, Shi D, Aguilar J, et al. LLM-KGMQA: Large language model-augmented multi-hop question-answering system based on knowledge graph in medical field. Res Sq. Preprint posted online August 5, 2024. doi: 10.21203/rs.3.rs-4721418/v1

 

  1. Jia X, Xiong Y, Pei S, Zhang Y, Yan C, Fang Z. Semantic Feedback-Based RAG for Radiology Report Generation. Big Data Min Anal. 2026;9(2):393-406. doi: 10.26599/BDMA.2025.9020037

 

  1. Li C, Wong C, Zhang S, et al. LLaVA-Med: Training a large language-and-vision assistant for biomedicine in one day. arXiv. Preprint posted online 2023. doi: 10.48550/arXiv.2306.00890

 

  1. Chen X, Reibman A, Arora S. Sequential Recommendation Model for Next Purchase Prediction. In: Machine Learning & Applications. Academy & Industry Research Collaboration; 2023:141-158. doi: 10.5121/csit.2023.131013

 

  1. Zheng D, Liu D, Lapata M, Pan JZ. TrustScore: Reference-free evaluation of LLM response trustworthiness. arXiv. Preprint posted online 2024. doi: 10.48550/arXiv.2402.12545

 

  1. Yu H, Zhou J, Li L, et al. Simulated patient systems powered by large language model-based AI agents offer potential for transforming medical education. Commun Med. 2025;6(1):27. doi: 10.1038/s43856-025-01283-x

 

  1. Dong X, Zhu W, Wang H, et al. Talk before you retrieve: agent-led discussions for better rag in medical QA. arXiv. Preprint posted online April 30, 2025. doi: 10.48550/arXiv.2504.21252

 

  1. Wu C, Lin W, Zhang X, Zhang Y, Wang Y, Xie W. PMC-LLaMA: Towards Building Open-source Language Models for Medicine. arXiv. Preprint posted online August 25, 2023. doi: 10.48550/arXiv.2304.14454

 

  1. Bolton E, Venigalla A, Yasunaga M, et al. BioMedLM: A 2.7B Parameter Language Model Trained on Biomedical Text. arXiv. Preprint posted online March 27, 2024. doi: 10.48550/arXiv.2403.18421

 

  1. Luo R, Sun L, Xia Y, et al. BioGPT: Generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform. 2022;23(6):bbac409. doi: 10.1093/bib/bbac409

 

  1. Jin D, Pan E, Oufattole N, Weng WH, Fang H, Szolovits P. What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams. Appl Sci. 2021;11(14):6421. doi: 10.3390/app11146421

 

  1. Pal A, Umapathi LK, Sankarasubbu M. MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. arXiv. Preprint posted online March 27, 2022. doi: 10.48550/arXiv.2203.14371

 

  1. Jin Q, Dhingra B, Liu Z, Cohen WW, Lu X. PubMedQA: A Dataset for Biomedical Research Question Answering. arXiv. Preprint posted online September 13, 2019:arXiv:1909.06146. doi: 10.48550/arXiv.1909.06146

 

  1. Welsby P, Cheung BMY. ChatGPT. Postgrad Med J. 2023;99(1176):1047-1048. doi: 10.1093/postmj/qgad056

 

  1. Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Wren J, ed. Bioinformatics. 2020;36(4):1234- 1240. doi: 10.1093/bioinformatics/btz682

 

  1. Gu Y, Tinn R, Cheng H, et al. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans Comput Healthc. 2022;3(1):1-23. doi: 10.1145/3458754

 

  1. Karpukhin V, Oguz B, Min S, et al. Dense Passage Retrieval for Open-Domain Question Answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Kerrville, USA: Association for Computational Linguistics; 2020:6769-6781. doi: 10.18653/v1/2020.emnlp-main.550

 

  1. Hu Y, Xuan W, Zhou Q, et al. A self-correcting Agentic Graph RAG for clinical decision support in hepatology. Front Med. 2025;12. doi: 10.3389/fmed.2025.1716327

 

  1. Zhao W, Wu C, Fan Y, et al. An agentic system for rare disease diagnosis with traceable reasoning. Nature. 2026;651(8106):775-784. doi: 10.1038/s41586-025-10097-9

 

  1. Kumari M, Chauhan R, Jain R, Garg P. A novel context-aware retrieval framework for biomedical knowledge integration with large language models. Inf Fusion. 2026;127:103902. doi: 10.1016/j.inffus.2025.103902

 

  1. Kim S. MedBioRAG: Semantic Search and Retrieval- Augmented Generation with Large Language Models for Medical and Biological QA. arXiv. Preprint posted online December 10, 2025. doi: 10.48550/arXiv.2512.10996

 

  1. Sohn J, Park Y, Yoon C, et al. Rationale-Guided Retrieval Augmented Generation for Medical Question Answering. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Vol 1. Kerrville, USA: Association for Computational Linguistics; 2025:12739-12753. doi: 10.18653/v1/2025.naacl-long.635

 

  1. Lopez I, Swaminathan A, Vedula K, et al. Clinical entity augmented retrieval for clinical information extraction. NPJ Digit Med. 2025;8(1):45. doi: 10.1038/s41746-024-01377-1

 

  1. Singh A. Agentic RAG Systems for Improving Adaptability and Performance in AI-Driven Information Retrieval. SSRN. Preprint posted online 2025. doi: 10.2139/ssrn.5188363

 

  1. Tian F, Fang J, Ganguly D, Meng Z, Macdonald C. Am I on the Right Track? What Can Predicted Query Performance Tell Us about the Search Behaviour of Agentic RAG. arXiv. Preprint posted online July 14, 2025. doi: 10.48550/arXiv.2507.10411

 

  1. Yao S, Zhao J, Yu D, et al. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv. Preprint posted online March 10, 2023. doi: 10.48550/arXiv.2210.03629

 

  1. Park JS, O’Brien J, Cai CJ, Morris MR, Liang P, Bernstein MS. Generative Agents: Interactive Simulacra of Human Behavior. In: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. New York USA: ACM; 2023:1-22. doi: 10.1145/3586183.3606763

 

  1. Yadav V, Gaurav, Rana A, Sharma S. MedRAG-Agent: Medical Query Resolution By Employing A Multi-Agent, Knowledge Graph-Enhanced RAG-Based AI Framework. In: 2025 IEEE 6th Global Conference for Advancement in Technology (GCAT). New York USA: IEEE; 2025:1-4. doi: 10.1109/GCAT66372.2025.11368379

 

  1. Han H, Wang Y, Shomer H, et al. Retrieval-Augmented Generation with Graphs (GraphRAG). arXiv. Preprint posted online January 8, 2025. doi: 10.48550/arXiv.2501.00309

 

  1. Kim D, Yoo S, Jeong O. MedSumGraph: enhancing GraphRAG for medical QA with summarization and optimized prompts. Artif Intell Med. 2026;172:103311. doi: 10.1016/j.artmed.2025.103311

 

  1. Su X, Wang Y, Gao S, et al. KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA. arXiv. Preprint posted online March 3, 2025. doi: 10.48550/arXiv.2410.04660

 

  1. Izacard G, Grave E. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics; 2021:874-880. doi: 10.18653/v1/2021.eacl-main.74

 

  1. Li L, Zhou X, Zhang Y, Wu X. From Retrieval to Generation: Unifying External and Parametric Knowledge for Medical Question Answering. arXiv. Preprint posted online October 21, 2025. doi: 10.48550/arXiv.2510.18297

 

  1. Frisoni G, Cocchieri A, Presepi A, Moro G, Meng Z. To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vol 1. Kerrville, USA: Association for Computational Linguistics; 2024:9878-9919. doi: 10.18653/v1/2024.acl-long.533

 

  1. Wei J, Zhou H, Zhang X, et al. Retrieval is Not Enough: Enhancing RAG Reasoning through Test-Time Critique and Optimization. arXiv. Preprint posted online October 11, 2025. doi: 10.48550/arXiv.2504.14858

 

  1. Madaan A, Tandon N, Gupta P, et al. Self-Refine: Iterative Refinement with Self-Feedback. arXiv. Preprint posted online May 25, 2023. doi: 10.48550/arXiv.2303.17651

 

  1. Jia S, Bit S, Jasodanand VH, Liu Y, Kolachalama VB. Agentic memory-augmented retrieval and evidence grounding for medical question-answering tasks. Int J Med Inform. 2026;212:106339. doi: 10.1016/j.ijmedinf.2026.106339

 

  1. Shi Y, Xu S, Yang T, et al. MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering. AMIA Annu Symp Proc. 2024;2024:1011-1020.

 

  1. Prayitno LOMY, Nurfadilah A, Saudi SB, Tsunami WD, Sajiah AM. Conversational Agent for Medical Question- Answering Using RAG and LLM. J Artif Intell Eng Appl. 2025;4(3):1894-1899. doi: 10.59934/jaiea.v4i3.1077

 

  1. Zeng Z, Cheng Q, Yin Z, et al. Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective. arXiv. Preprint posted online December 18, 2024. doi: 10.48550/arXiv.2412.14135

 

  1. Guo D, Yang D, Zhang H, et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature. 2025;645(8081):633-638. doi: 10.1038/s41586-025-09422-z
Share
Back to top
Artificial Intelligence in Health, Electronic ISSN: 3029-2387 Print ISSN: 3041-0894, Published by AccScience Publishing