Development and analysis of medical instruction-tuning for Japanese large language models

Issey Sukeda1* Masahiro Suzuki2 Hiroki Sakaji3 Satoshi Kodera1
1 Department of Cardiovascular Medicine, Graduate School of Medicine, The University of Tokyo, Bunkyo, Tokyo, Japan
2 Department of Systems Innovation, School of Engineering, The University of Tokyo, Bunkyo, Tokyo, Japan
3 Faculty of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido, Japan
AIH 2024, 1(2), 107–116;
Submitted: 10 January 2024 | Accepted: 13 March 2024 | Published: 8 April 2024
© 2024 by the Author (s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International License ( )

In the ongoing wave of impact driven by large language models (LLMs) like ChatGPT, the adaptation of LLMs to the medical domain has emerged as a crucial research frontier. Since mainstream LLMs tend to be designed for general-purpose applications, constructing a medical LLM through domain adaptation is a huge challenge. While instruction-tuning, particularly based on low-rank adaptation (LoRA), has become a frequently employed strategy to fine-tune LLMs recently, its precise roles in domain adaptation remain unknown. Here, we investigated how LoRA-based instruction-tuning improves the performance of Japanese medical question-answering tasks by employing a multifaceted evaluation of multiple-choice questions, including scoring based on “Exact match” and “Gestalt distance” in addition to the conventional accuracy. Our findings suggest that LoRA-based instruction-tuning can partially incorporate domain-specific knowledge into LLMs, with larger models demonstrating more pronounced effects. Furthermore, our results underscore the potential of adapting English-centric models for Japanese applications in domain adaptation, while also highlighting the persisting limitations of Japanese-centric models. This initiative represents a pioneering effort in enabling medical institutions to fine-tune and operate models without relying on external services.

Medical large language models
Domain adaptation
Low-rank adaptation
This study was supported by the Japan Agency for Medical Research and Development (Grant Number: JP23hk0102078h0003).
Conflict of interest
The authors declare they have no competing interests.
