Talking slide avatars: Open-source multimodal communication approach for teaching

¹ School of Mathematics and Computer Science, College of Business, Engineering and Technology, Kentucky State University, Frankfort, Kentucky, United States of America

AC, 026140017 https://doi.org/10.36922/AC026140017

Received: 30 March 2026 | Revised: 13 May 2026 | Accepted: 15 May 2026 | Published online: 5 June 2026

© 2026 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC-by the license) ( https://creativecommons.org/licenses/by-nc/4.0/ )

Download PDF

XML

Cite

Abstract

Slide-based teaching is widely used in higher education. Yet, in online, hybrid, and asynchronous contexts, slides often lose instructor presence, narrative continuity, and expressive framing that help learners connect with course content. A full lecture video can partly restore these qualities, but it is time-consuming to record, revise, and reuse. This study presents a practice-based implementation and analytic reflection of an open-source workflow for creating talking slide avatars. The workflow integrates OpenVoice for text-to-speech and authorized voice-style conversion with Ditto-TalkingHead for audio-driven talking-image synthesis, enabling instructors to transform a short script and an authorized or synthetic portrait image into a narrated video for slide decks or HyperText Markup Language-based lecture materials. Rather than treating this workflow only as a technical solution, the study frames talking slide avatars as multimodal communication artifacts at the intersection of digital pedagogy, aesthetic education, and art–technology practice. The paper documents the production pipeline, analyzes communicative and aesthetic affordances, and proposes practical guidelines for script length, image selection, pacing, disclosure, accessibility, consent, and ethical use. Its contribution is not a validated learning intervention but an educator-oriented open-source production model and communication design framework. The study concludes that short, transparent, and carefully designed avatars may provide a reusable communication layer for introductions, transitions, reminders, and recaps when used selectively and with appropriate ethical safeguards.

Keywords

Artificial intelligence avatar

Multimodal communication

Instructional video

Art and technology

Higher education

Talking head synthesis

Funding

None.

Conflict of interest

The author declares no competing interests.

References

Baker JP, Goodboy AK, Bowman ND, Wright AA. Does teaching with PowerPoint increase students’ learning? A meta-analysis. Comput Educ. 2018;126:376-387. doi: 10.1016/j.compedu.2018.08.003
Chávez HD, Ramón CP, Castelló TA. Patterns of PowerPoint use in higher education: a comparison between the natural, medical, and social sciences. Innov High Educ. 2020:45(1):65- 80. doi: 10.1007/s10755-019-09488-4
Li W, Wang W. The impact of teaching presence on students’ online learning experience: Evidence from 334 Chinese universities during the pandemic. Front Psychol. 2024;15:1291341. doi: 10.3389/fpsyg.2024.1291341
Polat H. Instructors’ presence in instructional videos: A systematic review. Educ Inf Technol. 2023;28(7):8537-8569. doi: 10.1007/s10639-022-11532-4
Lawson AP, Mayer RE, Adamo-Villani N, Benes B, Lei X, Cheng J. The positivity principle: Do positive instructors improve learning from video lectures?. Educ Technol Res Dev. 2021;69(6):3101-3129. doi: 10.1007/s11423-021-10057-w
Guo PJ, Kim J, Rubin R. How video production affects student engagement: An empirical study of MOOC videos. In: Proceedings of the First ACM Conference on Learning at Scale Conference. Assoc Comput Mach. 2014;41-50. doi: 10.1145/2556325.2566239
Polat H, Taş N, Kaban A, Kayaduman H, Battal A. Human or humanoid animated pedagogical avatars in video lectures: The impact of the knowledge type on learning outcomes. Int J Hum Comput Interact. 2025;41(14):8912-8927. doi: 10.1080/10447318.2024.2415762
Anttonen R, Kristian K, Eija R, Carita K. Storifying instructional videos on online credibility evaluation: Examining engagement and learning. Comput Hum Behav. 2024;161:108385. doi: 10.1016/j.chb.2024.108385
Dai L, Jung MM, Postma M, Louwerse MM. A systematic review of pedagogical agent research: Similarities, differences and unexplored aspects. Comput Educ. 2022;190:104607. doi: 10.1016/j.compedu.2022.104607
Atkinson RK. Fostering social agency in multimedia learning: Examining the impact of an animated agent’s voice. Contemp Educ Psychol. 2005;30(1):117-139. doi: 10.1016/j.cedpsych.2004.07.001
Wang N, Johnson WL, Mayer RE, Rizzo P, Shaw E, Collins H. The politeness effect: Pedagogical agents and learning outcomes. Int J Hum Comput Stud. 2008;66(2):98-112. doi: 10.1016/j.ijhcs.2007.09.003
Nass C, Steuer J, Tauber ER. Computers are social actors. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM; 1994:72-78. doi: 10.1145/191666.191703
Wu X. Singing syllabi with virtual avatars: enhancing student engagement through AI-generated music and digital embodiment. arXiv. 2025;2508:11872. doi: 10.48550/arXiv.2508.11872
Fink MC, Robinson SA, Ertl B. AI-based avatars are changing the way we learn and teach: Benefits and challenges. Front Educ. 2024;9:1416307. doi: 10.3389/feduc.2024.1416307
Qin Z, Zhao W, Yu X, Sun X. OpenVoice: Versatile instant voice cloning. arXiv. 2023;2312:01479. doi: 10.48550/arXiv.2312.01479
Li T, Zheng R, Yang M, Chen J, Yang M. Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis. In: Proceedings of the 33rd ACM International Conference on Multimedia. ACM; 2025:9704-9713. doi: 10.1145/3746027.3755075
Sun A, Zhang X, Ling T, Wang J, Cheng N, Xiao J. Pre- Avatar: An automatic presentation generation framework leveraging talking avatar. In: Proceedings of the2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE; 2022:1002-1006. doi: 10.1109/ICTAI56018.2022.00153
Uğraş H, Uğraş M, Papadakis S, Kalogiannakis M. ChatGPT-supported education in primary schools: The potential of ChatGPT for sustainable practices. Sustainability. 2024;16(22):9855. doi: 10.3390/su16229855
Arkün-Kocadere S, Çağlar-Özhan Ş. Video lectures with AI-generated instructors: Low video engagement, same performance as human instructors. Int Rev Res Open Distrib Learn. 2024;25(3):350-369. doi: 10.19173/irrodl.v25i3.7815
Duester E, Zhang R. Digital and AI transformation in the contemporary art industry in China. Arts Commun. 2025;3(2):3822. doi: 10.36922/ac.3822
Ramos-Vallecillo N, Murillo-Ligorred V. The phenomenon of artificial intelligence-generated images in university teacher training and its impact on developing critical thinking. Arts Commun. 2025;3(3):5047. doi: 10.36922/ac.5047
Zhao B, Zhan D, Zhang C, Su M. Computer-aided digital media art creation based on artificial intelligence. Neural Comput Appl. 2023;35(35):24565-24574. doi: 10.1007/s00521-023-08584-z
Holmes W, Miao F. Guidance for generative AI in education and research. UNESCO Publishing. 2023.
Bender S. Generative-AI, the media industries, and the disappearance of human creative labour. Media Pract Educ. 2025;26(2):200-217. doi: 10.1080/25741136.2024.2355597
Zhou E, Dokyun L. Generative artificial intelligence, human creativity, and art. PNAS Nexus. 2024;3(3):pgae052. doi: 10.1093/pnasnexus/pgae052
Bomba F, Antonella DA. Agency and authorship in AI art: Transformational practices for epistemic troubles. International J Hum Comput Stud. 2025;205:103652. doi: 10.1016/j.ijhcs.2025.103652
Egon K, Russell J, Julia R. AI in Art and Creativity: Exploring the Boundaries of Human-Machine Collaboration. OSF Preprints. 2023. doi: 10.31219/osf.io/g4nd5
Hsu TWL. Online Art Therapy: Reimagining Body, Place, Object and Relations in the Digital Era. Doctoral dissertation, Goldsmiths, University of London. 2024.
Katalin F. Exploring AI media. Definitions, conceptual model, research agenda. J Media Bus Stud. 2024;21(4):340- 363. doi: 10.1080/16522354.2024.2340419
Tao Z, Liu Y, Qiu J, Li S. Impact of virtual avatar appearance realism on perceptual interaction experience: a network meta-analysis. Front Psychol. 2025;16:1624975. doi: 10.3389/fpsyg.2025.1624975
Mori M, MacDorman KF, Kageki N. The uncanny valley [from the field]. IEEE Robot Autom Mag. 2012;19(2):98-100. doi: 10.1109/MRA.2012.2192811
Mayer RE. Evidence-based principles for how to design effective instructional videos. J Appl Res Mem Cogn. 2021;10(2):229-240. doi: 10.1016/j.jarmac.2021.03.007

Previous article in this issue

Next article in this issue

Arts & Communication, Electronic ISSN: 2972-4090 Published by AccScience Publishing