Decentralized reinforcement learning for scalable embodied intelligence in robotic swarms

² Department of Computer Science and Engineering, College of Engineering and Technology, Dr V. R. Godhania Institute of Engineering, IT, and Management, Porbandar, Gujarat, India

EIR, 025380008 https://doi.org/10.36922/EIR025380008

Received: 21 September 2025 | Revised: 10 October 2025 | Accepted: 14 October 2025 | Published online: 31 October 2025

© 2025 by the Author(s). This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution -Noncommercial 4.0 International License (CC-by the license) ( https://creativecommons.org/licenses/by-nc/4.0/ )

Download PDF

XML

Cite

Abstract

The realization of scalable embodied intelligence in robotic swarms represents a fundamental challenge in robotics and artificial intelligence, hindered by the limitations of conventional decentralized multi-agent reinforcement learning approaches. This paper introduces a novel integrated framework for decentralized reinforcement learning that holistically addresses these challenges through three key innovations: a dynamic hypergraph convolutional communication protocol for bandwidth-efficient coordination, a hierarchical policy network with recurrent state estimation for managing partial observability and enabling scalability, and a federated learning-inspired training paradigm for enhanced robustness and sim-to-real transfer. Extensive experimental evaluation demonstrates that our approach achieves a 78% reduction in communication overhead compared to state-of-the-art baselines while maintaining superior task performance in swarms of up to 50 agents in simulation. Crucially, the framework exhibits remarkable robustness, showing only a 16–22% performance drop when transferred from simulation to reality, compared to 45–68% for baseline methods. Physical validation on a swarm of 15 nano drones confirms the practical efficacy of our approach, with an 85% success rate in dynamic target pursuit tasks. Statistical analysis confirms the significance of these improvements (p<0.001). These results collectively establish a new state-of-the-art for deploying scalable, communication-efficient, and robust embodied intelligence in robotic swarms.

Keywords

Robotic swarms

Embodied intelligence

Decentralized reinforcement learning

Multi-agent systems

Hypergraph neural networks

Federated learning

Sim-to-real transfer

Scalable autonomy

Funding

None.

Conflict of interest

The authors declare that they have no competing interests.

References

Rockbach JD, Bennewitz M. Robot swarms as embodied extensions of humans. IOP Conf Ser Mater Sci Eng. 2022;1261(1):012015. doi: 10.1088/1757-899x/1261/1/012015

Gautam A, Mohan S. A Review of Research in Multi-Robot Systems. In: 2012 IEEE 7^thInternational Conference on Industrial and Information Systems (ICIIS). Chennai, India. IEEE; 2012. p. 1-5. doi: 10.1109/ICIInfS.2012.6304778

Omicini A. Agents Writing on Walls: Cognitive Stigmergy and Beyond. College Publications eBooks; 2012. p. 565-578. Available from: https://cris.unibo.it/handle/11585/132993 [Last accessed on 2025 Aug 31].

Ganin AA, Massaro E, Gutfraind A, et al. Operational resilience: Concepts, design and analysis. Sci Rep. 2016;6(1):19540. doi: 10.1038/srep19540

Tan M. Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. Amsterdam: Elsevier; 1993. p. 330-337. doi: 10.1016/b978-1-55860-307-3.50049-6

Li T, Zhu K, Luong NC, et al. Applications of multi-agent reinforcement learning in future internet: A comprehensive survey. IEEE Commun Surv Tutor. 2022;24(2):1240-1279. doi: 10.1109/comst.2022.3160697

Gronauer S, Diepold K. Multi-agent deep reinforcement learning: A survey. Artif Intell Rev. 2021;55(2):895-943. doi: 10.1007/s10462-021-09996-w

Pendleton B, Goodrich M. Scalable Human Interaction with Robotic Swarms. In: AIAA Infotech@Aerospace (I@a) Conference; 2013. doi: 10.2514/6.2013-4731

Bai Y, Gong C, Zhang B, Fan, G, Hou X, Lu Y. Cooperative Multi-Agent Reinforcement Learning with Hypergraph Convolution. In: 2022 International Joint Conference on Neural Networks (IJCNN); 2022. p. 1-8. doi: 10.1109/ijcnn55064.2022.9891942

Bhatt NDM. Self-Adaptive sensor fault detection in IoT health monitoring using federated learning and lightweight transformers. J Inf Syst Eng Manage. 2025;10(41s):298-309. doi: 10.52783/jisem.v10i41s.7838

Madhavi M, Agal S, Odedra ND, et al. Elevating offensive language detection: CNN-GRU and BERT for enhanced hate speech identification. Int J Adv Comput Sci Appl. 2024;15(5): 1164–1172. doi: 10.14569/ijacsa.2024.01505118

Hammoud A, Iskandar A, Kovács B. Dynamic foraging in swarm robotics: A hybrid approach with modular design and deep reinforcement learning intelligence. Inf Autom. 2025;24(1):51-71. doi: 10.15622/ia.24.1.3

Thilak KR, Chandrasekar P. Modeling and simulation of anaerobic digestion-gasification integration using MADRL-FAHP. Biofuels. 2025;1-24. doi: 10.1080/17597269.2025.2547552

Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S. Counterfactual Multi-Agent policy gradients. Proc AAAI Conf Artif Intell. 2018;32(1):2974-2982. doi: 10.1609/aaai.v32i1.11794

Chai J, Li W, Zhu Y, et al. UNMAS: Multiagent reinforcement learning for unshaped cooperative scenarios. IEEE Trans Neural Netw Learn Syst. 2021;34(4):2093-2104. doi: 10.1109/tnnls.2021.3105869

Yan Z, Xu Y. A multi-Agent deep reinforcement learning method for cooperative load frequency control of a Multi-Area power system. IEEE Trans Power Syst. 2020;35(6):4599-4608. doi: 10.1109/tpwrs.2020.2999890

Kadumbadi V, Packirisamy T, Sivakumar B, Seenuvasan P. Optimizing cluster head selection and routing in 5G WSNs: A reinforcement learning and deep learning approach. Commun Opt Connect. 2025;2. doi: 10.69709/cnc.2025.138412

Guo Z, Wu Z, Xiao T, Aggarwal C, Liu H, Wang S. Counterfactual learning on graphs: A survey. Deleted J. 2025;22(1):17-59. doi: 10.1007/s11633-024-1519-z

Shekhar J, Bhargavi N, Merlin JS, Suresh HR, RajaSekhar J, Prakalya SB. Employing reinforcement learning in autonomous Vehicle-to-Vehicle communication systems. Int J Comput Exp Sci Eng. 2025;11(3):4329–4335. doi: 10.22399/ijcesen.2490

Yu Y. Managing Complex Intelligent Systems : The Coexistence of Generativity and Criticality. In: Linköping Studies in Science and Technology. [Dissertations]; 2025. doi: 10.3384/9789180759984

Ryu H, Shin H, Park J. Multi-agent actor-critic with hierarchical graph attention network. Proc AAAI Conf Artif Intell. 2020;34(5):236-7243. doi: 10.1609/aaai.v34i05.6214

Ding R, Yang Z, Wei Y, Jin H, Wang X. Multi-Agent Reinforcement Learning for Urban Crowd Sensing with for- Hire Vehicles. In: IEEE INFOCOM 2022 - IEEE Conference on Computer Communications; 2021. doi: 10.1109/infocom42981.2021.9488713

Tan J, Zhang T, Coumans E, et al. Sim-to-real: Learning agile locomotion for quadruped robots. Robot Sci Syst Proc 2018. doi: 10.15607/rss.2018.xiv.010

Kyriazos T, Poga M. The hybrid modern network model: A multi-technique framework for comprehensive network analysis. Interpers Int J Pers Relat. 2025;19(1):135-158. doi: 10.5964/ijpr.15021

Kingslin S, Vaishnavi K. Comparative analysis of AI-driven IoT-based smart agriculture platforms with blockchain-enabled marketplaces. Int J Res Innov Appl Sci. 2025;10(9):243-249. doi: 10.51584/ijrias.2025.100900021

Zhao W, Queralta JP, Westerlund T. Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI); 2020. p737-744. doi: 10.1109/ssci47803.2020.9308468

Nguyen DC, Ding M, Pathirana PN, Seneviratne A, Li J, Poor HV. Federated learning for internet of things: A comprehensive survey. IEEE Commun Surv Tutor. 2020;23(3):1622-1658. doi: 10.1109/comst.2021.3075439

Sun W, Lei S, Wang L, Liu Z, Zhang Y. Adaptive federated learning and digital twin for industrial internet of things. IEEE Trans Ind Inf. 2020;17(8):5605-5614. doi: 10.1109/tii.2020.3034674

2020 Index IEEE Transactions on Components, Packaging and Manufacturing Technology. IEEE Trans Compon Packaging Manuf Technol. 2020;10(12):2133–2186. doi: 10.1109/tcpmt.2020.3045544

Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Stat. 2001;29(5):1189-1232. doi: 10.1214/aos/1013203451

Eccles T, Bachrach Y, Lever G, Lazaridou A, Graepel T. Biases for emergent Communication in Multi-Agent Reinforcement Learning. Vol. 32. New York: Cornell University; 2019. p. 13111-13121.

Hüttenrauch M, Šošić A, Neumann G. Deep reinforcement learning for swarm systems. J Mach Learn Res. 2019;20(54):1-31. doi: 10.48550/arXiv.1807.06613

Previous article in this issue

Next article in this issue

Embodied Intelligence and Robotics, Published by AccScience Publishing