single-jc.php

JACIII Vol.30 No.3 pp. 781-794
(2026)

Research Paper:

Perceptual Interaction System of Elderly Care Robot Based on Multimodal Large Language Models and Distributed Computing

Aihui Wang ORCID Icon, Kuozhan Wang, Yan Wang ORCID Icon, Xuebin Yue ORCID Icon, Hengyi Li ORCID Icon, and Yao Yao

School of Automation and Electrical Engineering, Zhongyuan University of Technology
No.41 Zhongyuan Road, Zhengzhou 450007, China

Corresponding author

Received:
June 30, 2025
Accepted:
January 7, 2026
Published:
May 20, 2026
Keywords:
Pepper robot, vision-language model, large language model, environmental perception, human–robot interaction
Abstract

Population aging intensifies the demands for elderly care robotics. However, current robots have limited environmental perception and interaction, making it difficult to meet application needs. We propose a perceptual interaction system for elderly care robots. The system combines multimodal large language models with distributed computing. Distributed computing allows the vision-language models and large language model to be deployed on a server. The robot performs environmental perception and kinematic solving. Computational resources are rationally allocated, enabling complex human–robot interactions such as dialogue, visual question answering, and object retrieval assistance. Experimental results show that the VQA module achieves an accuracy of 53.16% on the COCO-QA dataset and 66.7% on the VQA-v2 dataset. The mAP of the ZSD module on the COCO val 2014 dataset is 43.8%. These models were deployed to the robotic system in a constrained simulated living room setting. The average response time of the robotic interaction system was 1.346 seconds. We also collected feedback from 10 participating users to verify the feasibility of the robotic system in a home setting.

Elderly care robot system architecture

Elderly care robot system architecture

Cite this article as:
A. Wang, K. Wang, Y. Wang, X. Yue, H. Li, and Y. Yao, “Perceptual Interaction System of Elderly Care Robot Based on Multimodal Large Language Models and Distributed Computing,” J. Adv. Comput. Intell. Intell. Inform., Vol.30 No.3, pp. 781-794, 2026.
Data files:
References
  1. [1] C. T. Kulik, S. Ryan, S. Harper, and G. George, “Aging Populations and Management,” The Academy of Management J., Vol.57, No.4, pp. 929-935, 2014. https://doi.org/10.5465/amj.2014.4004
  2. [2] W. C. Sanderson and S. Scherbov, “A new perspective on population aging,” Demographic Research, Vol.16, pp. 27-58, 2007. https://doi.org/10.4054/DemRes.2007.16.2
  3. [3] Y. Cui, L. Zhang, Y. Hou, and G. Tian, “Design of intelligent home pension service platform based on machine learning and wireless sensor network,” J. of Intelligent & Fuzzy Systems, Vol.40, Issue 2, pp. 2529-2540, 2021. https://doi.org/10.3233/JIFS-189246
  4. [4] S. Guo and S. Dong, “Research and Innovation of a Community Intelligent Pension Service System: Taking Longhua District, Shenzhen, China, as an Example,” J. of Computer Science and Technology Studies, Vol.6, No.2, pp. 71-75, 2024. https://doi.org/10.32996/jcsts.2024.6.2.8
  5. [5] J. Wang, Y. Liang, S. Cao, P. Cai, and Y. Fan, “Application of Artificial Intelligence in Geriatric Care: Bibliometric Analysis,” J. of Medical Internet Research, Vol.25, Article No.e46014, 2023. https://doi.org/10.2196/46014
  6. [6] T. Bin, H. Yan, N. Wang, M. N. Nikolić, J. Yao, and T. Zhang, “A survey on the visual perception of humanoid robot,” Biomimetic Intelligence and Robotics, Vol.5, Issue 1, Article No.100197, 2025. https://doi.org/10.1016/j.birob.2024.100197
  7. [7] Z. Zhu, C. Chen, X. Liu, K. Liang, and Y. Jia, “Design and Implementation of Digital Twin System of OCS Maintenance Robot,” J. Adv. Comput. Intell. Intell. Inform., Vol.29, No.5, pp. 1062-1067, 2025. https://doi.org/10.20965/jaciii.2025.p1062
  8. [8] R. Harada, T. Oyama, K. Fujimoto, T. Shimizu, M. Ozawa, J. S. Amar, and M. Sakai, “Trash Detection Algorithm Suitable for Mobile Robots Using Improved YOLO,” J. Adv. Comput. Intell. Intell. Inform., Vol.27, No.4, pp. 622-631, 2023. https://doi.org/10.20965/jaciii.2023.p0622
  9. [9] Y. Sone and J. Woo, “Design of a Human-Centric Robotic System for User Support Based on Gaze Information,” J. Adv. Comput. Intell. Intell. Inform., Vol.29, No.4, pp. 796-802, 2025. https://doi.org/10.20965/jaciii.2025.p0796
  10. [10] Y. Fan, Y. Chen, C.-T. Chen, and J. Zhao, “Design of intelligent elderly care robot system based on ROS,” 3rd Int. Conf. on Electronic Information Engineering and Data Processing (EIEDP 2024), Vol.13184, pp. 1436-1443, 2024. https://doi.org/10.1117/12.3032907
  11. [11] K. K. F. So, H. Kim, S. Q. Liu, X. Fang, and J. Wirtz, “Service robots: The dynamic effects of anthropomorphism and functional perceptions on consumers’ responses,” European J. of Marketing, Vol.58, Issue 1, pp. 1-32, 2024. https://doi.org/10.1108/EJM-03-2022-0176
  12. [12] Y. Yamazaki, M. Ishii, T. Ito, and T. Hashimoto, “Frailty Care Robot for Elderly and its Application for Physical and Psychological Support,” J. Adv. Comput. Intell. Intell. Inform., Vol.25, No.6, pp. 944-952, 2021. https://doi.org/10.20965/jaciii.2021.p0944
  13. [13] J. C. Briede-Westermeyer, P. G. R. Fraga, M. J. Schilling-Norman, and C. Pérez-Villalobos, “Identifying the Needs of Older Adults Associated with Daily Activities: A Qualitative Study,” Int. J. of Environmental Research and Public Health, Vol.20, Issue 5, Article No.4257, 2023. https://doi.org/10.3390/ijerph20054257
  14. [14] G. D’Onofrio, L. Fiorini, H. Hoshino, A. Matsumori, Y. Okabe, M. Tsukamoto, R. Limosani, A. Vitanza, F. Greco, A. Greco et al., “Assistive robots for socialization in elderly people: Results pertaining to the needs of the users,” Aging Clinical and Experimental Research, Vol.31, No.9, pp. 1313-1329, 2019. https://doi.org/10.1007/s40520-018-1073-z
  15. [15] M. Shimosaka, H. Nishimoto, S. Okahashi, D. Zeng, K. Fukui, T. Kawasaki, I. Akiguchi, and A. Kinoshita, “Assessment of instrumental activities of daily living in patients with cognitive impairment based on their ability to use household appliances,” J. of Alzheimer’s Disease, Vol.104, Issue 3, pp. 919-932, 2025. https://doi.org/10.1177/13872877251320668
  16. [16] R. A. Cohen and L. Mykyta, “Prescription Medication Use, Coverage, and Nonadherence Among Adults Age 65 and Older: United States, 2021-2022,” National Health Statistics Reports, No.209, 2024. https://doi.org/10.15620/cdc/160016
  17. [17] A. C. Umfress and M. A. Brantley Jr., “Eye Care Disparities and Health-Related Consequences in Elderly Patients with Age-Related Eye Disease,” Seminars in Ophthalmology, Vol.31, Issue 4, pp. 432-438, 2016. https://doi.org/10.3109/08820538.2016.1154171
  18. [18] J. Wu, J. Gao, J. Yi, P. Liu, and C. Xu, “Environment Perception Technology for Intelligent Robots in Complex Environments: A Review,” 2022 7th Int. Conf. on Communication, Image and Signal Processing (CCISP), pp. 479-485, 2022. https://doi.org/10.1109/CCISP55629.2022.9974277
  19. [19] M. Marge, C. Espy-Wilson, N. G. Ward, A. Alwan, Y. Artzi, M. Bansal, G. Blankenship, J. Chai, H. Daumé III, D. Dey, M. Harper, T. Howard, C. Kennington, I. Kruijff-Korbayová, D. Manocha, C. Matuszek, R. Mead, R. Mooney, R. K. Moore, M. Ostendorf, H. Pon-Barry, A. I. Rudnicky, M. Scheutz, R. St. Amant, T. Sun, S. Tellex, D. Traum, and Z. Yu, “Spoken language interaction with robots: Recommendations for future research,” Computer Speech & Language, Vol.71, Article No.101255, 2022. https://doi.org/10.1016/j.csl.2021.101255
  20. [20] Q. Sheng, Z. Zhou, J. Li, X. Mi, P. Xiang, Z. Chen, H. Xu, S. Jia, X. Wu, Y. Cui, S. Ye, J. Yu, Y. Du, S. Zhai, K. Xu, Y. Yang, Z. Lou, Z. Song, Z. Yin, Y. Sun, R. Xiong, J. Zou, and H. Yang, “A Comprehensive Review of Humanoid Robots,” SmartBot, Vol.1, Issue 1, Article No.e12008, 2025. https://doi.org/10.1002/smb2.12008
  21. [21] C. Zhang, Z. Yang, X. He, and L. Deng, “Multimodal Intelligence: Representation Learning, Information Fusion, and Applications,” IEEE J. of Selected Topics in Signal Processing, Vol.14, Issue 3, pp. 478-493, 2020. https://doi.org/10.1109/JSTSP.2020.2987728
  22. [22] J. Kuffner, K. Nishiwaki, S. Kagami, M. Inaba, and H. Inoue, “Motion Planning for Humanoid Robots,” P. Dario and R. Chatila (Eds.), “Robotics Research – 11th Int. Symp.,” pp. 365-374, Springer, 2005. https://doi.org/10.1007/11008941_39
  23. [23] Y. Guo, G. Ding, J. Han, and Y. Gao, “Zero-Shot Learning with Transferred Samples,” IEEE Trans. on Image Processing, Vol.26, Issue 7, pp. 3277-3290, 2017. https://doi.org/10.1109/TIP.2017.2696747
  24. [24] J. Li, D. Li, C. Xiong, and S. Hoi, “BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation,” Proc. of the 39th Int. Conf. on Machine Learning, pp. 12888-12900, 2022.
  25. [25] J. Li, D. Li, S. Savarese, and S. Hoi, “BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models,” Proc. of the 40th Int. Conf. on Machine Learning, pp. 19730-19742, 2023.
  26. [26] L. H. Li, P. Zhang, H. Zhang, J. Yang, C. Li, Y. Zhong, L. Wang, L. Yuan, L. Zhang, J.-N. Hwang, K.-W. Chang, and J. Gao, “Grounded Language-Image Pre-training,” 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 10955-10965, 2022. https://doi.org/10.1109/CVPR52688.2022.01069
  27. [27] H. Zhang, P. Zhang, X. Hu, Y.-C. Chen, L. H. Li, X. Dai, L. Wang, L. Yuan, J.-N. Hwang, and J. Gao, “GLIPv2: Unifying Localization and VL Understanding,” 36th Conf. Neural Inf. Process. Syst. (NeurIPS 2022), 2022.
  28. [28] DeepSeek-AI, D. Guo, D. Yang, H. Zhang, J. Song, P. Wang, Q. Zhu, R. Xu, R. Zhang, S. Ma, X. Bi, X. Zhang, X. Yu et al., “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning,” arXiv preprint, arXiv:2501.12948, 2025. https://doi.org/10.48550/arXiv.2501.12948
  29. [29] A. K. Pandey and R. Gelin, “A Mass-Produced Sociable Humanoid Robot: Pepper: The First Machine of Its Kind,” IEEE Robotics & Automation Magazine, Vol.25, Issue 3, pp. 40-48, 2018. https://doi.org/10.1109/MRA.2018.2833157
  30. [30] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Proc. of the 34th Int. Conf. on Advances in Neural Information Processing Systems (NIPS’20), Vol.33, pp. 1877-1901, 2020.
  31. [31] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “LLaMA: Open and Efficient Foundation Language Models,” arXiv preprint, arXiv:2302.13971, 2023. https://doi.org/10.48550/arXiv.2302.13971
  32. [32] J. Wang, E. Shi, H. Hu, C. Ma, Y. Liu, X. Wang, Y. Yao, X. Liu, B. Ge, and S. Zhang, “Large language models for robotics: Opportunities, challenges, and perspectives,” J. of Automation and Intelligence, Vol.4, Issue 1, pp. 52-64, 2025. https://doi.org/10.1016/j.jai.2024.12.003
  33. [33] R. Mon-Williams, G. Li, R. Long, W. Du, and C. G. Lucas, “Embodied large language models enable robots to complete complex tasks in unpredictable environments,” Nature Machine Intelligence, Vol.7, pp. 592-601, 2025. https://doi.org/10.1038/s42256-025-01005-x
  34. [34] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is All You Need,” 31st Conf. on Neural Information Processing Systems (NIPS 2017), Vol.30, 2017.
  35. [35] A. Masumori, N. Maruyama, and T. Ikegami, “Personogenesis Through Imitating Human Behavior in a Humanoid Robot “Alter3”,” Frontiers in Robotics and AI, Vol.7, Article No.532375, 2021. https://doi.org/10.3389/frobt.2020.532375
  36. [36] Y. Ye, H. You, and J. Du, “Improved Trust in Human-Robot Collaboration with ChatGPT,” IEEE Access, Vol.11, pp. 55748-55754, 2023. https://doi.org/10.1109/ACCESS.2023.3282111
  37. [37] A. Obludzyner, F. Zaldivar, and O. E. Ramos, “Kinematic Control for the Motion Generation of Robot Manipulators Using MoMask LLM,” 2024 IEEE XXXI Int. Conf. on Electronics, Electrical Engineering and Computing (INTERCON), 2024. https://doi.org/10.1109/INTERCON63140.2024.10833232
  38. [38] H. Liu, Y. Zhu, K. Kato, A. Tsukahara, I. Kondo, T. Aoyama, and Y. Hasegawa, “Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration,” IEEE Robotics and Automation Letters, Vol.9, Issue 8, pp. 6904-6911, 2024. https://doi.org/10.1109/LRA.2024.3415931
  39. [39] D. Zheng, S. Huang, L. Zhao, Y. Zhong, and L. Wang, “Towards Learning a Generalist Model for Embodied Navigation,” 2024 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 13624-13634, 2024. https://doi.org/10.1109/CVPR52733.2024.01293
  40. [40] X. Yue and L. Meng, “YOLO-SM: A Lightweight Single-Class Multi-Deformation Object Detection Network,” IEEE Trans. on Emerging Topics in Computational Intelligence, Vol.8, Issue 3, pp. 2467-2480, 2024. https://doi.org/10.1109/TETCI.2024.3367821
  41. [41] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016. https://doi.org/10.1109/CVPR.2016.91
  42. [42] Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo, “Image Captioning with Semantic Attention,” 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4651-4659, 2016. https://doi.org/10.1109/CVPR.2016.503
  43. [43] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang, “Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering,” 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 6077-6086, 2018. https://doi.org/10.1109/CVPR.2018.00636
  44. [44] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning Transferable Visual Models From Natural Language Supervision,” Proc. of the 38th Int. Conf. on Machine Learning, pp. 8748-8763, 2021.
  45. [45] W. Yuan, H. Sun, X. Wang, and X. Liu, “Towards Efficient Deployment of Cloud Applications through Dynamic Reverse Proxy Optimization,” 2013 IEEE 10th Int. Conf. on High Performance Computing and Communications & 2013 IEEE Int. Conf. on Embedded and Ubiquitous Computing, pp. 651-658, 2013. https://doi.org/10.1109/HPCC.and.EUC.2013.97
  46. [46] K. Wang, A. Wang, Y. Wang, X. Yue, J. Xie, and Y. Wang, “Target Grasping and Multi-modal Interaction System Based on Pepper Robot,” 2024 Int. Conf. on Advanced Mechatronic Systems (ICAMechS), pp. 181-186, 2024. https://doi.org/10.1109/ICAMechS63130.2024.10818731
  47. [47] X. Yue, H. Li, and L. Meng, “AI-based Prevention Embedded System Against COVID-19 in Daily Life,” Procedia Computer Science, Vol.202, pp. 152-157, 2022. https://doi.org/10.1016/j.procs.2022.04.021
  48. [48] S. Wen, Z. Shi, and H. Li, “Coordinated Transport by Dual Humanoid Robots Using Distributed Model Predictive Control,” Biomimetics, Vol.9, Issue 6, Article No.332, 2024. https://doi.org/10.3390/biomimetics9060332
  49. [49] P. I. Corke, “A Simple and Systematic Approach to Assigning Denavit–Hartenberg Parameters,” IEEE Trans. on Robotics, Vol.23, Issue 3, pp. 590-594, 2007. https://doi.org/10.1109/TRO.2007.896765
  50. [50] S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh, “VQA: Visual Question Answering,” 2015 IEEE Int. Conf. on Computer Vision (ICCV), pp. 2425-2433, 2015. https://doi.org/10.1109/ICCV.2015.279
  51. [51] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: A method for automatic evaluation of machine translation,” Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02), pp. 311-318, 2002. https://doi.org/10.3115/1073083.1073135
  52. [52] S. Banerjee and A. Lavie, “METEOR: An automatic metric for MT evaluation with improved correlation with human judgments," Proc. of ACL-WMT, pp. 65-72, 2005.
  53. [53] C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries,” Text Summarization Branches Out, pp. 74-81, 2004.
  54. [54] D. K. Po, “Similarity Based Information Retrieval Using Levenshtein Distance Algorithm,” Int. J. Adv. Sci. Res. Eng., Vol.6, Issue 4, pp. 6-10, 2020. https://doi.org/10.31695/IJASRE.2020.33780

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on May. 20, 2026