single-jc.php

JACIII Vol.27 No.4 pp. 710-719
doi: 10.20965/jaciii.2023.p0710
(2023)

Research Paper:

Multimodal Facial Emotion Recognition Using Improved Convolution Neural Networks Model

Chinonso Paschal Udeh*,**,***, Luefeng Chen*,**,***,†, Sheng Du*,**,***, Min Li*,**,***, and Min Wu*,**,***

*School of Automation, China University of Geosciences
No.388 Lumo Road, Hongshan District, Wuhan 430074, China

**Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems
No.388 Lumo Road, Hongshan District, Wuhan 430074, China

***Engineering Research Center of Intelligent Technology for Geo-Exploration, Ministry of Education
No.388 Lumo Road, Hongshan District, Wuhan 430074, China

Corresponding author

Received:
February 16, 2023
Accepted:
April 28, 2023
Published:
July 20, 2023
Keywords:
emotion recognition, facial expression, head pose, convolution neural network, stochastic gradient descent
Abstract

In the quest for human-robot interaction (HRI), leading to the development of emotion recognition, learning, and analysis capabilities, robotics plays a significant role in human perception, attention, decision-making, and social communication. However, the accurate recognition of emotions in HRI remains a challenge. This is due to the coexistence of multiple sources of information in utilizing multimodal facial expressions and head poses as multiple convolutional neural networks (CNN) and deep learning are combined. This research analyzes and improves the robustness of emotion recognition, and proposes a novel approach that optimizes traditional deep neural networks that fall into poor local optima when optimizing the weightings of the deep neural network using standard methods. The proposed approach adaptively finds the better weightings of the network, resulting in a hybrid genetic algorithm with stochastic gradient descent (HGASGD). This hybrid algorithm combines the inherent, implicit parallelism of the genetic algorithm with the better global optimization of stochastic gradient descent (SGD). An experiment shows the effectiveness of our proposed approach in providing complete emotion recognition through a combination of multimodal data, CNNs, and HGASGD, indicating that it represents a powerful tool in achieving interactions between humans and robotics. To validate and test the effectiveness of our proposed approach through experiments, the performance and reliability of our approach and two variants of HGASGD FER are compared using a large dataset of facial images. Our approach integrates multimodal information from facial expressions and head poses, enabling the system to recognize emotions better. The results show that CNN-HGASGD outperforms CNNs-SGD and other existing state-of-the-art methods in terms of FER.

Multimodal facial emotion understanding for HRI with multi-measures using HGASGD

Multimodal facial emotion understanding for HRI with multi-measures using HGASGD

Cite this article as:
C. Udeh, L. Chen, S. Du, M. Li, and M. Wu, “Multimodal Facial Emotion Recognition Using Improved Convolution Neural Networks Model,” J. Adv. Comput. Intell. Intell. Inform., Vol.27 No.4, pp. 710-719, 2023.
Data files:
References
  1. [1] F. Foroni and G. R. Semin, “Language that puts you in touch with your bodily feelings: The multimodal responsiveness of affective expressions,” Psychological Science, Vol.20, No.8, pp. 974-980, 2009. https://doi.org/10.1111/j.1467-9280.2009.02400.x
  2. [2] A. L. Thomaz and C. Breazeal, “Teachable robots: Understanding human teaching behavior to build more effective robot learners,” Artificial Intelligence, Vol.172, No.6-7, pp. 716-737, 2008. https://doi.org/10.1016/j.artint.2007.09.009
  3. [3] C. Korsmeyer and R. W. Picard, “Affective Computing,” Minds and Machines, Vol.9, pp. 443-447, 1999. https://doi.org/10.1023/A:1008329803271
  4. [4] L. Chen, M. Wu, M. Zhou, Z. Liu, J. She, and K. Hirota, “Dynamic emotion understanding in human-robot interaction based on two-layer fuzzy SVR-TS model,” IEEE Trans. on Systems, Man, and Cybernetics: Systems, Vol.50, No.2, pp. 490-501, 2020. https://doi.org/10.1109/TSMC.2017.2756447
  5. [5] F. Afza, M. A. Khan, M. Sharif, S. Kadry, G. Manogaran, T. Saba, I. Ashraf, and R. Damaševičius, “A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection,” Image and Vision Computing, Vol.106, Article No.104090, 2021. https://doi.org/10.1016/j.imavis.2020.104090
  6. [6] A. R. Khan, “FER Using Conventional Machine Learning and Deep Learning Methods: Current Achievements, Analysis and Remaining Challenges,” Information, Vol.13, Article No.268, 2022. https://doi.org/10.3390/info13060268
  7. [7] J. Call and M. Carpenter, “Three sources of information in social learning,” K. Dautenhahn and C. L. Nehaniv (Eds.), “Imitation in animals and artifacts,” Boston Review, 2002.
  8. [8] M. Tomasello, “The cultural origins of human cognition,” Harvard University Press, 2000.
  9. [9] R. Toris, D. Kent, and S. Chernova, “The Robot Management System: A Framework for Conducting Human-Robot Interaction Studies Through Crowdsourcing,” J. of Human-Robot Interaction, Vol.3, No.2, pp. 25-49, 2014. https://doi.org/10.5898/JHRI.3.2.Toris
  10. [10] J. Tao and T. Tan, “Affective Computing: A Review,” Affective Computing and Intelligent Interaction, First Int. Conf. (ACII 2005), 2005. https://doi.org/10.1007/11573548_125
  11. [11] N. Ratliff, “Learning to Search: Structured Prediction Techniques for Imitation Learning,” Ph.D. Thesis, Carnegie Mellon University, 2009.
  12. [12] R. W. Picard, “Affective computing,” M.I.T Media Laboratory Perceptual Computing Section Technical Report, No.321, 1997.
  13. [13] B. Fasel and J. Luettin, “Automatic facial expression analysis: A survey,” Pattern Recognition, Vol.36, No.1, pp. 259-275, 2003. https://doi.org/10.1016/S0031-3203(02)00052-3
  14. [14] N. Elfaramawy, P. Barros, G. I. Parisi, and S. Wermter, “Emotion Recognition from Body Expressions with a Neural Network Architecture,” Proc. of the 5th Int. Conf. on Human Agent Interaction (HAI ’17), pp. 143-149, 2017. https://doi.org/10.1145/3125739.3125772
  15. [15] M. Soleymani, M. Pantic, and T. Pun, “Multimodal emotion recognition in response to videos (extended abstract),” Int. Conf. on Affective Computing and Intelligent Interaction (ACII 2015), 2015. https://doi.org/10.1109/ACII.2015.7344615
  16. [16] C. P. Udeh, L. Chen, and M. Wu, “FER using convolution neural networks-based deep learning model,” Proc. of the 7th Int. Workshop on Advanced Computational Intelligence and Intelligent Informatics (IWACIII2021), Article No.M1-7-5, 2021.
  17. [17] B. Zafar, R. Ashraf, N. Ali, M. K. Iqbal, M. Sajid, S. H. Dar, and N. I. Ratyal, “A novel discriminating and relative global spatial image representation with applications in CBIR,” Applied Sciences, Vol.8, No.11, Article No.2242, 2018. https://doi.org/10.3390/app8112242
  18. [18] N. Mehendale, “FER using convolutional neural networks (FERC),” SN Applied Sciences, Vol.2, No.3, Article No.446, 2020. https://doi.org/10.1007/s42452-020-2234-1
  19. [19] B. Ponsler, “Recognizing Engagement Behaviors in Human-Robot Interaction,” Master’s Theses, Worcester Polytechnic Institute, 2011.
  20. [20] A. Holroyd, C. Rich, C. L. Sidner, and B. Ponsler, “Generating connection events for human-robot collaboration,” IEEE Int. Workshop on Robot and Human Interactive Communication, pp. 241-246, 2011. https://doi.org/10.1109/ROMAN.2011.6005245
  21. [21] T. Kanda, H. Ishiguro, M. Imai, and T. Ono, “Development and evaluation of interactive humanoid robots,” Proc. of the IEEE, Vol.92, No.1, pp. 1839-1850, 2004. https://doi.org/10.1109/JPROC.2004.835359
  22. [22] M. Nakano, Y. Hasegawa, K. Funakoshi, J. Takeuchi, T. Torii, K. Nakadai, N. Kanda, K. Komatani, H. G. Okuno, and H. Tsujino, “A multi-expert model for dialogue and behavior control of conversational robots and agents,” J. of Knowledge-Based Systems, Vol.24, No.2, pp. 248-256, 2011. https://doi.org/10.1016/j.knosys.2010.08.004
  23. [23] C. Chao, “Timing multimodal turn-taking for human-robot cooperation,” Proc. of the 14th ACM Int. Conf. on Multimodal Interaction (ICMI ’12), pp. 309-312, 2012. https://doi.org/10.1145/2388676.2388744
  24. [24] C. Chao and A. L. Thomaz, “Controlling social dynamics with a parametrized model of floor regulation,” J. of Human-Robot Interaction, Vol.2, No.1, pp. 4-29, 2013. https://doi.org/10.5898/JHRI.2.1.Chao
  25. [25] S. Calinon, F. D’halluin, E. L. Sauser, D. G. Caldwell, and A. G. Billard, “Learning and Reproduction of Gestures by Imitation,” IEEE Robotics & Automation Magazine, Vol.17, No.2, pp. 44-54, 2010. https://doi.org/10.1109/MRA.2010.936947
  26. [26] A. N. Meltzoff, “The human infant as imitative generalist: A 20-year progress report on infant imitation with implications for comparative psychology,” C. M. Heyes and B. G. Galef, Jr. (Eds.), “Social learning in animals: The roots of culture,” pp. 347-370, Academic Press, 1996. https://doi.org/10.1016/B978-012273965-1/50017-0
  27. [27] M. Sajid, N. I. Ratyal, N. Ali, B. Zafar, S. H. Dar, M. T. Mahmood, and Y. B. Joo, “The impact of asymmetric left and asymmetric right face images on accurate age estimation,” J. of Mathematical Problems in Engineering, Vol.2019, Article No.8041413, 2019. https://doi.org/10.1155/2019/8041413
  28. [28] N. Ratyal, I. Taj, U. Bajwa, and M. Sajid, “Pose and expression invariant alignment based multi-view 3D face recognition,” KSII Trans. on Internet and Information Systems (TIIS), Vol.12, No.10, pp. 4903-4929, 2018. https://doi.org/10.3837/tiis.2018.10.016
  29. [29] S. Xie and H. Hu, “Facial expression recognition using hierarchical features with deep comprehensive multipatches aggregation convolutional neural networks,” IEEE Trans. on Multimedia, Vol.21, No.1, pp. 211-220, 2018. https://doi.org/10.1109/TMM.2018.2844085
  30. [30] B. Qin, L. Liang, J. Wu, Q. Quan, Z. Wang, and D. Li, “Automatic identification of down syndrome using facial images with deep convolutional neural network,” Diagnostics, Vol.10, No.7, Article No.487, 2020. https://doi.org/10.3390/diagnostics10070487
  31. [31] J. M. F. Dols and J. A. Russell, “The science of facial expression,” Oxford University Press, 2017.
  32. [32] P. E. Ekman, W. V. Friesen, and J. C. Hager, “Facial action coding system (FACS),” A Human Face, Salt Lake City, 2002.
  33. [33] J. Yan, Z. Lei, L. Wen, and S. Z. Li, “The fastest deformable part model for object detection,” Proc. of the 2014 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2497-2504, 2014. https://doi.org/10.1109/CVPR.2014.320
  34. [34] R. Cowie, E. Douglas-Cowie, J. G. Taylor, S. Ioannou, and S. D. Kollias, “An intelligent system for FER,” Proc. of the 2005 IEEE Int. Conf. on Multimedia and Expo (ICME), 2005. https://doi.org/10.1109/ICME.2005.1521570
  35. [35] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional neural network cascade for face detection,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 5325-5334, 2015. https://doi.org/10.1109/CVPR.2015.7299170
  36. [36] P. Barros, D. Jirak, C. Weber, and S. Wermter, “Multimodal emotional state recognition using sequence-dependent deep hierarchical features,” Neural Networks, Vol.72, pp. 140-151, 2015. https://doi.org/10.1016/j.neunet.2015.09.009
  37. [37] D. Wu, L. Pigou, P.-J. Kindermans, N. D.-H. Le, L. Shao, J. Dambre, and J.-M. Odobez, “Deep dynamic neural networks for multimodal gesture segmentation and recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.38, No.8, pp. 1583-1597, 2016. https://doi.org/10.1109/TPAMI.2016.2537340
  38. [38] T. R. Schäfle, M. Mitschke, and N. Uchiyama, “Generation of optimal coverage paths for mobile robots using hybrid genetic algorithm,” J. Robot. Mechatron., Vol.33, No.1, pp. 11-23, 2021. https://doi.org/10.20965/jrm.2021.p0011
  39. [39] A. Behera, A. G. Gidney, Z. Wharton, D. Robinson, and K. Quinn, “A CNN model for head pose recognition using wholes and regions,” 2019 14th IEEE Int. Conf. on Automatic Face & Gesture Recognition (FG 2019), 2019. https://doi.org/10.1109/FG.2019.8756536
  40. [40] L. Alzubaidi, J. Zhang, A. J. Humaidi, A. Al-Dujaili, Y. Duan, O. Al-Shamma, J. Santamarła, M. A. Fadhel, M. Al-Amidie, and L. Farhan, “Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions,” J. of Big Data, Vol.8, No.1, Article No.53, 2021. https://doi.org/10.1186/s40537-021-00444-8
  41. [41] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. of the IEEE, Vol.86, No.11, pp. 2278-2324, 1998. https://doi.org/10.1109/5.726791
  42. [42] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, Vol.52, No.7553, pp. 436-444, 2015. https://doi.org/10.1038/nature14539
  43. [43] S. Wang, S. Wu, G. Peng, and Q. Ji, “Capturing feature and label relations simultaneously for multiple facial action unit recognition,” IEEE Trans. on Affective Computing, Vol.10, Issue 3, pp. 348-359, 2019. https://doi.org/10.1109/TAFFC.2017.2737540
  44. [44] “A bimodal face and body pose database,” 2006. http://mmv.eecs.qmul.ac.uk/fabo/ [Accessed August 20, 2006]
  45. [45] P. Viola and M. J. Jones, “Robust real-time object detection,” Int. J. of Computer Vision, Vol.57, No.2, pp. 137-154, 2004. https://doi.org/10.1023/B:VISI.0000013087.49260.fb

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Apr. 22, 2024