A Hybrid Optimization Approach for 3D Multi-Camera Human Pose Estimation

Masatoshi Eguchi; Takenori Obo; Naoyuki Kubota

doi:10.20965/jaciii.2024.p1344

single-jc.php

« previous

JACIII Vol.28 No.6 pp. 1344-1353

doi: 10.20965/jaciii.2024.p1344

(2024)

Research Paper:

Views over last 60 days: 240

A Hybrid Optimization Approach for 3D Multi-Camera Human Pose Estimation

Masatoshi Eguchi^†, Takenori Obo , and Naoyuki Kubota

Department of Mechanical System Engineering, Graduate School of Systems Design, Tokyo Metropolitan University
6-6 Asahigaoka, Hino, Tokyo 191-0065, Japan

^†Corresponding author

Received:

April 1, 2024

Accepted:

September 9, 2024

Published:

November 20, 2024

Keywords:

particle swarm optimization, steepest descent method, motion capture

Abstract

This paper introduces a method for estimating 3D human joint angles using a hybrid optimization approach that integrates particle swarm optimization (PSO) with the steepest descent method for enhanced accuracy in both global and local searches. While advancements in motion capture technologies have made it easier to obtain 2D human joint position data, the accurate estimation of 3D joint angles remains crucial for detailed behavior analysis. Our proposed method first applies PSO to optimize the initial estimation of 3D joint angles from 2D joint positions. We further refine the estimation using the steepest descent method, improving the local search process. The convergence and accuracy of the algorithm are influenced by the grouping strategy in PSO, which is discussed in detail. Experimental results validate the effectiveness of our approach in enhancing the accuracy of 3D human pose estimation.

Cite this article as:

M. Eguchi, T. Obo, and N. Kubota, “A Hybrid Optimization Approach for 3D Multi-Camera Human Pose Estimation,” J. Adv. Comput. Intell. Intell. Inform., Vol.28 No.6, pp. 1344-1353, 2024.

Data files:

References

[1] C. Greer, M. Burns, D. Wollman, and E. Griffor, “Cyber-physical systems and internet of things,” NIST Special Publication, National Institute of Standards and Technology, 2019. https://doi.org/10.6028/NIST.SP.1900-202
[2] K. Oshio, K. Kaneko, and N. Kubota, “Multi-scopic simulation for human-robot interactions based on multi-objective behavior coordination,” The 7th Int. Workshop on Advanced Computational Intelligence and Intelligent Informatics, 2021.
[3] Z. Zhang, “Microsoft Kinect sensor and its effect,” IEEE MultiMedia, Vol.19, Issue 2, pp. 4-10, 2012. https://doi.org/10.1109/MMUL.2012.24
[4] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake, “Real-time human pose recognition in parts from single depth images,” Proc. of IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1297-1304, 2011. https://doi.org/10.1109/CVPR.2011.5995316
[5] J. Martinez, R. Hossain, J. Romero, and J. J. Little, “A simple yet effective baseline for 3D human pose estimation,” Proc. of 2010 IEEE Int. Conf. Comput. Vis. (ICCV), pp. 2640-2649, 2017. https://doi.org/10.1109/ICCV.2017.288
[6] M. Eguchi, A. Yorita, R. Kaburagi, R. Tanno, K. Hamada, M. Watanabe, and N. Kubota, “Measurement system for daily activities in a trailer-type living laboratory,” 30th Symp. on Fuzzy, Artificial Intelligence, Neural Networks and Computational Intelligence, Vol.38, 2022 (in Japanese).
[7] J. Kennedy and R. Eberhart, “Particle swarm optimization,” Proc. of ICNN’95 Int. Conf. Neural Networks, pp. 1942-1948, 1995. https://doi.org/10.1109/ICNN.1995.488968
[8] D. Mehta, S. Sridhar, O. Sotnychenko, H. Rhodin, M. Shafiei, H. P. Seidel, W. Xu, D. Casas, and C. Theobalt, “VNect: Real-time 3D human pose estimation with a single RGB camera,” ACM Trans. Graph., Vol.36, No.4, Article No.44, 2017. https://doi.org/10.1145/3072959.3073596
[9] X. Zhou, Q. Huang, X. Sun, X. Xue, and Y. Wei, “Towards 3D human pose estimation in the wild: A weakly-supervised approach,” Proc. of 2017 IEEE Int. Conf. Comput. Vis. (ICCV), pp. 398-407, 2017. https://doi.org/10.1109/ICCV.2017.51
[10] J. Xu, Z. Yu, B. Ni, J. Yang, X. Yang, and W. Zhang, “Deep kinematics analysis for monocular 3D human pose estimation,” Proc. of 2020 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 896-905, 2020. https://doi.org/10.1109/CVPR42600.2020.00098
[11] M. Kocabas, S. Karagoz, and E. Akbas, “Self-supervised learning of 3D human pose using multi-view geometry,” Proc. of 2019 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1077-1086, 2019. https://doi.org/10.1109/CVPR.2019.00117
[12] F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black, “Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image,” Proc. of Eur. Conf. Comput. Vis. (ECCV), pp. 561-578, 2016. https://doi.org/10.1007/978-3-319-46454-1_34
[13] X. Zhou, X. Sun, W. Zhang, S. Liang, and Y. Wei, “Deep kinematic pose regression,” Proc. of ECCV Workshops, pp. 186-201, 2016. https://doi.org/10.1007/978-3-319-49409-8_17
[14] X. Zhou, M. Zhu, S. Leonardos, K. G. Derpanis, and K. Daniilidis, “Sparseness meets deepness: 3D human pose estimation from monocular video,” Proc. of 2016 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 4966-4975, 2016. https://doi.org/10.1109/CVPR.2016.537
[15] H. Qiu, C. Wang, J. Wang, N. Wang, and W. Zeng, “Cross view fusion for 3D human pose estimation,” Proc. of 2019 IEEE Int. Conf. Comput. Vis. (ICCV), pp. 4342-4351, 2019. https://doi.org/10.1109/ICCV.2019.00444
[16] Y. He, R. Yan, K. Fragkiadaki, and S.-I. Yu, “Epipolar Transformers,” Proc. of 2020 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 7779-7788, 2020. https://doi.org/10.1109/CVPR42600.2020.00780
[17] V. Belagiannis, S. Amin, M. Andriluka, B. Schiele, N. Navab, and S. Ilic, “3D pictorial structures revisited: Multiple human pose estimation,” IEEE Trans. Pattern Anal. Mach. Intell, Vol.38, Issue 10, pp. 1929-1942, 2016. https://doi.org/10.1109/TPAMI.2015.2509986
[18] J. Lin and G. H. Lee, “Multi-view multi-person 3D pose estimation with plane sweep stereo,” Proc. of 2021 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 11881-11890, 2021.
[19] T. Wang, J. Zhang, Y. Cai, S. Yan, and J. Feng, “Direct multi-view multi-person 3D pose estimation,” Proc. of the 35th Int. Conf. on Neural Information Processing Systems (NIPS’21), Article No.1007, pp. 13153-13164, 2021.
[20] M. Fabbri, F. Lanzi, S. Calderara, S. Alletto, and R. Cucchiara, “Compressed volumetric heatmaps for multi-person 3D pose estimation,” Proc. of 2020 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 7204-7213, 2020. https://doi.org/10.1109/CVPR42600.2020.00723
[21] D. Mehta, O. Sotnychenko, F. Mueller, W. Xu, M. Elgharib, P. Fua, H.-P. Seidel, H. Rhodin, G. Pons-Moll, and C. Theobalt, “XNect: Realtime multi-person 3D motion capture with a single RGB camera,” ACM Trans. Graph., Vol.39, Issue 4, Article No.82, pp. 82:1-82:17, 2020. https://doi.org/10.1145/3386569.3392410
[22] J. Kusaka, T. Obo, and N. Kubota, “Arm angle estimation for computational system rehabilitation with range image sensor,” Trans. of the Institute of Systems, Control and Information Engineers, Vol.28, Issue 5, pp. 228-235, 2015. https://doi.org/10.5687/iscie.28.228
[23] T. Obo, K. Hamada, R. Kaburagi, and N. Kubota, “Joint angle estimation with evolutionary computation from 2D skeleton tracking,” 30th Symp. on Fuzzy, Artificial Intelligence, Neural Networks and Computational Intelligence, 2022.
[24] G. Papandreou, T. Zhu, L.-C. Chen, S. Gidaris, J. Tompson, and K. Murphy, “PersonLab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model,” Proc. of Eur. Conf. Comput. Vis. (ECCV), pp. 282-299, 2018. https://doi.org/10.1007/978-3-030-01264-9_17
[25] Z. Cao et al., “Realtime multi-person 2D pose estimation using part affinity fields,” Proc. of 2017 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1302-1310, 2017. https://doi.org/10.1109/CVPR.2017.143
[26] M. Sugimoto, H. Matsushita, and Y. Nishio, “Particle swarm optimization containing plural swarms whose particles have different features,” IEICE Tech. Rep., Vol.110, No.166, pp. 31-34, 2010.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] C. Greer, M. Burns, D. Wollman, and E. Griffor, “Cyber-physical systems and internet of things,” NIST Special Publication, National Institute of Standards and Technology, 2019. https://doi.org/10.6028/NIST.SP.1900-202

[2] [2] K. Oshio, K. Kaneko, and N. Kubota, “Multi-scopic simulation for human-robot interactions based on multi-objective behavior coordination,” The 7th Int. Workshop on Advanced Computational Intelligence and Intelligent Informatics, 2021.

[3] [3] Z. Zhang, “Microsoft Kinect sensor and its effect,” IEEE MultiMedia, Vol.19, Issue 2, pp. 4-10, 2012. https://doi.org/10.1109/MMUL.2012.24

[4] [4] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake, “Real-time human pose recognition in parts from single depth images,” Proc. of IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1297-1304, 2011. https://doi.org/10.1109/CVPR.2011.5995316

[5] [5] J. Martinez, R. Hossain, J. Romero, and J. J. Little, “A simple yet effective baseline for 3D human pose estimation,” Proc. of 2010 IEEE Int. Conf. Comput. Vis. (ICCV), pp. 2640-2649, 2017. https://doi.org/10.1109/ICCV.2017.288

[6] [6] M. Eguchi, A. Yorita, R. Kaburagi, R. Tanno, K. Hamada, M. Watanabe, and N. Kubota, “Measurement system for daily activities in a trailer-type living laboratory,” 30th Symp. on Fuzzy, Artificial Intelligence, Neural Networks and Computational Intelligence, Vol.38, 2022 (in Japanese).

[7] [7] J. Kennedy and R. Eberhart, “Particle swarm optimization,” Proc. of ICNN’95 Int. Conf. Neural Networks, pp. 1942-1948, 1995. https://doi.org/10.1109/ICNN.1995.488968

[8] [8] D. Mehta, S. Sridhar, O. Sotnychenko, H. Rhodin, M. Shafiei, H. P. Seidel, W. Xu, D. Casas, and C. Theobalt, “VNect: Real-time 3D human pose estimation with a single RGB camera,” ACM Trans. Graph., Vol.36, No.4, Article No.44, 2017. https://doi.org/10.1145/3072959.3073596

[9] [9] X. Zhou, Q. Huang, X. Sun, X. Xue, and Y. Wei, “Towards 3D human pose estimation in the wild: A weakly-supervised approach,” Proc. of 2017 IEEE Int. Conf. Comput. Vis. (ICCV), pp. 398-407, 2017. https://doi.org/10.1109/ICCV.2017.51

[10] [10] J. Xu, Z. Yu, B. Ni, J. Yang, X. Yang, and W. Zhang, “Deep kinematics analysis for monocular 3D human pose estimation,” Proc. of 2020 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 896-905, 2020. https://doi.org/10.1109/CVPR42600.2020.00098

[11] [11] M. Kocabas, S. Karagoz, and E. Akbas, “Self-supervised learning of 3D human pose using multi-view geometry,” Proc. of 2019 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1077-1086, 2019. https://doi.org/10.1109/CVPR.2019.00117

[12] [12] F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black, “Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image,” Proc. of Eur. Conf. Comput. Vis. (ECCV), pp. 561-578, 2016. https://doi.org/10.1007/978-3-319-46454-1_34

[13] [13] X. Zhou, X. Sun, W. Zhang, S. Liang, and Y. Wei, “Deep kinematic pose regression,” Proc. of ECCV Workshops, pp. 186-201, 2016. https://doi.org/10.1007/978-3-319-49409-8_17

[14] [14] X. Zhou, M. Zhu, S. Leonardos, K. G. Derpanis, and K. Daniilidis, “Sparseness meets deepness: 3D human pose estimation from monocular video,” Proc. of 2016 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 4966-4975, 2016. https://doi.org/10.1109/CVPR.2016.537

[15] [15] H. Qiu, C. Wang, J. Wang, N. Wang, and W. Zeng, “Cross view fusion for 3D human pose estimation,” Proc. of 2019 IEEE Int. Conf. Comput. Vis. (ICCV), pp. 4342-4351, 2019. https://doi.org/10.1109/ICCV.2019.00444

[16] [16] Y. He, R. Yan, K. Fragkiadaki, and S.-I. Yu, “Epipolar Transformers,” Proc. of 2020 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 7779-7788, 2020. https://doi.org/10.1109/CVPR42600.2020.00780

[17] [17] V. Belagiannis, S. Amin, M. Andriluka, B. Schiele, N. Navab, and S. Ilic, “3D pictorial structures revisited: Multiple human pose estimation,” IEEE Trans. Pattern Anal. Mach. Intell, Vol.38, Issue 10, pp. 1929-1942, 2016. https://doi.org/10.1109/TPAMI.2015.2509986

[18] [18] J. Lin and G. H. Lee, “Multi-view multi-person 3D pose estimation with plane sweep stereo,” Proc. of 2021 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 11881-11890, 2021.

[19] [19] T. Wang, J. Zhang, Y. Cai, S. Yan, and J. Feng, “Direct multi-view multi-person 3D pose estimation,” Proc. of the 35th Int. Conf. on Neural Information Processing Systems (NIPS’21), Article No.1007, pp. 13153-13164, 2021.

[20] [20] M. Fabbri, F. Lanzi, S. Calderara, S. Alletto, and R. Cucchiara, “Compressed volumetric heatmaps for multi-person 3D pose estimation,” Proc. of 2020 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 7204-7213, 2020. https://doi.org/10.1109/CVPR42600.2020.00723

[21] [21] D. Mehta, O. Sotnychenko, F. Mueller, W. Xu, M. Elgharib, P. Fua, H.-P. Seidel, H. Rhodin, G. Pons-Moll, and C. Theobalt, “XNect: Realtime multi-person 3D motion capture with a single RGB camera,” ACM Trans. Graph., Vol.39, Issue 4, Article No.82, pp. 82:1-82:17, 2020. https://doi.org/10.1145/3386569.3392410

[22] [22] J. Kusaka, T. Obo, and N. Kubota, “Arm angle estimation for computational system rehabilitation with range image sensor,” Trans. of the Institute of Systems, Control and Information Engineers, Vol.28, Issue 5, pp. 228-235, 2015. https://doi.org/10.5687/iscie.28.228

[23] [23] T. Obo, K. Hamada, R. Kaburagi, and N. Kubota, “Joint angle estimation with evolutionary computation from 2D skeleton tracking,” 30th Symp. on Fuzzy, Artificial Intelligence, Neural Networks and Computational Intelligence, 2022.

[24] [24] G. Papandreou, T. Zhu, L.-C. Chen, S. Gidaris, J. Tompson, and K. Murphy, “PersonLab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model,” Proc. of Eur. Conf. Comput. Vis. (ECCV), pp. 282-299, 2018. https://doi.org/10.1007/978-3-030-01264-9_17

[25] [25] Z. Cao et al., “Realtime multi-person 2D pose estimation using part affinity fields,” Proc. of 2017 IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1302-1310, 2017. https://doi.org/10.1109/CVPR.2017.143

[26] [26] M. Sugimoto, H. Matsushita, and Y. Nishio, “Particle swarm optimization containing plural swarms whose particles have different features,” IEICE Tech. Rep., Vol.110, No.166, pp. 31-34, 2010.

A Hybrid Optimization Approach for 3D Multi-Camera Human Pose Estimation

Masatoshi Eguchi†, Takenori Obo , and Naoyuki Kubota

Masatoshi Eguchi^†, Takenori Obo , and Naoyuki Kubota