single-rb.php

JRM Vol.37 No.2 pp. 310-321
doi: 10.20965/jrm.2025.p0310
(2025)

Paper:

Multiview Object Pose Estimation Using Viewpoint Weight Based on Shared Object Representation

Kazuya Yabashi and Tsuyoshi Tasaki ORCID Icon

Meijo University
1-501 Shiogamaguchi, Tempaku-ku, Nagoya, Aichi 468-8502, Japan

Received:
September 19, 2024
Accepted:
January 8, 2025
Published:
April 20, 2025
Keywords:
object pose estimation, multiview
Abstract

Product display robots are considered for industrial arm robot applications. Object pose estimation is necessary to automate product displays. However, the shapes of some objects in retail stores are simple, and robots often use RGB images from a single viewpoint. Consequently, the pose estimation accuracy is low depending on the viewpoint. Therefore, this paper proposes a multiview pose estimation method that fuses features using weights for each viewpoint. To calculate the weights, we focus on a shared object representation that expresses object poses through classification. The classification score for each class increased when pose estimation became easier. Thus, we developed a method that weighs features from each viewpoint using classification scores as confidence, and estimates the object pose. We compared the pose estimation results with those of the conventional method, which derives the most plausible pose from multiple estimation results. When the permissible angle error was set to 30°, the success rate of our method was 68.0%, which was 8.2 points higher than that of the conventional method.

Feature fusion for multiview object pose estimation

Feature fusion for multiview object pose estimation

Cite this article as:
K. Yabashi and T. Tasaki, “Multiview Object Pose Estimation Using Viewpoint Weight Based on Shared Object Representation,” J. Robot. Mechatron., Vol.37 No.2, pp. 310-321, 2025.
Data files:
References
  1. [1] H. Okada, T. Inamura, and K. Wada, “What competitions were conducted in the service categories of the world robot summit?,” Advanced Robotics, Vol.33, No.17, pp. 900-910, 2019. https://doi.org/10.1080/01691864.2019.1663608
  2. [2] Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes,” arXiv preprint, arXiv:1711.00199, 2018. https://doi.org/10.15607/RSS.2018.XIV.019
  3. [3] Y. Di, F. Manhardt, G. Wang, X. Ji, N. Navab, and F. Tombari, “SO-Pose: Exploiting self-occlusion for direct 6D pose estimation,” 2021 IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 12396-12405, 2021. https://doi.org/10.1109/ICCV48922.2021.01217
  4. [4] Y. Su, M. Saleh, T. Fetzer, J. Rambach, N. Navab, B. Busam, D. Stricker, and F. Tombari, “ZebraPose: Coarse to fine surface encoding for 6dof object pose estimation,” 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 6728-6738, 2022. https://doi.org/10.1109/CVPR52688.2022.00662
  5. [5] Z. Li and I. Stamos, “Depth-based 6dof object pose estimation using swin transformer,” 2023 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 1185-1191, 2023. https://doi.org/10.1109/IROS55552.2023.10342215
  6. [6] F. Hagelskjær and A. G. Buch, “Pointvotenet: Accurate object detection and 6 dof pose estimation in point clouds,” 2020 IEEE Int. Conf. on Image Processing (ICIP), pp. 2641-2645, 2020. https://doi.org/10.1109/ICIP40778.2020.9191119
  7. [7] X. Liu, G. Wang, Y. Li, and X. Ji, “CATRE: Iterative point clouds alignment for category-level object pose refinement,” 2022 European Conf. on Computer Vision (ECCV 2022), pp. 499-516, 2022. https://doi.org/10.1007/978-3-031-20086-1_29
  8. [8] Y. He, W. Sun, H. Huang, J. Liu, H. Fan, and J. Sun, “PVN3D: A deep point-wise 3D keypoints voting network for 6dof pose estimation,” 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 11629-11638, 2020. https://doi.org/10.1109/CVPR42600.2020.01165
  9. [9] Y. He, H. Huang, H. Fan, Q. Chen, and J. Sun, “FFB6D: A full flow bidirectional fusion network for 6D pose estimation,” 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3002-3012, 2021. https://doi.org/10.1109/CVPR46437.2021.00302
  10. [10] C. Wang, D. Xu, Y. Zhu, R. Martín-Martín, C. Lu, L. Fei-Fei, and S. Savarese, “DenseFusion: 6D object pose estimation by iterative dense fusion,” 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3338-3347, 2019. https://doi.org/10.1109/CVPR.2019.00346
  11. [11] J. Lin, L. Liu, D. Lu, and K. Jia, “SAM-6D: Segment anything model meets zero-shot 6D object pose estimation,” 2024 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 27906-27916, 2024. https://doi.org/10.1109/CVPR52733.2024.02636
  12. [12] K. Fujita and T. Tasaki, “PYNet: Poseclass and yaw angle output network for object pose estimation,” J. Robot. Mechatron., Vol.35, No.1, pp. 8-17, 2023. https://doi.org/10.20965/jrm.2023.p0008
  13. [13] Y. Labbé, J. Carpentier, M. Aubry, and J. Sivic, “CosyPose: Consistent multi-view multi-object 6D pose estimation,” 2020 European Conf. on Computer Vision (ECCV 2020), pp. 574-591, 2020. https://doi.org/10.1007/978-3-030-58520-4_34
  14. [14] F. Duffhauss, T. Demmler, and G. Neumann, “MV6D: Multi-view 6D pose estimation on RGB-D frames using a deep point-wise voting network,” 2022 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 3568-3575, 2022. https://doi.org/10.1109/IROS47612.2022.9982268
  15. [15] F. Duffhauss, S. Koch, H. Ziesche, N. A. Vien, and G. Neumann, “SyMFM6D: Symmetry-aware multi-directional fusion for multi-view 6D object pose estimation,” IEEE Robotics and Automation Letters, Vol.8, No.9, pp. 5315-5322, 2023. https://doi.org/10.1109/LRA.2023.3293317
  16. [16] H. Wang, S. Sridhar, J. Huang, J. Valentin, S. Song, and L. J. Guibas, “Normalized object coordinate space for category-level 6D object pose and size estimation,” 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2637-2646, 2019. https://doi.org/10.1109/CVPR.2019.00275
  17. [17] B. Calli, A. Singh, A. Walsman, S. Srinivasa, P. Abbeel, and A. M. Dollar, “The YCB object and model set: Towards common benchmarks for manipulation research,” 2015 Int. Conf. on Advanced Robotics (ICAR), pp. 510-517, 2015. https://doi.org/10.1109/ICAR.2015.7251504
  18. [18] S. Tyree, J. Tremblay, T. To, J. Cheng, T. Mosier, J. Smith, and S. Birchfield, “6-dof pose estimation of household objects for robotic manipulation: An accessible dataset and benchmark,” 2022 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 13081-13088, 2022. https://doi.org/10.1109/IROS47612.2022.9981838
  19. [19] S. Peng, X. Zhou, Y. Liu, H. Lin, Q. Huang, and H. Bao, “PVNet: Pixel-wise voting network for 6dof object pose estimation,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.44, No.6, pp. 3212-3223, 2022. https://doi.org/10.1109/TPAMI.2020.3047388
  20. [20] A. Collet, D. Berenson, S. S. Srinivasa, and D. Ferguson, “Object recognition and full pose registration from a single image for robotic manipulation,” 2009 IEEE Int. Conf. on Robotics and Automation, pp. 48-55, 2009. https://doi.org/10.1109/ROBOT.2009.5152739
  21. [21] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” M. A. Fischler and O. Firschein (Eds.), “Readings in Computer Vision,” Elsevier B.V., pp. 726-740, 1987. https://doi.org/10.1016/B978-0-08-051581-6.50070-2
  22. [22] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: deep hierarchical feature learning on point sets in a metric space,” Proc. of the 31st Int. Conf. on Neural Information Processing Systems, pp. 5105-5114, 2017.
  23. [23] Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, and A. Markham, “RandLA-Net: Efficient semantic segmentation of large-scale point clouds,” 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 11105-11114, 2020. https://doi.org/10.1109/CVPR42600.2020.01112
  24. [24] J. Lin, Z. Wei, Y. Zhang, and K. Jia, “VI-Net: Boosting category-level 6D object pose estimation via learning decoupled rotations on the spherical representations,” 2023 IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 13955-13965, 2023. https://doi.org/10.1109/ICCV51070.2023.01287
  25. [25] B. Wan, Y. Shi, and K. Xu, “SOCS: Semantically-aware object coordinate space for category-level 6D object pose estimation under large shape variations,” 2023 IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 14019-14028, 2023. https://doi.org/10.1109/ICCV51070.2023.01293
  26. [26] G. Li, Y. Li, Z. Ye, Q. Zhang, T. Kong, Z. Cui, and G. Zhang, “Generative category-level shape and pose estimation with semantic primitives,” arXiv preprint, arXiv:2210.01112, 2022. https://doi.org/10.48550/arXiv.2210.01112
  27. [27] H. Wang, W. Li, J. Kim, and Q. Wang, “Attention-guided RGB-D fusion network for category-level 6D object pose estimation,” 2022 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 10651-10658, 2022. https://doi.org/10.1109/IROS47612.2022.9981242
  28. [28] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon, “Bundle adjustment – A modern synthesis,” Int. Workshop on Vision Algorithms (IWVA 1999), pp. 298-372, 2000. https://doi.org/10.1007/3-540-44480-7_21
  29. [29] A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint, arXiv:2010.11929, 2020. https://doi.org/10.48550/arXiv.2010.11929
  30. [30] K. Han, Y. Wang, J. Guo, Y. Tang, and E. Wu, “Vision GNN: An image is worth graph of nodes,” Proc. of the 36th Int. Conf. on Neural Information Processing System, Vol.35, pp. 8291-8303, 2022.
  31. [31] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “NeRF: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, Vol.65, No.1, pp. 99-106, 2021. https://doi.org/10.1145/3503250
  32. [32] B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3D gaussian splatting for real-time radiance field rendering,” ACM Trans. Graph., Vol.42, No.4, Article No.139, 2023. https://doi.org/10.1145/3592433
  33. [33] J. Yang, H. Li, and Y. Jia, “Go-ICP: Solving 3D registration efficiently and globally optimally,” 2013 IEEE Int. Conf. on Computer Vision (ICCV), pp. 1457-1464, 2013. https://doi.org/10.1109/ICCV.2013.184
  34. [34] S. Pan, L. Jin, H. Hu, M. Popović, and M. Bennewitz, “How many views are needed to reconstruct an unknown object using NeRF?,” 2024 IEEE Int. Conf. on Robotics and Automation (ICRA), pp. 12470-12476, 2024. https://doi.org/10.1109/ICRA57147.2024.10610617
  35. [35] M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” Proc. of the 36th Int. Conf. on Machine Learning, pp. 6105-6114, 2019.
  36. [36] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment Anything,” 2023 IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 3992-4003, 2023. https://doi.org/10.1109/ICCV51070.2023.00371
  37. [37] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, real-time object detection,” 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 779-788, 2016. https://doi.org/10.1109/CVPR.2016.91
  38. [38] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd Int. Conf. on Learning Representations (ICLR 2015), 2015.
  39. [39] M. Denninger, D. Winkelbauer, M. Sundermeyer, W. Boerdijk, M. Knauer, K. H. Strobl, M. Humt, and R. Triebel, “BlenderProc2: A procedural pipeline for photorealistic rendering,” J. of Open Source Software, Vol.8, No.82, Article No.4901, 2023. https://doi.org/10.21105/joss.04901
  40. [40] T. Mizuno, K. Yabashi, and T. Tasaki, “Object pose estimation by camera arm control based on the next viewpoint estimation,” 2024 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), pp. 9481-9486, 2024. https://doi.org/10.1109/IROS58592.2024.10801633

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Apr. 24, 2025