Research Paper:
A Proposal for Cooking Motion Generation Based on Maximizing the Margin of Task Achievement
Koki Maruyama*,, Kazuki Yamada**, Takahiro Suzuki**, Shuichi Akizuki**, and Manabu Hashimoto**

*Department of Engineering, Chukyo University
101-2 Yagoto-honmachi, Showa-ku, Nagoya, Aichi 466-8666, Japan
Corresponding author
**Graduate School of Engineering, Chukyo University
Nagoya, Japan
Cooking robots provide life-assistance and should be capable of performing suitable tasks using simple language instructions. While previous studies have employed LLMs to develop action plans for the upper symbolic layer, there are many issues regarding its connection to the lower physical layer. For example, considering scooping, although the motion to scoop food from a container can be generated, the actual task may still fail. This is because the executability of the desired task is not examined during the trajectory generation stage. In this study, we propose the “margin of task achievement” comprising two components, the “margin for interference” and the “margin for execution,” to evaluate the degree of task accomplishment, and propose a method to determine the optimal trajectory for the task and, if necessary, the switching of tools. In an experiment in which three different tools were used to perform the task of scooping green tea powder, the average success rate was 93.3%, which was 61.6 percentage points higher than that when the margin of task achievement was not considered. These results confirm the validity of the proposed method.
- [1] D. K. Misra et al., “Tell Me Dave: Context-sensitive grounding of natural language to manipulation instructions,” The Int. J. of Robotics Research, Vol.35, Nos.1-3, pp. 281-300, 2016. https://doi.org/10.1177/0278364915602060
- [2] D. Nyga et al., “Grounding robot plans from natural language instructions with incomplete world knowledge,” Proc. of the 2nd Conf. on Robot Learning, Vol.87, pp. 714-723, 2018.
- [3] M. Bollini et al., “Interpreting and executing recipes with a cooking robot,” Experimental Robotics, Vol.88, pp. 481-495, 2013. http://doi.org/10.1007/978-3-319-00065-7_33
- [4] M. Ahn et al., “Do As I Can, Not As I Say: Grounding language in robotic affordances,” Proc. of the 6th Conf. on Robot Learning, Vol.205, pp. 287-318, 2023.
- [5] J. Wu et al., “TidyBot: Personalized robot assistance with large language models,” 2023 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pp. 3546-3553, 2023. https://doi.org/10.1109/IROS55552.2023.10341577
- [6] K. Shirai et al., “Vision-language interpreter for robot task planning,” IEEE Int. Conf. on Robotics and Automation, pp. 2051-2058, 2024. https://doi.org/10.1109/ICRA57147.2024.10611112
- [7] J. Liang et al., “Code as policies: Language model programs for embodied control,” 2023 IEEE Int. Conf. on Robotics and Automation, pp. 9493-9500, 2023. https://doi.org/10.1109/ICRA48891.2023.10160591
- [8] I. Singh et al., “Progprompt: Generating situated robot task plans using large language models,” Proc. of 2023 IEEE Int. Conf. on Robotics and Automation, pp. 11523-11530, 2023. https://doi.org/10.1109/ICRA48891.2023.10161317
- [9] W. Huang et al., “Language models as zero-shot planners: Extracting actionable knowledge for embodied agents,” Proc. of the 39th Int. Conf. on Machine Learning, Vol.162, pp. 9118-9147, 2022.
- [10] Y. Ohshima et al., “Meal support system with spoon using laser range finder and manipulator,” Workshop on Robot Vision, pp. 82-87, 2013. https://doi.org/10.1109/worv.2013.6521918
- [11] J. Grannen et al., “Learning bimanual scooping policies for food acquisition,” Proc. of the 6th Conf. on Robot Learning, Vol.205, pp. 1510-1519, 2023.
- [12] A. Bhaskar et al., “LAVA: Long-horizon Visual Action based Food Acquisition,” arXiv:2403.12876, 2024. https://doi.org/10.48550/arXiv.2403.12876
- [13] C. Schenck et al., “Learning robotic manipulation of granular media,” Proc. of the 1st Annual Conf. on Robot Learning, Vol.78, pp. 239-248, 2017. https://proceedings.mlr.press/v78/schenck17a.html
- [14] N. Saito et al., “How to select and use tools?: Active perception of target objects using multimodal deep learning,” IEEE Robotics and Automation Letters, pp. 2517-2524, 2021. https://doi.org/10.1109/LRA.2021.3062004
- [15] Y. Ando et al.,“ A method of generating robot motion using common motion trajectory model and function information of tools,” The Trans. of the Institute of Electrical Engineers of Japan. C, Vol.143, No.9, pp. 1-8, 2023. https://doi.org/10.1541/ieejeiss.143.877
- [16] A. Myers et al., “Affordance detection of tool parts from geometric features,” Proc. of IEEE Int. Conf. on Robotics and Automation, pp. 1374-1381, 2015. https://doi.org/10.1109/ICRA.2015.7139369
- [17] M. Iizuka and M. Hashimoto, “Detection of semantic GraspingParameter using part-affordance recognition,” Proc. of Int. Conf. on Research and Education in Mechatronics, pp. 136-140, 2018. https://doi.org/10.1541/ieejeiss.143.877
- [18] C. R. Qi et al., “PointNet++: Deep hierarchical feature learning on point sets in a metric space,” Int. Conf. on Neural Information Processing Systems, pp. 5105-5114, 2017. https://doi.org/10.48550/arXiv.1706.02413
- [19] J. Redmon et al., “You only look once: Unified, real-time object detection,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 779-788, 2016. https://doi.org/10.1109/CVPR.2016.91
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.