Paper:
Pitch-Cluster-Map Based Daily Sound Recognition for Mobile Robot Audition
Yoko Sasaki*, Masahito Kaneyoshi*, Satoshi Kagami*,
Hiroshi Mizoguchi*,**, and Tadashi Enomoto***
*Digital Human Research Center, National Institute of Advanced Science and Technology, 2-3-26 Aomi, Kouto-ku, Tokyo 135-0064, Japan.
**Dept. of Mechanical Engineering, Tokyo University of Science, 2641 Yamazaki, Noda-shi, Chiba 278-8510, Japan.
***The Kansai Electric Power Co. Inc., 3-11-20 Nakoji, Amagasaki, Hyogo 661-0974, Japan.
- [1] S. Furui, “50 years of progress in speech and speaker recognition,” In Proc. of SPECOM2005, Patras, Greece, pp. 1-9, 2005.
- [2] T. Matsui and K. Tanabe, “Comparative study of speaker identification methods : dplrm, svm and gmm,” IEICE Trans. on INFOMATION and SYSTEMS, Vol.89-D, No.3, pp. 1066-1073, Mar., 2006.
- [3] N. Roman and D. L. Wang, “Pitch-based monaural segregation of reverberant speech,” J. of Acoustics Society of America, Vol.120, No.1, pp. 458-469, Jul., 2006.
- [4] Y. Shao and D. L. Wang, “Model-based sequential organization in cochannel speech,” IEEE Trans. on Audio, Speech, and Language Processing, Vol.14, No.1, pp. 289-298, Jan., 2006.
- [5] M. Goto, “Analysis of musical audio signals. In D. L. Wang and G. J. Brown, editors, Computational Auditory Scene Analysis: Principles, Algorithms, and Applications,” Wiley-IEEE Press, pp. 251-295. 2006.
- [6] H. Fujihara, T. Kitahara, M. Goto, K. Komatani, T. Ogata, and H. G. Okuno, “Singer identification based on accompaniment sound reduction and reliable frame selection,” In Proc. of 6th Int. Conf. on Music Information Retrieval (ISMIR2005), London, U.K., pp. 329-336, Sep., 2005.
- [7] H. Fujihara, T. Kitahara, M. Goto, K. Komatani, T. Ogata, and H. G. Okuno, “Speaker identification under noisy environments by using harmonic structure extraction and reliable frame weighting,” In Proc. of Int. Conf. on Spoken Language Proc. (Interspeech2006), Pittsburgh PA, USA, pp. 1459-1462, Sep., 2006.
- [8] J. Chen, A. H. Kam, J. Zhang, N. Liu, and L. Shue, “Bathroom activity monitoring based on sound,” Pervasive Computing: Lecture notes in Computer Science, Vol.3468, pp. 47-61, May, 2005.
- [9] K. Hiyane and J. Iio, “Non-speech sound recognition with microphone array,” In Proc. of IEEE Int. Workshop on Hands-Free Speech Communication (HSC2001), Kyoto, Japan, pp. 107-110, Apr., 2001.
- [10] P. Lukowicz, J. AWard, H. Junker, M. Stager, G. Troster, A. Atrash, and T. Starner, “Recognizing workshop activity using body worn microphones and accelerometers,” Pervasive Computing: Lecture notes in Computer Science, Vol.3001, pp. 18-32, May, 2004.
- [11] S. Tokutsu, K. Okada, and M. Inaba, “Discrimination of daily sounds for humanoids understanding situations,” In Proc. of the 25th annual conf. of the Robotics Society of Japan, p. 1H36, Sep., 2007. (in japanese)
- [12] Y. Sasaki, S. Kagami, and H. Mizoguchi, “Simple sound source detection using main-lobe model of microphone array,” In Proc. of the 25th annual conference of the Robotics Society of Japan, Chiba, Japan, p. 1N13, Sep., 2007. (in japanese)
- [13] Y. Tamai, Y. Sasaki, S. Kagami, and H. Mizoguchi, “Three ring microphone array for 3d sound localization and separation for mobile robot audition,” In Proc. of 2005 IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS2005), Edmonton, Canada, pp. 903-908, Aug., 2005.
- [14] M. Aoki, M. Okamoto, S. Aoki, H. Matsui, T. Sakurai, and Y. Kaneda, “Sound source segregation based on estimating incident angle of each frequency component of input signals acquired by multiple microphones,” Acoustical Science and Technology, Vol.22, No.2, pp. 149-157, 2001.
- [15] T. Kinnunen and H. Li, “An overview of textindependent speaker recognition: From features to supervectors,” Speech Communication, Vol.52, No.1, pp. 12-40, Jan. 2010.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.
Copyright© 2010 by Fuji Technology Press Ltd. and Japan Society of Mechanical Engineers. All right reserved.