JACIII Vol.11 No.10 pp. 1177-1183
doi: 10.20965/jaciii.2007.p1177


A Robotic Auditory System that Interacts with Musical Sounds and Human Voices

Hideyuki Sawada and Toshiya Takechi

Department of Intelligent Mechanical Systems Engineering, Faculty of Engineering, Kagawa University, 2217-20 Hayashi-cho, Takamatsu-city, Kagawa 761-0396, Japan

November 13, 2006
August 16, 2007
December 20, 2007
musical instruments, microphone array, sound identification, sound localization, mel cepstrum

Voice and sounds are the primary media employed for human communication. Humans are able to exchange information smoothly using voice under different situations, such as a noisy environment and in the presence of multiple speakers. We are surrounded by various sounds, and yet are able to detect the location of a sound source in 3D space, extract a particular sound from a mixture of sounds, and recognize the source of a specific sound. Also, music is composed of various sounds generated by musical instruments, and directly affects our emotions and feelings. This paper introduces real-time detection and identification of a particular sound among plural sound sources using a microphone array based on the location of a speaker and the tonal characteristics. The technique will also be applied to an adaptive auditory system of a robotic arm, which interacts with humans.

Cite this article as:
Hideyuki Sawada and Toshiya Takechi, “A Robotic Auditory System that Interacts with Musical Sounds and Human Voices,” J. Adv. Comput. Intell. Intell. Inform., Vol.11, No.10, pp. 1177-1183, 2007.
Data files:
  1. [1] M. Unoki and M. Akagi, “A Method of Signal Extraction from Noise-Added Signal,” IEICE, Vol.J80-A, No.3, pp. 444-453, 1997.
  2. [2] S. Hayakawa, K. Takeda, and F. Itakura, “Speaker Recognition Using the Harmonic Structure of Linear Prediction Residual Spectrum,” IEICE, Vol.J80-A, No.9, pp. 1360-1367, 1997.
  3. [3] A. Nehorai and B. Porat, “Adaptive Comb Filtering for Harmonic Signal Enhancement,” IEEE Trans. Acoust., Speech & Signal Processing, Vol.34, No.5, pp. 1124-1138, 1986.
  4. [4] T. Yamada, S. Nakamura, and K. Shikano, “Hands-free Speech Recognition with Talker Localization by a Microphone Array,” Information Processing Society of Japan, Vol.39, No.5, pp. 1275-1284, 1998.
  5. [5] F. Asano, S. Hayamizu, and T. Matsui, “A Realtime Noise Reduction System using Delay-and-Sum Beamformer and its Application to Speech Recognition,” Electrotechnical Laboratory, 1996.
  6. [6] J. L. Flanagan, A. C. Surendran, and E. E. Jan, “Spatially selective sound capture for speech and audio processing,” Speech Communication, Vol.13, pp. 207-222, 1993.
  7. [7] T. Nishiura, T. Yamada, S. Nakamura, and K. Shikano, “Localization of Multiple Sound Sources Based on CSP Analysis with a Microphone Array,” IEICE, D-II, Vol.J83-D-II, No.8, 2000.
  8. [8] C. H. Knapp and G. G. Carter, “The generalized correlation method for estimation of time delay,” IEEE Trans. Acoust., Speech & Signal Processing, Vol.24, No.4, pp. 320-327, 1976.
  9. [9] T. Funada and T. Tsuzuki, “Feature extraction based on spectral slope for speech recognition,” IEICE, D-II, Vol.J82-D-II, No.11, pp. 2184-2187, 1999.
  10. [10] T. Takechi, K. Sugimoto, T. Mandono, and H. Sawada, “Automobile identification based on the measurement of car sounds,” Annual Conf. of the IEEE Industrial Electronics Society, TD6-4, 2004.
  11. [11] H. Sawada and M. Ohkado, “Identification and tracking of particular speaker in noisy environment,” Int. Conf. on Machine Vision and its Optomechatronic Applications, OpticsEast, SPIE Int. Society for Optical Engineering, pp. 138-145, 2004.
  12. [12] S. Imai, “Cepstral Analysis Synthesis on the Mel Frequency Scale,” IEEE Int. Conf. Acoust., Speech & Signal Processing, pp. 93-96, 1983.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Feb. 25, 2021