Automatic Baseball Video Tagging Based on Voice Pattern Prioritization and Recursive Model Localization

Komei Arasawa; Shun Hattori

doi:10.20965/jaciii.2017.p1262

single-jc.php

« previous

JACIII Vol.21 No.7 pp. 1262-1279

doi: 10.20965/jaciii.2017.p1262

(2017)

Paper:

Views over last 60 days: 1,355

Automatic Baseball Video Tagging Based on Voice Pattern Prioritization and Recursive Model Localization

Komei Arasawa^* and Shun Hattori^**

^*Web Intelligence Time-Space (WITS) Laboratory, Graduate School of Engineering, Muroran Institute of Technology
27-1 Mizumoto-cho, Muroran, Hokkaido 050-8585, Japan

^**Web Intelligence Time-Space (WITS) Laboratory, College of Information and Systems, Muroran Institute of Technology
27-1 Mizumoto-cho, Muroran, Hokkaido 050-8585, Japan

Received:

February 21, 2017

Accepted:

August 31, 2017

Published:

November 20, 2017

Keywords:

tagging, automatic division, voice recognition, modelling, web text extraction

Abstract

To enable us to select only the specific scenes that we want to watch in a baseball video and personalize its highlights sub-video, we require an Automatic Baseball Video Tagging system that can divide a baseball video into multiple sub-videos per at-bat scene automatically and append tag information relevant to at-bat scenes. Towards developing the system, the previous papers proposed several Tagging algorithms using ball-by-ball textual reports and voice recognition, and tried to refine models for baseball games. To improve its robustness, this paper proposes a novel Tagging method that utilizes multiple kinds of play-by-play comment patterns for voice recognition which represent the situation of at-bat scenes and take their “Priority” into account. In addition, to search for a voice-recognized play-by-play comment on the start/end of at-bat scenes, this paper proposes a novel modelling method called as “Local Modelling,” as well as Global Modelling used by the previous papers.

Cite this article as:

K. Arasawa and S. Hattori, “Automatic Baseball Video Tagging Based on Voice Pattern Prioritization and Recursive Model Localization,” J. Adv. Comput. Intell. Intell. Inform., Vol.21 No.7, pp. 1262-1279, 2017.

Data files:

References

[1] C. Xu, J. Wang, K. Wan, Y. Li, and L. Duan, “Live Sports Event Detection based on Broadcast Video and Web-casting Text,” Proc. of the 14th ACM Int. Conf. on Multimedia (MM’06), pp. 221-230, October 2006.
[2] N. Babaguchi, Y. Kawai, and T. Kitahashi, “Event Based Indexing of Broadcasted Sports Video by Intermodal Collaboration,” IEEE Trans. on Multimedia, Vol.4, Issue 1, pp. 68-75, March 2002.
[3] C. Xu, Y.-F. Zhang, G. Zhu, Y. Rui, H. Lu, and Q. Huang, “Using Webcast Text for Semantic Event Detection in Broadcast Sports Video,” IEEE Trans. on Multimedia, Vol.10, Issue 7, pp. 1342-1355, November 2008.
[4] A. Ekin, A. M. Tekalp, and R. Mehrotra, “Automatic Soccer Video Analysis and Summarization,” IEEE Trans. on Image Processing, Vol.12, Issue 7, pp. 796-807, July 2003.
[5] D. A. Sadlier and N. E. O’Connor, “Event Detection in Field Sports Video Using Audio-visual Features and a Support Vector Machine,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.15, Issue 10, pp. 1225-1233, October 2005.
[6] L.-Y. Duan, M. Xu, and Q. Tian, “A Unified Framework for Semantic Shot Classification in Sports Video,” IEEE Trans. on Multimedia, Vol.7, Issue 6, pp. 1066-1083, December 2005.
[7] M. Mukunoki, M. Terao, and K. Ikeda, “Division of Sports Video into Play Units Using Regularity of Cut Composition,” IEICE Trans., Vol.J85-D-II, No.6, pp. 1016-1024, June 2002.
[8] M. Kumano, N. Kanzaki, M. Fujimoto, Y. Ariki, K. Tsukada, S. Hamaguhci, and H. Kiyose, “Automatic Extraction of PC Scenes For a Real Time Delivery System of Baseball Highlight Scenes,” IEICE SIG-MVE, IEICE Technical Report, Vol.103, No.209, MVE2003-30, pp. 27-34, July 2003.
[9] M. Nakazawa, K. Hoashi, and C. Ono, “Detection and Labeling of Significant Scenes from TV Program based on Twitter Analysis,” DEIM Forum 2011, F5-6, February 2011.
[10] A. Ulges, C. Schulze, D. Keysers, and T. M. Breuel, “Content-based Video Tagging for Online Video Portals,” Proc. of the 3rd MUSCLE/ImageCLEF Workshop on Image and Video Retrieval Evaluation, pp. 40-49, October 2007.
[11] S. Siersdorfer, J. S. Pedro, and M. Sanderson, “Automatic Video Tagging Using Content Redundancy,” Proc. of the 32nd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 395-402, July 2009.
[12] S. Koelstra, C. M”uhl, and I. Patras, “EEG Analysis for Implicit Tagging of Video Data,” Proc. of the 3rd Int. Conf. on Affective Computing and Intelligent Interaction and Workshops (ACII’09), pp. 27-32, September 2009.
[13] J. S. Pedro, S. Siersdorfer, and M. Sanderson, “Content Redundancy in YouTube and Its Application to Video Tagging,” ACM Trans. on Information Systems (TOIS), Vol.29, Issue 3, pp. 13:1-31, July 2011.
[14] C.-Y. Chiu, P.-C. Lin, S.-Y. Li, T.-H. Tsai, and Y.-L. Tsai, “Tagging Webcast Text in Baseball Videos by Video Segmentation and Text Alignment,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.22, Issue 7, pp. 999-1013, July 2012.
[15] T. Yao, T. Mei, C.-W. Ngo, and S. Li, “Annotation for Free: Video Tagging by Mining User Search Behavior,” Proc. of the 21st ACM Int. Conf. on Multimedia (MM’13), pp. 977-986, October 2013.
[16] L. Ballan, M. Bertini, T. Uricchio, and A. Del Bimbo, “Data-driven Approaches for Social Image and Video Tagging,” Multimedia Tools and Applications, Vol.74, Issue 4, pp. 1443-1468, February 2015.
[17] M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, and G. J. F. Jones, “Automatic Tagging and Geotagging in Video Collections and Communities,” Proc. of the 1st ACM Int. Conf. on Multimedia Retrieval (ICMR’11), No.51, April 2011.
[18] M. Wang, R. Hong, G. Li, Z.-J. Zha, S. Yan, and T.-S. Chua, “Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification,” IEEE Trans. on Multimedia, Vol.14, Issue 4, pp. 975-985, August 2012.
[19] R. Ando, K. Shinoda, S. Furui, and T. Mochizuki, “A Robust Scene Recognition System for Baseball Broadcast Using Data-driven Approach,” Proc. of the 6th ACM Int. Conf. on Image and Video Retrieval (CIVR’07), pp. 186-193, July 2007.
[20] G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang, “Correlative Multi-label Video Annotation,” Proc. of the 15th ACM Int. Conf. on Multimedia (MM’07), pp. 17-26, September 2007.
[21] M. Wang, X.-S. Hua, R. Hong, J. Tang, G.-J. Qi, and Y. Song, “Unified Video Annotation via Multigraph Learning,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.19, Issue 5, pp. 733-746, May 2009.
[22] M. Wang, X.-S. Hua, J. Tang, and R. Hong, “Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation,” IEEE Trans. on Multimedia, Vol.11, Issue 3, pp. 465-476, February 2009.
[23] Y.-P. Tan, D. D. Saur, S. R. Kulkami, and P. J. Ramadge, “Rapid Estimation of Camera Motion from Compressed Video with Application to Video Annotation,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.10, Issue 1, pp. 133-146, February 2000.
[24] T. Volkmer, J. R. Smith, and A. (Paul) Natsev, “A Web-based System for Collaborative Annotation of Large Image and Video Collections: An Evaluation and User Study,” Proc. of the 13th Annual ACM Int. Conf. on Multimedia (MULTIMEDIA’05), pp. 892-901, November 2005.
[25] J. Hagedorn, J. Hailpern, and Karrie G. Karahalios, “VCode and VData: Illustrating A New Framework for Supporting the Video Annotation Workflow,” Proc. of the Working Conf. on Advanced Visual Interfaces (AVI’08), pp. 317-321, May 2008.
[26] M. Bertini, A. Del Bimbo, C. Torniai, R. Cucchiara, and C. Grana, “MOM: Multimedia Ontology Manager. A Framework for Automatic Annotation and Semantic Retrieval of Video Sequences,” Proc. of the 14th ACM Int. Conf. on Multimedia (MM’06), pp. 787-788, October 2006.
[27] J. Tang, X.-S. Hua, T. Mei, G.-J. Qi, and X. Wu, “Video Annotation based on Temporally Consistent Gaussian Random Field,” Electronics Letters, Vol.43, Issue 8, pp. 448-449, April 2007.
[28] J. Yang, R. Yan, and A. G. Hauptmann, “Multiple Instance Learning for Labeling Faces in Broadcasting News Video,” Proc. of the 13th Annual ACM Int. Conf. on Multimedia (MULTIMEDIA’05), pp. 31-40, November 2005.
[29] K. Arasawa and S. Hattori, “Automatic Baseball Video Tagging Using Ball-by-Ball Textual Report and Voice Recognition,” IEICE SIG-IN, IEICE Technical Report, Vol.115, No.405, IN2015-95, pp. 1-6, January 2016.
[30] K. Arasawa and S. Hattori, “Modeling Refinement for Automatic Baseball Video Tagging,” Proc. of the 43rd SICE Symp. on Intelligent Systems (SICE-IS43), March 2016.
[31] K. Arasawa and S. Hattori, “Comparative Experiments on Models for Automatic Baseball Video Tagging,” Proc. of the Joint 8th Int. Conf. on Soft Computing and Intelligent Systems and 17th Int. Symp. on Advanced Intelligent Systems (SCIS&ISIS’16), Sa3-3-4, pp. 678-685, August 2016.
[32] K. Arasawa and S. Hattori, “Error-Corrected Update Time of Web Flash Report for Automatic Baseball Video Tagging,” IEICE SIG-IN, IEICE Technical Report, Vol.116, No.304, IN2016-65, pp. 31-36, November 2016.
[33] Yahoo! JAPAN Sportsnavi – Ball-by-ball textual report –, http://baseball.yahoo.co.jp/npb/, 2016.
[34] Advanced Media, Voice Recognition Software AmiVoice SP2, http://sp.advanced-media.co.jp/.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] C. Xu, J. Wang, K. Wan, Y. Li, and L. Duan, “Live Sports Event Detection based on Broadcast Video and Web-casting Text,” Proc. of the 14th ACM Int. Conf. on Multimedia (MM’06), pp. 221-230, October 2006.

[2] [2] N. Babaguchi, Y. Kawai, and T. Kitahashi, “Event Based Indexing of Broadcasted Sports Video by Intermodal Collaboration,” IEEE Trans. on Multimedia, Vol.4, Issue 1, pp. 68-75, March 2002.

[3] [3] C. Xu, Y.-F. Zhang, G. Zhu, Y. Rui, H. Lu, and Q. Huang, “Using Webcast Text for Semantic Event Detection in Broadcast Sports Video,” IEEE Trans. on Multimedia, Vol.10, Issue 7, pp. 1342-1355, November 2008.

[4] [4] A. Ekin, A. M. Tekalp, and R. Mehrotra, “Automatic Soccer Video Analysis and Summarization,” IEEE Trans. on Image Processing, Vol.12, Issue 7, pp. 796-807, July 2003.

[5] [5] D. A. Sadlier and N. E. O’Connor, “Event Detection in Field Sports Video Using Audio-visual Features and a Support Vector Machine,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.15, Issue 10, pp. 1225-1233, October 2005.

[6] [6] L.-Y. Duan, M. Xu, and Q. Tian, “A Unified Framework for Semantic Shot Classification in Sports Video,” IEEE Trans. on Multimedia, Vol.7, Issue 6, pp. 1066-1083, December 2005.

[7] [7] M. Mukunoki, M. Terao, and K. Ikeda, “Division of Sports Video into Play Units Using Regularity of Cut Composition,” IEICE Trans., Vol.J85-D-II, No.6, pp. 1016-1024, June 2002.

[8] [8] M. Kumano, N. Kanzaki, M. Fujimoto, Y. Ariki, K. Tsukada, S. Hamaguhci, and H. Kiyose, “Automatic Extraction of PC Scenes For a Real Time Delivery System of Baseball Highlight Scenes,” IEICE SIG-MVE, IEICE Technical Report, Vol.103, No.209, MVE2003-30, pp. 27-34, July 2003.

[9] [9] M. Nakazawa, K. Hoashi, and C. Ono, “Detection and Labeling of Significant Scenes from TV Program based on Twitter Analysis,” DEIM Forum 2011, F5-6, February 2011.

[10] [10] A. Ulges, C. Schulze, D. Keysers, and T. M. Breuel, “Content-based Video Tagging for Online Video Portals,” Proc. of the 3rd MUSCLE/ImageCLEF Workshop on Image and Video Retrieval Evaluation, pp. 40-49, October 2007.

[11] [11] S. Siersdorfer, J. S. Pedro, and M. Sanderson, “Automatic Video Tagging Using Content Redundancy,” Proc. of the 32nd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 395-402, July 2009.

[12] [12] S. Koelstra, C. M”uhl, and I. Patras, “EEG Analysis for Implicit Tagging of Video Data,” Proc. of the 3rd Int. Conf. on Affective Computing and Intelligent Interaction and Workshops (ACII’09), pp. 27-32, September 2009.

[13] [13] J. S. Pedro, S. Siersdorfer, and M. Sanderson, “Content Redundancy in YouTube and Its Application to Video Tagging,” ACM Trans. on Information Systems (TOIS), Vol.29, Issue 3, pp. 13:1-31, July 2011.

[14] [14] C.-Y. Chiu, P.-C. Lin, S.-Y. Li, T.-H. Tsai, and Y.-L. Tsai, “Tagging Webcast Text in Baseball Videos by Video Segmentation and Text Alignment,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.22, Issue 7, pp. 999-1013, July 2012.

[15] [15] T. Yao, T. Mei, C.-W. Ngo, and S. Li, “Annotation for Free: Video Tagging by Mining User Search Behavior,” Proc. of the 21st ACM Int. Conf. on Multimedia (MM’13), pp. 977-986, October 2013.

[16] [16] L. Ballan, M. Bertini, T. Uricchio, and A. Del Bimbo, “Data-driven Approaches for Social Image and Video Tagging,” Multimedia Tools and Applications, Vol.74, Issue 4, pp. 1443-1468, February 2015.

[17] [17] M. Larson, M. Soleymani, P. Serdyukov, S. Rudinac, C. Wartena, V. Murdock, G. Friedland, R. Ordelman, and G. J. F. Jones, “Automatic Tagging and Geotagging in Video Collections and Communities,” Proc. of the 1st ACM Int. Conf. on Multimedia Retrieval (ICMR’11), No.51, April 2011.

[18] [18] M. Wang, R. Hong, G. Li, Z.-J. Zha, S. Yan, and T.-S. Chua, “Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification,” IEEE Trans. on Multimedia, Vol.14, Issue 4, pp. 975-985, August 2012.

[19] [19] R. Ando, K. Shinoda, S. Furui, and T. Mochizuki, “A Robust Scene Recognition System for Baseball Broadcast Using Data-driven Approach,” Proc. of the 6th ACM Int. Conf. on Image and Video Retrieval (CIVR’07), pp. 186-193, July 2007.

[20] [20] G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, and H.-J. Zhang, “Correlative Multi-label Video Annotation,” Proc. of the 15th ACM Int. Conf. on Multimedia (MM’07), pp. 17-26, September 2007.

[21] [21] M. Wang, X.-S. Hua, R. Hong, J. Tang, G.-J. Qi, and Y. Song, “Unified Video Annotation via Multigraph Learning,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.19, Issue 5, pp. 733-746, May 2009.

[22] [22] M. Wang, X.-S. Hua, J. Tang, and R. Hong, “Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation,” IEEE Trans. on Multimedia, Vol.11, Issue 3, pp. 465-476, February 2009.

[23] [23] Y.-P. Tan, D. D. Saur, S. R. Kulkami, and P. J. Ramadge, “Rapid Estimation of Camera Motion from Compressed Video with Application to Video Annotation,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.10, Issue 1, pp. 133-146, February 2000.

[24] [24] T. Volkmer, J. R. Smith, and A. (Paul) Natsev, “A Web-based System for Collaborative Annotation of Large Image and Video Collections: An Evaluation and User Study,” Proc. of the 13th Annual ACM Int. Conf. on Multimedia (MULTIMEDIA’05), pp. 892-901, November 2005.

[25] [25] J. Hagedorn, J. Hailpern, and Karrie G. Karahalios, “VCode and VData: Illustrating A New Framework for Supporting the Video Annotation Workflow,” Proc. of the Working Conf. on Advanced Visual Interfaces (AVI’08), pp. 317-321, May 2008.

[26] [26] M. Bertini, A. Del Bimbo, C. Torniai, R. Cucchiara, and C. Grana, “MOM: Multimedia Ontology Manager. A Framework for Automatic Annotation and Semantic Retrieval of Video Sequences,” Proc. of the 14th ACM Int. Conf. on Multimedia (MM’06), pp. 787-788, October 2006.

[27] [27] J. Tang, X.-S. Hua, T. Mei, G.-J. Qi, and X. Wu, “Video Annotation based on Temporally Consistent Gaussian Random Field,” Electronics Letters, Vol.43, Issue 8, pp. 448-449, April 2007.

[28] [28] J. Yang, R. Yan, and A. G. Hauptmann, “Multiple Instance Learning for Labeling Faces in Broadcasting News Video,” Proc. of the 13th Annual ACM Int. Conf. on Multimedia (MULTIMEDIA’05), pp. 31-40, November 2005.

[29] [29] K. Arasawa and S. Hattori, “Automatic Baseball Video Tagging Using Ball-by-Ball Textual Report and Voice Recognition,” IEICE SIG-IN, IEICE Technical Report, Vol.115, No.405, IN2015-95, pp. 1-6, January 2016.

[30] [30] K. Arasawa and S. Hattori, “Modeling Refinement for Automatic Baseball Video Tagging,” Proc. of the 43rd SICE Symp. on Intelligent Systems (SICE-IS43), March 2016.

[31] [31] K. Arasawa and S. Hattori, “Comparative Experiments on Models for Automatic Baseball Video Tagging,” Proc. of the Joint 8th Int. Conf. on Soft Computing and Intelligent Systems and 17th Int. Symp. on Advanced Intelligent Systems (SCIS&ISIS’16), Sa3-3-4, pp. 678-685, August 2016.

[32] [32] K. Arasawa and S. Hattori, “Error-Corrected Update Time of Web Flash Report for Automatic Baseball Video Tagging,” IEICE SIG-IN, IEICE Technical Report, Vol.116, No.304, IN2016-65, pp. 31-36, November 2016.

[33] [33] Yahoo! JAPAN Sportsnavi – Ball-by-ball textual report –, http://baseball.yahoo.co.jp/npb/, 2016.

[34] [34] Advanced Media, Voice Recognition Software AmiVoice SP2, http://sp.advanced-media.co.jp/.

Automatic Baseball Video Tagging Based on Voice Pattern Prioritization and Recursive Model Localization

Komei Arasawa* and Shun Hattori**

Komei Arasawa^* and Shun Hattori^**