Paper:
Cross-Media Retrieval Based on Query Modality and Semi-Supervised Regularization
Yihe Liu, Huaxiang Zhang†, Li Liu, Lili Meng, Yongxin Wang, and Xiao Dong
Department of Computer Science, Shandong Normal University
No. 1, University Road, Changqing District, Jinan 250358, China
†Corresponding author
Existing cross-media retrieval methods usually learn one same latent subspace for different retrieval tasks, which can only achieve a suboptimal retrieval. In this paper, we propose a novel cross-media retrieval method based on Query Modality and Semi-supervised Regularization (QMSR). Taking the cross-media retrieval between images and texts for example, QMSR learns two couples of mappings for different retrieval tasks (i.e. using images to search texts (Im2Te) or using texts to search images (Te2Im)) instead of learning one couple of mappings. QMSR learns two couples of projections by optimizing the correlation between images and texts and the semantic information of query modality (image or text), and integrates together the semi-supervised regularization, the structural information among both labeled and unlabeled data of query modality to transform different media objects from original feature spaces into two different isomorphic subspaces (Im2Te common subspace and Te2Im common subspace). Experimental results show the effectiveness of the proposed method.
- [1] A. W. M. Smeulders, M. Worring, S. Santini et al., “Content-Based Image Retrieval at the End of the Early Years,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.22, No.12, pp. 1349-1380, 2000.
- [2] F. Perronnin, Y. Liu, J. Sánchez et al., “Large-scale image retrieval with compressed fisher vectors,” Computer Vision and Pattern Recognition (CVPR), pp. 3384-3391, 2010.
- [3] G. Liu and J. Yang, “Content-based image retrieval using color difference histogram,” Pattern Recognition, Vol.46, No.1, pp. 188-198, 2013.
- [4] J. Sivic and A. Zisserman, “Video google: A text retrieval approach to object matching in videos,” Proc. of the 9th IEEE Int. Conf. on Computer Vision (ICCV), Vol.2, No.1470, pp. 1470-1477, 2003.
- [5] E. M. Voorhees and D. Harman, “Overview of the sixth text retrieval conference (TREC-6),” Information Processing and Management, Vol.36, No.1, pp. 3-35, 2000.
- [6] G. Guo, S. Li, and Z. Stan, “Content-based audio classification and retrieval by support vector machines,” IEEE Trans. on Neural Networks, Vol.14, No.1, pp. 209-215, 2003.
- [7] J. Song, Y. Yang, Z. Huang et al., “Multiple feature hashing for real-time large scale near-duplicate video retrieval,” Proc. of the 19th ACM Int. Conf. on Multimedia, pp. 423-432, 2011.
- [8] Y. Yang, Y. Zhuang, W. Wang et al., “Heterogeneous multimedia data semantics mining using content and location context,” Proc. of the 16th ACM Int. Conf. on Multimedia, pp. 655-658, 2008.
- [9] Y. Yang, F. Wu, D. Xu et al., “Cross-media retrieval using query dependent search methods,” Pattern Recognition, Vol.43, No.8, pp. 2927-2936, 2010.
- [10] Y. Yang, F. Nie, D. Xu et al., “A multimedia retrieval framework based on semi-supervised ranking and relevance feedback,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.34, No.4, pp. 723-742, 2012.
- [11] D. Ma, X. Zhai, and Y. Peng, “Cross-media retrieval by cluster-based correlation analysis,” IEEE Int. Conf. on Image Processing, pp. 3986-3990, 2013.
- [12] Y. Yang, D. Xu, F. Nie et al., “Ranking with local regression and global alignment for cross media retrieval,” Proc. of the 17th ACM Int. Conf. on Multimedia, pp. 175-184, 2009.
- [13] F. Wu, H. Zhang, and Y Zhuang, “Learning semantic correlations for cross-media retrieval,” Int. Conf. on Image Processing, pp. 1465-1468, 2006.
- [14] X. Zhai, Y. Peng, and J. Xiao, “Learning cross-media joint representation with sparse and semisupervised regularization,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.24, No.6, pp. 965-978, 2014.
- [15] N. Rasiwasia, J. Costa Pereira, E. Coviello et al., “A new approach to cross-modal multimedia retrieval,” Proc. of the 18th ACM Int. Conf. on Multimedia, pp. 251-260, 2010.
- [16] X. Zhai, Y. Peng, and J. Xiao, “Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval,” Proc. of the 27th AAAI Conf. on Artificial Intelligence, pp. 1198-1204, 2013.
- [17] D. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical correlation analysis: An overview with application to learning methods,” Neural Computation, Vol.16, No.12, pp. 2639-2664, 2004.
- [18] D. Li, N. Dimitrova, M. Li et al., “Multimedia content processing through cross-modal association,” Proc. of the 11th ACM Int. Conf. on Multimedia, pp. 604-611, 2003.
- [19] A. Sharma, A. Kumar, H. Daume et al., “Generalized multiview analysis: A discriminative latent space,” Computer Vision and Pattern Recognition (CVPR), pp. 2160-2167, 2012.
- [20] Y. Gong, Q. Ke, M. Isard et al., “A multi-view embedding space for modeling internet images, tags, and their semantics,” Int. J. of Computer Vision, Vol.106, No.2, pp. 210-233, 2014.
- [21] P. Zhou, L. Du, M. Fan et al., “An LLE based Heterogeneous Metric Learning for Cross-media Retrieval,” Proc. of the 2015 SIAM Int. Conf. on Data Mining, pp. 64-72, 2015.
- [22] K. Wang, Q. Yin, W. Wang et al., “A Comprehensive Survey on Cross-modal Retrieval,” arXiv preprint arXiv:1607.06215, 2016.
- [23] S. Hwang and K. Grauman, “Accounting for the Relative Importance of Objects in Image Retrieval,” Proc. of the British Machine Vision Conf. (BMVC), Vol.1, No.2, pp. 5, 2010.
- [24] F. Wu, Z. Yu, Y. Yang et al., “Sparse multi-modal hashing,” IEEE Trans. on Multimedia, Vol.16, No.2, pp. 427-439, 2014.
- [25] G. Andrew, R. Arora, J. Bilmes et al., “Deep Canonical Correlation Analysis,” Proc. of the 30th Int. Conf. on Machine Learning (ICML), 2013.
- [26] L. Wang, W. Sun, Z. Zhao et al., “Modeling intra-and inter-pair correlation via heterogeneous high-order preserving for cross-modal retrieval,” Signal Processing, Vol.131, pp. 249-260, 2017.
- [27] J. Shao, L. Wang, Z. Zhao et al., “Deep canonical correlation analysis with progressive and hypergraph learning for cross-modal retrieval,” Neurocomputing, Vol.214, pp. 618-628, 2016.
- [28] F. R. Chung, “Spectral graph theory,” American Mathematical Society, 1997.
- [29] Y. Wei, Y. Zhao, Z. Zhu et al., “Modality-dependent Cross-media Retrieval,” ACM Trans. on Intelligent Systems and Technology (TIST), Vol.7, No.4, pp. 57, 2016.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.