A Cross-Media Retrieval Algorithm Based on Consistency Preserving of Collaborative Representation

Fei Shang; Huaxiang Zhang; Ji; e Sun; Li Liu; Hui Zeng

doi:10.20965/jaciii.2018.p0280

single-jc.php

« previous

JACIII Vol.22 No.2 pp. 280-289

doi: 10.20965/jaciii.2018.p0280

(2018)

Paper:

Views over last 60 days: 1,976

A Cross-Media Retrieval Algorithm Based on Consistency Preserving of Collaborative Representation

Fei Shang, Huaxiang Zhang^†, Jiande Sun, Li Liu, and Hui Zeng

Department of Computer Science, Shandong Normal University
No. 1, University Road, Changqing District, Jinan 250300, China

^†Corresponding author

Received:

September 18, 2017

Accepted:

February 1, 2018

Published:

March 20, 2018

Keywords:

cross-media retrieval, dictionary learning, collaborative representation, consistency preserving

Abstract

Unlike traditional methods that directly map different modalities into an isomorphic subspace for cross-media retrieval, this paper proposes a cross-media retrieval algorithm based on the consistency of collaborative representation (called CR-CMR). In order to measure the similarity between data coming from different modalities, CR-CMR first takes the advantage of dictionary learning techniques to obtain homogeneous collaborative representation for texts and images, then, it considers the semantic consistency of different modalities simultaneously and maps the collaborative representation coefficients into an isomorphic semantic subspace to conduct cross-media retrieval. Experimental results on three state-of-the-art datasets show that the algorithm is effective.

Cite this article as:

F. Shang, H. Zhang, J. Sun, L. Liu, and H. Zeng, “A Cross-Media Retrieval Algorithm Based on Consistency Preserving of Collaborative Representation,” J. Adv. Comput. Intell. Intell. Inform., Vol.22 No.2, pp. 280-289, 2018.

Data files:

References

[1] M. Zhao, H. Zhang, and L. Meng, “An Angle Structure Descriptor for Image retrieval,” China Communications, Vol.13, No.8, pp. 222-230, 2016.
[2] J. Sun, X. Liu, W. Wan, J. Li, D. Zhao, and H. Zhang, “Video Hashing Based on Appearance and Attention Features Fusion via DBN,” Neurocomputing, Vol.213, pp. 84-94, 2016.
[3] H. Zhang and J. Weng, “Measuring Multi-modality Similarities Via Subspace Learning for Cross-Media Retrieval,” Pacific-Rim Conf. on Multimedia, pp. 979-988, 2006.
[4] H. Zhang, F. Wu, and Y. Zhuang,“Cross-Media Retrieval Method Based on Feature Subspace Learning,” Pattern Recognition and Artificial Intelligence, Vol.21, No.6, pp. 739-745, 2008.
[5] D. Mandal and S. Biswas, “Generalized Coupled Dictionary Learning Approach with Applications to Cross-Modal Matching,” IEEE Trans. on Image Processing, Vol.25, No.8, pp. 3826-3837, 2016.
[6] L. Zhang, M. Yang, and X. Feng, “Sparse Representation or Collaborative Representation: Which Helps Face Recognition?,” 2011 IEEE Int. Conf. on Computer Vision (ICCV), pp. 471-478, 2011.
[7] W. Li, Q. Du, and B. Zhang, “Combined Sparse and Collaborative Representation for Hyperspectral Target Detection,” Pattern Recognition, Vol.48, No.12, pp. 3904-3916, 2015.
[8] C. Wang, H. Yang, and C. Meinel, “Deep Semantic Mapping for Cross-Modal Retrieval,” 2015 IEEE 27th Int. Conf. on Tools with Artificial Intelligence (ICTAI), pp. 234-241, 2015.
[9] G. T. Anand, V. H. Kumar, T. T. Manikandan, T. R. Devi, and A. Umamakeswari, “Cross Media Data Retrieval based on Semantic Consistency,” Indian J. of Science and Technology, Vol.8, No.S9, pp. 292-299, 2015.
[10] K. Wang, Q. Yin, W. Wang, S. Wu, and L. Wang, “A Comprehensive Survey on Cross-Modal Retrievall,” arXiv preprint, arXiv:1607.06215, 2016.
[11] K. Wang, R. He, W. Wang, L.Wang, and T. Tan, “Learning Coupled Feature Spaces for Cross-Modal Matching,” Proc. of the IEEE Int. Conf. on Computer Vision, pp. 2088-2095, 2013.
[12] J. He, B. Ma, S. Wang, Y. Liu, and Q. Huang, “Cross-Modal Retrieval by Real Label Partial Least Squares,” Proc. of the 2016 ACM on Multimedia Conf., pp. 227-231, 2016.
[13] H. Abdi, “Partial Least Squares Regression and Projection on Latent Structure Regression (PLS Regression),” Wiley Interdisciplinary Reviews: Computational Statistics, Vol.2, No.1, pp. 97-106, 2010.
[14] C. Peng, Z. Kang, Y Hu, J. Cheng, and Q. Cheng, “Nonnegative Matrix Factorization with Integrated Graph and Feature Learning,” ACM Trans. on Intelligent Systems and Technology (TIST), Vol.8, No.3, p. 42, 2017.
[15] S. Yang, Z. Yi, M. Ye, and X. He, “Convergence Analysis of Graph Regularized Non-Negative Matrix Factorization,” IEEE Trans. on Knowledge and Data Engineering, Vol.26, No.9, pp. 2151-2165, 2014.
[16] X. Kong, M. K. Ng, and Z. Zhou, “Transductive Multilabel Learning via Label Set Propagation,” IEEE Trans. on Knowledge and Data Engineering, Vol.25, No.3, pp. 704-719, 2013.
[17] P. Xie and E. Xing, “Multi-Modal Distance Metric Learning,” Int. Joint Conf. on Artificial Intelligence, pp. 1806-1812, 2013.
[18] F. Wu, H. Zhang, and Y. Zhuang, “Learning Semantic Correlations for Cross-Media Retrieval,” Int. Conf. on Image Processing, pp. 1465-1468, 2006.
[19] J. Shao, L. Wang, Z. Zhao, F. Su, and A. Cai, “Deep Canonical Correlation Analysis with Progressive and Hypergraph Learning for Cross-Modal Retrieval,” Neurocomputing, Vol.214, pp. 618-628, 2016.
[20] H. Zhang and L. Chen, “Learning Optimal Data Representation for Cross-Media Retrieval,” 19th IEEE Int. Conf. on Image Processing (ICIP), pp. 1925-1928, 2012.
[21] L. Ballan, T. Uricchio, L. Seidenari, and A. D. Bimbo, “A Cross-Media Model for Automatic Image Annotation,” Proc. of Int. Conf. on Multimedia Retrieval, p. 73, 2014.
[22] G. Andrew, R. Arora, J. Bilmes, and K. Livescu, “Deep Canonical Correlation Analysis,” Int. Conf. on Machine Learning, pp. 1247-1255, 2013.
[23] Y. Gong, Q. Ke, M. Isard, and S. Lazebnik, “A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics,” Int. J. of Computer Vision, Vol.106, No.2, pp. 210-233, 2014.
[24] N. Rasiwasia, J. Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos, “A New Approach to Cross-Modal Multimedia Retrieval,” Proc. of the 18th ACM Int. Conf. on Multimedia, pp. 251-260, 2010.
[25] A. Sharma, A. Kumar, H. Daume, and D. W. Jacobs, “Generalized Multiview Analysis: A Discriminative Latent Space,” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2160-2167, 2012.
[26] Y. Wei, Y. Zhao, Z, Zhu, S. Wei, Y. Xiao, J. Feng, and S. Yan, “Modality-Dependent Cross-Media Retrieval,” ACM Trans. on Intelligent Systems and Technology (TIST), Vol.7, No.4, p. 57, 2016.
[27] Y. Zhuang, Y. Wang, F. Wu, Y. Zhang, and W. Lu, “Supervised Coupled Dictionary Learning with Group Structures for Multi-modal Retrieval,” AAAI, pp. 1070-1076, 2013.
[28] Z. Yu, F. Wu, Y. Yang, Q. Tian, J. Luo, and Y. Zhuang, “Discriminative Coupled Dictionary Hashing for Fast Cross-Media Retrieval,” Proc. of the 37th Int. ACM SIGIR Conf. on Research & Development in Information Retrieval, pp. 395-404, 2014.
[29] C. Deng, X. Tang, J. Yan, W. Liu and X. Gao, “Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval,” IEEE Trans. on Multimedia, Vol.18, No.2, pp. 208-218, 2016.
[30] X. Xu, A. Shimada, R, Taniguchi, and L. He, “Coupled Dictionary Learning and Feature Mapping for Cross-Modal Retrieval,” IEEE Int. Conf. on Multimedia and Expo (ICME), pp. 1-6, 2015.
[31] X. Hu, L. Guo, and H. Li, “An Object Tracking Algorithm Combining Spatial Structure and Motion Continuity,” IEEE Int. Conf. on Signal Processing, Communications and Computing (ICSPCC), pp. 1-6, 2016.
[32] J. Basak, K. Kate, V. Tyagi, and N. Ratha, “A Gradient Descent Approach for Multi-modal Biometric Identification,” 20th Int. Conf. on Pattern Recognition (ICPR), pp. 1322-1325, 2010.
[33] N. Rasiwasia, D. Mahajan, V. Mahadevan, and G. Aggarwal, “Cluster Canonical Correlation Analysis,” Artificial Intelligence and Statistics, pp. 823-831, 2014.
[34] K. Chaudhuri, S. Kakade, K. Livescu, and K. Sridharan, “Multi-View Clustering via Canonical Correlation Analysis,” Proc. of the 26th Annual Int. Conf. on Machine Learning, pp. 129-136, 2009.
[35] D. Xu, S. Yan, D. Tao, S. Lin, and H.-J. Zhang, “Marginal Fisher Analysis and its Variants for Human Gait Recognition and Content- Based Image Retrieval,” IEEE Trans. on Image Processing, Vol.16, No.11, pp. 2811-2821, 2007.
[36] A. Sharma and K. Paliwal, “A Deterministic Approach to Regularized Linear Discriminant Analysis,” Neurocomputing, Vol.151, pp. 207-214, 2015.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] M. Zhao, H. Zhang, and L. Meng, “An Angle Structure Descriptor for Image retrieval,” China Communications, Vol.13, No.8, pp. 222-230, 2016.

[2] [2] J. Sun, X. Liu, W. Wan, J. Li, D. Zhao, and H. Zhang, “Video Hashing Based on Appearance and Attention Features Fusion via DBN,” Neurocomputing, Vol.213, pp. 84-94, 2016.

[3] [3] H. Zhang and J. Weng, “Measuring Multi-modality Similarities Via Subspace Learning for Cross-Media Retrieval,” Pacific-Rim Conf. on Multimedia, pp. 979-988, 2006.

[4] [4] H. Zhang, F. Wu, and Y. Zhuang,“Cross-Media Retrieval Method Based on Feature Subspace Learning,” Pattern Recognition and Artificial Intelligence, Vol.21, No.6, pp. 739-745, 2008.

[5] [5] D. Mandal and S. Biswas, “Generalized Coupled Dictionary Learning Approach with Applications to Cross-Modal Matching,” IEEE Trans. on Image Processing, Vol.25, No.8, pp. 3826-3837, 2016.

[6] [6] L. Zhang, M. Yang, and X. Feng, “Sparse Representation or Collaborative Representation: Which Helps Face Recognition?,” 2011 IEEE Int. Conf. on Computer Vision (ICCV), pp. 471-478, 2011.

[7] [7] W. Li, Q. Du, and B. Zhang, “Combined Sparse and Collaborative Representation for Hyperspectral Target Detection,” Pattern Recognition, Vol.48, No.12, pp. 3904-3916, 2015.

[8] [8] C. Wang, H. Yang, and C. Meinel, “Deep Semantic Mapping for Cross-Modal Retrieval,” 2015 IEEE 27th Int. Conf. on Tools with Artificial Intelligence (ICTAI), pp. 234-241, 2015.

[9] [9] G. T. Anand, V. H. Kumar, T. T. Manikandan, T. R. Devi, and A. Umamakeswari, “Cross Media Data Retrieval based on Semantic Consistency,” Indian J. of Science and Technology, Vol.8, No.S9, pp. 292-299, 2015.

[10] [10] K. Wang, Q. Yin, W. Wang, S. Wu, and L. Wang, “A Comprehensive Survey on Cross-Modal Retrievall,” arXiv preprint, arXiv:1607.06215, 2016.

[11] [11] K. Wang, R. He, W. Wang, L.Wang, and T. Tan, “Learning Coupled Feature Spaces for Cross-Modal Matching,” Proc. of the IEEE Int. Conf. on Computer Vision, pp. 2088-2095, 2013.

[12] [12] J. He, B. Ma, S. Wang, Y. Liu, and Q. Huang, “Cross-Modal Retrieval by Real Label Partial Least Squares,” Proc. of the 2016 ACM on Multimedia Conf., pp. 227-231, 2016.

[13] [13] H. Abdi, “Partial Least Squares Regression and Projection on Latent Structure Regression (PLS Regression),” Wiley Interdisciplinary Reviews: Computational Statistics, Vol.2, No.1, pp. 97-106, 2010.

[14] [14] C. Peng, Z. Kang, Y Hu, J. Cheng, and Q. Cheng, “Nonnegative Matrix Factorization with Integrated Graph and Feature Learning,” ACM Trans. on Intelligent Systems and Technology (TIST), Vol.8, No.3, p. 42, 2017.

[15] [15] S. Yang, Z. Yi, M. Ye, and X. He, “Convergence Analysis of Graph Regularized Non-Negative Matrix Factorization,” IEEE Trans. on Knowledge and Data Engineering, Vol.26, No.9, pp. 2151-2165, 2014.

[16] [16] X. Kong, M. K. Ng, and Z. Zhou, “Transductive Multilabel Learning via Label Set Propagation,” IEEE Trans. on Knowledge and Data Engineering, Vol.25, No.3, pp. 704-719, 2013.

[17] [17] P. Xie and E. Xing, “Multi-Modal Distance Metric Learning,” Int. Joint Conf. on Artificial Intelligence, pp. 1806-1812, 2013.

[18] [18] F. Wu, H. Zhang, and Y. Zhuang, “Learning Semantic Correlations for Cross-Media Retrieval,” Int. Conf. on Image Processing, pp. 1465-1468, 2006.

[19] [19] J. Shao, L. Wang, Z. Zhao, F. Su, and A. Cai, “Deep Canonical Correlation Analysis with Progressive and Hypergraph Learning for Cross-Modal Retrieval,” Neurocomputing, Vol.214, pp. 618-628, 2016.

[20] [20] H. Zhang and L. Chen, “Learning Optimal Data Representation for Cross-Media Retrieval,” 19th IEEE Int. Conf. on Image Processing (ICIP), pp. 1925-1928, 2012.

[21] [21] L. Ballan, T. Uricchio, L. Seidenari, and A. D. Bimbo, “A Cross-Media Model for Automatic Image Annotation,” Proc. of Int. Conf. on Multimedia Retrieval, p. 73, 2014.

[22] [22] G. Andrew, R. Arora, J. Bilmes, and K. Livescu, “Deep Canonical Correlation Analysis,” Int. Conf. on Machine Learning, pp. 1247-1255, 2013.

[23] [23] Y. Gong, Q. Ke, M. Isard, and S. Lazebnik, “A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics,” Int. J. of Computer Vision, Vol.106, No.2, pp. 210-233, 2014.

[24] [24] N. Rasiwasia, J. Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos, “A New Approach to Cross-Modal Multimedia Retrieval,” Proc. of the 18th ACM Int. Conf. on Multimedia, pp. 251-260, 2010.

[25] [25] A. Sharma, A. Kumar, H. Daume, and D. W. Jacobs, “Generalized Multiview Analysis: A Discriminative Latent Space,” IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2160-2167, 2012.

[26] [26] Y. Wei, Y. Zhao, Z, Zhu, S. Wei, Y. Xiao, J. Feng, and S. Yan, “Modality-Dependent Cross-Media Retrieval,” ACM Trans. on Intelligent Systems and Technology (TIST), Vol.7, No.4, p. 57, 2016.

[27] [27] Y. Zhuang, Y. Wang, F. Wu, Y. Zhang, and W. Lu, “Supervised Coupled Dictionary Learning with Group Structures for Multi-modal Retrieval,” AAAI, pp. 1070-1076, 2013.

[28] [28] Z. Yu, F. Wu, Y. Yang, Q. Tian, J. Luo, and Y. Zhuang, “Discriminative Coupled Dictionary Hashing for Fast Cross-Media Retrieval,” Proc. of the 37th Int. ACM SIGIR Conf. on Research & Development in Information Retrieval, pp. 395-404, 2014.

[29] [29] C. Deng, X. Tang, J. Yan, W. Liu and X. Gao, “Discriminative Dictionary Learning With Common Label Alignment for Cross-Modal Retrieval,” IEEE Trans. on Multimedia, Vol.18, No.2, pp. 208-218, 2016.

[30] [30] X. Xu, A. Shimada, R, Taniguchi, and L. He, “Coupled Dictionary Learning and Feature Mapping for Cross-Modal Retrieval,” IEEE Int. Conf. on Multimedia and Expo (ICME), pp. 1-6, 2015.

[31] [31] X. Hu, L. Guo, and H. Li, “An Object Tracking Algorithm Combining Spatial Structure and Motion Continuity,” IEEE Int. Conf. on Signal Processing, Communications and Computing (ICSPCC), pp. 1-6, 2016.

[32] [32] J. Basak, K. Kate, V. Tyagi, and N. Ratha, “A Gradient Descent Approach for Multi-modal Biometric Identification,” 20th Int. Conf. on Pattern Recognition (ICPR), pp. 1322-1325, 2010.

[33] [33] N. Rasiwasia, D. Mahajan, V. Mahadevan, and G. Aggarwal, “Cluster Canonical Correlation Analysis,” Artificial Intelligence and Statistics, pp. 823-831, 2014.

[34] [34] K. Chaudhuri, S. Kakade, K. Livescu, and K. Sridharan, “Multi-View Clustering via Canonical Correlation Analysis,” Proc. of the 26th Annual Int. Conf. on Machine Learning, pp. 129-136, 2009.

[35] [35] D. Xu, S. Yan, D. Tao, S. Lin, and H.-J. Zhang, “Marginal Fisher Analysis and its Variants for Human Gait Recognition and Content- Based Image Retrieval,” IEEE Trans. on Image Processing, Vol.16, No.11, pp. 2811-2821, 2007.

[36] [36] A. Sharma and K. Paliwal, “A Deterministic Approach to Regularized Linear Discriminant Analysis,” Neurocomputing, Vol.151, pp. 207-214, 2015.

A Cross-Media Retrieval Algorithm Based on Consistency Preserving of Collaborative Representation

Fei Shang, Huaxiang Zhang†, Jiande Sun, Li Liu, and Hui Zeng

Fei Shang, Huaxiang Zhang^†, Jiande Sun, Li Liu, and Hui Zeng