Joint Graph Regularization in a Homogeneous Subspace for Cross-Media Retrieval

Yudan Qi; Huaxiang Zhang

doi:10.20965/jaciii.2019.p0939

single-jc.php

« previous

JACIII Vol.23 No.5 pp. 939-946

doi: 10.20965/jaciii.2019.p0939

(2019)

Paper:

Views over last 60 days: 1,098

Joint Graph Regularization in a Homogeneous Subspace for Cross-Media Retrieval

Yudan Qi and Huaxiang Zhang^†

Department of Computer Science, Shandong Normal University
No.1 University Road, Changqing District, Jinan, Shandong 250358, China

^†Corresponding author

Received:

September 30, 2018

Accepted:

May 1, 2019

Published:

September 20, 2019

Keywords:

cross-media retrieval, subspace learning, semi-supervised, graph Laplacian

Abstract

The heterogeneity of multimodal data is the main challenge in cross-media retrieval; many methods have already been developed to address the problem. At present, subspace learning is one of the mainstream approaches for cross-media retrieval; its aim is to learn a latent shared subspace so that similarities within cross-modal data can be measured in this subspace. However, most existing subspace learning algorithms only focus on supervised information, using labeled data for training to obtain one pair of mapping matrices. In this paper, we propose joint graph regularization based on semi-supervised learning cross-media retrieval (JGRHS), which makes full use of labeled and unlabeled data. We jointly considered correlation analysis and semantic information when learning projection matrices to maintain the closeness of pairwise data and semantic consistency; graph regularization is used to make learned transformation consistent with similarity constraints in both modalities. In addition, the retrieval results on three datasets indicate that the proposed method achieves good efficiency in theoretical research and practical applications.

Cite this article as:

Y. Qi and H. Zhang, “Joint Graph Regularization in a Homogeneous Subspace for Cross-Media Retrieval,” J. Adv. Comput. Intell. Intell. Inform., Vol.23 No.5, pp. 939-946, 2019.

Data files:

References

[1] X. Chang, Z. Ma, Y. Yang et al., “Bi-Level Semantic Representation Analysis for Multimedia Event Detection,” IEEE Trans. on Cybernetics, Vol.47, Issue 5, pp. 1180-1197, 2017.
[2] F. Wu, H. Zhang, and Y. Zhuang, “Learning Semantic Correlations for Cross-Media Retrieval,” Proc. of the 2006 Int. Conf. on Image Processing, pp. 1465-1468, 2006.
[3] X. Chang, Y. Yu, Y. Yang, and E. P. Xing, “Semantic Pooling for Complex Event Analysis in Untrimmed Videos,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.39, Issue 8, pp. 1617-1632, 2017.
[4] N. Rasiwasia, J. Costa Pereira, E. Coviello et al., “A new approach to cross-modal multimedia retrieval,” Proc. of the 18th ACM Int. Conf. on Multimedia, pp. 251-260, 2010.
[5] M. Luo, X. Chang, Z. Li et al., “Simple to Complex cross-modal learning to rank,” Computer Vision and Image Understanding, Vol.163, pp. 67-77, 2017.
[6] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical Correlation Analysis: An Overview with Application to Learning Methods,” Neural Computation, Vol.16, No.12, pp. 2639-2664, 2004.
[7] A. Sharma, A. Kumar, H. Daume, and D. W. Jacobs, “Generalized Multiview Analysis: A discriminative latent space,” Proc, of the 2012 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2160-2167, 2012.
[8] Y. Jia, M. Salzmann, and T. Darrell, “Learning cross-modality similarity for multinomial data,” Proc. of the 2011 Int. Conf. on Computer Vision, pp. 2407-2414, 2011.
[9] C. Kang, S. Xiang, S. Liao et al., “Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval,” IEEE Trans. on Multimedia, Vol.17, Issue 3, pp. 370-381, 2015.
[10] J. B. Tenenbaum and W. T. Freeman, “Separating Style and Content with Bilinear Models,” Neural Computation, Vol.12, Issue 6, pp. 1247-1283, 2000.
[11] C. Hou, F. Nie, X. Li et al., “Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection,” IEEE Trans. on Cybernetics, Vol.44, Issue 6, pp. 793-804, 2014.
[12] J. Liang, Z. Li, D. Cao et al., “Self-Paced Cross-Modal Subspace Matching, Proc. of the 39th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR’16), pp. 569-578, 2016.
[13] Y. Wei, Y. Zhao, C. Lu et al., “Cross-Modal Retrieval with CNN Visual Features: A New Baseline,” IEEE Trans. on Cybernetics, Vol.47, Issue 2, pp. 449-460, 2017.
[14] Y. Wei, Y. Zhao, Z. Zhu et al., “Modality-Dependent Cross-Media Retrieval,” ACM Trans. on Intelligent Systems and Technology, Vol.7, Issue 4, Article No.57, 2016.
[15] J. He, B. Ma, S. Wang et al., “Cross-modal Retrieval by Real Label Partial Least Squares,” Proc. of the 24th ACM Int. Conf. on Multimedia (MM’16), pp. 227-231, 2016.
[16] X. Zhai, Y. Peng, and J. Xiao, “Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval,” Proc. of the 27th AAAI Conf. on Artificial Intelligence, pp. 1198-1204, 2013.
[17] K. Wang, R. He, and L. Wang, “Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.38, Issue 10, pp. 2010-2023, 2016.
[18] D. Li, N. Dimitrova, M. Li, and I. K. Sethi, “Multimedia content processing through cross-modal association,” Proc. of the 11th ACM Int. Conf. on Multimedia (MULTIMEDIA’03), pp. 604-611, 2003.
[19] X. Chang and Y. Yang, “Semi-supervised Feature Analysis by Mining Correlations Among Multiple Tasks,” IEEE Trans. on Neural Networks and Learning Systems, Vol.28, Issue 10, pp. 2294-2305, 2017.
[20] Y. Peng, X. Zhai, Y. Zhao, and X. Huang, “Semi-Supervised Cross-Media Feature Learning with Unified Patch Graph Regularization,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.26, Issue 3, pp. 583-596, 2016.
[21] X. Zhai, Y. Peng, and J. Xiao, “Learning Cross-Media Joint Representation with Sparse and Semi-Supervised Regularization,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.24, Issue 6, pp. 965-978, 2014.
[22] L. Zhang, B. Ma, G. Li et al, “Generalized Semi-supervised and Structured Subspace Learning for Cross-Modal Retrieval,” IEEE Trans. on Multimedia, Vol.20, Issue 1, pp. 128-141, 2017.
[23] H. Abdi, “Partial Least Square Regression,” N. J. Salkind (Ed.), Encyclopedia of Social Sciences Research Methods,” pp. 741-744, SAGE Publishing, 2007.
[24] J. C. Pereira, E. Coviello, G. Doyle et al., “On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.36, Issue 3, pp. 521-35, 2013.
[25] J. Wu, Z. Lin, and H. Zha, “Joint Latent Subspace Learning and Regression for Cross-Modal Retrieval,” Proc. of the 40th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR’17), pp. 917-920, 2017.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] X. Chang, Z. Ma, Y. Yang et al., “Bi-Level Semantic Representation Analysis for Multimedia Event Detection,” IEEE Trans. on Cybernetics, Vol.47, Issue 5, pp. 1180-1197, 2017.

[2] [2] F. Wu, H. Zhang, and Y. Zhuang, “Learning Semantic Correlations for Cross-Media Retrieval,” Proc. of the 2006 Int. Conf. on Image Processing, pp. 1465-1468, 2006.

[3] [3] X. Chang, Y. Yu, Y. Yang, and E. P. Xing, “Semantic Pooling for Complex Event Analysis in Untrimmed Videos,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.39, Issue 8, pp. 1617-1632, 2017.

[4] [4] N. Rasiwasia, J. Costa Pereira, E. Coviello et al., “A new approach to cross-modal multimedia retrieval,” Proc. of the 18th ACM Int. Conf. on Multimedia, pp. 251-260, 2010.

[5] [5] M. Luo, X. Chang, Z. Li et al., “Simple to Complex cross-modal learning to rank,” Computer Vision and Image Understanding, Vol.163, pp. 67-77, 2017.

[6] [6] D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical Correlation Analysis: An Overview with Application to Learning Methods,” Neural Computation, Vol.16, No.12, pp. 2639-2664, 2004.

[7] [7] A. Sharma, A. Kumar, H. Daume, and D. W. Jacobs, “Generalized Multiview Analysis: A discriminative latent space,” Proc, of the 2012 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 2160-2167, 2012.

[8] [8] Y. Jia, M. Salzmann, and T. Darrell, “Learning cross-modality similarity for multinomial data,” Proc. of the 2011 Int. Conf. on Computer Vision, pp. 2407-2414, 2011.

[9] [9] C. Kang, S. Xiang, S. Liao et al., “Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval,” IEEE Trans. on Multimedia, Vol.17, Issue 3, pp. 370-381, 2015.

[10] [10] J. B. Tenenbaum and W. T. Freeman, “Separating Style and Content with Bilinear Models,” Neural Computation, Vol.12, Issue 6, pp. 1247-1283, 2000.

[11] [11] C. Hou, F. Nie, X. Li et al., “Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection,” IEEE Trans. on Cybernetics, Vol.44, Issue 6, pp. 793-804, 2014.

[12] [12] J. Liang, Z. Li, D. Cao et al., “Self-Paced Cross-Modal Subspace Matching, Proc. of the 39th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR’16), pp. 569-578, 2016.

[13] [13] Y. Wei, Y. Zhao, C. Lu et al., “Cross-Modal Retrieval with CNN Visual Features: A New Baseline,” IEEE Trans. on Cybernetics, Vol.47, Issue 2, pp. 449-460, 2017.

[14] [14] Y. Wei, Y. Zhao, Z. Zhu et al., “Modality-Dependent Cross-Media Retrieval,” ACM Trans. on Intelligent Systems and Technology, Vol.7, Issue 4, Article No.57, 2016.

[15] [15] J. He, B. Ma, S. Wang et al., “Cross-modal Retrieval by Real Label Partial Least Squares,” Proc. of the 24th ACM Int. Conf. on Multimedia (MM’16), pp. 227-231, 2016.

[16] [16] X. Zhai, Y. Peng, and J. Xiao, “Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval,” Proc. of the 27th AAAI Conf. on Artificial Intelligence, pp. 1198-1204, 2013.

[17] [17] K. Wang, R. He, and L. Wang, “Joint Feature Selection and Subspace Learning for Cross-Modal Retrieval,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.38, Issue 10, pp. 2010-2023, 2016.

[18] [18] D. Li, N. Dimitrova, M. Li, and I. K. Sethi, “Multimedia content processing through cross-modal association,” Proc. of the 11th ACM Int. Conf. on Multimedia (MULTIMEDIA’03), pp. 604-611, 2003.

[19] [19] X. Chang and Y. Yang, “Semi-supervised Feature Analysis by Mining Correlations Among Multiple Tasks,” IEEE Trans. on Neural Networks and Learning Systems, Vol.28, Issue 10, pp. 2294-2305, 2017.

[20] [20] Y. Peng, X. Zhai, Y. Zhao, and X. Huang, “Semi-Supervised Cross-Media Feature Learning with Unified Patch Graph Regularization,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.26, Issue 3, pp. 583-596, 2016.

[21] [21] X. Zhai, Y. Peng, and J. Xiao, “Learning Cross-Media Joint Representation with Sparse and Semi-Supervised Regularization,” IEEE Trans. on Circuits and Systems for Video Technology, Vol.24, Issue 6, pp. 965-978, 2014.

[22] [22] L. Zhang, B. Ma, G. Li et al, “Generalized Semi-supervised and Structured Subspace Learning for Cross-Modal Retrieval,” IEEE Trans. on Multimedia, Vol.20, Issue 1, pp. 128-141, 2017.

[23] [23] H. Abdi, “Partial Least Square Regression,” N. J. Salkind (Ed.), Encyclopedia of Social Sciences Research Methods,” pp. 741-744, SAGE Publishing, 2007.

[24] [24] J. C. Pereira, E. Coviello, G. Doyle et al., “On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.36, Issue 3, pp. 521-35, 2013.

[25] [25] J. Wu, Z. Lin, and H. Zha, “Joint Latent Subspace Learning and Regression for Cross-Modal Retrieval,” Proc. of the 40th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR’17), pp. 917-920, 2017.

Joint Graph Regularization in a Homogeneous Subspace for Cross-Media Retrieval

Yudan Qi and Huaxiang Zhang†

Yudan Qi and Huaxiang Zhang^†