Image Crowd Counting Using Convolutional Neural Network and Markov Random Field

Kang Han; Wanggen Wan; Haiyan Yao; Li Hou

doi:10.20965/jaciii.2017.p0632

single-jc.php

« previous

JACIII Vol.21 No.4 pp. 632-638

doi: 10.20965/jaciii.2017.p0632

(2017)

Paper:

Views over last 60 days: 1,457

Image Crowd Counting Using Convolutional Neural Network and Markov Random Field

Kang Han, Wanggen Wan, Haiyan Yao, and Li Hou

School of Communication and Information Engineering, Shanghai University
Institute of Smart City, Shanghai University
99 Shangda Road, BaoShan District, Shanghai 200444, China

Received:

January 25, 2017

Accepted:

May 10, 2017

Published:

July 20, 2017

Keywords:

crowd counting, convolutional neural network, Markov random field

Abstract

In this paper, we propose a method called Convolutional Neural Network-Markov Random Field (CNN-MRF) to estimate the crowd count in a still image. We first divide the dense crowd visible image into overlapping patches and then use a deep convolutional neural network to extract features from each patch image, followed by a fully connected neural network to regress the local patch crowd count. Since the local patches have overlapping portions, the crowd count of the adjacent patches has a high correlation. We use this correlation and the Markov random field to smooth the counting results of the local patches. Experiments show that our approach significantly outperforms the state-of-the-art methods on UCF and Shanghaitech crowd counting datasets.

Cite this article as:

K. Han, W. Wan, H. Yao, and L. Hou, “Image Crowd Counting Using Convolutional Neural Network and Markov Random Field,” J. Adv. Comput. Intell. Intell. Inform., Vol.21 No.4, pp. 632-638, 2017.

Data files:

References

[1] S. A. M. Saleh, S. A. Suandi, and H. Ibrahim, “Recent survey on crowd density estimation and counting for visual surveillance,” Engineering Applications of Artificial Intelligence, Vol.41, pp. 103-114, 2015.
[2] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” 2005 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1, pp. 886-893, 2005.
[3] R. Stewart, M. Andriluka, and A. Y. Ng, “End-to-end people detection in crowded scenes,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2325-2333, 2016.
[4] Y. Yasuoka, Y. Shinomiya, and Y. Hoshino, “Simulation of Human Detection System Using BRIEF and Neural Network,” J. Adv. Comput. Intell. Intell. Inform., Vol.20, No.7, pp. 1159-1164, 2016.
[5] J. Ma, Y. Dai, and K. Hirota, “A Survey of Video-Based Crowd Anomaly Detection in Dense Scenes,” J. Adv. Comput. Intell. Intell. Inform., Vol.21, No.2, pp. 235-246, 2017.
[6] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. on pattern analysis and machine intelligence, Vol.24, No.7, pp. 971-987, 2002.
[7] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. of computer vision, Vol.60, No.2, pp. 91-110, 2004.
[8] Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 589-597, 2016.
[9] H. Idrees, I. Saleemi, C. Seibert, and M. Shah, “Multi-source multi-scale counting in extremely dense crowd images,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2547-2554, 2013.
[10] M. Li, Z. Zhang, K. Huang, and T. Tan, “Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection,” 19th Int. Conf. on Pattern Recognition (ICPR 2008), pp. 1-4. IEEE, 2008.
[11] A. M. Cheriyadat, B. L. Bhaduri, and R. J. Radke, “Detecting multiple moving objects in crowded environments with coherent motion regions,” IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW’08), pp. 1-8, 2008.
[12] G. J. Brostow and R. Cipolla, “Unsupervised bayesian detection of independent motion in crowds,” 2006 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR’06), Vol. 1, pp. 594-601, 2006.
[13] V. Lempitsky and A. Zisserman, “Learning to count objects in images,” Advances in Neural Information Processing Systems, pp. 1324-1332, 2010.
[14] C. Shang, H. Ai, and B. Bai, “End-to-end crowd counting via joint learning local and global count,” 2016 IEEE Int. Conf. on Image Processing (ICIP), pp. 1215-1219, 2016.
[15] C. Zhang, H. Li, X. Wang, and X. Yang, “Cross-scene crowd counting via deep convolutional neural networks,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 833-841, 2015.
[16] M. Rodriguez, I. Laptev, J. Sivic, and J.-Y. Audibert, “Density-aware person detection and tracking in crowds,” 2011 Int. Conf. on Computer Vision, pp. 2423-2430, 2011.
[17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv preprint, arXiv:1512.03385, 2015.
[18] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient belief propagation for early vision,” Int. J. of computer vision, Vol.70, No.1, pp. 41-54, 2006.
[19] A. Vedaldi and K. Lenc, “Matconvnet: Convolutional neural networks for matlab,” Proc. of the 23rd ACM Int. Conf. on Multimedia, pp. 689-692, 2015.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] S. A. M. Saleh, S. A. Suandi, and H. Ibrahim, “Recent survey on crowd density estimation and counting for visual surveillance,” Engineering Applications of Artificial Intelligence, Vol.41, pp. 103-114, 2015.

[2] [2] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” 2005 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR’05), Vol. 1, pp. 886-893, 2005.

[3] [3] R. Stewart, M. Andriluka, and A. Y. Ng, “End-to-end people detection in crowded scenes,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2325-2333, 2016.

[4] [4] Y. Yasuoka, Y. Shinomiya, and Y. Hoshino, “Simulation of Human Detection System Using BRIEF and Neural Network,” J. Adv. Comput. Intell. Intell. Inform., Vol.20, No.7, pp. 1159-1164, 2016.

[5] [5] J. Ma, Y. Dai, and K. Hirota, “A Survey of Video-Based Crowd Anomaly Detection in Dense Scenes,” J. Adv. Comput. Intell. Intell. Inform., Vol.21, No.2, pp. 235-246, 2017.

[6] [6] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. on pattern analysis and machine intelligence, Vol.24, No.7, pp. 971-987, 2002.

[7] [7] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. of computer vision, Vol.60, No.2, pp. 91-110, 2004.

[8] [8] Y. Zhang, D. Zhou, S. Chen, S. Gao, and Y. Ma, “Single-image crowd counting via multi-column convolutional neural network,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 589-597, 2016.

[9] [9] H. Idrees, I. Saleemi, C. Seibert, and M. Shah, “Multi-source multi-scale counting in extremely dense crowd images,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2547-2554, 2013.

[10] [10] M. Li, Z. Zhang, K. Huang, and T. Tan, “Estimating the number of people in crowded scenes by mid based foreground segmentation and head-shoulder detection,” 19th Int. Conf. on Pattern Recognition (ICPR 2008), pp. 1-4. IEEE, 2008.

[11] [11] A. M. Cheriyadat, B. L. Bhaduri, and R. J. Radke, “Detecting multiple moving objects in crowded environments with coherent motion regions,” IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops (CVPRW’08), pp. 1-8, 2008.

[12] [12] G. J. Brostow and R. Cipolla, “Unsupervised bayesian detection of independent motion in crowds,” 2006 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR’06), Vol. 1, pp. 594-601, 2006.

[13] [13] V. Lempitsky and A. Zisserman, “Learning to count objects in images,” Advances in Neural Information Processing Systems, pp. 1324-1332, 2010.

[14] [14] C. Shang, H. Ai, and B. Bai, “End-to-end crowd counting via joint learning local and global count,” 2016 IEEE Int. Conf. on Image Processing (ICIP), pp. 1215-1219, 2016.

[15] [15] C. Zhang, H. Li, X. Wang, and X. Yang, “Cross-scene crowd counting via deep convolutional neural networks,” Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 833-841, 2015.

[16] [16] M. Rodriguez, I. Laptev, J. Sivic, and J.-Y. Audibert, “Density-aware person detection and tracking in crowds,” 2011 Int. Conf. on Computer Vision, pp. 2423-2430, 2011.

[17] [17] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv preprint, arXiv:1512.03385, 2015.

[18] [18] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient belief propagation for early vision,” Int. J. of computer vision, Vol.70, No.1, pp. 41-54, 2006.

[19] [19] A. Vedaldi and K. Lenc, “Matconvnet: Convolutional neural networks for matlab,” Proc. of the 23rd ACM Int. Conf. on Multimedia, pp. 689-692, 2015.