Paper:
Local Mixer with Prior Position for Cars’ Type Recognition
Bin Cao, Hongbin Ma, and Ying Jin
Beijing Institute of Technology
Haidian District, Beijing 100081, China
Corresponding author
Deep learning has attracted attention widely as the successful application of deep learning for vision tasks, such as image classification, object detection and so on. Due to the robustness and universality of deep learning, automotive manufacturing, a crucial part of national economy, needs deep learning to make production lines more intelligent and improve efficiency. However, some superior generally deep learning models, such as ViT, TNT, and Swin transformer, cannot meet automotive manufacturing requirements with high accuracy on a specific scene. As for automotive production lines, engineers usually adopt some smart designs, which can provide prior knowledge for designing deep learning models. Specifically, in an image, the position of target is usually fixed. Therefore, in order to take advantage of prior position, this paper designs a local mixer with prior position to capture local feature. Its main idea is that dividing the whole feature map into window feature maps and connecting window feature maps along channel dimension in order to make convolution kernel parameters for each window feature map are independent from others. Besides, MLP is adopted as global mixer to capture global feature and the pyramidal architecture with CNN is adopted. Comprehensive results demonstrate the effectiveness of proposed model on cars’ type recognition. In particular, the proposed model achieves 97.938% accuracy on our data set, surpassing some transformer-like models.
- [1] W. Sun, X. Zhang, X. He et al., “A Two-Stage Vehicle Type Recognition Method Combining the Most Effective Gabor Features,” Computers, Materials & Continua, Vol.65, No.3, pp. 2489-2510, 2020.
- [2] A. D’Eusanio, A. Simoni, S. Pini et al., “A Transformer-based network for dynamic hand gesture recognition,” 2020 Int. Conf. on 3D Vision (3DV), pp. 623-632, 2020.
- [3] X. Ding, J. Wang, C. Dong, and Y. Huang, “Vehicle type recognition from surveillance data based on deep active learning,” IEEE Trans. on Vehicular Technology, Vol.69, No.3, pp. 2477-2486, 2020.
- [4] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.24, No.7, pp. 971-987, 2002.
- [5] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” 2005 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR’05), pp. 886-893, 2005.
- [6] H. Touvron, M. Cord, M. Douze et al., “Training data-efficient image transformers & distillation through attention,” Proc. of the 38th Int. Conf. on Machine Learning, Vol.139, pp. 10347-10357, 2021.
- [7] X. Ke and Y. Zhang, “Fine-grained vehicle type detection and recognition based on dense attention network,” Neurocomputing, Vol.399, pp. 247-257, 2020.
- [8] J. Deng, W. Dong, R. Socher et al., “ImageNet: A large-scale hierarchical image database,” 2009 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 248-255, 2009.
- [9] A. Dosovitskiy, L. Beyer, A. Kolesnikov et al., “An image is worth 16 × 16 words: Transformers for image recognition at scale,” arXiv:2010.11929, 2020.
- [10] K. Han, A. Xiao, E. Wu et al., “Transformer in transformer,” Advances in Neural Information Processing Systems 34 (NeurIPS 2021), pp. 15908-15919, 2021.
- [11] Z. Liu, Y. Lin, Y. Cao et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” 2021 IEEE/CVF Int. Conf. on Computer Vision (ICCV), pp. 9992-10002, 2021.
- [12] I. O. Tolstikhin, N. Houlsby, A. Kolesnikov et al., “MLP-Mixer: An all-MLP architecture for vision,” Advances in Neural Information Processing Systems 34 (NeurIPS 2021), pp. 24261-24272, 2021.
- [13] C. Tang, Y. Zhao, G. Wang et al., “Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?,” arXiv:2109.05422, 2021.
- [14] J. Guo, Y. Tang, K. Han et al., “Hire-MLP: Vision MLP via hierarchical rearrangement,” arXiv:2108.13341, 2021.
- [15] S. Guan, B. Liao, Y. Du, and X. Yin, “Vehicle type recognition based on Radon-CDT hybrid transfer learning,” 2019 IEEE 10th Int. Conf. on Software Engineering and Service Science (ICSESS), doi: 10.1109/ICSESS47205.2019.9040687, 2019.
- [16] J. Kim, J.-Y. Sung, and S.-H. Park, “Comparison of Faster-RCNN, YOLO, and SSD for real-time vehicle type recognition,” 2020 IEEE Int. Conf. on Consumer Electronics - Asia (ICCE-Asia), doi: 10.1109/ICCE-Asia49877.2020.9277040, 2020.
- [17] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems 25 (NIPS 2012), Article No.534, 2012.
- [18] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556, 2014.
- [19] C. Tang, Y. Zhao, G. Wang et al., “Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?,” arXiv:2109.05422, 2021.
- [20] C. Szegedy, W. Liu, Y. Jia et al., “Going deeper with convolutions,” 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1-9, 2015.
- [21] X. Ding, C. Xia, X. Zhang et al., “RepMLP: Re-parameterizing convolutions into fully-connected layers for image recognition,” arXiv:2105.01883, 2021.
- [22] Z. Liu, H. Mao, C.-Y. Wu et al., “A ConvNet for the 2020s,” arXiv:2201.03545, 2022.
- [23] A. Trockman and J. Z. Kolter, “Patches Are All You Need?,” arXiv:2201.09792, 2022.
- [24] T. He, Z. Zhang, H. Zhang et al., “Bag of tricks for image classification with convolutional neural networks,” 2019 IEEE/CVF Conf. on CVPR, pp. 558-567, 2019.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.