single-jc.php

JACIII Vol.29 No.1 pp. 152-157
doi: 10.20965/jaciii.2025.p0152
(2025)

Research Paper:

Research on the Recognition of Piano-Playing Notes by a Music Transcription Algorithm

Ruosi Guo*,† and Yongjian Zhu**

*College of Art, Hebei Agricultural University
No.289 Lingyusi Street, Baoding, Hebei 071000, China

Corresponding author

**College of Music and Dance, Baoding University
71 East 3027 Road, Baoding, Hebei 071000, China

Received:
September 4, 2024
Accepted:
November 5, 2024
Published:
January 20, 2025
Keywords:
music transcription algorithm, piano, spectrogram, convolutional neural network, pitch
Abstract

With the deepening research on music works, music transcription algorithms have been increasingly studied. This study examined the recognition of piano-playing notes using a music-transcription algorithm. First, the characteristics of MelSpec, LogSpec, and the constant Q-transform (CQT) are briefly introduced. Then, a convolutional recurrent neural network (CRNN) transcription algorithm, which includes four convolutional blocks and one bidirectional long short-term memory (BiLSTM) structure, was designed. The recognition performance of this method was analyzed using the MAPS dataset. LogSpec was found to have the best recognition performance for piano-playing notes when used as an input feature. In the CRNN structure, the recognition performance for piano-playing notes was the best when four convolutional blocks were used. Compared with the convolutional neural network (CNN), BiLSTM, and CNN-hidden Markov model algorithms, the F1-values of the CRNN algorithm were 84.9%, 92.24%, and 79.27% for frames, notes, and offsets, respectively, achieving the best recognition results. The results verify that the CRNN transcription algorithm is effective for the recognition of piano-playing notes and can be applied in practice.

Cite this article as:
R. Guo and Y. Zhu, “Research on the Recognition of Piano-Playing Notes by a Music Transcription Algorithm,” J. Adv. Comput. Intell. Intell. Inform., Vol.29 No.1, pp. 152-157, 2025.
Data files:
References
  1. [1] G. Agarwal and H. Om, “An efficient supervised framework for music mood recognition using autoencoder-based optimised support vector regression model,” IET Signal Process., Vol.15, No.2, pp. 98-121, 2021. https://doi.org/10.1049/sil2.12015
  2. [2] X. Zhu, “The influence of the development of computer music information on piano education,” Proc. of the 4th Int. Conf. on Innovative Computing, pp. 1817-1821, 2022. https://doi.org/10.1007/978-981-16-4258-6_237
  3. [3] M. Furner, M. Z. Islam, and C. T. Li, “Knowledge discovery and visualisation framework using machine learning for music information retrieval from broadcast radio data,” Expert Syst. Appl., Vol.182, pp. 1-11, 2021. https://doi.org/10.1016/j.eswa.2021.115236
  4. [4] V. Hernández-López, N. D. Duque-Méndez, and M. Orozco-Alzate, “Assessment of musical representations using a music information retrieval technique,” IEEE Potent., Vol.40, No.6, pp. 11-17, 2021. https://doi.org/10.1109/MPOT.2021.3053089
  5. [5] M. Alfaro-Contreras, J. J. Valero-Mas, J. M. Iesta, and J. Calvo-Zaragoza, “Late multimodal fusion for image and audio music transcription,” Expert Syst. Appl., Vol.216, pp. 1-10, 2023. https://doi.org/10.1016/j.eswa.2022.119491
  6. [6] M. Leś and M. Woźniak, “Transfer of knowledge among instruments in automatic music transcription,” arXiv:2305.00426, 2023. https://doi.org/10.48550/arXiv.2305.00426
  7. [7] K. Vaca, A. Gajjar, and X. Yang, “Real-Time Automatic Music Transcription (AMT) with Zync FPGA,” 2019 IEEE Computer Society Annual Symp. on VLSI (ISVLSI), pp. 378-384, 2019. https://doi.org/10.1109/ISVLSI.2019.00075
  8. [8] S. Lee, “Estimating the Rank of a Nonnegative Matrix Factorization Model for Automatic Music Transcription Based on Stein’s Unbiased Risk Estimator,” Appl. Sci., Vol.10, No.8, pp. 1-19, 2020. https://doi.org/10.3390/app10082911
  9. [9] K. W. Cheuk, Y. J. Luo, E. Benetos, and D. Herremans, “The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy,” 2020 25th Int. Conf. on Pattern Recognition (ICPR), Milan, Italy, pp. 9091-9098, 2021. https://doi.org/10.1109/ICPR48806.2021.9412155
  10. [10] J. Liu, W. Xu, X. Wang, and W. Cheng, “An EB-enhanced CNN Model for Piano Music Transcription,” ICMLC 2021: 2021 13th Int. Conf. on Machine Learning and Computing, Vol.2021, pp. 186-190, 2021. https://doi.org/10.1145/3457682.3457710
  11. [11] F. Simonetta, S. Ntalampiras, and F. Avanzini, “Context-aware Automatic Music Transcription,” arXiv:2203.16294, 2022. https://doi.org/10.48550/arXiv.2203.16294
  12. [12] C. Hernandez-Olivan, I. Z. Pinilla, C. Hernandez-Lopez, and J. R. B. Beltran, “A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription,” Electronics, Vol.10, No.7, pp. 1-16, 2021. https://doi.org/10.3390/electronics10070810
  13. [13] L. Reymore, E. Beauvais-Lacasse, B. K. Smith, and S. McAdams, “Modeling noise-related timbre semantic categories of orchestral instrument sounds with audio features, pitch register, and instrument family,” Front. Psychol., Vol.13, Article No.1082, 2022. https://doi.org/10.3389/fpsyg.2022.796422
  14. [14] A. Ranjan, V. N. J. Behera, and M. Reza, “Using a bi-directional long short-term memory model with attention mechanism trained on midi data for generating unique music,” Artificial Intelligence for Data Science in Theory and Practice, pp. 219-239, 2022. https://doi.org/10.1007/978-3-030-92245-0_10
  15. [15] S. Dodia, D. R. Edla, A. Bablani, and R. Cheruku, “Lie detection using extreme learning machine: A concealed information test based on short-time Fourier transform and binary bat optimization using a novel fitness function,” Comput. Intell., Vol.36, No.2, pp. 637-658, 2020. https://doi.org/10.1111/coin.12256
  16. [16] M. T. Ahmad, G. Pradhan, and J. P. Singh, “Modeling source and system features through multi-channel convolutional neural network for improving intelligibility assessment of dysarthric speech,” Circ. Syst. Signal Pr., Vol.43, No.10, pp. 6332-6350, 2024. https://doi.org/10.1007/s00034-024-02739-6
  17. [17] L. Wei, Y. Long, H. Wei, and Y. Li, “New acoustic features for synthetic and replay spoofing attack detection,” Symmetry, Vol.14, No.2, Article No.274, 2022. https://doi.org/10.3390/sym14020274
  18. [18] Y. Huang, H. Hou, Y. Wang, Y. Zhang, and M. Fan, “A Long Sequence Speech Perceptual Hashing Authentication Algorithm Based on Constant Q Transform and Tensor Decomposition,” IEEE Access, Vol.8, pp. 34140-34152, 2020. https://doi.org/10.1109/ACCESS.2020.2974029
  19. [19] G. Lahoti, C. Ranjan, J. Chen, H. Yan, and C. Zhang, “Convolutional Neural Network-Assisted Adaptive Sampling for Sparse Feature Detection in Image and Video Data,” IEEE Intell. Syst., Vol.38, No.1, pp. 45-57, 2023. https://doi.org/10.1109/MIS.2022.3215779
  20. [20] X. Ai, V. S. Sheng, W. Fang, C. X. Ling, and C. Li, “Ensemble Learning with Attention-Integrated Convolutional Recurrent Neural Network for Imbalanced Speech Emotion Recognition,” IEEE Access, Vol.8, pp. 199909-199919, 2020. https://doi.org/10.1109/ACCESS.2020.3035910
  21. [21] M. Zou, N. Holjevac, J. Đaković, I. Kuzle, R. Langella, V. D. Giorgio, and S. Z. Djokic, “Bayesian CNN-BiLSTM and Vine-GMCM Based Probabilistic Forecasting of Hour-Ahead Wind Farm Power Outputs,” IEEE T. Sustain. Energ., Vol.13, No.2, pp. 1169-1187, 2022. https://doi.org/10.1109/TSTE.2022.3148718
  22. [22] V. Emiya, R. Badeau, and B. David, “Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle,” IEEE T. Audi. P., Vol.18, No.6, pp. 1643-1654, 2010. https://doi.org/10.1109/TASL.2009.2038819
  23. [23] B. Mcfee, C. Raffel, D. Liang, D. Ellis, M. Mcvicar, E. Battenberg, and O. Nieto, “librosa: Audio and Music Signal Analysis in Python,” Python in Science Conf., 2015. https://doi.org/10.25080/Majora-7b98e3ed-003
  24. [24] C. Raffel, B. Mcfee, E. J. Humphrey, J. Salamon, O. Nieto, D. Liang, and D. P. Ellis, “mir_eval: A Transparent Implementation of Common MIR Metrics,” Proc. of 15th Int. Society for Music Information Retrieval Conf. (ISMIR 2014), 2014.
  25. [25] F. Cong, S. Liu, and L. Guo, “A Parallel Fusion Approach to Piano Music Transcription Based on Convolutional Neural Network,” 2018 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 391-395, 2018. https://doi.org/10.1109/ICASSP.2018.8461794
  26. [26] C. Hawthorne, E. Elsen, J. Song, A. Roberts, I. Simon, C. Raffel, J. Engel, S. Oore, and D. Eck, “Onsets and Frames: Dual-Objective Piano Transcription,” arXiv:1710.11153, 2017. https://doi.org/10.48550/arXiv.1710.11153
  27. [27] R. Kelz, S. Böck, and G. Widmer, “Deep Polyphonic ADSR Piano Note Transcription,” 2019 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2019), pp. 246-250, 2019. https://doi.org/10.1109/ICASSP.2019.8683582

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Feb. 07, 2025