Research on the Recognition of Piano-Playing Notes by a Music Transcription Algorithm

Ruosi Guo; Yongjian Zhu

doi:10.20965/jaciii.2025.p0152

single-jc.php

« previous

JACIII Vol.29 No.1 pp. 152-157

doi: 10.20965/jaciii.2025.p0152

(2025)

Research Paper:

Views over last 60 days: 202

Research on the Recognition of Piano-Playing Notes by a Music Transcription Algorithm

Ruosi Guo^*,† and Yongjian Zhu^**

^*College of Art, Hebei Agricultural University
No.289 Lingyusi Street, Baoding, Hebei 071000, China

^†Corresponding author

^**College of Music and Dance, Baoding University
71 East 3027 Road, Baoding, Hebei 071000, China

Received:

September 4, 2024

Accepted:

November 5, 2024

Published:

January 20, 2025

Keywords:

music transcription algorithm, piano, spectrogram, convolutional neural network, pitch

Abstract

With the deepening research on music works, music transcription algorithms have been increasingly studied. This study examined the recognition of piano-playing notes using a music-transcription algorithm. First, the characteristics of MelSpec, LogSpec, and the constant Q-transform (CQT) are briefly introduced. Then, a convolutional recurrent neural network (CRNN) transcription algorithm, which includes four convolutional blocks and one bidirectional long short-term memory (BiLSTM) structure, was designed. The recognition performance of this method was analyzed using the MAPS dataset. LogSpec was found to have the best recognition performance for piano-playing notes when used as an input feature. In the CRNN structure, the recognition performance for piano-playing notes was the best when four convolutional blocks were used. Compared with the convolutional neural network (CNN), BiLSTM, and CNN-hidden Markov model algorithms, the F1-values of the CRNN algorithm were 84.9%, 92.24%, and 79.27% for frames, notes, and offsets, respectively, achieving the best recognition results. The results verify that the CRNN transcription algorithm is effective for the recognition of piano-playing notes and can be applied in practice.

Cite this article as:

R. Guo and Y. Zhu, “Research on the Recognition of Piano-Playing Notes by a Music Transcription Algorithm,” J. Adv. Comput. Intell. Intell. Inform., Vol.29 No.1, pp. 152-157, 2025.

Data files:

References

[1] G. Agarwal and H. Om, “An efficient supervised framework for music mood recognition using autoencoder-based optimised support vector regression model,” IET Signal Process., Vol.15, No.2, pp. 98-121, 2021. https://doi.org/10.1049/sil2.12015
[2] X. Zhu, “The influence of the development of computer music information on piano education,” Proc. of the 4th Int. Conf. on Innovative Computing, pp. 1817-1821, 2022. https://doi.org/10.1007/978-981-16-4258-6_237
[3] M. Furner, M. Z. Islam, and C. T. Li, “Knowledge discovery and visualisation framework using machine learning for music information retrieval from broadcast radio data,” Expert Syst. Appl., Vol.182, pp. 1-11, 2021. https://doi.org/10.1016/j.eswa.2021.115236
[4] V. Hernández-López, N. D. Duque-Méndez, and M. Orozco-Alzate, “Assessment of musical representations using a music information retrieval technique,” IEEE Potent., Vol.40, No.6, pp. 11-17, 2021. https://doi.org/10.1109/MPOT.2021.3053089
[5] M. Alfaro-Contreras, J. J. Valero-Mas, J. M. Iesta, and J. Calvo-Zaragoza, “Late multimodal fusion for image and audio music transcription,” Expert Syst. Appl., Vol.216, pp. 1-10, 2023. https://doi.org/10.1016/j.eswa.2022.119491
[6] M. Leś and M. Woźniak, “Transfer of knowledge among instruments in automatic music transcription,” arXiv:2305.00426, 2023. https://doi.org/10.48550/arXiv.2305.00426
[7] K. Vaca, A. Gajjar, and X. Yang, “Real-Time Automatic Music Transcription (AMT) with Zync FPGA,” 2019 IEEE Computer Society Annual Symp. on VLSI (ISVLSI), pp. 378-384, 2019. https://doi.org/10.1109/ISVLSI.2019.00075
[8] S. Lee, “Estimating the Rank of a Nonnegative Matrix Factorization Model for Automatic Music Transcription Based on Stein’s Unbiased Risk Estimator,” Appl. Sci., Vol.10, No.8, pp. 1-19, 2020. https://doi.org/10.3390/app10082911
[9] K. W. Cheuk, Y. J. Luo, E. Benetos, and D. Herremans, “The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy,” 2020 25th Int. Conf. on Pattern Recognition (ICPR), Milan, Italy, pp. 9091-9098, 2021. https://doi.org/10.1109/ICPR48806.2021.9412155
[10] J. Liu, W. Xu, X. Wang, and W. Cheng, “An EB-enhanced CNN Model for Piano Music Transcription,” ICMLC 2021: 2021 13th Int. Conf. on Machine Learning and Computing, Vol.2021, pp. 186-190, 2021. https://doi.org/10.1145/3457682.3457710
[11] F. Simonetta, S. Ntalampiras, and F. Avanzini, “Context-aware Automatic Music Transcription,” arXiv:2203.16294, 2022. https://doi.org/10.48550/arXiv.2203.16294
[12] C. Hernandez-Olivan, I. Z. Pinilla, C. Hernandez-Lopez, and J. R. B. Beltran, “A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription,” Electronics, Vol.10, No.7, pp. 1-16, 2021. https://doi.org/10.3390/electronics10070810
[13] L. Reymore, E. Beauvais-Lacasse, B. K. Smith, and S. McAdams, “Modeling noise-related timbre semantic categories of orchestral instrument sounds with audio features, pitch register, and instrument family,” Front. Psychol., Vol.13, Article No.1082, 2022. https://doi.org/10.3389/fpsyg.2022.796422
[14] A. Ranjan, V. N. J. Behera, and M. Reza, “Using a bi-directional long short-term memory model with attention mechanism trained on midi data for generating unique music,” Artificial Intelligence for Data Science in Theory and Practice, pp. 219-239, 2022. https://doi.org/10.1007/978-3-030-92245-0_10
[15] S. Dodia, D. R. Edla, A. Bablani, and R. Cheruku, “Lie detection using extreme learning machine: A concealed information test based on short-time Fourier transform and binary bat optimization using a novel fitness function,” Comput. Intell., Vol.36, No.2, pp. 637-658, 2020. https://doi.org/10.1111/coin.12256
[16] M. T. Ahmad, G. Pradhan, and J. P. Singh, “Modeling source and system features through multi-channel convolutional neural network for improving intelligibility assessment of dysarthric speech,” Circ. Syst. Signal Pr., Vol.43, No.10, pp. 6332-6350, 2024. https://doi.org/10.1007/s00034-024-02739-6
[17] L. Wei, Y. Long, H. Wei, and Y. Li, “New acoustic features for synthetic and replay spoofing attack detection,” Symmetry, Vol.14, No.2, Article No.274, 2022. https://doi.org/10.3390/sym14020274
[18] Y. Huang, H. Hou, Y. Wang, Y. Zhang, and M. Fan, “A Long Sequence Speech Perceptual Hashing Authentication Algorithm Based on Constant Q Transform and Tensor Decomposition,” IEEE Access, Vol.8, pp. 34140-34152, 2020. https://doi.org/10.1109/ACCESS.2020.2974029
[19] G. Lahoti, C. Ranjan, J. Chen, H. Yan, and C. Zhang, “Convolutional Neural Network-Assisted Adaptive Sampling for Sparse Feature Detection in Image and Video Data,” IEEE Intell. Syst., Vol.38, No.1, pp. 45-57, 2023. https://doi.org/10.1109/MIS.2022.3215779
[20] X. Ai, V. S. Sheng, W. Fang, C. X. Ling, and C. Li, “Ensemble Learning with Attention-Integrated Convolutional Recurrent Neural Network for Imbalanced Speech Emotion Recognition,” IEEE Access, Vol.8, pp. 199909-199919, 2020. https://doi.org/10.1109/ACCESS.2020.3035910
[21] M. Zou, N. Holjevac, J. Đaković, I. Kuzle, R. Langella, V. D. Giorgio, and S. Z. Djokic, “Bayesian CNN-BiLSTM and Vine-GMCM Based Probabilistic Forecasting of Hour-Ahead Wind Farm Power Outputs,” IEEE T. Sustain. Energ., Vol.13, No.2, pp. 1169-1187, 2022. https://doi.org/10.1109/TSTE.2022.3148718
[22] V. Emiya, R. Badeau, and B. David, “Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle,” IEEE T. Audi. P., Vol.18, No.6, pp. 1643-1654, 2010. https://doi.org/10.1109/TASL.2009.2038819
[23] B. Mcfee, C. Raffel, D. Liang, D. Ellis, M. Mcvicar, E. Battenberg, and O. Nieto, “librosa: Audio and Music Signal Analysis in Python,” Python in Science Conf., 2015. https://doi.org/10.25080/Majora-7b98e3ed-003
[24] C. Raffel, B. Mcfee, E. J. Humphrey, J. Salamon, O. Nieto, D. Liang, and D. P. Ellis, “mir_eval: A Transparent Implementation of Common MIR Metrics,” Proc. of 15th Int. Society for Music Information Retrieval Conf. (ISMIR 2014), 2014.
[25] F. Cong, S. Liu, and L. Guo, “A Parallel Fusion Approach to Piano Music Transcription Based on Convolutional Neural Network,” 2018 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 391-395, 2018. https://doi.org/10.1109/ICASSP.2018.8461794
[26] C. Hawthorne, E. Elsen, J. Song, A. Roberts, I. Simon, C. Raffel, J. Engel, S. Oore, and D. Eck, “Onsets and Frames: Dual-Objective Piano Transcription,” arXiv:1710.11153, 2017. https://doi.org/10.48550/arXiv.1710.11153
[27] R. Kelz, S. Böck, and G. Widmer, “Deep Polyphonic ADSR Piano Note Transcription,” 2019 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2019), pp. 246-250, 2019. https://doi.org/10.1109/ICASSP.2019.8683582

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] G. Agarwal and H. Om, “An efficient supervised framework for music mood recognition using autoencoder-based optimised support vector regression model,” IET Signal Process., Vol.15, No.2, pp. 98-121, 2021. https://doi.org/10.1049/sil2.12015

[2] [2] X. Zhu, “The influence of the development of computer music information on piano education,” Proc. of the 4th Int. Conf. on Innovative Computing, pp. 1817-1821, 2022. https://doi.org/10.1007/978-981-16-4258-6_237

[3] [3] M. Furner, M. Z. Islam, and C. T. Li, “Knowledge discovery and visualisation framework using machine learning for music information retrieval from broadcast radio data,” Expert Syst. Appl., Vol.182, pp. 1-11, 2021. https://doi.org/10.1016/j.eswa.2021.115236

[4] [4] V. Hernández-López, N. D. Duque-Méndez, and M. Orozco-Alzate, “Assessment of musical representations using a music information retrieval technique,” IEEE Potent., Vol.40, No.6, pp. 11-17, 2021. https://doi.org/10.1109/MPOT.2021.3053089

[5] [5] M. Alfaro-Contreras, J. J. Valero-Mas, J. M. Iesta, and J. Calvo-Zaragoza, “Late multimodal fusion for image and audio music transcription,” Expert Syst. Appl., Vol.216, pp. 1-10, 2023. https://doi.org/10.1016/j.eswa.2022.119491

[6] [6] M. Leś and M. Woźniak, “Transfer of knowledge among instruments in automatic music transcription,” arXiv:2305.00426, 2023. https://doi.org/10.48550/arXiv.2305.00426

[7] [7] K. Vaca, A. Gajjar, and X. Yang, “Real-Time Automatic Music Transcription (AMT) with Zync FPGA,” 2019 IEEE Computer Society Annual Symp. on VLSI (ISVLSI), pp. 378-384, 2019. https://doi.org/10.1109/ISVLSI.2019.00075

[8] [8] S. Lee, “Estimating the Rank of a Nonnegative Matrix Factorization Model for Automatic Music Transcription Based on Stein’s Unbiased Risk Estimator,” Appl. Sci., Vol.10, No.8, pp. 1-19, 2020. https://doi.org/10.3390/app10082911

[9] [9] K. W. Cheuk, Y. J. Luo, E. Benetos, and D. Herremans, “The Effect of Spectrogram Reconstruction on Automatic Music Transcription: An Alternative Approach to Improve Transcription Accuracy,” 2020 25th Int. Conf. on Pattern Recognition (ICPR), Milan, Italy, pp. 9091-9098, 2021. https://doi.org/10.1109/ICPR48806.2021.9412155

[10] [10] J. Liu, W. Xu, X. Wang, and W. Cheng, “An EB-enhanced CNN Model for Piano Music Transcription,” ICMLC 2021: 2021 13th Int. Conf. on Machine Learning and Computing, Vol.2021, pp. 186-190, 2021. https://doi.org/10.1145/3457682.3457710

[11] [11] F. Simonetta, S. Ntalampiras, and F. Avanzini, “Context-aware Automatic Music Transcription,” arXiv:2203.16294, 2022. https://doi.org/10.48550/arXiv.2203.16294

[12] [12] C. Hernandez-Olivan, I. Z. Pinilla, C. Hernandez-Lopez, and J. R. B. Beltran, “A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription,” Electronics, Vol.10, No.7, pp. 1-16, 2021. https://doi.org/10.3390/electronics10070810

[13] [13] L. Reymore, E. Beauvais-Lacasse, B. K. Smith, and S. McAdams, “Modeling noise-related timbre semantic categories of orchestral instrument sounds with audio features, pitch register, and instrument family,” Front. Psychol., Vol.13, Article No.1082, 2022. https://doi.org/10.3389/fpsyg.2022.796422

[14] [14] A. Ranjan, V. N. J. Behera, and M. Reza, “Using a bi-directional long short-term memory model with attention mechanism trained on midi data for generating unique music,” Artificial Intelligence for Data Science in Theory and Practice, pp. 219-239, 2022. https://doi.org/10.1007/978-3-030-92245-0_10

[15] [15] S. Dodia, D. R. Edla, A. Bablani, and R. Cheruku, “Lie detection using extreme learning machine: A concealed information test based on short-time Fourier transform and binary bat optimization using a novel fitness function,” Comput. Intell., Vol.36, No.2, pp. 637-658, 2020. https://doi.org/10.1111/coin.12256

[16] [16] M. T. Ahmad, G. Pradhan, and J. P. Singh, “Modeling source and system features through multi-channel convolutional neural network for improving intelligibility assessment of dysarthric speech,” Circ. Syst. Signal Pr., Vol.43, No.10, pp. 6332-6350, 2024. https://doi.org/10.1007/s00034-024-02739-6

[17] [17] L. Wei, Y. Long, H. Wei, and Y. Li, “New acoustic features for synthetic and replay spoofing attack detection,” Symmetry, Vol.14, No.2, Article No.274, 2022. https://doi.org/10.3390/sym14020274

[18] [18] Y. Huang, H. Hou, Y. Wang, Y. Zhang, and M. Fan, “A Long Sequence Speech Perceptual Hashing Authentication Algorithm Based on Constant Q Transform and Tensor Decomposition,” IEEE Access, Vol.8, pp. 34140-34152, 2020. https://doi.org/10.1109/ACCESS.2020.2974029

[19] [19] G. Lahoti, C. Ranjan, J. Chen, H. Yan, and C. Zhang, “Convolutional Neural Network-Assisted Adaptive Sampling for Sparse Feature Detection in Image and Video Data,” IEEE Intell. Syst., Vol.38, No.1, pp. 45-57, 2023. https://doi.org/10.1109/MIS.2022.3215779

[20] [20] X. Ai, V. S. Sheng, W. Fang, C. X. Ling, and C. Li, “Ensemble Learning with Attention-Integrated Convolutional Recurrent Neural Network for Imbalanced Speech Emotion Recognition,” IEEE Access, Vol.8, pp. 199909-199919, 2020. https://doi.org/10.1109/ACCESS.2020.3035910

[21] [21] M. Zou, N. Holjevac, J. Đaković, I. Kuzle, R. Langella, V. D. Giorgio, and S. Z. Djokic, “Bayesian CNN-BiLSTM and Vine-GMCM Based Probabilistic Forecasting of Hour-Ahead Wind Farm Power Outputs,” IEEE T. Sustain. Energ., Vol.13, No.2, pp. 1169-1187, 2022. https://doi.org/10.1109/TSTE.2022.3148718

[22] [22] V. Emiya, R. Badeau, and B. David, “Multipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle,” IEEE T. Audi. P., Vol.18, No.6, pp. 1643-1654, 2010. https://doi.org/10.1109/TASL.2009.2038819

[23] [23] B. Mcfee, C. Raffel, D. Liang, D. Ellis, M. Mcvicar, E. Battenberg, and O. Nieto, “librosa: Audio and Music Signal Analysis in Python,” Python in Science Conf., 2015. https://doi.org/10.25080/Majora-7b98e3ed-003

[24] [24] C. Raffel, B. Mcfee, E. J. Humphrey, J. Salamon, O. Nieto, D. Liang, and D. P. Ellis, “mir_eval: A Transparent Implementation of Common MIR Metrics,” Proc. of 15th Int. Society for Music Information Retrieval Conf. (ISMIR 2014), 2014.

[25] [25] F. Cong, S. Liu, and L. Guo, “A Parallel Fusion Approach to Piano Music Transcription Based on Convolutional Neural Network,” 2018 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 391-395, 2018. https://doi.org/10.1109/ICASSP.2018.8461794

[26] [26] C. Hawthorne, E. Elsen, J. Song, A. Roberts, I. Simon, C. Raffel, J. Engel, S. Oore, and D. Eck, “Onsets and Frames: Dual-Objective Piano Transcription,” arXiv:1710.11153, 2017. https://doi.org/10.48550/arXiv.1710.11153

[27] [27] R. Kelz, S. Böck, and G. Widmer, “Deep Polyphonic ADSR Piano Note Transcription,” 2019 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2019), pp. 246-250, 2019. https://doi.org/10.1109/ICASSP.2019.8683582

Research on the Recognition of Piano-Playing Notes by a Music Transcription Algorithm

Ruosi Guo*,† and Yongjian Zhu**

Ruosi Guo^*,† and Yongjian Zhu^**