Generation of Rating Matrix Based on Rational Behaviors of Users

Kenshin Moriyoshi; Hiroki Shibata; Yasufumi Takama

doi:10.20965/jaciii.2024.p0129

single-jc.php

« previous

JACIII Vol.28 No.1 pp. 129-140

doi: 10.20965/jaciii.2024.p0129

(2024)

Research Paper:

Views over last 60 days: 514

Generation of Rating Matrix Based on Rational Behaviors of Users

Kenshin Moriyoshi, Hiroki Shibata , and Yasufumi Takama

Tokyo Metropolitan University
6-6 Asahigaoka, Hino, Tokyo 191-0065, Japan

Received:

March 21, 2023

Accepted:

September 6, 2023

Published:

January 20, 2024

Keywords:

recommendation, synthetic data, long-tail

Abstract

This paper proposes a method to generate a synthetic rating matrix based on user’s rational behavior, with the aim of generating a large-scale rating matrix at low cost. Collaborative filtering is one of the major techniques for recommender systems, which is widely used because it can recommend items using only a history of ratings given to the items by users. However, collaborative filtering has some problems such as the cold-start problem and the sparsity problem, both of which are caused by the shortage of ratings in a database (rating matrix). This problem is particularly serious for services that have just started operation or do not have a large number of users. The proposed method generates a rating matrix without missing values using users’ rating probabilities, which are obtained from the distribution of their actual ratings. The final synthetic rating matrix is generated after adjusting its sparsity by introducing missing values. The validity of the proposed method is evaluated by comparing the synthetic rating matrix in terms of the similarity of the distribution of several statistics with that of the real data. The synthetic rating matrix is also evaluated by applying it to recommendation to actual users. The experimental results show that the proposed method can generate the synthetic rating matrix that has similar statistics to the real data, and recommendation models trained with the synthetic data achieve comparable recall to that trained with the real data when using the real data as test data. Based on the results of these experiments, this paper also tries to generate the synthetic rating matrix that contains richer information than the real data by increasing the number of users or reducing the sparsity of the rating matrix. The results of these experiments show the possibility that increasing the information contained in a rating matrix could improve recall.

Cite this article as:

K. Moriyoshi, H. Shibata, and Y. Takama, “Generation of Rating Matrix Based on Rational Behaviors of Users,” J. Adv. Comput. Intell. Intell. Inform., Vol.28 No.1, pp. 129-140, 2024.

Data files:

References

[1] P. Resnick et al., “GroupLens: An Open Architecture for Collaborative Filtering of Netnews,” Proc. of the 1994 ACM Conf. on Computer Supported Cooperative Work (CSCW’94), pp. 175-186, 1994. https://doi.org/10.1145/192844.192905
[2] A. I. Schein et al., “Methods and Metrics for Cold-Start Recommendations,” Proc. of the 25th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR’02), pp. 253-260, 2002. https://doi.org/10.1145/564376.564421
[3] S. Lee, J. Yang, and S.-Y. Park, “Discovery of Hidden Similarity on Collaborative Filtering to Overcome Sparsity Problem,” Proc. of the 7th Int. Conf. on Discovery Science (DS 2004), pp. 396-402, 2004. https://doi.org/10.1007/978-3-540-30214-8_36
[4] A. Ratner et al., “Data programming: Creating large training sets, quickly,” Proc. of the 30th Int. Conf. on Neural Information Processing Systems (NIPS’16), pp. 3574-3582, 2016.
[5] M. Frid-Adar et al., “GAN-Based Synthetic Medical Image Augmentation for Increased CNN Performance in Liver Lesion Classification,” Neurocomputing, Vol.321, pp. 321-331, 2018. https://doi.org/10.1016/j.neucom.2018.09.013
[6] K. Moriyoshi, H. Shibata, and Y. Takama, “Proposal of Generation of Rating Matrix Based on Rational Behaviors of Users,” The 10th Int. Symp. on Computational Intelligence and Industrial Applications (ISCIIA 2022), Session No.C1-4, 2022.
[7] C. Anderson, “The Long Tail: Why the Future of Business is Selling Less of More,” Hyperion, 2006.
[8] Ò. Celma and P. Cano, “From Hits to Niches?: Or How Popular Artists Can Biad Music Recommendation and Discovery,” Proc. of the 2nd KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition (NETFLIX’08), Article No.5, 2008. https://doi.org/10.1145/1722149.1722154
[9] M. Zhang and N. Hurley, “Niche Product Retrieval in Top-N Recommendation,” Proc. of the 2010 IEEE/WICA/ACM Int. Conf. on Web Intelligence and Intelligent Agent Technology (WI-IAT’10), Vol.1, pp. 74-81, 2010. https://doi.org/10.1109/WI-IAT.2010.79
[10] S. Yoshida and T. Takagi, “Proposal of Recommender System Removed Popularity Bias by Using Information Gain,” Trans. of the Japanese Society for Artificial Intelligence, Vol.30, No.5, pp. 647-657, 2015 (in Japanese). https://doi.org/10.1527/tjsai.30_647
[11] D. B. Rubin, “Multiple Imputation for Nonresponse in Surveys,” John Wiley & Sons, Inc., 1987.
[12] Y. Lee et al., “How to Impute Missing Ratings?: Claims, Solution, and its Application to Collaborative Filtering,” Proc. of the 2018 World Wide Web Conf. (WWW’18), pp. 783-792, 2018. https://doi.org/10.1145/3178876.3186159
[13] X. Zhao et al., “UserSim: User Simulation via Supervised Generative Adversarial Network,” Proc. of the Web Conf. 2021 (WWW’21), pp. 3582-3589, 2021.
[14] X. Luo et al., “MINDSim: User Simulator for News Recommenders,” Proc. of the ACM Web Conf. 2022 (WWW’22), pp. 2067-2077, 2022.
[15] J. L. Herlocker et al., “Evaluating Collaborative Filtering Recommender Systems,” ACM Trans. on Information and System, Vol.22, No.1, pp. 5-53, 2004. https://doi.org/10.1145/963770.963772
[16] Y. Koren, R. Bell, and C. Volinsky, “Matrix Factorization Techniques for Recommender Systems,” Computer, Vol.42, No.8, pp. 30-37, 2009. https://doi.org/10.1109/MC.2009.263
[17] X. Luo et al., “An Efficient Non-Negative Matrix-Factorization-Based Approach to Collaborative Filtering for Recommender Systems,” IEEE Trans. on Industrial Informatics, Vol.10, No.2, pp. 1273-1284, 2014. https://doi.org/10.1109/TII.2014.2308433

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] P. Resnick et al., “GroupLens: An Open Architecture for Collaborative Filtering of Netnews,” Proc. of the 1994 ACM Conf. on Computer Supported Cooperative Work (CSCW’94), pp. 175-186, 1994. https://doi.org/10.1145/192844.192905

[2] [2] A. I. Schein et al., “Methods and Metrics for Cold-Start Recommendations,” Proc. of the 25th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR’02), pp. 253-260, 2002. https://doi.org/10.1145/564376.564421

[3] [3] S. Lee, J. Yang, and S.-Y. Park, “Discovery of Hidden Similarity on Collaborative Filtering to Overcome Sparsity Problem,” Proc. of the 7th Int. Conf. on Discovery Science (DS 2004), pp. 396-402, 2004. https://doi.org/10.1007/978-3-540-30214-8_36

[4] [4] A. Ratner et al., “Data programming: Creating large training sets, quickly,” Proc. of the 30th Int. Conf. on Neural Information Processing Systems (NIPS’16), pp. 3574-3582, 2016.

[5] [5] M. Frid-Adar et al., “GAN-Based Synthetic Medical Image Augmentation for Increased CNN Performance in Liver Lesion Classification,” Neurocomputing, Vol.321, pp. 321-331, 2018. https://doi.org/10.1016/j.neucom.2018.09.013

[6] [6] K. Moriyoshi, H. Shibata, and Y. Takama, “Proposal of Generation of Rating Matrix Based on Rational Behaviors of Users,” The 10th Int. Symp. on Computational Intelligence and Industrial Applications (ISCIIA 2022), Session No.C1-4, 2022.

[7] [7] C. Anderson, “The Long Tail: Why the Future of Business is Selling Less of More,” Hyperion, 2006.

[8] [8] Ò. Celma and P. Cano, “From Hits to Niches?: Or How Popular Artists Can Biad Music Recommendation and Discovery,” Proc. of the 2nd KDD Workshop on Large-Scale Recommender Systems and the Netflix Prize Competition (NETFLIX’08), Article No.5, 2008. https://doi.org/10.1145/1722149.1722154

[9] [9] M. Zhang and N. Hurley, “Niche Product Retrieval in Top-N Recommendation,” Proc. of the 2010 IEEE/WICA/ACM Int. Conf. on Web Intelligence and Intelligent Agent Technology (WI-IAT’10), Vol.1, pp. 74-81, 2010. https://doi.org/10.1109/WI-IAT.2010.79

[10] [10] S. Yoshida and T. Takagi, “Proposal of Recommender System Removed Popularity Bias by Using Information Gain,” Trans. of the Japanese Society for Artificial Intelligence, Vol.30, No.5, pp. 647-657, 2015 (in Japanese). https://doi.org/10.1527/tjsai.30_647

[11] [11] D. B. Rubin, “Multiple Imputation for Nonresponse in Surveys,” John Wiley & Sons, Inc., 1987.

[12] [12] Y. Lee et al., “How to Impute Missing Ratings?: Claims, Solution, and its Application to Collaborative Filtering,” Proc. of the 2018 World Wide Web Conf. (WWW’18), pp. 783-792, 2018. https://doi.org/10.1145/3178876.3186159

[13] [13] X. Zhao et al., “UserSim: User Simulation via Supervised Generative Adversarial Network,” Proc. of the Web Conf. 2021 (WWW’21), pp. 3582-3589, 2021.

[14] [14] X. Luo et al., “MINDSim: User Simulator for News Recommenders,” Proc. of the ACM Web Conf. 2022 (WWW’22), pp. 2067-2077, 2022.

[15] [15] J. L. Herlocker et al., “Evaluating Collaborative Filtering Recommender Systems,” ACM Trans. on Information and System, Vol.22, No.1, pp. 5-53, 2004. https://doi.org/10.1145/963770.963772

[16] [16] Y. Koren, R. Bell, and C. Volinsky, “Matrix Factorization Techniques for Recommender Systems,” Computer, Vol.42, No.8, pp. 30-37, 2009. https://doi.org/10.1109/MC.2009.263

[17] [17] X. Luo et al., “An Efficient Non-Negative Matrix-Factorization-Based Approach to Collaborative Filtering for Recommender Systems,” IEEE Trans. on Industrial Informatics, Vol.10, No.2, pp. 1273-1284, 2014. https://doi.org/10.1109/TII.2014.2308433