On the Optimal Hyperparameter Behavior in Bayesian Clustering
Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology
G5-19, 4259 Nagatsuta, Midori-ku, Yokohama 226-8503, Japan
In a probabilistic approach to cluster analysis, parametric models, such as a mixture of Gaussian distributions, are often used. Since the parameter is unknown, it is necessary to estimate both the parameter and the labels of the clusters. Recently, the statistical properties of Bayesian clustering have been studied. The theoretical accuracy of the label estimation has been analyzed, and it has been found to be better than the maximum-likelihood method, which is based on the expectation-maximization algorithm. However, the effect of a prior distribution on the clustering result remains unknown. The prior distribution has the parameter, which is the hyperparameter. In the present paper, we theoretically and experimentally investigate the behavior of the optimal hyperparameter, and we propose an evaluation method for the clustering result, based on the prior optimization.
-  A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. of the Royal Statistical Society, B, Vol.39, No.1, pp. 1-38, 1977.
-  J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov, “Neighbourhood Components Analysis,” In L. K. Saul, Y. Weiss, and l. Bottou, editors, Advances in Neural Information Processing Systems Vol.17, pp. 513-520, MIT Press, 2005.
-  K. Q. Weinberger and L. K. Saul, “Distance Metric Learning for Large Margin Nearest Neighbor Classification,” J. of Machine Learning Research, Vol.10, pp. 207-244, 2009.
-  P. Hartono, P. Hollensen, and T. Trappenberg, “Learning-regulated context relevant topographical map,” IEEE Trans. on Neural Networks and Learning Systems, 2014.
-  K. Yamazaki, “Asymptotic accuracy of distribution-based estimation for latent variables,” J. of Machine Learning Research, Vol.13, pp. 3541-3562, 2014.
-  K. Yamazaki, “Asymptotic accuracy of Bayes estimation for latent variables with redundancy,” Machine Learning, 2015. doi:10.1007/s10994-015-5482-3.
-  K. Yamazaki, “On Bayesian Clustering with a Structured Gaussian Mixture,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.18, No.6, pp. 1007-1012, 2014.
-  K. Yamazaki and D. Kaji, “Comparing two Bayes methods based on the free energy functions in Bernoulli mixtures,” Neural Networks, Vol.44, pp. 36-43, 2013.
-  A. P. Dawid and S. L. Lauritzen, “Hyper-Markov laws in the statistical analysis of decomposable graphical models,” Annals of Statistics, Vol.21, No.3, pp. 1272-1317, 1993.
-  D. Heckerman, “Learning in graphical models,” Chapter A tutorial on learning with Bayesian networks, pp. 301-354, MIT Press, Cambridge, MA, USA, 1999.
-  A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, “Bayesian Data Analysis,” Chapman and Hall/CRC, 2014.
-  K. P. Murphy, “Conjugate bayesian analysis of the gaussian distribution,” Technical report, University of British Columbia, 2007.
-  C. M. Bishop, “Pattern Recognition and Machine Learning (Information Science and Statistics),” Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.