JACIII Vol.19 No.6 pp. 818-824
doi: 10.20965/jaciii.2015.p0818


On the Optimal Hyperparameter Behavior in Bayesian Clustering

Keisuke Yamazaki

Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology
G5-19, 4259 Nagatsuta, Midori-ku, Yokohama 226-8503, Japan

May 20, 2015
August 18, 2015
Online released:
November 20, 2015
November 20, 2015
cluster analysis, Bayes statistics, unsupervised learning, asymptotic analysis

In a probabilistic approach to cluster analysis, parametric models, such as a mixture of Gaussian distributions, are often used. Since the parameter is unknown, it is necessary to estimate both the parameter and the labels of the clusters. Recently, the statistical properties of Bayesian clustering have been studied. The theoretical accuracy of the label estimation has been analyzed, and it has been found to be better than the maximum-likelihood method, which is based on the expectation-maximization algorithm. However, the effect of a prior distribution on the clustering result remains unknown. The prior distribution has the parameter, which is the hyperparameter. In the present paper, we theoretically and experimentally investigate the behavior of the optimal hyperparameter, and we propose an evaluation method for the clustering result, based on the prior optimization.

  1. [1]  A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. of the Royal Statistical Society, B, Vol.39, No.1, pp. 1-38, 1977.
  2. [2]  J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov, “Neighbourhood Components Analysis,” In L. K. Saul, Y. Weiss, and l. Bottou, editors, Advances in Neural Information Processing Systems Vol.17, pp. 513-520, MIT Press, 2005.
  3. [3]  K. Q. Weinberger and L. K. Saul, “Distance Metric Learning for Large Margin Nearest Neighbor Classification,” J. of Machine Learning Research, Vol.10, pp. 207-244, 2009.
  4. [4]  P. Hartono, P. Hollensen, and T. Trappenberg, “Learning-regulated context relevant topographical map,” IEEE Trans. on Neural Networks and Learning Systems, 2014.
  5. [5]  K. Yamazaki, “Asymptotic accuracy of distribution-based estimation for latent variables,” J. of Machine Learning Research, Vol.13, pp. 3541-3562, 2014.
  6. [6]  K. Yamazaki, “Asymptotic accuracy of Bayes estimation for latent variables with redundancy,” Machine Learning, 2015. doi:10.1007/s10994-015-5482-3.
  7. [7]  K. Yamazaki, “On Bayesian Clustering with a Structured Gaussian Mixture,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.18, No.6, pp. 1007-1012, 2014.
  8. [8]  K. Yamazaki and D. Kaji, “Comparing two Bayes methods based on the free energy functions in Bernoulli mixtures,” Neural Networks, Vol.44, pp. 36-43, 2013.
  9. [9]  A. P. Dawid and S. L. Lauritzen, “Hyper-Markov laws in the statistical analysis of decomposable graphical models,” Annals of Statistics, Vol.21, No.3, pp. 1272-1317, 1993.
  10. [10]  D. Heckerman, “Learning in graphical models,” Chapter A tutorial on learning with Bayesian networks, pp. 301-354, MIT Press, Cambridge, MA, USA, 1999.
  11. [11]  A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin, “Bayesian Data Analysis,” Chapman and Hall/CRC, 2014.
  12. [12]  K. P. Murphy, “Conjugate bayesian analysis of the gaussian distribution,” Technical report, University of British Columbia, 2007.
  13. [13]  C. M. Bishop, “Pattern Recognition and Machine Learning (Information Science and Statistics),” Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, IE9,10,11, Opera.

Last updated on Mar. 28, 2017