Paper:
Testing Hypotheses on Simulated Data: Why Traditional Hypotheses-Testing Statistics Are Not Always Adequate for Simulated Data, and How to Modify Them
Richard Aló*, Vladik Kreinovich**, and Scott A. Starks**
*Center for Computational Sciences and Advanced Distributed Simulation, University of Houston-Downtown, One Main Street, Houston, TX 77002, USA
**Pan-American Center for Earth and Environmental Studies, University of Texas at El Paso, El Paso, TX 79968, USA
To check whether a new algorithm is better, researchers use traditional statistical techniques for hypotheses testing. In particular, when the results are inconclusive, they run more and more simulations (n2>n1, n3>n2, …, nm>nm-1) until the results become conclusive. In this paper, we point out that these results may be misleading. Indeed, in the traditional approach, we select a statistic and then choose a threshold for which the probability of this statistic “accidentally” exceeding this threshold is smaller than, say, 1%. It is very easy to run additional simulations with ever-larger n. The probability of error is still 1% for each ni, but the probability that we reach an erroneous conclusion for at least one of the values ni increases as m increases. In this paper, we design new statistical techniques oriented towards experiments on simulated data, techniques that would guarantee that the error stays under, say, 1% no matter how many experiments we run.
- [1] P. R. Cohen, “Empirical Methods for Artificial Intelligence,” MIT Press, Cambridge, Massachusetts, 1995.
- [2] P. R. Cohen, I. Gent, and T. Walsh, “Empirical Methods for Artificial Intelligence and Computer Science,” Tutorial at the 17th National Conference on Artificial Intelligence AAAI’2000, Austin, TX, July 30-August 3, 2000.
- [3] I. Gent, and T. Walsh, “An Empirical Analysis of Search in GSAT,” Journal of Artificial Intelligence Research, Vol.1, pp. 47-59, 1993.
- [4] C. McGeoch, P. Sanders, R. Fleischer, P. R. Cohen, and D. Precup, “Using Finite Experiments to Study Asymptotic Performance,” In: R. Fleischer, B. Moret, and M. Schmidt (eds.), Experimental Algorithmics, Springer-Verlag, Berlin, Heidelberg, New York, pp. 93-124, 2002.
- [5] D. J. Sheskin, “Handbook of Parametric and Nonparametric Statistical Procedures,” Chapman & Hall/CRC, Boca Raton, Florida, 2004.
- [6] H. M. Wadsworth Jr., “Handbook of statistical methods for engineers and scientists,” McGraw-Hill, N.Y., 1990.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.