Combining the Global and Partial Information for Distance-Based Time Series Classification and Clustering

Hui Zhang; Tu Bao Ho; Mao-Song Lin; Wei Huang

doi:10.20965/jaciii.2006.p0069

single-jc.php

« previous

JACIII Vol.10 No.1 pp. 69-76

doi: 10.20965/jaciii.2006.p0069

(2006)

Paper:

Views over last 60 days: 645

Combining the Global and Partial Information for Distance-Based Time Series Classification and Clustering

Hui Zhang^, Tu Bao Ho^, Mao-Song Lin^**,
and Wei Huang^*

^*School of Knowledge Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa 923-1292, Japan

^**School of Computer Science, Southwest University of Science and Technology, Mianyang, Sichuan 621002, China

Received:

March 23, 2005

Accepted:

August 11, 2005

Published:

January 20, 2006

Keywords:

time series, Haar wavelets, classification, clustering, feature extraction

Abstract

Many time series representation schemes for classification and clustering have been proposed. Most of the proposed representation focuses on the prominent series by considering the global information of the time series. The partial information of time series that indicates the local change of time series is often ignored. Recently, researches shown that the partial information is also important for time series mining. However, the combination of these two types of information has not been well studied in the literature. Moreover, most of the proposed time series representation requires predefined parameters. The classification and clustering results are considerably influenced by the parameter settings, and, users often have difficulty in determining the parameters. We attack above two problems by exploiting the multi-scale property of wavelet decomposition. The main contributions of this work are: (1) extracting features combining the global information and partial information of time series (2) automatically choosing appropriate features, namely, features in an appropriate wavelet decomposition scale according to the concentration of wavelet coefficients within this scale. Experiments performed on several benchmark time series datasets justify the usefulness of the proposed approach.

Cite this article as:

H. Zhang, T. Ho, M. Lin, and W. Huang, “Combining the Global and Partial Information for Distance-Based Time Series Classification and Clustering,” J. Adv. Comput. Intell. Intell. Inform., Vol.10 No.1, pp. 69-76, 2006.

Data files:

References

[1] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient Similarity Search in Sequence Databases,” In Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms, pp. 69-84, October, 1993.
[2] C. S. Burrus, R. A. Gopinath, and H. Guo, “Introduction to Wavelets and Wavelet Transforms, A Primer,” Prentice Hall, Englewood Cliffs, NJ, 1997.
[3] K. Chan, A. W. Fu, and T. Y. Clement, “Harr Wavelets for Efficient Similarity Search of Time-Series: with and without Time Warping,” IEEE Trans. on Knowledge and Data Engineering, 15(3), pp. 686-705, 2003.
[4] R. R. Coifman, and M. V. Wickerhauser, “Entropy-Based Algorithms for Best Basis Selection,” IEEE Trans. on Information Theory, 38(2), pp. 713-718, 1992.
[5] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans. on Information Theory, 41(3), pp. 613-627, 1995.
[6] D. L. Donoho, and I. M. Johnson, “Ideal spatial adaptation via wavelet shrinkage,” Biometrika, 81, pp. 425-455, 1994.
[7] R. Duda, P. Hart, and D. Stork, “Pattern Classification,” John Wiley & Sons, New York, second edition, 2001.
[8] M. Gavrilov, D. Anguelov, and P. Indyk, “Mining the Stock Market: Which Measure is Best?,” In Proceedings of The 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 487-496, August, 2000.
[9] P. Geurts, “Pattern Extraction for Time Series Classification,” In Proceedings of the Principles of Data Mining and Knowledge Discovery, 5th European Conference, pp. 115-127, September, 2001.
[10] T. B. Ho, T. D. Nguyen, S. Kawasaki, S. Q. Le, D. D. Nguyen, H. Yokoi, and K. Takabayashi, “Mining Hepatitis Data with Temporal Abstraction,” In Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining, pp. 369-377, August, 2003.
[11] D. Jiang, C. Tang, and A. Zhang, “Cluster Analysis for Gene Expression Data: A Survey,” IEEE Trans. on Knowledge and Data Engineering, 16(11), pp. 1370-1386, 2004.
[12] X. Jin, Y. Lu, and C. Shi, “Similarity Measure Based on Partial Information of Time Series,” In Proceedings of The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 544-549, August, 2002.
[13] M. W. Kadous, “Learning Comprehensible Descriptions of Multivariate Time Series,” In Proceedings of the 6th International Conference on Machine Learning, pp. 454-463, September, 1999.
[14] T. Kalayci, and O. Ozdamar, “Wavelet Preprocessing for Automated Neural Network Detection of EEG Spikes,” IEEE Eng. in Medicine and Biology, 14, pp. 160-166, 1995.
[15] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, “Dimensionality Reduction of Fast Similarity Search in Large Time Series Databases,” Journal of Knowledge and Information System, 3, pp. 263-286, 2000.
[16] E. Keogh, and T. Folias. “The UCR Time Series Data Mining Archive,”
http://www.cs.ucr.edu/˜eamonn/TSDMA/index.html ,
2002.
[17] E. Keogh, and S. Kasetty, “On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration,” Data Mining and Knowledge Discovery, 7(4), pp. 349-371, 2003.
[18] E. Keogh, S. Lonardi, and C. A. Ratanamahatana, “Towards Parameter-Free Data Mining,” In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206-215, August, 2004.
[19] E. Keogh, and M. Pazzani, “An Enhanced Representation of Time Series which Allows Fast and Accurate Classification, Clustering and Relevance Feedback,” In Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining, pp. 239-241, August, 1998.
[20] F. Korn, H. Jagadish, and C. Faloutsos, “Efficiently Supporting ad hoc Queries in Large Datasets of Time Sequences,” In Proceedings of The ACM SIGMOD International Conference on Management of Data, pp. 289-300, May, 1997.
[21] S. Lawrence, A. Back, A. Tsoi, and C. L. Giles, “The Gamma MLP – Using Multiple Temporal Resolutions for Improved Classification,” In Proceedings of the 1997 IEEE Workshop on Neural Networks for Signal Processing VII, pp. 256-265, Piscataway, NJ, 1997, IEEE Press.
[22] J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A Symbolic Representation of Time Series, with Implications for Streaming Algorithms,” In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2-11, June, 2003.
[23] T. Mitchell, “Machine Learning,” McGraw-Hill, New York, 1997.
[24] S. Pitter, and S. V. Kamarthi, “Feature Extraction From Wavelet Coefficients for Pattern Recognition Tasks,” IEEE Trans. on Pattern Recognition and Machine Intelligence, 21(1), pp. 83-85, January, 1999.
[25] I. Popivanov, and R. J. Miller, “Similarity Search over Time-Series Data Using Wavelets,” In Proceedings of The 18th International Conference on Data Engineering, pp. 212-221, February, 2002.
[26] D. Rafiei, and A. Mendelzon, “Efficient Retrieval of Similar Time Sequences Using DFT,” In Proceedings of the 5th International Conference on Foundations of Data Organizations, pp. 249-257, 1998.
[27] C. A. Ratanamahatana, and E. Keogh, “Three Myths about Dynamic Time Warping,” In SIAM International Conference on Data Mining, Newport Beach, CA, April, 2005.
[28] J. F. Roddich, and M. Spiliopoulou, “A Survey of Temporal Knowledge Discovery Paradigms and Methods,” IEEE Trans. on Knowledge and Data Engineering, 14(4), pp. 750-767, 2002.
[29] C. Shahabi, X. Tian, and W. Zhao, “TSA-Tree: A Wavelet-Based Approach to Improve the Efficiency of Multi-Level Surprise and Trend Queries on Time-Series Data,” In Proceedings of the 12th International Conference on Scientific and Statistical Database Management, pp. 55-68, July, 2000.
[30] C. E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical Journal, 27, pp. 379-423, 1948.
[31] Z. R. Struzik, and A. Siebes, “The Harr Wavelet Transform in the Time Series Similarity Paradigm,” In Proceedings of the 3rd European Conference on Principles of Data Mining and Knowledge Discovery, pp. 12-22, September, 1999.
[32] I. N. Tansel, C. Mekdeci, and C. McLaughlin, “Detection of Tool Failure in End Milling with Wavelet Transformations and Neural Networks (WT-NN),” International Journal of Machine Tolls and Manufacture, 35(8), pp. 1137-1147, 1995.
[33] Y. Yamada, E. Suzuki, H. Yokoi, and K. Takabayashi, “Decision-Tree Induction from Time-Series Data Based on Standard-Example Split Test,” In Proceedings of the 20th International Conference on Machine Learning (ICML03), pp. 840-847, August, 2003.
[34] B. K. Yi, and C. Faloustos, “Fast Time Sequence Indexing for arbitrary L_p norms,” In Proceedings of the 26th International Conference on Very Large Databases, pp. 385-394, September, 2000.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient Similarity Search in Sequence Databases,” In Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms, pp. 69-84, October, 1993.

[2] [2] C. S. Burrus, R. A. Gopinath, and H. Guo, “Introduction to Wavelets and Wavelet Transforms, A Primer,” Prentice Hall, Englewood Cliffs, NJ, 1997.

[3] [3] K. Chan, A. W. Fu, and T. Y. Clement, “Harr Wavelets for Efficient Similarity Search of Time-Series: with and without Time Warping,” IEEE Trans. on Knowledge and Data Engineering, 15(3), pp. 686-705, 2003.

[4] [4] R. R. Coifman, and M. V. Wickerhauser, “Entropy-Based Algorithms for Best Basis Selection,” IEEE Trans. on Information Theory, 38(2), pp. 713-718, 1992.

[5] [5] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans. on Information Theory, 41(3), pp. 613-627, 1995.

[6] [6] D. L. Donoho, and I. M. Johnson, “Ideal spatial adaptation via wavelet shrinkage,” Biometrika, 81, pp. 425-455, 1994.

[7] [7] R. Duda, P. Hart, and D. Stork, “Pattern Classification,” John Wiley & Sons, New York, second edition, 2001.

[8] [8] M. Gavrilov, D. Anguelov, and P. Indyk, “Mining the Stock Market: Which Measure is Best?,” In Proceedings of The 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 487-496, August, 2000.

[9] [9] P. Geurts, “Pattern Extraction for Time Series Classification,” In Proceedings of the Principles of Data Mining and Knowledge Discovery, 5th European Conference, pp. 115-127, September, 2001.

[10] [10] T. B. Ho, T. D. Nguyen, S. Kawasaki, S. Q. Le, D. D. Nguyen, H. Yokoi, and K. Takabayashi, “Mining Hepatitis Data with Temporal Abstraction,” In Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining, pp. 369-377, August, 2003.

[11] [11] D. Jiang, C. Tang, and A. Zhang, “Cluster Analysis for Gene Expression Data: A Survey,” IEEE Trans. on Knowledge and Data Engineering, 16(11), pp. 1370-1386, 2004.

[12] [12] X. Jin, Y. Lu, and C. Shi, “Similarity Measure Based on Partial Information of Time Series,” In Proceedings of The 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 544-549, August, 2002.

[13] [13] M. W. Kadous, “Learning Comprehensible Descriptions of Multivariate Time Series,” In Proceedings of the 6th International Conference on Machine Learning, pp. 454-463, September, 1999.

[14] [14] T. Kalayci, and O. Ozdamar, “Wavelet Preprocessing for Automated Neural Network Detection of EEG Spikes,” IEEE Eng. in Medicine and Biology, 14, pp. 160-166, 1995.

[15] [15] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, “Dimensionality Reduction of Fast Similarity Search in Large Time Series Databases,” Journal of Knowledge and Information System, 3, pp. 263-286, 2000.

[16] [16] E. Keogh, and T. Folias. “The UCR Time Series Data Mining Archive,”
http://www.cs.ucr.edu/˜eamonn/TSDMA/index.html ,
2002.

[17] [17] E. Keogh, and S. Kasetty, “On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration,” Data Mining and Knowledge Discovery, 7(4), pp. 349-371, 2003.

[18] [18] E. Keogh, S. Lonardi, and C. A. Ratanamahatana, “Towards Parameter-Free Data Mining,” In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206-215, August, 2004.

[19] [19] E. Keogh, and M. Pazzani, “An Enhanced Representation of Time Series which Allows Fast and Accurate Classification, Clustering and Relevance Feedback,” In Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining, pp. 239-241, August, 1998.

[20] [20] F. Korn, H. Jagadish, and C. Faloutsos, “Efficiently Supporting ad hoc Queries in Large Datasets of Time Sequences,” In Proceedings of The ACM SIGMOD International Conference on Management of Data, pp. 289-300, May, 1997.

[21] [21] S. Lawrence, A. Back, A. Tsoi, and C. L. Giles, “The Gamma MLP – Using Multiple Temporal Resolutions for Improved Classification,” In Proceedings of the 1997 IEEE Workshop on Neural Networks for Signal Processing VII, pp. 256-265, Piscataway, NJ, 1997, IEEE Press.

[22] [22] J. Lin, E. Keogh, S. Lonardi, and B. Chiu, “A Symbolic Representation of Time Series, with Implications for Streaming Algorithms,” In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2-11, June, 2003.

[23] [23] T. Mitchell, “Machine Learning,” McGraw-Hill, New York, 1997.

[24] [24] S. Pitter, and S. V. Kamarthi, “Feature Extraction From Wavelet Coefficients for Pattern Recognition Tasks,” IEEE Trans. on Pattern Recognition and Machine Intelligence, 21(1), pp. 83-85, January, 1999.

[25] [25] I. Popivanov, and R. J. Miller, “Similarity Search over Time-Series Data Using Wavelets,” In Proceedings of The 18th International Conference on Data Engineering, pp. 212-221, February, 2002.

[26] [26] D. Rafiei, and A. Mendelzon, “Efficient Retrieval of Similar Time Sequences Using DFT,” In Proceedings of the 5th International Conference on Foundations of Data Organizations, pp. 249-257, 1998.

[27] [27] C. A. Ratanamahatana, and E. Keogh, “Three Myths about Dynamic Time Warping,” In SIAM International Conference on Data Mining, Newport Beach, CA, April, 2005.

[28] [28] J. F. Roddich, and M. Spiliopoulou, “A Survey of Temporal Knowledge Discovery Paradigms and Methods,” IEEE Trans. on Knowledge and Data Engineering, 14(4), pp. 750-767, 2002.

[29] [29] C. Shahabi, X. Tian, and W. Zhao, “TSA-Tree: A Wavelet-Based Approach to Improve the Efficiency of Multi-Level Surprise and Trend Queries on Time-Series Data,” In Proceedings of the 12th International Conference on Scientific and Statistical Database Management, pp. 55-68, July, 2000.

[30] [30] C. E. Shannon, “A Mathematical Theory of Communication,” Bell System Technical Journal, 27, pp. 379-423, 1948.

[31] [31] Z. R. Struzik, and A. Siebes, “The Harr Wavelet Transform in the Time Series Similarity Paradigm,” In Proceedings of the 3rd European Conference on Principles of Data Mining and Knowledge Discovery, pp. 12-22, September, 1999.

[32] [32] I. N. Tansel, C. Mekdeci, and C. McLaughlin, “Detection of Tool Failure in End Milling with Wavelet Transformations and Neural Networks (WT-NN),” International Journal of Machine Tolls and Manufacture, 35(8), pp. 1137-1147, 1995.

[33] [33] Y. Yamada, E. Suzuki, H. Yokoi, and K. Takabayashi, “Decision-Tree Induction from Time-Series Data Based on Standard-Example Split Test,” In Proceedings of the 20th International Conference on Machine Learning (ICML03), pp. 840-847, August, 2003.

[34] [34] B. K. Yi, and C. Faloustos, “Fast Time Sequence Indexing for arbitrary L_p norms,” In Proceedings of the 26th International Conference on Very Large Databases, pp. 385-394, September, 2000.

Combining the Global and Partial Information for Distance-Based Time Series Classification and Clustering

Hui Zhang*, Tu Bao Ho*, Mao-Song Lin**, and Wei Huang*

Hui Zhang^, Tu Bao Ho^, Mao-Song Lin^**,
and Wei Huang^*