Paper:
Distance Measure for Symbolic Approximation Representation with Subsequence Direction for Time Series Data Mining
Tianyu Li, Fang-Yan Dong, and Kaoru Hirota
Department of Computational Intelligence & Systems Science, Tokyo Institute of Technology, G3-49, 4259 Nagatsuta, Midori-ku, Yokohama 226-8502, Japan
A distance measure is proposed for time series data mining based on symbolic aggregate approximation (SAX) with direction representation. It aims at increasing lower bound tightness to Euclidean distance and decreasing the error rate of time series data mining tasks by adding the time series subsequence direction factor to original SAX. Experiments on public University of California, Riverside (UCR) time series datasets, which contain various time series data with diverse type, length, and size, demonstrate that the tightness of the proposed distance measure increases 17.54% on average when compared with that of original SAX, and classification error rates on SAX with direction representation are reduced by 16.22% in comparison with that of results obtained by original SAX. The proposed approach lowers the classification error rate and could be applied to other time series data mining tasks, such as clustering, query by content, and motif discovery.
- [1] J. Lin and E. Keogh, “A Symbolic Representation of Time Series, with Implications for Streaming Algorithms,” 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 2-11, 2003.
- [2] B. Lkhagava, Y. Suzuki, and K. Kawagoe, “Extended SAX: Extension of Symbolic Aggregate Approximation for Financial Time Series Data Representation,” Data Engineering Workshop, 2006.
- [3] M. M. M. Fuad and P. F. Marteau, “Enhancing the Symbolic Aggregate Approximation Method Using Updated Lookup Tables,” Knowledge-Based and Intelligent Information and Engineering Systems, pp. 420-431, 2010.
- [4] G. Li, L. P. Zhang, and L. Q. Yang, “TSX: A Novel Symbolic Representation for Financial Time series,” Trends in Artificial Intelligence, pp. 262-273, 2012.
- [5] H. Ding and E. Keogh, “Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures,” Proc. of VLDB Endowment, Vol.1, Issue 2, pp. 1542-1552, 2010.
- [6] B. K. Yi and C. Faloutsos, “Fast Time Sequence Indexing for Arbitrary Lp Norms,” Processing of the Very Large Database, pp. 385-394, 2000.
- [7] E. Keogh, Q. Zhu, B. Hu, Y. Hao, X. Xi, L. Wei, and C. A. Ratanamahatana, “The UCR Time Series Classification/Clustering,” 2011.http:/www.cs.ucr.edu/˜eamonn/time_series_data/
- [8] X. Wang, H. Ding, G. Trajcevski, P. Scheuermann, and E. Keogh, “Experimental comparison of representation methods and distance measures for time series data,” Proc. of CoRR, 2010.
- [9] P. Siirtola, H. Koskimaki, V. Huikari, P. Laurinen, and J. Roning, “Improving the classification accuracy of stream data using SAX similarity features,” Pattern Recognition Letters, Vol.32, Issue 13, pp. 1659-1668, 2011.
- [10] A. Sant’Anna, N. Wickstrom, and A. Salarian, “A new measure of movement symmetry in early Parkinson’s disease patients using symbolic processing of inertial sensor data,” IEEE Trans. Biomed. Engineering, Vol.58, Issue 7, pp. 2127-2135, 2011.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.