Effective Image Mining by Representing Color Histograms as Time Series
Zaher Al Aghbari
Department of Computer Science, University of Sharjah, UAE
Due to the wide spread of digital libraries, digital cameras, and the increase access to WWW by individuals, the number of digital images that exist pose a great challenge. Easy access to such collections requires an index structure to facilitate random access to individual images and ease navigation of these images. As these images are not annotated or associated with descriptions, existing systems represent the images by their extracted low level features.
In this paper, we demonstrate two image mining tasks, namely image classification and image clustering, which are preliminary steps in facilitating indexing and navigation. These tasks are based on the extraction of color distributions of images. Then, these color distributions are represented as time series. To make the representation more effective and efficient for the data mining tasks, we have chosen to represent the time series by a new representation called SAX (Symbolic Aggregate approXimation) . SAX based representation is very effective because it reduces the dimensionality and lower bounds the distance measure. We demonstrate by our experiment the feasibility of our approach.
-  J. Zhang, W. Hsu, and M-L. Lee, “Image Mining: Issues, Frameworks and Techniques,” Int. Workshop on Multimedia Data Mining (MDM/KDD) 2001, pp. 13-20.
-  M. Flickner, H. Sawhney, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, “Query By Image and Video Content: The QBIC System,” IEEE Computer Magazine, Sep. 1995.
-  J. R. Smith and S-F. Chang, “VisualSEEK: A fully automated content-based image query system,” ACM Multimedia Conf., Nov. 1996.
-  W. Y. Ma and B. S. Manjunath, “Netra: A Toolbox for Navigating Large Image Databases,” IEEE Int. Conf. On Image Processing (ICIP), Oct. 1997.
-  S. Mehrotra, Y. Rui, M. Ortega, and T. S. Huang, “Supporting Content-Based Queries over Images in MARS,” IEEE Int. Conf. On Multimedia Computing and Systems, 1997.
-  O. Zaiane, J. Han, Z-N. Li, and J. Hou, “Mining Multimedia Data,” CASCON'98: Meeting of Minds, pp. 83-96, Toronto, Canada, Nov. 1998.
-  J-H. Lim, “Explicit Query Formulation with Visual Keywords,” ACM Multimedia, Oct. 2000.
-  P. Mulhem and J-H. Lim, “Symbolic Photograph Content-Based Retrieval,” ACM Int. Conf. On Information and Knowledge Management (CIKM), Nov. 2002.
-  C. P. Town and D. Sinclair, “Content Based Image Retrieval Using Semantic Visual Categories,” Technical Report TR2000-14, AT&T Laboratories Cambridge, 2000.
-  T. Gevers, F. Aldershoff, and A. W. M. Smeulders, “Classification of Images on Internet by Visual and Textual Information,” Proc. of SPIE Internet Imaging, Vol.3964, 1999.
-  B. Bradshaw, “Semantic Based Image Retrieval: A Probabilistic Approach,” ACM Multimedia, Oct. 2000.
-  A. Vailaya, A. Jain, M. Figueiredo, and H. J. Zhang, “Content-Based Hierarchical Classification of Vacation Images,” IEEE Int. Conf. On Multimedia Computing and Systems, Jun. 1999.
-  A. Vailaya and A. Jain, “Incremental Learning for Bayesian Classification of Images,” IEEE Int. Conf. on Image Processing, Vol.2, pp. 585-589, 1999.
-  J. Lin, E. Keogh, S. Lonadi, and B. Chiu, “A Symbolic Representation of Time Series with Implications for Streaming Algorithms,” 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2003.
-  E. Keogh, J. Lin, and A. Fu, “HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequences,” IEEE Int. Conf. on Data Mining (ICDM), 2005.
-  Z. A. Aghbari and R. Al-Haj, “Hill-Manipulation: An Effective Algorithm for Color Image Segmentation,” Elsevier Int. Journal on Image and Vision Computing, Vol.24, No.8, pp. 894-903, Aug. 2006.
-  C. Ding, X. He, H. Zha, and H. Simon, “Adaptive Dimension Reduction for Clustering High Dimensional Data,” In Proc. of the 2nd IEEE Int. Conf. on Data Mining, Dec. 9-12, 2002, Japan, pp. 147-154.
-  http://archive.ics.uci.edu/ml/datasets.html