FzMail: Using FIS-CRM for E-mail Classification
Francisco P. Romero*, José A. Olivas*, and Pablo J. Garcés**
SMILe-ORETO Research Group (Soft Management of Internet e-Laboratory)
*Department of Information Systems and Technologies, Escuela Superior de Informática, Universidad de Castilla La Mancha, Paseo de la Universidad 4, 13071-Ciudad Real, Spain
**Department of Computer Science and Artificial Intelligence, Universidad de Alicante, Carretera San Vicente del Raspeig s/n, 03080 - Alicante, Spain
In this work a brief summary of FIS-CRM (Fuzzy Interrelations and Synonymy Conceptual Representation Model) and its application to intelligent e-mail management are presented. FzMail tool is based on a soft computing methodology for automatic classification of the mailbox into a fuzzy and hierarchical structure of groups of “conceptually related” messages. FIS-CRM is used to conceptually represent messages and it is also used in the process carried out to deal with the incoming messages in order to keep the achieved conceptual organization. The aim is to make an optimum exploitation of the concepts contained in these messages possible. Therefore, we apply Fuzzy Deformable Prototypes for the document clusters representation. The effectiveness of the method has been proved by applying these techniques in an IR system. The documents considered are composed by a set of e-mail messages produced by some distribution lists about different subjects and languages.
-  D. Harrys and C. Heather, “Wordtalk releases first Internet e-mail corporate usage report; concludes e-mail abuse at epidemic levels,” 2004.
-  M. Capucciati, P. Curran, K. D. O’Brien, and A. Wagner, “Neither rain, nor sleet, nor gloom of night: Adventures in electronic mail,” In Human Factors in Computing Systems CHI’95 Conference Proceedings, Denver, Colorado, pp. 553-557, 1995.
-  G. Pasi, “Flexible information retrieval: some research trends,” Mathware and Soft Computing 9, pp. 107-121, 2002.
-  R. Herrera-Viedma and G. Pasi, “Fuzzy approaches to access information on the Web: recent developments and research trends,” Proc. of the Third Conference of the EUSFLAT, pp. 25-31, 2003.
-  N. Turene, “Learning Semantic Classes for improving Email Classification,” Proceeding of the Text Link Conference, 2003.
-  M. Sahami et al., “A Bayesian Approach to Filtering Junk E-Mail,” in Proceedings of the AAAI Symposium, 1998.
-  S. Kiritchenko and S. Matwin, “Email Classification with Cotraining,” Proceedings of the CASCON’02 (IBM Centre for Advanced Studies Conference), Toronto, 2002.
-  T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” Proceedings of the European Conference on Machine Learning, Germany, 1998.
-  G. Salton, A. Wang, and C. S. A. Yang, “Vector space model for automatic indexing,” Communications of the ACM 18, pp. 613-620, 1975.
-  G. A. Miller, “WordNet: A lexical database for English,” Communications of the ACM 11, pp. 39-41, 1995.
-  J. Gonzalo, F. Verdejo, I. Chugur, and J. Cigarran, “Indexing with WordNet synsets can improve retrieval,” Proc. of the COLING/ACL Work. on usage of WordNet in natural language processing systems, 1998.
-  A. K. Kiryakov and K. I. Simov, “Ontologically supported semantic matching” Proceedings of “NODALIDA’99: Nordic Conference on Computational Linguistics,” Trondheim, 1999.
-  J. M. Whaley, “An application of word sense disambiguation to information retrieval,” Dartmouth College Computer Science Technical Report PCS-TR99-352, 1999.
-  C. Leacock and M. Chodorow, “Combining local context and Wordnet similarity for word sense disambiguation,” In WordNet, an Electronic Lexical Database, MIT Press, Cambridge MA, pp. 285-303, 1998.
-  C. Loupy and M. El-Bèze, “Managing synonymy and polysemy in a document retrieval system using WordNet,” Proceedings of the LREC2002: Workshop on Linguistic Knowledge Acquisition and Representation, 1998.
-  G. Ramakrishnan, B. P. Prithviraj, E. Deepa et al, “Soft word disambiguation,” Second Global WordNet Conference, 2004.
-  M. Lafourcade and V. Prince, “Relative Synonymy and conceptual vectors,” Proceedings the Sixth Natural Language Processing Pacific Rim Symposium, Japan, (202), pp. 127-134, 2001.
-  D. Widyantoro and J. Yen, “Incorporating fuzzy ontology of term relations in a search engine,” Proceedings of the BISC Int. Workshop on Fuzzy Logic and the Internet, pp. 155-160, 2001.
-  S. Fernandez, “A contribution to the automatic processing of the synonymy using Prolog,” Ph.D. Thesis, University of Santiago de Compostela, Spain, 2001.
-  M. F. Porter, “An Algorithm for Suffix Stripping,” Program, 14(3), pp. 130-13, 1980.
-  L. King-ip and K. Ravikumar, “A similarity-based soft clustering algorithm for documents,” Proc. of the Seventh Int. Conf. on Database Sys. for Advanced Applications, 2001.
-  J. A. Olivas, “Contribution to the experimental study of the prediction based on Fuzzy Deformable Categories,” Ph.D. Thesis, University of Castilla-La Mancha, Spain, 2000.
-  L. A. Zadeh, “A note on prototype set theory and fuzzy sets,” Cognition, 12, pp. 291-297, 1982.
-  F. Beil, M. Ester, and X. Xu, “Frequent Term-Based Clustering,” Proceedings of the SIGKDD’02, Edmonton, Canada, 2002.
-  Y. Yang, “An Evaluation of Statistical Approaches to Text Categorization,” Journal of Information Retrieval, Vol.1, No.1/2, pp. 67-88, 1999.
-  H. Liu and P. Singh, “Commonsense reasoning in and over natural language,” Proceedings of the 8th International Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES-2004), 2004.
-  J. A. Olivas, P. J. Garcés, and F. P. Romero, “An application of the FIS-CRM model to the FISS metasearcher: Using fuzzy synonymy and fuzzy generality for representing concepts in documents,” Int. Journal of Approximate Reasoning, 34, pp. 201-219, 2003.
-  F. P. Romero, J. A. Olivas, and P. J. Garcés, “Proposal of a Document Cluster Representation based on Fuzzy Deformable Prototypes,” In Proc. of the Joint 4th EUSFLAT & 11th LFA Conference, EUSFLAT - LFA 2005, UPC, 2005.