Semantically Enhanced Code Clone Refinement Algorithm Based on Analysis of Multiple Detection Reports
Ricardo Sotolongo, Fangyan Dong, and Kaoru Hirota
Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, G3-49, 4259 Nagatsuta, Midori-ku, Yokohama 226-8502, Japan
An algorithm based on semantic analysis of multiple detection tools’ reports using WordNet is proposed oriented on the refinement of code clones. It parses different detection tools’ reports looking for new clone specifications, and refines the location of existing ones using semantic information contained in source code. It is applied to a real and complex software system and is compared to three other well-known detection algorithms, discovering 4888 clone pairs more than the average detected by other tools; also making the code clones 3 lines longer (for a subset of the same system the results are proportional to the size reduction). The objective is to provide higher quantity of code clones, and more appropriated localization to be used in refactoring processes.
-  C. Kapser and M. Godfrey, “Cloning Considered Harmful considered harmful,” in: Proc. on the 13th Working Conference on Reverse Engineering (WCRE), pp. 19-28, IEEE, 2006.
-  C. K. Roy and J. R. Cordy, “An empirical study of function clones in open source software systems,” Proc. of the 15th Working Conf. on Reverse Engineering, pp. 81-90, WCRE 2008, 2008.
-  M. Kim, L. Bergman, T. Lau, and D. Notkin, “An ethnographic study of copy and paste programming practices in OOPL,” Proc. of the Int. Symposium on Empirical Software Engineering (ISESE), pp. 83-92, IEEE, 2004.
-  Z. Li, S. Lu, S. Myagmar, and Y. Zhou, “CP-Miner: A tool for finding copy-paste and related bugs in operating system code,” OSDI, pp. 289-302, 2004.
-  J. Krinke, “Identifying similar code with program dependence graphs,” Proc. of the 8th Working Conf. on Reverse Engineering, pp. 301-309, WCRE 2001, 2001.
-  C. K. Roy, J. R. Cordy, and R. Koschke, “Comparison and evaluation of code clone detection techniques and tools: A qualitative approach,” Science of Computer Programming Vol.74, pp. 470-495, 2009.
-  C. Roy and J. R. Cordy, “NICAD: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization,” Proc. of the 16th IEEE Int. Conf. on Program Comprehension, pp. 172-181, ICPC 2008, 2008.
-  T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: A multilinguistic token-based code clone detection system for large scale source code,” IEEE Trans. on Software Engineering, Vol.28, No.7, pp. 654-670, 2002.
-  L. Jiang, G. Misherghi, Z. Su, and S. Glondu, “DECKARD: Scalable and accurate tree based detection of code clones,” Proc. of the 29th Int. Conf. on Software Engineering, pp. 96-105, ICSE 2007, 2007.
-  S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E.Merlo, “Comparison and evaluation of clone detection tools,” Trans. on Software Engineering, Vol.33, No.9, pp. 577-591, 2007.
-  S. Ducasse, M. Rieger, and S. Demeyer, “A language independent approach for detecting duplicated code,” Proc. of the 15th Int. Conf. on Software Maintenance, pp. 109-118, ICSM 1999, 1999.
-  R. Wettel and R. Marinescu, “Archeology of code duplication: Recovering duplication chains from small duplication fragments,” Proc. of the 7th Int. Symposium on Symbolic and Numeric Algorithms for Scientific Computing, p. 8, SYNASC 2005, 2005.
-  B. Baker, “On finding duplication and near-duplication in large software systems,” Proc. of the 2nd Working Conf. on Reverse Engineering, pp. 8-95, WCRE 1995, 1995.
-  Z. Li, S. Lu, S. Myagmar, and Y. Zhou, “CP-Miner: Finding copypaste and related bugs in large-scale software code,” IEEE Trans. on Software Engineering, Vol.32, No.3, pp. 176-192, 2006.
-  I. D. Baxter, A. Yahin, L. Moura, and M. Anna, “Clone detection using abstract syntax trees,” Proc. of the 14th Int. Conf. on Software Maintenance, pp. 368-377, ICSM 1998, 1998.
-  V. Wahler, D. Seipel, J. Gudenberg, and G. Fischer, “Clone detection in source code by frequent itemset techniques,” Proc. of the 4th IEEE Int. Workshop Source Code Analysis and Manipulation, pp. 128-135, SCAM 2004, 2004.
-  R. Komondoor and S. Horwitz, “Using slicing to identify duplication in source code,” Proc. of the 8th Int. Symposium on Static Analysis, pp. 40-56, SAS 2001, 2001.
-  M. Gabel, L. Jiang, and Z. Su, “Scalable detection of semantic clones,” Proc. of the 30th Int. Conf. on Software Engineering, pp. 321-330, ICSE 2008, 2008.
-  N. Davey, P. Barson, S. Field, and R. Frank, “The development of a software clone detector,” Int. J. of Applied Software Technology, Vol.1, No.3-4, pp. 219-236, 1995.
-  K. Kontogiannis, R. DeMori, E. Merlo, M. Galler, and M. Bernstein, “Pattern matching for clone and concept detection,” J. of Automated Software Engineering, Vol.3, No.1-2, pp. 77-108, 1996.
-  J. Mayrand, C. Leblanc, and E. Merlo, “Experiment on the automatic detection of function clones in a software system using metrics,” Proc. of the 12th Int. Conf. on SoftwareMaintenance, pp. 244-253, ICSM 1996, 1996.
-  Y. Higo, K. Sawa, and S. Kusumoto, “Problematic code clones identification using multiple detection results,” Proc. of 16th Asia-Pacific Software Engineering Conf., pp. 365-372, APSEC 2009, 2009.
-  C. Fellbaum (Ed.), “WordNet: An Electronic Lexical Database,” Cambridge, MA, MIT Press, 1998.
-  R. Sotolongo, C. Kobashikawa, F. Dong, and K. Hirota, “Algorithm for Web Service Discovery Based on Information Retrieval Using WordNet and Linear Discriminant Functions,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.12, No.2, pp. 182-189, 2008.