Mining Time-Interval Sequential Patterns with High Utility from Transaction Databases

Wen-Yen Wang; Anna Y.-Q. Huang

doi:10.20965/jaciii.2016.p1018

single-jc.php

« previous

JACIII Vol.20 No.6 pp. 1018-1026

doi: 10.20965/jaciii.2016.p1018

(2016)

Paper:

Views over last 60 days: 1,282

Mining Time-Interval Sequential Patterns with High Utility from Transaction Databases

Wen-Yen Wang^* and Anna Y.-Q. Huang^**

^*Department of Information Engineering, Kun Shan University
No.195, Kunda Rd., Yongkang District, Tainan City 71070, Taiwan

^**Department of Computer Science and Information Engineering, National Central University
No. 300, Zhongda Rd., Zhongli District, Taoyuan City, Taiwan

Received:

April 1, 2016

Accepted:

October 3, 2016

Published:

November 20, 2016

Keywords:

time interval, sequential pattern mining, utility

Abstract

The purpose of time-interval sequential pattern mining is to help superstore business managers promote product sales. Sequential pattern mining discovers the time interval patterns for items: for example, if most customers purchase product item A, and then buy items B and C after r to s and t to u days respectively, the time interval between r to s and t to u days can be provided to business managers to facilitate informed marketing decisions. We treat these time intervals as patterns to be mined, to predict the purchasing time intervals between A and B, as well as B and C. Nevertheless, little work considers the significance of product items while mining these time-interval sequential patterns. This work extends previous work and retains high-utility time interval patterns during pattern mining. This type of mining is meant to more closely reflect actual business practice. Experimental results show the differences between three mining approaches when jointly considering item utility and time intervals for purchased items. In addition to yielding more accurate patterns than the other two methods, the proposed UTMining_A method shortens execution times by delaying join processing and removing unnecessary records.

Cite this article as:

W. Wang and A. Huang, “Mining Time-Interval Sequential Patterns with High Utility from Transaction Databases,” J. Adv. Comput. Intell. Intell. Inform., Vol.20 No.6, pp. 1018-1026, 2016.

Data files:

References

[1] L. Kaufman and P. J. Rousseeuw, “Finding Groups in Data,” Wiley, New York, 1990.
[2] J. Han, M. Kamber, and J. Pei, “Data mining: concepts and techniques,” 3rd Edition, Morgan kaufmann, 2006.
[3] H. Yao and H. J. Hamilton, “Mining itemset utilities from transaction databases,” Data & Knowledge Engineering, Vol.59, No.3, pp. 603-626, 2006.
[4] T. P. Hong, C. H. Lee, and S. L. Wang, “Mining high average-utility itemsets,” IEEE Int. Conf. on Systems, Man and Cybernetics, pp. 2526-2530, 2009.
[5] Y. L. Chen and C. K. Huang, “Discovering fuzzy time-interval sequential patterns in sequence databases,” IEEE Trans SystMan Cybern, Part B, Cybern, Vol.35, No.5, 2005.
[6] S. J. Yen and Y. S. Lee, “Mining non-redundant time-gap sequential patterns,” Applied Intelligence, pp. 1-12, 2013.
[7] S. J. Yen and Y. S. Lee, “Mining time-gap sequential patterns,” Proc. of the Int. Conf. on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE), Lecture notes in Artificial Intelligence, Vol.7345, pp. 637-646, June, 2012.
[8] M. A. Jabbar, B. L. Deekshatulu, and P. Chandra. “A Novel Algorithm for Utility-Frequent Itemset Mining in Market Basket Analysis,” Innovations in Bio-Inspired Computing and Applications, Springer Int. Publishing, pp. 337-345, 2016.
[9] D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu, “MAFIA: a maximal frequent itemset algorithm,” IEEE Trans. on Knowledge and Data Engineering, Vol.17, No.11, pp. 1490-1504, 2005.
[10] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” Proc. 20th Int. Conf. Very Large Data Bases, pp. 487-499, 1994.
[11] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” ICDE, 1995.
[12] R. Agrawal and R. Srikant, “Mining sequential patterns: generalizations and performance improvements,” Proc. of the 5th Int. Conf. on Extending Database Technology, pp. 3-17. 1996.
[13] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential pattern mining using a bitmap representation,” Proc. of the 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 429-435, 2002.
[14] M. J. Zaki, “SPADE: An efficient algorithm for mining frequent sequences,” Machine learning, Vol.42, No.1-2, pp. 31-60, 2001.
[15] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. C. Hsu, “FreeSpan: frequent pattern-projected sequential pattern mining,” Proc. of the 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 355-359, 2000.
[16] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. C. Hsu, “Mining sequential patterns by pattern growth: The prefixspan approach,” IEEE Trans. on Knowledge and Data Engineering, Vol.16, No.11, pp. 1424-1440, 2004.
[17] R. Agrawal, H. Mannile, R. Srikant, H. Toivonen, and A. I. Verkamo, “Fast Discovery of Association Rules,” U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and Ft. Uthu-rusamy (Eds.), Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, Chapter 12, pp. 307-328, 1996.
[18] Y. L. Chen, M. C. Chiang, and M. T. Ko , “Discovering time-interval sequential patterns in sequence databases,” Expert Systems with Applications, Vol.25, No.3, pp. 343-354, 2003.
[19] S. Shankar, T. Purusothaman, S. Jayanthi, and N. Babu, “A Fast Algorithm for Mining High Utility Itemsets,” IEEE Int. Advance Computing Conf. 2009 (IACC 2009), pp. 1459-1464, 2009.
[20] T. P. Hong, C. H. Lee, and S. L. Wang, “Effective utility mining with the measure of average utility,” Expert Systems with Applications, Vol.38, No.7, pp. 8259-8265, July 2011.
[21] S. J. Chen, “A Study and Implementation of Applying Data Mining on Dermatologic Diseases,” Master Thesis, Kun Shan University, 2012.
[22] R. Chan, Q. Yang, and Y. D. Shen, “Mining high utility itemsets,” Proc. of the 3rd IEEE Int. Conf. on Data Mining, Melbourne, Florida, pp. 19-26, 2003.
[23] W.-Y. Wang, and A. Y.-Q. Huang, “Considering High Utilities for Time Interval Sequential Pattern Mining.” The 2015 Conf. on Technologies and Applications of Artificial Intelligence, Tainan, Taiwan, pp. 412-418, pp. 20-22, Nov. 2015.
[24] Bradley R. Chiller, “Essentials of Economics,” New York: McGraw-Hill, 1991.
[25] J. Z. Wang, J. L. Huang, and Y. C. Chen, “On efficiently mining high utility sequential patterns,” Knowledge and Information Systems, pp. 1-31, 2016.
[26] J. C. W. Lin, W. Gan, P. Fournier-Viger, T. P. Hong, and J. Zhan, “Efficient mining of high-utility itemsets using multiple minimum utility thresholds,” Knowledge-Based Systems, 2016.
[27] J. Liu, K. Wang, and B. C. Fung, “Mining High Utility Patterns in One Phase without Generating Candidates,” IEEE Trans. on Knowledge and Data Engineering, Vol.28, No.5, pp. 1245-1257, 2016.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] L. Kaufman and P. J. Rousseeuw, “Finding Groups in Data,” Wiley, New York, 1990.

[2] [2] J. Han, M. Kamber, and J. Pei, “Data mining: concepts and techniques,” 3rd Edition, Morgan kaufmann, 2006.

[3] [3] H. Yao and H. J. Hamilton, “Mining itemset utilities from transaction databases,” Data & Knowledge Engineering, Vol.59, No.3, pp. 603-626, 2006.

[4] [4] T. P. Hong, C. H. Lee, and S. L. Wang, “Mining high average-utility itemsets,” IEEE Int. Conf. on Systems, Man and Cybernetics, pp. 2526-2530, 2009.

[5] [5] Y. L. Chen and C. K. Huang, “Discovering fuzzy time-interval sequential patterns in sequence databases,” IEEE Trans SystMan Cybern, Part B, Cybern, Vol.35, No.5, 2005.

[6] [6] S. J. Yen and Y. S. Lee, “Mining non-redundant time-gap sequential patterns,” Applied Intelligence, pp. 1-12, 2013.

[7] [7] S. J. Yen and Y. S. Lee, “Mining time-gap sequential patterns,” Proc. of the Int. Conf. on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA/AIE), Lecture notes in Artificial Intelligence, Vol.7345, pp. 637-646, June, 2012.

[8] [8] M. A. Jabbar, B. L. Deekshatulu, and P. Chandra. “A Novel Algorithm for Utility-Frequent Itemset Mining in Market Basket Analysis,” Innovations in Bio-Inspired Computing and Applications, Springer Int. Publishing, pp. 337-345, 2016.

[9] [9] D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu, “MAFIA: a maximal frequent itemset algorithm,” IEEE Trans. on Knowledge and Data Engineering, Vol.17, No.11, pp. 1490-1504, 2005.

[10] [10] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” Proc. 20th Int. Conf. Very Large Data Bases, pp. 487-499, 1994.

[11] [11] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” ICDE, 1995.

[12] [12] R. Agrawal and R. Srikant, “Mining sequential patterns: generalizations and performance improvements,” Proc. of the 5th Int. Conf. on Extending Database Technology, pp. 3-17. 1996.

[13] [13] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential pattern mining using a bitmap representation,” Proc. of the 8th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 429-435, 2002.

[14] [14] M. J. Zaki, “SPADE: An efficient algorithm for mining frequent sequences,” Machine learning, Vol.42, No.1-2, pp. 31-60, 2001.

[15] [15] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. C. Hsu, “FreeSpan: frequent pattern-projected sequential pattern mining,” Proc. of the 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 355-359, 2000.

[16] [16] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. C. Hsu, “Mining sequential patterns by pattern growth: The prefixspan approach,” IEEE Trans. on Knowledge and Data Engineering, Vol.16, No.11, pp. 1424-1440, 2004.

[17] [17] R. Agrawal, H. Mannile, R. Srikant, H. Toivonen, and A. I. Verkamo, “Fast Discovery of Association Rules,” U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and Ft. Uthu-rusamy (Eds.), Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, Chapter 12, pp. 307-328, 1996.

[18] [18] Y. L. Chen, M. C. Chiang, and M. T. Ko , “Discovering time-interval sequential patterns in sequence databases,” Expert Systems with Applications, Vol.25, No.3, pp. 343-354, 2003.

[19] [19] S. Shankar, T. Purusothaman, S. Jayanthi, and N. Babu, “A Fast Algorithm for Mining High Utility Itemsets,” IEEE Int. Advance Computing Conf. 2009 (IACC 2009), pp. 1459-1464, 2009.

[20] [20] T. P. Hong, C. H. Lee, and S. L. Wang, “Effective utility mining with the measure of average utility,” Expert Systems with Applications, Vol.38, No.7, pp. 8259-8265, July 2011.

[21] [21] S. J. Chen, “A Study and Implementation of Applying Data Mining on Dermatologic Diseases,” Master Thesis, Kun Shan University, 2012.

[22] [22] R. Chan, Q. Yang, and Y. D. Shen, “Mining high utility itemsets,” Proc. of the 3rd IEEE Int. Conf. on Data Mining, Melbourne, Florida, pp. 19-26, 2003.

[23] [23] W.-Y. Wang, and A. Y.-Q. Huang, “Considering High Utilities for Time Interval Sequential Pattern Mining.” The 2015 Conf. on Technologies and Applications of Artificial Intelligence, Tainan, Taiwan, pp. 412-418, pp. 20-22, Nov. 2015.

[24] [24] Bradley R. Chiller, “Essentials of Economics,” New York: McGraw-Hill, 1991.

[25] [25] J. Z. Wang, J. L. Huang, and Y. C. Chen, “On efficiently mining high utility sequential patterns,” Knowledge and Information Systems, pp. 1-31, 2016.

[26] [26] J. C. W. Lin, W. Gan, P. Fournier-Viger, T. P. Hong, and J. Zhan, “Efficient mining of high-utility itemsets using multiple minimum utility thresholds,” Knowledge-Based Systems, 2016.

[27] [27] J. Liu, K. Wang, and B. C. Fung, “Mining High Utility Patterns in One Phase without Generating Candidates,” IEEE Trans. on Knowledge and Data Engineering, Vol.28, No.5, pp. 1245-1257, 2016.

Mining Time-Interval Sequential Patterns with High Utility from Transaction Databases

Wen-Yen Wang* and Anna Y.-Q. Huang**

Wen-Yen Wang^* and Anna Y.-Q. Huang^**