A Signal-Representation-Based Parser to Extract Text-Based Information from the Web
Mu-Chun Su*1, Shao-Jui Wang*2, Chen-Ko Huang*3,
Pa-ChunWang*4, *5, Fu-Hau Hsu*1, Shih-Chieh Lin*1,
and Yi-Zeng Hsieh*1
*1Department of Computer Science & Information Engineering, National Central University, Taiwan
*2Chunghwa Telecom Co., Ltd., Taiwan
*3COMPAL ELECTRONIC, INC, Taiwan
*4Quality Management Center, Cathay General Hospital, Taiwan, R.O.C.
*5School of Medicine, Fu Jen Catholic University, Taiwan
-  B. Adelberg, “NoDoSE: A tool for semi-automatically extracting structured and semi-structured data from text documents,” in Proc. of the 1998 ACM SIGMOD Int. Conf. on Management of Data, pp. 283-294, Seattle, Washington, June 1998.
-  N. Ashish and C. Knoblock, “Semi-automatic wrapper generation for internet information sources,” in Proc. of the Second IFCIS Int. Conf. on Cooperative Information Systems, pp. 160-169, Kiawah Island, SC, June 1997.
-  N. Ashish and C. Knoblock, “Wrapper generation for semistructured internet sources,” SIGMOD Record, Vol.26, No.4, pp. 8-15, December 1997.
-  P. Atzeni and G. Mecca, “Cut and paste,” in Proc. of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp. 144-153, Tucson, Arizona, May 1997.
-  R. B. Doorenbos, O. Etzioni, and D. S. Weld, “A scalable comparison-shopping agent for the world-wide web,” in Proc. of the First Int. Conf. on Autonomous Agents, pp. 39-48, California, February 1997.
-  D. Embley, D. Campbell, Y. Jiang, Y.-K. Ng, R. Smith, S. Liddle, and D. Quass, “A conceptual-modeling approach to extracting data from the web,” in Proc. of the 17th Int. Conf. on Conceptual Modeling (ER’98), pp. 78-91, Singapore, November 1998.
-  A. Gupta, V. Harinarayan, and A. Rajaraman, “Virtual database technology,” SIGMOD Record, Vol.26, No.4, pp. 57-61, December 1997.
-  J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, and A. Crespo, “Extracting semi-structured information from the web,” in Proc. of the Workshop on Management of Semi-structured Data, Tucson, Arizona, pp. 18-25, May 1997.
-  N. Kushmerick, D. Weld, and R. Doorenbos, “Wrapper induction for information extraction,” in Proc. of the 1997 Int. Joint Conf. on Artificial Intelligence, pp. 729-735, 1997.
-  I. Muslea, S. Minton, and C. Knoblock, “STAKLER: learning extraction rules for semi-structured, web-based information sources,” in Proc. of AAAI’98 Workshop on AI and Information Integration, pp. 74-81, Madison, Wisconsin, July 1998.
-  S. Soderland, “Learning to extract text-based information from the world wide web,” in Proc. of the Third Int. Conf. on Knowledge Discovery and Data Mining, pp. 251-254, California, August 1997.
-  C. N. Hsu and M. T. Dung, “Generating finite-state transducers for semi-structured data,” Information Systems, Vol.23, No.8, pp. 521-537, Aug. 1998.
-  A. Sahuguet and F. Azavant, “Building intelligent web applications using lightweight wrappers,” Data and Knowledge Engineering, Vol.36, No.3, pp. 283-316, 2001.
-  R. Baumgartner, S. Flesca, and G. Gottlob, “Supervised wrapper generation with Lixto,” VLDB J., pp. 715-716, 2001.
-  C. H. Chang and S. C. Lui, “IEPAD: Information extraction based on pattern discovery,” in Proc. of the 10th Int. Conf. on World Wide Web, pp. 681-688, Hong-Kong, May 2-6, 2001.
-  H. Zhao, W. Meng, Z. Wu, V. Raghavan, and C. Yu, “Fully automatic wrapper generation for search engines,” in Proc. of the 14th Int. Conf. on World Wide Web Conference, pp. 66-75, 2005.
-  N. K. Papadakis, D. Skoutas, K. Raftopoulos, and T. A. Varvarigou, “STAVIES: a system for information extraction from unknown Web data sources through automatic Web wrapper generation using clustering techniques,” IEEE Trans. on Knowledge and Data Engineering, Vol.17, No.12, pp. 1638-1652, 2005.
-  D. W. Embley, Y. Jiang, and Y. K. Ng, “Record-boundary discovery in Web documents,” in Proc. of the ACM SIGMOD Int. Conf. on Management of Data (SIGMOD’99), pp. 467-478, Philadelphia, PA, 1999.
-  G. Carpenter and S. Grossberg, “Adaptive resonance theory: stable self-organization of neural recognition codes in response to arbitrary lists of input patterns,” in Proc. of the 8th Conf. of the Cognitive Science Society, pp. 45-62, 1986.
-  A Repository of Online Information Sources Used in Information Extraction Tasks,
-  I. Muslea, S. Minton, and C. Knoblock, “A hierarchical approach to wrapper induction,” in Proc. of the Third Int. Conf.on Autonomous Agents, pp. 190-197, 1999.
-  C. N. Hsu and C. C. Chang, “Finite-state transducers for semistructured text mining,” in Proc. of IJCAI-99 Workshop on Text Mining: Foundations, Techniques and Applications, pp. 38-49, Stockholm, Sweden, 1999.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.