Protein Entity Name Recognition Using Orthographic, Morphological and Proteinhood Features
Sagara Sumathipala,Koichi Yamada, Muneyuki Unehara, and Izumi Suzuki
Graduate School of Engineering, Nagaoka University of Technology
1603-1 Kamitomioka-machi, Nagaoka, Niigata 940-2188, Japan
-  PubMed, http://www.ncbi.nlm.nih.gov/pubmed
-  C. Friedman, P. Kra, H. Yu, M. Krauthammer, and A. Rzhetsky, “GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles,” Bioinformatics, Vol.17, No.Suppl.1, S74-S82, 2001.
-  M. Bundschus, M. Dejori, M. Stetter, V. Tresp, and H. P. Kriegel, “Extraction of semantic biomedical relations from text using conditional random fields,” BMC Bioinformatics, Vol.9, No.1, p.207, 2008.
-  T. C. Rindflesch, L. Tanabe, J. N. Weinstein, and L. Hunter, “EDGAR: extraction of drugs, genes and relations from the biomedical literature,” Pacific Symp. on Biocomputing, p. 517, NIH Public Access, 2000.
-  D. Zhou and Y. He, “Extracting interactions between proteins from the literature,” J. of Biomedical Informatics, Vol.41, No.2, pp. 393-407, 2008.
-  Q. C. Bui, S. Katrenko, and P. M. Sloot, “A hybrid approach to extract protein-protein interactions,” Bioinformatics, Vol.27, No.2, pp.259-265, 2011.
-  C. Blaschke, M. A. Andrade, C. A. Ouzounis, and A. Valencia, “Automatic extraction of biological information from scientific text: protein-protein interactions,” Ismb, Vol.7, pp. 60-67, 1999.
-  M. Huang, X. Zhu, Y. Hao, D. G. Payan, K. Qu, and M. Li, “Discovering patterns to extract protein-protein interactions from full texts,” Bioinformatics, Vol.20, No.18, pp. 3604-3612, 2004.
-  L. Ratinov and D. Roth, “Design challenges and misconceptions in named entity recognition,” Proc. of the 13th Conf. on Computational Natural Language Learning, pp. 147-155, Association for Computational Linguistics, 2009.
-  B. M. Sundheim, “Overview of results of the MUC-6 evaluation,” Proc. of a Workshop on held at Vienna, Virginia: May 6-8, 1996, pp. 423-442, Association for Computational Linguistics, 1996.
-  L. Yang and Y. Zhou, “Exploring feature sets for two-phase biomedical named entity recognition using semi-CRFs,” Knowledge and Information Systems, pp. 1-15, 2014.
-  S. Sumathipala, K. Yamada, and M. Unehara, “Protein Name Classification Using Probabilistic Information of Orthographic and Morphological Features,” 22nd Symp. of SOFT Hokushinetsu Chapter, Nagaoka, Japan, 2013.
-  H. C. Kuo and K. I. Lin, “Extracting Protein Names from Biological Literature,” Advances in Computer Science: an Int. J. Vol.3, No.2, pp. 58-68, 2014.
-  G. Zhou, J. Zhang, J. Su, D. Shen, and C. Tan, “Recognizing names in biomedical texts: a machine learning approach,” Bioinformatics, Vol.20, No.7, pp. 1178-1190, 2004.
-  S. Tatar and I. Cicekli, “Two learning approaches for protein name extraction,” J. of Biomedical Informatics, Vol.42, No.6, pp. 1046-1055, 2009.
-  M. Krauthammer, A. Rzhetsky, P. Morozov, and C. Friedman, “Using BLAST for identifying gene and protein names in journal articles,” Gene, Vol.259, No.1, pp. 245-252, 2000.
-  T. Mitsumori, S. Fation, M. Murata, K. Doi, and H. Doi, “Gene/protein name recognition based on support vector machine using dictionary as features,” BMC Bioinformatics, Vol.6, No.Suppl.1, S8, 2005.
-  K. Seki and J. Mostafa, “A probabilistic model for identifying protein names and their name boundaries,” Proc. of the 2003 IEEE Bioinformatics Conf. 2003 (CSB 2003), pp. 251-258, 2003.
-  Y. F. Lin, T. H. Tsai, W. C. Chou, K. P. Wu, T. Y. Sung, and W. L. Hsu, “A maximum entropy approach to biomedical named entity recognition,” Proc. of the 4th ACM SIGKDD Workshop on Data Mining in Bioinformatics, Seattle, WA, pp. 5661, 2004.
-  R. Bunescu, R. Ge, R. J. Kate, E. M. Marcotte, R. J. Mooney, A. Ramani, and Y. W. Wong, “Learning to extract proteins and their interactions from medline abstracts,” 2003.
-  Z. Ju, J. Wang, and F. Zhu, “Named entity recognition from biomedical text using SVM,” 2011 5th Int. Conf. on Bioinformatics and Biomedical Engineering (iCBBE), pp. 1-4, IEEE, 2011.
-  K. J. Lee, Y. S. Hwang, S. Kim, and H. C. Rim, “Biomedical named entity recognition using two-phase model based on SVMs,” J. of Biomedical Informatics, Vol.37, No.6, pp. 436-447, 2004.
-  S. Zhang and N. Elhadad, “Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts,” J. of Biomedical Informatics, Vol.46, No.6, pp. 1088-1098, 2013.
-  F. Zhu and B. Shen, “Combined SVM-CRFs for biological named entity recognition with maximal bidirectional squeezing,” PloS one, Vol.7, No. 6, e39230, 2012.
-  J. I. Kazama, T. Makino, Y. Ohta, and J. I. Tsujii, “Tuning support vector machines for biomedical named entity recognition,” Proc. of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain, Vol.3, pp. 1-8, Association for Computational Linguistics, 2002.
-  J. Patrick and Y. Wang, “Biomedical named entity recognition system,” Proc. of the 10th Australasian Document Computing Symp. (ADCS 2005), 2005.
-  B. Settles, “Biomedical named entity recognition using conditional random fields and rich feature sets,” Proc. of the Int. Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 104-107, Association for Computational Linguistics, 2004.
-  L. Li, R. Zhou, and D. Huang, “Two-phase biomedical named entity recognition using CRFs,” Computational Biology and Chemistry, Vol.33, No.4, pp. 334-338, 2009.
-  X. Liu, S. Zhang, F. Wei, and M. Zhou, “Recognizing named entities in tweets,” Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol.1, pp. 359-367, Association for Computational Linguistics, 2011.
-  42] H. L. Chieu and H. T. Ng, “Named entity recognition: a maximum entropy approach using global information,” Proc. of the 19th Int. Conf. on Computational linguistics, Vol.1, pp. 1-7, Association for Computational Linguistics, 2002.
-  K. Kageura, and B. Umino, “Methods of automatic term recognition: A review,” Terminology, Vol.3, No.2, pp. 259-289, 1996.
-  I. H. Witten and E. Frank, “Data Mining: Practical machine learning tools and techniques,” Morgan Kaufmann, 2005.
-  Genia, Term annotation,
http://www.nactem.ac.uk/genia/genia-corpus/term-corpus (1textsuperscriptst July 2015).
-  U.S. National Library of Medicine, MEDLINEcircledR/ PubMedcircledR Resources,
http://www.nlm.nih.gov/ bsd/ pmresources.html, 2006.
-  PubMed Help
-  Bethesda (MD): National Center for Biotechnology Information (US); 2005-. PubMed Help. [Updated Mar 25, 2014]. Available from:
http://www. ncbi.nlm. nih.gov/books/NBK3827/
-  J. D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier, “Introduction to the bio-entity recognition task at JNLPBA,” Proc. of the Int. Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 70-75, Association for Computational Linguistics, 2004.
-  G. F. Cooper and E. Herskovits, “A Bayesian method for the induction of probabilistic networks from data,” Machine Learning, Vol.9, No.4, pp. 309-347, 1992.
-  R. E. Schapire and Y. Singer, “Improved boosting algorithms using confidence-rated predictions,” Machine Learning, Vol.37, No.3, pp. 297-336, 1999.
-  P. Harrington, “Machine learning in action,” Manning Publications Co., 2012.
-  V. Vapnik, “The nature of statistical learning theory,” Springer, 2000.
-  L. Breiman, “Random forests,” Machine Learning, Vol.45, No.1, pp.5-32, 2001.
-  L. Breiman, J. Friedman, R. Olshen, and C. J. Stone, “Classification and regression trees,” Wadsworth International Group, 1984.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.