Paper:
Protein Entity Name Recognition Using Orthographic, Morphological and Proteinhood Features
Sagara Sumathipala,Koichi Yamada, Muneyuki Unehara, and Izumi Suzuki
Graduate School of Engineering, Nagaoka University of Technology
1603-1 Kamitomioka-machi, Nagaoka, Niigata 940-2188, Japan
- [1] PubMed, http://www.ncbi.nlm.nih.gov/pubmed
- [2] C. Friedman, P. Kra, H. Yu, M. Krauthammer, and A. Rzhetsky, “GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles,” Bioinformatics, Vol.17, No.Suppl.1, S74-S82, 2001.
- [3] M. Bundschus, M. Dejori, M. Stetter, V. Tresp, and H. P. Kriegel, “Extraction of semantic biomedical relations from text using conditional random fields,” BMC Bioinformatics, Vol.9, No.1, p.207, 2008.
- [4] T. C. Rindflesch, L. Tanabe, J. N. Weinstein, and L. Hunter, “EDGAR: extraction of drugs, genes and relations from the biomedical literature,” Pacific Symp. on Biocomputing, p. 517, NIH Public Access, 2000.
- [5] D. Zhou and Y. He, “Extracting interactions between proteins from the literature,” J. of Biomedical Informatics, Vol.41, No.2, pp. 393-407, 2008.
- [6] Q. C. Bui, S. Katrenko, and P. M. Sloot, “A hybrid approach to extract protein-protein interactions,” Bioinformatics, Vol.27, No.2, pp.259-265, 2011.
- [7] C. Blaschke, M. A. Andrade, C. A. Ouzounis, and A. Valencia, “Automatic extraction of biological information from scientific text: protein-protein interactions,” Ismb, Vol.7, pp. 60-67, 1999.
- [8] M. Huang, X. Zhu, Y. Hao, D. G. Payan, K. Qu, and M. Li, “Discovering patterns to extract protein-protein interactions from full texts,” Bioinformatics, Vol.20, No.18, pp. 3604-3612, 2004.
- [9] L. Ratinov and D. Roth, “Design challenges and misconceptions in named entity recognition,” Proc. of the 13th Conf. on Computational Natural Language Learning, pp. 147-155, Association for Computational Linguistics, 2009.
- [10] B. M. Sundheim, “Overview of results of the MUC-6 evaluation,” Proc. of a Workshop on held at Vienna, Virginia: May 6-8, 1996, pp. 423-442, Association for Computational Linguistics, 1996.
- [11] L. Yang and Y. Zhou, “Exploring feature sets for two-phase biomedical named entity recognition using semi-CRFs,” Knowledge and Information Systems, pp. 1-15, 2014.
- [12] S. Sumathipala, K. Yamada, and M. Unehara, “Protein Name Classification Using Probabilistic Information of Orthographic and Morphological Features,” 22nd Symp. of SOFT Hokushinetsu Chapter, Nagaoka, Japan, 2013.
- [13] H. C. Kuo and K. I. Lin, “Extracting Protein Names from Biological Literature,” Advances in Computer Science: an Int. J. Vol.3, No.2, pp. 58-68, 2014.
- [14] G. Zhou, J. Zhang, J. Su, D. Shen, and C. Tan, “Recognizing names in biomedical texts: a machine learning approach,” Bioinformatics, Vol.20, No.7, pp. 1178-1190, 2004.
- [15] S. Tatar and I. Cicekli, “Two learning approaches for protein name extraction,” J. of Biomedical Informatics, Vol.42, No.6, pp. 1046-1055, 2009.
- [16] M. Krauthammer, A. Rzhetsky, P. Morozov, and C. Friedman, “Using BLAST for identifying gene and protein names in journal articles,” Gene, Vol.259, No.1, pp. 245-252, 2000.
- [17] T. Mitsumori, S. Fation, M. Murata, K. Doi, and H. Doi, “Gene/protein name recognition based on support vector machine using dictionary as features,” BMC Bioinformatics, Vol.6, No.Suppl.1, S8, 2005.
- [18] K. Seki and J. Mostafa, “A probabilistic model for identifying protein names and their name boundaries,” Proc. of the 2003 IEEE Bioinformatics Conf. 2003 (CSB 2003), pp. 251-258, 2003.
- [19] Y. F. Lin, T. H. Tsai, W. C. Chou, K. P. Wu, T. Y. Sung, and W. L. Hsu, “A maximum entropy approach to biomedical named entity recognition,” Proc. of the 4th ACM SIGKDD Workshop on Data Mining in Bioinformatics, Seattle, WA, pp. 5661, 2004.
- [20] R. Bunescu, R. Ge, R. J. Kate, E. M. Marcotte, R. J. Mooney, A. Ramani, and Y. W. Wong, “Learning to extract proteins and their interactions from medline abstracts,” 2003.
- [21] Z. Ju, J. Wang, and F. Zhu, “Named entity recognition from biomedical text using SVM,” 2011 5th Int. Conf. on Bioinformatics and Biomedical Engineering (iCBBE), pp. 1-4, IEEE, 2011.
- [22] K. J. Lee, Y. S. Hwang, S. Kim, and H. C. Rim, “Biomedical named entity recognition using two-phase model based on SVMs,” J. of Biomedical Informatics, Vol.37, No.6, pp. 436-447, 2004.
- [23] S. Zhang and N. Elhadad, “Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts,” J. of Biomedical Informatics, Vol.46, No.6, pp. 1088-1098, 2013.
- [24] F. Zhu and B. Shen, “Combined SVM-CRFs for biological named entity recognition with maximal bidirectional squeezing,” PloS one, Vol.7, No. 6, e39230, 2012.
- [25] J. I. Kazama, T. Makino, Y. Ohta, and J. I. Tsujii, “Tuning support vector machines for biomedical named entity recognition,” Proc. of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain, Vol.3, pp. 1-8, Association for Computational Linguistics, 2002.
- [26] J. Patrick and Y. Wang, “Biomedical named entity recognition system,” Proc. of the 10th Australasian Document Computing Symp. (ADCS 2005), 2005.
- [27] B. Settles, “Biomedical named entity recognition using conditional random fields and rich feature sets,” Proc. of the Int. Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 104-107, Association for Computational Linguistics, 2004.
- [28] L. Li, R. Zhou, and D. Huang, “Two-phase biomedical named entity recognition using CRFs,” Computational Biology and Chemistry, Vol.33, No.4, pp. 334-338, 2009.
- [29] X. Liu, S. Zhang, F. Wei, and M. Zhou, “Recognizing named entities in tweets,” Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol.1, pp. 359-367, Association for Computational Linguistics, 2011.
- [30] 42] H. L. Chieu and H. T. Ng, “Named entity recognition: a maximum entropy approach using global information,” Proc. of the 19th Int. Conf. on Computational linguistics, Vol.1, pp. 1-7, Association for Computational Linguistics, 2002.
- [31] K. Kageura, and B. Umino, “Methods of automatic term recognition: A review,” Terminology, Vol.3, No.2, pp. 259-289, 1996.
- [32] I. H. Witten and E. Frank, “Data Mining: Practical machine learning tools and techniques,” Morgan Kaufmann, 2005.
- [33] Genia, Term annotation,
http://www.nactem.ac.uk/genia/genia-corpus/term-corpus (1textsuperscriptst July 2015). - [34] U.S. National Library of Medicine, MEDLINEcircledR/ PubMedcircledR Resources,
http://www.nlm.nih.gov/ bsd/ pmresources.html, 2006. - [35] PubMed Help
- [36] Bethesda (MD): National Center for Biotechnology Information (US); 2005-. PubMed Help. [Updated Mar 25, 2014]. Available from:
http://www. ncbi.nlm. nih.gov/books/NBK3827/ - [37] J. D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier, “Introduction to the bio-entity recognition task at JNLPBA,” Proc. of the Int. Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 70-75, Association for Computational Linguistics, 2004.
- [38] G. F. Cooper and E. Herskovits, “A Bayesian method for the induction of probabilistic networks from data,” Machine Learning, Vol.9, No.4, pp. 309-347, 1992.
- [39] R. E. Schapire and Y. Singer, “Improved boosting algorithms using confidence-rated predictions,” Machine Learning, Vol.37, No.3, pp. 297-336, 1999.
- [40] P. Harrington, “Machine learning in action,” Manning Publications Co., 2012.
- [41] V. Vapnik, “The nature of statistical learning theory,” Springer, 2000.
- [42] L. Breiman, “Random forests,” Machine Learning, Vol.45, No.1, pp.5-32, 2001.
- [43] L. Breiman, J. Friedman, R. Olshen, and C. J. Stone, “Classification and regression trees,” Wadsworth International Group, 1984.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.