Estimation of Protein Function Using Optimized Finite State Automaton Based on Accumulated Amino Acid Residue Scores

Shinji Chiba; Ken Sugawara

doi:10.20965/jaciii.2007.p1129

single-jc.php

« previous

JACIII Vol.11 No.9 pp. 1129-1135

(2007)

doi: 10.20965/jaciii.2007.p1129

Paper:

Views over last 60 days: 930

Estimation of Protein Function Using Optimized Finite State Automaton Based on Accumulated Amino Acid Residue Scores

Shinji Chiba^* and Ken Sugawara^**

^*Department of Information Engineering, Sendai National College of Technology, 4-16-1 Ayashi-chuou, Aoba-ku, Sendai 989-3128, Japan

^**Department of Information Science, Tohoku Gakuin University, 2-1-1 Tenjinzawa, Izumi-ku, Sendai 981-3193, Japan

Received:

December 12, 2006

Accepted:

July 17, 2007

Published:

November 20, 2007

Keywords:

amino acid sequence, motif, pair-wise alignment, finite state automaton, estimation of protein function

Abstract

The function of unknown proteins is currently most effective determined by retrieving similar known sequences. Some effective techniques involve sequence retrieval. We propose retrieval using a finite state automaton (FSA). The FSA is created with accumulated amino acid residue scores that express a property of a protein family. We calculate the similarity of known and unknown protein sequences using the FSA and used it to determine protein functions. To improve accuracy, we optimized the FSA using a genetic algorithm. Results from determining protein functions indicated that our proposal was superior to general motif analysis.

Cite this article as:

S. Chiba and K. Sugawara, “Estimation of Protein Function Using Optimized Finite State Automaton Based on Accumulated Amino Acid Residue Scores,” J. Adv. Comput. Intell. Intell. Inform., Vol.11 No.9, pp. 1129-1135, 2007.

Data files:

References

[1] T. E. Creighton, “PROTEINS,” W. H. Freeman and Company, 1984.
[2] A. Konagaya, “Genome and Computer,” Kyoritsu Shuppan, 2000 (in Japanese).
[3] S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” J. Mol. Biol., 48, pp. 443-453, 1970.
[4] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Mol. Biol., 215, pp. 403-410, 1990.
[5] D. J. Lipman and W. R. Pearson, “Rapid and sensitive protein similarity searches,” Science, 227, pp. 1435-1441, 1985.
[6] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Res., 25, pp. 3389-3402, 1997.
[7] S. Chiba, K. Sugawara, and T. Watanabe, “Classification and Function Estimation of Protein by using Data Compression and Genetic Algorithm,” Proc. of Congress on Evolutionary Computation (CEC2001), pp. 839-844, 2001.
[8] S. Chiba and K. Sugawara, “Estimation of Protein Function with an Evolutionary Dictionary,” Proc. of 2002 IEEE World Congress on Computational Intelligence (WCCI2002), pp. 315-320, 2002.
[9] A. Krogh, “Hidden Markov models for labeled sequences,” Proc. of the 12^th IAPR Int. Conf. on Pattern Recognition, pp. 140-144, IEEE Computer Society Press, 1994.
[10] T. Yada, M. Nakao, Y. Totoki, and K. Nakai, “Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models,” Bioinformatics, 15, pp. 987-993, 1999.
[11] O. Gotoh, “An improved algorithm for matching biological sequences,” Journal of Molecular Biology, 162, pp. 705-708, 1982.
[12] http://kr.expasy.org/sprot/
[13] http://kr.expasy.org/prosite/

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[B1] [1] T. E. Creighton, “PROTEINS,” W. H. Freeman and Company, 1984.

[B2] [2] A. Konagaya, “Genome and Computer,” Kyoritsu Shuppan, 2000 (in Japanese).

[B3] [3] S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” J. Mol. Biol., 48, pp. 443-453, 1970.

[B4] [4] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Mol. Biol., 215, pp. 403-410, 1990.

[B5] [5] D. J. Lipman and W. R. Pearson, “Rapid and sensitive protein similarity searches,” Science, 227, pp. 1435-1441, 1985.

[B6] [6] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Res., 25, pp. 3389-3402, 1997.

[B7] [7] S. Chiba, K. Sugawara, and T. Watanabe, “Classification and Function Estimation of Protein by using Data Compression and Genetic Algorithm,” Proc. of Congress on Evolutionary Computation (CEC2001), pp. 839-844, 2001.

[B8] [8] S. Chiba and K. Sugawara, “Estimation of Protein Function with an Evolutionary Dictionary,” Proc. of 2002 IEEE World Congress on Computational Intelligence (WCCI2002), pp. 315-320, 2002.

[B9] [9] A. Krogh, “Hidden Markov models for labeled sequences,” Proc. of the 12^th IAPR Int. Conf. on Pattern Recognition, pp. 140-144, IEEE Computer Society Press, 1994.

[B10] [10] T. Yada, M. Nakao, Y. Totoki, and K. Nakai, “Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models,” Bioinformatics, 15, pp. 987-993, 1999.

[B11] [11] O. Gotoh, “An improved algorithm for matching biological sequences,” Journal of Molecular Biology, 162, pp. 705-708, 1982.

[B12] [12] http://kr.expasy.org/sprot/

[B13] [13] http://kr.expasy.org/prosite/

Estimation of Protein Function Using Optimized Finite State Automaton Based on Accumulated Amino Acid Residue Scores

Shinji Chiba* and Ken Sugawara**

Shinji Chiba^* and Ken Sugawara^**