Estimation of Protein Function Using Optimized Finite State Automaton Based on Accumulated Amino Acid Residue Scores
Shinji Chiba* and Ken Sugawara**
*Department of Information Engineering, Sendai National College of Technology, 4-16-1 Ayashi-chuou, Aoba-ku, Sendai 989-3128, Japan
**Department of Information Science, Tohoku Gakuin University, 2-1-1 Tenjinzawa, Izumi-ku, Sendai 981-3193, Japan
The function of unknown proteins is currently most effective determined by retrieving similar known sequences. Some effective techniques involve sequence retrieval. We propose retrieval using a finite state automaton (FSA). The FSA is created with accumulated amino acid residue scores that express a property of a protein family. We calculate the similarity of known and unknown protein sequences using the FSA and used it to determine protein functions. To improve accuracy, we optimized the FSA using a genetic algorithm. Results from determining protein functions indicated that our proposal was superior to general motif analysis.
-  T. E. Creighton, “PROTEINS,” W. H. Freeman and Company, 1984.
-  A. Konagaya, “Genome and Computer,” Kyoritsu Shuppan, 2000 (in Japanese).
-  S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” J. Mol. Biol., 48, pp. 443-453, 1970.
-  S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Mol. Biol., 215, pp. 403-410, 1990.
-  D. J. Lipman and W. R. Pearson, “Rapid and sensitive protein similarity searches,” Science, 227, pp. 1435-1441, 1985.
-  S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Res., 25, pp. 3389-3402, 1997.
-  S. Chiba, K. Sugawara, and T. Watanabe, “Classification and Function Estimation of Protein by using Data Compression and Genetic Algorithm,” Proc. of Congress on Evolutionary Computation (CEC2001), pp. 839-844, 2001.
-  S. Chiba and K. Sugawara, “Estimation of Protein Function with an Evolutionary Dictionary,” Proc. of 2002 IEEE World Congress on Computational Intelligence (WCCI2002), pp. 315-320, 2002.
-  A. Krogh, “Hidden Markov models for labeled sequences,” Proc. of the 12th IAPR Int. Conf. on Pattern Recognition, pp. 140-144, IEEE Computer Society Press, 1994.
-  T. Yada, M. Nakao, Y. Totoki, and K. Nakai, “Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models,” Bioinformatics, 15, pp. 987-993, 1999.
-  O. Gotoh, “An improved algorithm for matching biological sequences,” Journal of Molecular Biology, 162, pp. 705-708, 1982.
-  http://kr.expasy.org/sprot/
-  http://kr.expasy.org/prosite/