JACIII Vol.11 No.9 pp. 1129-1135
doi: 10.20965/jaciii.2007.p1129


Estimation of Protein Function Using Optimized Finite State Automaton Based on Accumulated Amino Acid Residue Scores

Shinji Chiba* and Ken Sugawara**

*Department of Information Engineering, Sendai National College of Technology, 4-16-1 Ayashi-chuou, Aoba-ku, Sendai 989-3128, Japan

**Department of Information Science, Tohoku Gakuin University, 2-1-1 Tenjinzawa, Izumi-ku, Sendai 981-3193, Japan

December 12, 2006
July 17, 2007
November 20, 2007
amino acid sequence, motif, pair-wise alignment, finite state automaton, estimation of protein function

The function of unknown proteins is currently most effective determined by retrieving similar known sequences. Some effective techniques involve sequence retrieval. We propose retrieval using a finite state automaton (FSA). The FSA is created with accumulated amino acid residue scores that express a property of a protein family. We calculate the similarity of known and unknown protein sequences using the FSA and used it to determine protein functions. To improve accuracy, we optimized the FSA using a genetic algorithm. Results from determining protein functions indicated that our proposal was superior to general motif analysis.

Cite this article as:
Shinji Chiba and Ken Sugawara, “Estimation of Protein Function Using Optimized Finite State Automaton Based on Accumulated Amino Acid Residue Scores,” J. Adv. Comput. Intell. Intell. Inform., Vol.11, No.9, pp. 1129-1135, 2007.
Data files:
  1. [1] T. E. Creighton, “PROTEINS,” W. H. Freeman and Company, 1984.
  2. [2] A. Konagaya, “Genome and Computer,” Kyoritsu Shuppan, 2000 (in Japanese).
  3. [3] S. B. Needleman and C. D. Wunsch, “A general method applicable to the search for similarities in the amino acid sequence of two proteins,” J. Mol. Biol., 48, pp. 443-453, 1970.
  4. [4] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Mol. Biol., 215, pp. 403-410, 1990.
  5. [5] D. J. Lipman and W. R. Pearson, “Rapid and sensitive protein similarity searches,” Science, 227, pp. 1435-1441, 1985.
  6. [6] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped BLAST and PSI-BLAST: a new generation of protein database search programs,” Nucleic Acids Res., 25, pp. 3389-3402, 1997.
  7. [7] S. Chiba, K. Sugawara, and T. Watanabe, “Classification and Function Estimation of Protein by using Data Compression and Genetic Algorithm,” Proc. of Congress on Evolutionary Computation (CEC2001), pp. 839-844, 2001.
  8. [8] S. Chiba and K. Sugawara, “Estimation of Protein Function with an Evolutionary Dictionary,” Proc. of 2002 IEEE World Congress on Computational Intelligence (WCCI2002), pp. 315-320, 2002.
  9. [9] A. Krogh, “Hidden Markov models for labeled sequences,” Proc. of the 12th IAPR Int. Conf. on Pattern Recognition, pp. 140-144, IEEE Computer Society Press, 1994.
  10. [10] T. Yada, M. Nakao, Y. Totoki, and K. Nakai, “Modeling and predicting transcriptional units of Escherichia coli genes using hidden Markov models,” Bioinformatics, 15, pp. 987-993, 1999.
  11. [11] O. Gotoh, “An improved algorithm for matching biological sequences,” Journal of Molecular Biology, 162, pp. 705-708, 1982.
  12. [12]
  13. [13]

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Mar. 05, 2021