Improving the Prediction of Protein Structural Class for Low-Similarity Sequences by Incorporating Evolutionaryand Structural Information
Liang Kong*,**, Lingfu Kong**, and Rong Jing**
*School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology
**School of Information Science and Engineering, Yanshan University
Protein structural class prediction is beneficial to study protein function, regulation and interactions. However, protein structural class prediction for low-similarity sequences (i.e., below 40% in pairwise sequence similarity) remains a challenging problem at present. In this study, a novel computational method is proposed to accurately predict protein structural class for low-similarity sequences. This method is based on support vector machine in conjunction with integrated features from evolutionary information generated with position specific iterative basic local alignment search tool (PSI-BLAST) and predicted secondary structure. Various prediction accuracies evaluated by the jackknife tests are reported on two widely-used low-similarity benchmark datasets (25PDB and 1189), reaching overall accuracies 89.3% and 87.9%, which are significantly higher than those achieved by state-of-the-art in protein structural class prediction. The experimental results suggest that our method could serve as an effective alternative to existing methods in protein structural classification, especially for low-similarity sequences.
-  K. C. Chou, “Structural bioinformatics and its impact to biomedical science,” Curr. Med. Chem, Vol.11, pp. 2105-2134, 2004.
-  J. Yang, Z. Peng, and X. Chen, “Prediction of protein structural classes for low-homology sequences based on predicted secondary structure,” BMC Bioinforma., Vol 11, pp. S9, 2010.
-  M. Levitt and C. Chothia, “Structural patterns in globular proteins,” Nature, Vol.261, pp. 552-558, 1976.
-  K. C. Chou, “Progress in protein structural class prediction and its impact to bioinformatics and proteomics,” Curr. Protein Pept. Sci., Vol.6, pp. 423-436, 2005.
-  L. A. Kurgan and L. Homaeian, “Prediction of structural classes for protein sequences and domains-impact of prediction algorithms, sequence representation and homology, and test procedures on accuracy,” Pattern Recognit., Vol.39, pp. 2323-2343, 2006.
-  K. C. Chou, “Some remarks on protein attribute prediction and pseudo amino acid composition (50th Anniversary Year Review),” J. Theor. Biol., Vol.273, pp. 236-247, 2011.
-  K. C. Chou, “A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space,” Proteins, Vol.21, pp. 319-344, 1995.
-  K. C. Chou, “A key driving force in determination of protein structural classes,” Biochem. Biophys. Res. Commun., Vol.264, pp. 216-224, 1999.
-  K. C. Chou, “Prediction of protein cellular attributes using pseudo amino acid composition,” Proteins, Vol.43, pp. 246-255, 2001.
-  X. Xiao, S. H. Shao, Z. D. Huang, and K. C. Chou, “Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor,” J.Comput. Chem., Vol.27, pp. 478-482, 2006.
-  H. Lin and Q. Z. Li, “Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components,” J. Comput. Chem., Vol.28, pp. 1463-1466, 2007.
-  T. L. Zhang, Y. S. Ding, and K. C. Chou, “Prediction protein structural classes with pseudo-amino acid composition: approximate entropy and hydrophobicity pattern,” J. Theor. Biol., Vol.250, pp. 186-193, 2008.
-  R. Y. Luo, Z. P. Feng, and J. K. Liu, “Prediction of protein structural class by amino acid and polypeptide composition,” Eur. J. Biochem., Vol.269, pp. 4219-4225, 2002.
-  X. D. Sun and R. B. Huang, “Prediction of protein structural classes using support vector machines,” Amino Acids, Vol.30, pp. 469-475, 2006.
-  S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, “Gapped BLAST and PSI-BLAST: A new generation of protein database search programs,” Nucleic Acids Res., Vol.25, pp. 3389-3402, 1997.
-  D. T. Jones, “Protein secondary structure prediction based on position-specific scoring matrices,” J. Mol. Biol., Vol.292, pp. 195-202, 1999.
-  H. Kim and H. Park, “Prediction of protein relative solvent accessibility with support vector machines and long-range interaction 3D local descriptor,” Proteins, Vol.54, pp. 557-562, 2004.
-  T. G. Liu, X. Zheng, and J. Wang, “Prediction of protein structural class for low-similarity sequences using support vector machine and PSI-BLAST profile,” Biochimie, Vol.92, pp. 1330-1334, 2010.
-  S. L. Zhang, Y. Feng, and X. G. Yuan, “Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM,” J. Biomol. Struct. Dyn. Vol.29, pp. 634-642, 2012.
-  T. Liu, X. Geng, X. Zheng, R. Li, and J. Wang, “Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles,” Amino Acids, Vol.42, pp. 2243-2249, 2012.
-  L. A. Kurgan, K. Cios, and K. Chen, “SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences,” BMC Bioinforma., Vol.9, pp. 226, 2008.
-  T. Liu and C. Jia, “A high-accuracy protein structural class prediction algorithm using predicted secondary structural information,” J. Theor. Biol., Vol.267, pp. 272-275, 2010.
-  S. Zhang, S. Ding, and T. Wang, “High-accuracy prediction of protein structural class for low-similarity sequences based on predicted secondary structure,” Biochimie, Vol.93, pp. 710-714, 2011.
-  S. Ding, S. Zhang, Y. Li, and T. Wang, “A novel protein structural classes prediction method based on predicted secondary structure,” Biochimie, Vol.94, pp. 1166-1171, 2012.
-  L. Zhang, X. Zhao, and L. Kong, “A protein structural class prediction method based on novel features,” Biochimie, Vol.95, pp. 1741-1744, 2013.
-  M. J. Mizianty and L. Kurgan, “Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences,” BMC Bioinforma., Vol.10, pp. 414, 2009.
-  S. Ding, Y. Li, Z. Shi, and S. Yan, “A protein structural classes prediction method based on predicted secondary structure and PSI-BLAST profile,” Biochimie, Vol.97, pp. 60-65, 2014.
-  L. Kong, L. Zhang, and J. Lv, “Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou's pseudo amino acid composition,” J. Theor. Biol., Vol.344, pp. 12-18, 2014.
-  L. Kong and L. Zhang, “Novel structure-driven features for accurate prediction of protein structural class,” Genomics, Vol.103, No.4, pp. 292-297, 2014.
-  U. Hobohm and C. Sander, “Enlarged representative set of protein structures,” Protein Sci., Vol.3, pp. 522-524, 1994.
-  H. M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne, “The protein data bank,” Nucleic Acids Res., Vol.28, pp. 235-242, 2000.
-  A. Dehzangi, K. Paliwal, J. Lyons, A. Sharma, and A. Sattar, “Proposing a highly accurate protein structural class predictor using segmentation-based features,” BMC Genomics, Vol.15, pp. S2, 2014.
-  H. Saini, G. Raicar, A. Sharma, S. Lal, A. Dehzangi, J. Lyons, K. Paliwal, S. Imoto, and S. Miyano, “Probabilistic expression of spatially varied amino acid dimers into general form of Chou's pseudo amino acid composition for protein fold recognition,” J. Theor. Biol., Vol.380, pp. 291-298, 2015.
-  A. Dehzangi, A. Sharma, J. Lyons, K. Paliwal, and A. Sattar, “A mixture of physicochemical and evolutionaryｨCbased feature extraction approaches for protein fold recognition,” Int. J. Data Min. Bioinform., Vol.11, pp. 115-138, 2015.
-  K. Paliwal, A. Sharma, J. Lyons, and A. Dehzangi, “Improving protein fold recognition using the amalgamation of evolutionary-based and structural based information,” BMC Bioinform., Vol.15, pp. S12, 2014.
-  J. Lyons, N. Biswas, A. Sharma, A. Dehzangi, and K. Paliwal, “Protein fold recognition by alignment of amino acid residues using kernelized dynamic time warping,” J. Theor. Biol., Vol.354, pp. 137-145, 2014.
-  K. Paliwal, A. Sharma, J. Lyons, and A. Dehzangi, “A tri-gram based feature extraction technique using linear probabilities of position specific scoring matrix for protein fold recognition,” IEEE Trans. Nanobioscience, Vol.13, pp. 44-50, 2014.
-  A. Sharma, A. Dehzangi, J. Lyons, S. Imoto, S. Miyano, K. Nakai, and A. Patil, “Evaluation of sequence features from intrinsically disordered regions for the estimation of protein function,” PLoS One, Vol.9, pp. e89890, 2014.
-  A. Dehzangi, K. Paliwal, J. Lyons, A. Sharma, and A. Sattar, “A segmentation-based method to extract structural and evolutionary features for protein fold recognition,” IEEE Trans. on Computational Biology and Bioinformatics, Vol.11, pp. 510-519, 2014.
-  A. Sharma, J. Lyons, A. Dehzangi, and K. Paliwal, “A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition,” J. Theor. Biol., Vol.320, pp. 41-46, 2013.
-  C. D. Huang and C. T. Lin, “Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification,” IEEE Trans. on Nanobioscience, Vol.2, pp. 221-232, 2003.
-  E. Faraggi, T. Zhang, Y. Yang, L. Kurgan, and Y. Zhou, “SPINE-X: improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles,” J. Comput. Chem., Vol.33, pp. 259-267, 2012.
-  R. Heffeman, K. Paliwal, J. Lyons, A. Dehzangi, A. Sharma, J. Wang, A. Sattar, Y. Yang, and Y. Zhou, “Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning,” Sci. Rep., Vol.5, pp. 11476, 2015.
-  Y. Saeys, I. Inza, and P. Larranaga, “A review of feature selection techniques in bioinformatics,” Bioinformatics, Vol.23, pp. 2507-2517, 2007.
-  A. Sharma, “A top-r feature selection algorithm for microarray gene expression data,” IEEE Trans. on Computational Biology and Bioinformatics, Vol.9, pp. 754-764, 2012.
-  A. Sharma, S. Imoto, S. Miyano, and V. Sharma, “Null space based feature selection method for gene expression data,” Int. J. of Machine Learning and Cybernetics, Vol.3, pp. 269-276, 2012.
-  M. A. Hall, “Correlation-based feature selection for machine learning,” Ph.D. Thesis, The University of Waikato, pp. 51-74, 1999.
-  A. Ahmadi Adl, A. Nowzari-Dalini, B. Xue, V.N. Uversky, and X. Qian, “Accurate prediction of protein structural classes using functional domains and predicted secondarystructure sequences,” J. Biomol. Struct. Dyn., Vol.29, pp. 1127-1137, 2012.
-  C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, Vol.20, pp. 273-297, 1995.
-  K. C. Chou and Y. D. Cai, “Using functional domain composition and support vector machines for prediction of protein subcellular location,” J. Biol. Chem., Vol.277, pp. 45765-45769, 2002.
-  Y. D. Cai, G. P. Zhou, and K. C. Chou, “Support vector machines for predicting membrane protein types by using functional domain composition,” Biophys. J., Vol.84, pp. 3257-3263, 2003.
-  P. M. Feng, W. Chen, H. Lin, and K. C. Chou, “iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduce damino acid alphabet composition,” Anal. Biochem., Vol.442, pp. 118-125, 2013.
-  W. Chen, P. M. Feng, H. Lin, and K. C. Chou, “iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition,” Nucl. Acids Res., Vol.41, pp. e69, 2013.
-  C. C. Chang and C. J. Lin, “LIBSVM: A library for support vector machines,” ACM Trans. Intell. Syst. Technol., Vol.2, pp. 1-27, 2011, software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm [Accessed April 3, 2015].
-  K. C. Chou and C. T. Zhang, “Prediction of protein structural classes,” Crit. Rev. Biochem. Mol. Biol., Vol.30, pp. 275-349, 1995.
-  Z. X. Wang and Z. Yuan, “How good is prediction of protein structural class by the component-coupled method?” Proteins, Vol.38, pp. 165-175, 2000.
-  T. Liu, X. Zheng, and J. Wang, “Prediction of protein structural class using a complexity-based distance measure,” Amino Acids, Vol.38, pp. 721-728, 2010.
-  Y. Cai, K. Feng, W. Lu, and K. C. Chou, “Using LogitBoost classifier to predict protein structural classes,” J. Theor. Biol., Vol.238, pp. 172-176, 2006.
-  L. Dong, Y. Yuan, and T. Cai, “Using Bagging classifier to predict protein domain structural class,” J. Biomol. Struct. Dyn., Vol.24, pp. 239-242, 2006.
-  K. C. Chou and H. B. Shen, “Review: recent advances in developing web-servers for predicting protein attributes,” Natural Science, Vol.2, pp. 63-92, 2009.