Institute of Computing Technology, Chinese Academy IR
Improving protein structure prediction using templates and sequence embedding | |
Wu, Fandi1,2,3; Jing, Xiaoyang2; Luo, Xiao2; Xu, Jinbo2 | |
2023 | |
发表期刊 | BIOINFORMATICS |
ISSN | 1367-4803 |
卷号 | 39期号:1页码:8 |
摘要 | Motivation: Protein structure prediction has been greatly improved by deep learning, but the contribution of different information is yet to be fully understood. This article studies the impacts of two kinds of information for structure prediction: template and multiple sequence alignment (MSA) embedding. Templates have been used by some methods before, such as AlphaFold2, RoseTTAFold and RaptorX. AlphaFold2 and RosetTTAFold only used templates detected by HHsearch, which may not perform very well on some targets. In addition, sequence embedding generated by pre-trained protein language models has not been fully explored for structure prediction. In this article, we study the impact of templates (including the number of templates, the template quality and how the templates are generated) on protein structure prediction accuracy, especially when the templates are detected by methods other than HHsearch. We also study the impact of sequence embedding (generated by MSATransformer and ESM-1b) on structure prediction. Results: We have implemented a deep learning method for protein structure prediction that may take templates and MSA embedding as extra inputs. We study the contribution of templates and MSA embedding to structure prediction accuracy. Our experimental results show that templates can improve structure prediction on 71 of 110 CASP13 (13th Critical Assessment of Structure Prediction) targets and 47 of 91 CASP14 targets, and templates are particularly useful for targets with similar templates. MSA embedding can improve structure prediction on 63 of 91 CASP14 (14th Critical Assessment of Structure Prediction) targets and 87 of 183 CAMEO targets and is particularly useful for proteins with shallow MSAs. When both templates and MSA embedding are used, our method can predict correct folds (TMscore>0.5) for 16 of 23 CASP14 FM targets and 14 of 18 Continuous Automated Model Evaluation (CAMEO) targets, outperforming RoseTTAFold by 5% and 7%, respectively. Availability and implementation : Available at https://github.com/xluo233/RaptorXFold. Supplementary information: Supplementary data are available at Bioinformatics online. |
DOI | 10.1093/bioinformatics/btac723 |
收录类别 | SCI |
语种 | 英语 |
资助项目 | National Institutes of Health[R01GM089753] ; National Science Foundation[DBI1564955] ; CSC Scholarship ; NSF of China[61925208] ; NSF of China[62222214] ; NSF of China[U22A2028] ; CAS Project for Young Scientists in Basic Research[YSBR-029] ; Youth Innovation Promotion Association CAS ; Xplore Prize |
WOS研究方向 | Biochemistry & Molecular Biology ; Biotechnology & Applied Microbiology ; Computer Science ; Mathematical & Computational Biology ; Mathematics |
WOS类目 | Biochemical Research Methods ; Biotechnology & Applied Microbiology ; Computer Science, Interdisciplinary Applications ; Mathematical & Computational Biology ; Statistics & Probability |
WOS记录号 | WOS:001025519200001 |
出版者 | OXFORD UNIV PRESS |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | http://119.78.100.204/handle/2XEOYT63/21270 |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Xu, Jinbo |
作者单位 | 1.Chinese Acad Sci, Inst Comp Technol, Beijing 626011, Peoples R China 2.Toyota Technol Inst Chicago, Chicago, IL 60637 USA 3.Univ Chinese Acad Sci, Beijing 100049, Peoples R China |
推荐引用方式 GB/T 7714 | Wu, Fandi,Jing, Xiaoyang,Luo, Xiao,et al. Improving protein structure prediction using templates and sequence embedding[J]. BIOINFORMATICS,2023,39(1):8. |
APA | Wu, Fandi,Jing, Xiaoyang,Luo, Xiao,&Xu, Jinbo.(2023).Improving protein structure prediction using templates and sequence embedding.BIOINFORMATICS,39(1),8. |
MLA | Wu, Fandi,et al."Improving protein structure prediction using templates and sequence embedding".BIOINFORMATICS 39.1(2023):8. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论