CCPortal
DOI10.1073/pnas.2016239118
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
Rives A.; Meier J.; Sercu T.; Goyal S.; Lin Z.; Liu J.; Guo D.; Ott M.; Zitnick C.L.; Ma J.; Fergus R.
发表日期2021
ISSN00278424
卷号118期号:15
英文摘要In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In the life sciences, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Protein language modeling at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology. To this end, we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity. The resulting model contains information about biological properties in its representations. The representations are learned from sequence data alone. The learned representation space has a multiscale organization reflecting structure from the level of biochemical properties of amino acids to remote homology of proteins. Information about secondary and tertiary structure is encoded in the representations and can be identified by linear projections. Representation learning produces features that generalize across a range of applications, enabling state-of-the-art supervised prediction of mutational effect and secondary structure and improving state-of-the-art features for long-range contact prediction. © 2021 National Academy of Sciences. All rights reserved.
英文关键词Deep learning; Generative biology; Protein language model; Representation learning; Synthetic biology
语种英语
来源期刊Proceedings of the National Academy of Sciences of the United States of America
文献类型期刊论文
条目标识符http://gcip.llas.ac.cn/handle/2XKMVOVA/179861
作者单位Facebook AI Research, New York, NY 10003, United States; Department of Computer Science, New York University, New York, NY 10012, United States; Harvard University, Cambridge, MA 02138, United States; Booth School of Business, University of Chicago, Chicago, IL 60637, United States; Yale Law School, New Haven, CT 06511, United States
推荐引用方式
GB/T 7714
Rives A.,Meier J.,Sercu T.,et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences[J],2021,118(15).
APA Rives A..,Meier J..,Sercu T..,Goyal S..,Lin Z..,...&Fergus R..(2021).Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.Proceedings of the National Academy of Sciences of the United States of America,118(15).
MLA Rives A.,et al."Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences".Proceedings of the National Academy of Sciences of the United States of America 118.15(2021).
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Rives A.]的文章
[Meier J.]的文章
[Sercu T.]的文章
百度学术
百度学术中相似的文章
[Rives A.]的文章
[Meier J.]的文章
[Sercu T.]的文章
必应学术
必应学术中相似的文章
[Rives A.]的文章
[Meier J.]的文章
[Sercu T.]的文章
相关权益政策
暂无数据
收藏/分享

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。