news

nature's major research: alphafold draws a "family tree" of viruses, revealing the mystery of their origins

2024-09-18

한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina

【introduction】in addition to protein design and drug discovery, a paper recently published in nature has unlocked new uses for large biological models such as alphafold - revealing the kinship and evolutionary history of organisms.

in july this year, the esmfold team, which was disbanded by meta, successfully started over and released their latest life science model esm3, with the slogan "simulating 500 million years of evolution with language models."

paper address: https://evolutionaryscale-public.s3.us-east-2.amazonaws.com/research/esm3.pdf

this use was quickly discovered by biologists.

in many recently published works, scientists are using models such as alphafold and esmfold to redraw viral lineages and explore some surprising "kinships."

these results can not only reveal the evolutionary history of the virus family, but also enable us to better deal with future biochemical risks.

using traditional methods, scientists need to understand viral evolution based on the results of genome comparison.

however, compared with mammals, the evolution speed of viruses can be said to be as fast as lightning, especially viruses whose genes are composed of rna. the number and complexity of genomes that need to be compared will increase rapidly.

in addition, the evolution of viruses does not only come from gene mutations, they can also obtain genetic material from other organisms, which makes it more difficult to identify the "kinship" of viruses. genetic sequences that look very different may hide very deep and distant relationships between viruses.

compared with the genes of viruses, the shape or structure of the proteins they encode often changes more slowly. however, joe grove, a molecular virologist at the university of glasgow in the uk, said that before the emergence of tools such as alphafold, even the protein structure of the entire virus family was difficult to study and compare using traditional methods.

a paper recently published by grove and his team in nature used the power of large models to reveal the evolutionary history of the flaviviridae family through the structure of glycoproteins.

paper address: https://www.nature.com/articles/s41586-024-07899-8

flaviviruses include hepatitis c virus, dengue virus, and zika virus, as well as several major animal pathogens and species that may pose new threats to human health.

how viruses enter cells

since the widespread vaccination, hepatitis c has become an infectious disease that we are less familiar with, but the virus still causes hundreds of thousands of deaths every year.

if we want to develop a more effective hepatitis c vaccine, we need to understand which proteins (including glycoproteins) the virus uses to enter cells, and these proteins also determine which hosts the virus can infect.

if you only study and compare at the sequence level, you will find that the proteins of each virus are so different that it is difficult to find meaningful connections. however, if you use the protein structure prediction function of the biological macromodel, this problem will be solved.

the researchers used deepmind's alphafold 2 model and esmfold, a structure prediction tool developed by meta, to generate more than 33,000 predicted structures for proteins from 458 flaviviruses.

prediction of the structure of hepatitis c virus glycoprotein

the reason why both alphafold and esmfold models are used is due to an essential difference between the two.

alphafold's input needs to rely on multiple sequences of similar proteins, but esmfold is different. it is a "protein language model" trained on tens of millions of protein sequences and can only accept delayed sequences as input, making it very suitable for in-depth analysis of the most "mysterious" viruses.

the predicted results of these structures allowed the researchers to discover some unexpected connections. some seemingly unrelated relatives of flaviviruses can also use similar proteins as "keys" to enter cells.

for example, the cell infection system used by hepatitis c is very similar to that of pestiviruses, including the more classic swine fever, and other animal pathogens.

the ai-assisted tool can also tell us that the "entry system" used by hepatitis c and pestivirus is very different from other viruses. grove also has a hard time explaining this: "for hepatitis c and its relatives, we don't know where their entry systems come from. maybe those viruses invented them a long time ago."

obtaining "pirated" proteins from bacteria

in addition to pestiviruses, the predicted structure also helped flaviviruses find two "relatives" - zika virus and dengue virus, whose entry proteins seem to have the same origin; in addition, flaviviruses seem to have "stole" an enzyme from bacteria and took it for themselves.

predicting the structure of dengue virus protein using colabfold–alpahfold2

previously, the team of virologist mary petrone from the university of sydney had discovered similar "theft" behavior in a strange flavivirus.

"gene theft may have played a much larger role in shaping the evolution of flaviviruses than we previously thought," she said.

david moi, a computational biologist at the university of lausanne in switzerland, also points out that flavivirus research is just the tip of the iceberg, given the untapped potential of ai-assisted tools.

with the help of artificial intelligence, the evolutionary history of other viruses and even many cellular organisms may be rewritten.

“we’re going to be retelling their stories with a new generation of tools. the evolutionary history of all these organisms needs to be updated now that we can see further back.”

among the many unsolved mysteries in life sciences, the tremendous power unleashed by ai has allowed us to see the dawn of answers, and also makes us look forward to the day when the story is rewritten.