Although the underlying medical conditions play an important role, many aspects of why the severity of COVID-19 can vary so much remain unclear.
A new study identifies dozens of genomic variations that can cause these unpredictable differences in clinical outcomes. According to the work, led by University of Pennsylvania scientists, genomic variants in four genes critical for SARS-CoV-2 infection, including the ACE2 gene, were targets of natural selection and linked to health conditions in COVID-19 patients .
The study, which used genomic data from different global populations, suggests that these variants may have evolved in response to previous encounters with viruses similar to SARS-CoV-2. The team published the results in the journal Proceedings of the National Academy of Sciences.
“This study illustrates my lab’s approach to genomic studies: we use what happens in nature and signatures of natural selection to identify functionally important variants that impact health and disease,” says Sarah Tishkoff, co-author of the paper and Penn Integrates Knowledge University Professor with appointments to the Perelman School of Medicine and the School of Arts & Sciences. “Nature has already done a lot of the screening and can give us clues as to which parts of a gene like ACE2 are important for infection.”
While other groups have performed genome-wide association studies to identify genetic variants associated with COVID-19 severity, this is the first to include racially diverse Africans and a highly diverse Penn Medicine BioBank data set. Inclusion of these often overlooked groups revealed new variants that could be clinically significant.
signals of selection
Before COVID-19 was even declared a pandemic, Giorgio Sirugo of the Perelman School of Medicine hypothesized that there was a genetic basis for susceptibility to, or protection from, more serious consequences.
“The idea is really classic, that infectious diseases have a host genetic component,” says Sirugo, a co-corresponding author of the paper. He turned to Tishkoff and other colleagues to approach the question with a population genetics approach.
Researchers focused on a handful of genes known to play a role in how SARS-CoV-2 enters cells: ACE2, TMPRSS2, DPP4 and LY6E. They used genomic data from 2,012 ethnically diverse Africans, including people who practice traditional hunter-gatherer, herder and farmer lifestyles, and 15,977 people of European and African descent from the Penn Medicine BioBank, all of whom had associated electronic health record data available.
Looking for variations in these genes that showed evidence they were subject to evolutionary selection, the researchers found 41 variants in the ACE2 gene that affected the protein’s amino acid sequence. Although these variants were rare when the team looked at the pooled world population, three variants were common among a population of Central African hunter-gatherers.
“That really caught our eye,” says Tishkoff. “This is a group that lives in a tropical environment and continues to forage for bushmeat and spend a lot of time in the forest. You are likely to be exposed to all sorts of viruses brought by animals. And of course SARS-CoV-2 is said to have jumped from an animal to humans. Even if this population group had not been exposed to this exact virus in the past, they could have been exposed to similar types of viruses.”
In other words, these variants may have evolved because they offered protection against viruses with similarities to SARS-CoV-2. These variants showed signs of positive selection, further evidence that they confer a fitness advantage.
Signs of natural selection were present not only in the parts of the genome that code for ACE2 and other genes, but also in so-called regulatory regions that affect how and where these genes are expressed. Many of these variants appeared to have undergone what is known as purifying selection, which occurs when evolutionary forces select for the removal of variants with negative effects on fitness.
“We saw significant signals of natural selection in the regulatory regions of ACE2,” says Chao Zhang, a postdoc in Tishkoff’s lab and co-first author. “Personally, I think that’s going to be really important when you think about clinical outcomes.”
“From an African and especially Central African perspective, the discovery of three non-synonymous variants in ACE2 in indigenous populations of Cameroon is significant,” says Alfred K. Njamnshi, co-author and professor of neurology and neuroscience at the University of Yaoundé in Cameroon. “The regulatory variants found in ACE2 indicate targets of recent natural selection in some African populations, and this may have important implications for disease risk or resistance that warrant further investigation.”
Rare variants also likely play a role in health outcomes, the team notes, accounting for individual differences in disease severity. In East Asian populations, they found variations in the ACE2 regulatory region that could increase ACE2 expression, which could affect the extent to which SARS-CoV-2 infects host cells.
“To know for sure, we need to test the function of this variant and see if we can get evidence that changes in this region are related to the susceptibility and severity of COVID infection,” says Yuanqing Feng, another postdoc at the Tishkoff Labor, who shared first authorship on the paper.
These variations in noncoding regions of the genome could also affect which organs the genes are expressed in, a relevant trait given the known effects of COVID-19 on the heart, brain, lungs, kidneys and other organs. Furthermore, the ACE2 receptor not only plays a role in binding to the SARS-CoV-2 spike protein; It is also involved in blood pressure regulation and therefore variants can affect health outside of pure COVID infection.
Beyond ACE2, natural selection signals were also evident in the coding and regulatory regions of the TMPRSS2 gene, including variations that appear to have evolved after early human populations separated from other great apes. “There are a lot of human-specific substitutions in this protein, which is really intriguing,” Tishkoff says, suggesting that natural selection acted at these sites during human evolutionary history after diverging from the chimpanzee ancestor more than 5 million years ago. The team also identified dozens of other variants in the DPP4 and LY6E genes.
Connections between the genome and health
To determine the clinical relevance of these variants, the researchers used data from the Penn Medicine BioBank. Most of the analysis was conducted before the pandemic swept the United States, and therefore COVID-19 disease results were not part of patients’ medical records at the time. However, because the biobank data includes genetic sequencing information, the researchers were able to examine the genetic variants just identified and determine if there were any links to diseases considered relevant to COVID-19 infection.
“With our data, we can look at the variants identified by Sarah’s team and link them to clinical data,” says Anurag Verma of the Perelman School of Medicine in Penn, a co-first author of the publication.
The team found that certain variants of the coding regions they identified were actually associated with conditions associated with or overlapping with COVID-19, including respiratory disease, respiratory syncytial virus infection, and liver diseases.
Building on these initial findings, the researchers say that further exploration of key genetic variants could reveal a lot about how proteins related to COVID-19 or other diseases work.
“From a medical perspective, you could identify new therapeutic targets or even provide personalized medicine based on what variants a person had,” says Sirugo.
The team underscores the importance of screening different populations for genomic studies, as some of the newly identified variants that could be clinically significant have only been identified in African populations that had not previously been screened in this way.
“This is an extremely important and unique aspect of this study,” says Tishkoff.
Chao Zhang is a postdoctoral fellow in the Tishkoff lab at the University of Pennsylvania.
Anurag Verma is Associate Professor of Translational Medicine and Human Genetics at the Perelman School of Medicine in Penn and Associate Director of Clinical Informatics and Genomics at the Penn Medicine BioBank.
Yuanqingfeng is a postdoctoral fellow in the Tishkoff lab at Penn.
Giorgio Sirugo is a clinical scientist and senior research investigator at Penn’s Perelman School of Medicine.
Sarah Tischkow is the David and Lyn Silfen University Professor in the Department of Genetics at the Perelman School of Medicine and the Department of Biology at the Penn School of Arts & Sciences and Director of the Penn Center for Global Genomics & Health Equity.
Zhang, Verma, Feng, Njamnshi, Sirugo and Tishkoff co-authored the study with Michael McQuillan, Matthew Hansen, Anastasia Lucas, Joseph Park, Alessia Ranciaro, Simon Thompson, Meagan A. Rubel, William Beggs, Daniel Rader of the Perelman School of Medicine written. and Marylyn D Ritchie; Marcelo CR Melo and Cesar de la Fuente of Penn Medicine and Penn Engineering’s Machine Biology Group; Michael C. Campbell of the University of Southern California; Jibril Hirbo of Vanderbilt University; Sununguko Wata Mpoloka and Gaonyadiwe George Mokone from the University of Botswana; Thomas Nyambo from Kampala International University in Tanzania; Dawit Wolde Meskel and Guria Belay from Addis Ababa University; Charles Fokunang and Alfred K. Njamnshi from Yaoundé University in Cameroon; Sarah A. Omar from the Kenya Medical Research Institute; Scott M. Williams of Case Western Reserve University; and the Regeneron Genetic Center.
Sirugo and Tishkoff are co-corresponding authors, and Zhang, Feng, and Verma are co-first authors.
The study was supported in part by the National Institutes of Health (Grants X01HL139409, 1R35GM134957, R01GM113657, R01DK104339, R01AR076241, and R01LM010098) and the American Diabetes Association (Grant ADA 1-19-VSN-02).