Evolution of SARS-CoV-2 genomes based on continued positive selection

The ongoing coronavirus disease 2019 (COVID-19) pandemic, caused by the rapid outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has severely affected the global economy and healthcare system.

Study: Continued positive selection drives evolution of SARS-CoV-2 genomes. Image Credit: peterschreiber.media/Shutterstock

SARS-CoV-2 is an RNA virus with a high mutation rate, causing rapid evolution.


Natural selection plays an important role in the virulence and transmissibility of SARS-CoV-2 through specifically adaptive mutations. Similar conditions have been reported for Zika and Ebola viruses. Since the emergence of SARS-CoV-2 in the Chinese province of Wuhan in 2019, it has accumulated numerous genetic mutations.

Researchers have studied and documented the entire genome of the virus and have so far reported approximately 29,735 nucleotide substitutions.

Scientists have observed that SARS-CoV-2 lineages show a high number of variations in transmission and clinical manifestations. Some of the SARS-CoV-2 variants show increased contagiousness compared to the original strain.

Additionally, some SARS-CoV-2 variants may evade immune responses triggered by COVID-19 vaccination or natural infection. Given these trait differences, the World Health Organization (WHO) has classified SARS-CoV-2 variants into variants of concern (VOC) and variants of concern (VOI).

Scientists have expressed the importance of predicting epidemic trends, formulating effective disease control strategies, and developing effective COVID-19 vaccines to protect individuals against the disease. Additionally, they highlighted the importance of understanding how natural selection drove the evolution of SARS-CoV-2 virulence and infectivity during the pandemic.

There is a research gap related to the identification of functional mutants that influence the evolution of epidemiological and pathogenic characteristics of SARS-CoV-2. Previous studies have reported two evolutionary hypotheses associated with the COVID-19 virus, which include the evolution of the virus within the animal host and in the human population after zoonotic transfer.

To date, evaluation of natural selection on SARS-CoV-2 has mainly focused on the host switching phase, i.e. from animal to human. In this case, the researchers studied the sequence divergence between SARS-CoV-2 and closely related viruses, for example BatCoV-RaTG13. An effective method to assess indeterminate ancestral sequences, assess SARS-CoV-2 cluster infections, and reduce sampling bias is lacking.

The majority of available studies have conducted analyzes based on changes in allele frequency of individual mutations, which may not have occurred due to natural selection. Therefore, the researchers said it is important to determine the candidate mutant loci of natural selection. So far, screening the whole genome of SARS-CoV-2 to determine the evolving landscape of functional mutations and understand its effect on the epidemiological perspective remains elusive.

About the study

Scientists have recently focused on determining natural selection on the evolution of SARS-CoV-2 based on a new method. They said that compared to conventional methods based on founder effects, viral cluster infections and sampling bias of viral genomic data, the current method is significantly improved.

In this study, the researchers hypothesized that continued positive selection strongly influences SARS-CoV-2 genomes, which plays an important role in shaping the dynamics of the COVID-19 pandemic. This study is available as a pretest in Genomics, proteomics and bioinformatics.

In this study, researchers obtained sequences of SARS-CoV-2 from the Novel Coronavirus 2019 Resource and the Global Influenza All Data Sharing Initiative. They included 3,328,405 sequences from 169 countries. Scientists used MUSCLE to align these sequences and determine nucleotide mutations by comparing the sequences with the reference sequence, i.e. the sequence of the original SARS-CoV-2 strain.

The scientists divided these viral sequences into clusters based on genomic similarity based on global transmissibility and clustering outbreaks. They constructed a temporal and spatial landscape of mutations above the clusters, which helped determine which mutations are pathogenic and cause severe or impaired clinical symptoms.


The researchers compared the relative excess of non-synonymous and synonymous substitutions as an effective method to determine the effect of natural selection on SARS-CoV-2. This method is logically similar to the McDonald-Kreitman test in molecular evolution. The method proposed in the current study has been called the NSRF1 method, which compares genetic polymorphisms within a species.

NSRF1 is a new method that determines the relative abundance of Nucleocapsid and Spike protein (Nm/Sm ratio) between mutations with high and low allele frequencies. The researchers examined the trend of increasing or decreasing Nm/Sm ratios with the improved frequencies of the mutant alleles. In this context, scientists hypothesized that mutations with higher frequencies tend to undergo a longer duration of natural selection.

The results of the study indicate that ongoing positive selection is responsible for the affinity with humans for increased transmissibility and evasion of host antiviral immunity.

Final remarks

The authors stated that the proportional increase or decrease in the Nm/Sm ratio is a more effective indicator of natural selection. The current study presented several lines of evidence that showed that SARS-CoV-2 genomes are limited by purifying selection during the pandemic.

Importantly, the study provided a list of 556 mutations as putative target sites of natural selection. Scientists have revealed that mutations in the divergence between clusters or the frequency within clusters improve pathogenicity and infectivity. This list provides a basis for future studies related to clinical treatment.

Martin E. Berry