Bayesian selection of viral alleles for SARS-CoV-2 genomic surveillance

In a recent study published on bioRxiv* preprint server, researchers assessed selection effects in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) using Bayesian selection of viral alleles.

Various studies have reported novel SARS-CoV-2 mutations that have been associated with increased transmissibility, increased binding to angiotensin-converting enzyme 2 (ACE2), and antibody evasion. However, the functional consequences of such mutations and their links to SARS-CoV-2 fitness are still unknown.

Study: Inferring selection effects in SARS-CoV-2 with Bayesian selection of viral alleles. Image credit: NIAID

About the study

In the present study, researchers developed Bayesian selection of viral alleles to determine genetic factors that influence differential viral fitness as well as growth rates of different SARS-CoV-2 variants.

The Bayesian viral allele selection (BVAS) developed by the team allowed the calculation of the posterior inclusion probability (PIP). It was noted that alleles with high PIPs were good candidates for influencing viral fitness. The team performed comparisons between three methods based on viral diffusion, including average apparent spreader (MAP), BVAS and Laplace.

The team evaluated the sensitivity of BVAS to hyperparameters such as the prior inclusion probability h and the prior precision τ. The value of PIP relative to BVAS was also demonstrated by examining the allele-level sensitivity and precision observed when alleles with PIPs greater than 0.1 were considered hits. Additionally, the team estimated the relative viral fitness of all SARS-CoV-2 variants by fitting the BVAS model to the allele frequencies found in different regions.


The results of the study showed that in the analysis of the four methods of viral diffusion, the causal hit rate increased with the number of regions and decreased as the number of alleles increased. The BVAS methods showed the best success rates among the four methods, while the efficiency of the MAP and Laplace methods was significantly low in the presence of a high number of alleles.

The sensitivity of BVAS to τ was slightly greater than four orders of magnitude; however, the sensitivity decreases when the value of τ is very high. The team also observed that the value of the effective population size (ν) played a crucial role in the sensitivity to BVAS. Large values ​​of ν indicated that increases in allele frequency depended on deterministic drift. On the other hand, small values ​​of v suggest that allele frequency increments display significant variability prevalent in deterministic drift. When the team considered alleles with PIPs greater than 0.1 as hits, high accuracy was observed for BVAS. This indicated that alleles with high PIP values ​​were more likely to be causally associated with viral fitness. Moreover, the effective population size decreases by 15 times when the sampling rate (ρ) drops from 64% to 1%.

The fitness estimation of SARS-CoV-2 lines showed that SARS-CoV-2 Omicron BA.2 was the most fit line, followed by Omicron BA.1, Delta, Alpha and the wild-type variant. Notably, some of the phylogenetic assignments of world epidemic named (PANGO) lineages exhibited diverse genotypes that corresponded to distinct growth rates. The team also noticed that the Omicron variant had fractured into various sub-lines with fitness levels improving over time. The Omicron BA.2.12.2 subline was found to be the fittest line, while the other BA.2 sublines also have comparable fitness levels.

Locations of the top 20 Spike hits, ranked by PIP, on the Cryo-EM structure of an ACE2-bound Spike trimer (magenta) at 3.9 Angstrom resolution in the single RBD "at the top" conformation of (Zhou et al., 2020) B. Magnified view of the RBD-ACE2 interface, showing the spatial proximity of S:R346, S:N339, S:N440, S:L452, S:S477, S:E484 , and S:N501.Locations of the top 20 Spike hits, ranked by PIP, on the Cryo-EM structure of an ACE2-bound Spike trimer (magenta) at 3.9 Angstrom resolution in the unique “up” RBD conformation of (Zhou et al. , 2020) B Magnified view of the RBD-ACE2 interface, showing the spatial proximity of S:R346, S:N339, S:N440, S:L452, S:S477, S:E484 and S:N501.

The team also found recombinant lines, which were the result of recombinations between BA.1 and BA.2, and Delta and BA.1. Of these, XN and XT were the fittest recombinants; however, their physical condition was lower than BA.2 and higher than BA.1. Additionally, the suitability of existing recombinants such as XA-XT indicated that the suitability of these recombinant lines may not be a concern in the near future.

Analysis of SARS-CoV-2 mutations showed that the most robust selection signal was in the spike (S) protein, with the highest concentration of signals in the receptor binding domain (RBD). Strong selection signals were also detected in the N-terminal domain (NTD) as well as furin cleavage sites. Considering the size of the effect, the S:L452R mutation was found to be the most affected and was found in lines BA.4/BA.5, B.1.427 and B.1.429. Additionally, the S:L452Q mutation had one of the highest scores and was found in the BA.2.12.2 variant.

Overall, the study results showed the importance of the Bayesian method of viral allele selection for understanding the selection effects of SARS-CoV-2 and its emerging variants.

*Important Notice

bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be considered conclusive, guide clinical practice/health-related behaviors, or treated as established information.

Martin E. Berry