The S1 protein of Pangolin-CoV is much more closely related to SARS-CoV-2 than to RaTG13. R. Soc. While pangolins could be acting as intermediate hosts for bat viruses to get into humansthey develop severe respiratory disease38 and commonly come into contact with people through traffickingthere is no evidence that pangolin infection is a requirement for bat viruses to cross into humans. SARS-CoV-2 is an appropriate name for the new coronavirus. . BEAST inferences made use of the BEAGLE v.3 library68 for efficient likelihood computations. Viruses 11, 174 (2019). We compare both MERS-CoV- and HCoV-OC43-centred prior distributions (Extended Data Fig. M.F.B., P.L. Specifically, we used a combination of six methods implemented in v.5.5 of RDP5 (ref. 874850). Xiao, K. et al. Proc. Regions AC were further examined for mosaic signals by 3SEQ, and all showed signs of mosaicism. Nat Microbiol 5, 14081417 (2020). Identification of diverse alphacoronaviruses and genomic characterization of a novel severe acute respiratory syndrome-like coronavirus from bats in China. Schierup, M. H. & Hein, J. Recombination and the molecular clock. There is a 90% DNA match between SARS CoV 2 and a coronavirus in pangolins. We used an uncorrelated relaxed clock model with log-normal distribution for all datasets, except for the low-diversity SARS data for which we specified a strict molecular clock model. Dudas, G., Carvalho, L. M., Rambaut, A. Since experts have suggested that pangolins may be the reservoir species for COVID-19, the scaly anteater has been catapulted into headlines, news reports, and conversationsand some are calling COVID-19 "the revenge of the . As illustrated by the dashed arrows, these two posteriors motivate our specification of prior distributions with standard deviations inflated 10-fold (light color). Nguyen, L.-T., Schmidt, H. A., Von Haeseler, A. Further information on research design is available in the Nature Research Reporting Summary linked to this article. 3 Priors and posteriors for evolutionary rate of SARS-CoV-2. Patino-Galindo, J. 04:20. and P.L.) (Yes, Pango is a tongue-in-cheek reference to pangolins, which were briefly suspected to have had a role in the coronavirus's originseveral of the team's computational tools are named after. In Extended Data Fig. CNN . To obtain Uncertainty measures are shown in Extended Data Fig. Time-measured phylogenetic reconstruction was performed using a Bayesian approach implemented in BEAST42 v.1.10.4. Using these breakpoints, the longest putative non-recombining segment (nt1,88521,753) is 9.9kb long, and we call this region NRR2. Evolutionary rate estimation can be profoundly affected by the presence of recombination50. Google Scholar. Evol. Zhang, Y.-Z. Microbiol. 2). In case of DRAGEN COVID Lineage tool, the minimum accepted alignment score was set to 22 and results with scores <22 were discarded. 56, 152179 (1992). Biol. In outbreaks of zoonotic pathogens, identification of the infection source is crucial because this may allow health authorities to separate human populations from the wildlife or domestic animal reservoirs posing the zoonotic risk9,10. 31922087). 92, 433440 (2020). Now, the two researchers used genomic sequencing to compare the DNA of the new coronavirus in humans with that in animals and found a 99% match with pangolins. Extensive diversity of coronaviruses in bats from China. performed Srecombination analysis. Boxes show 95% HPD credible intervals. All authors contributed to analyses and interpretations. Transparent bands of interquartile range width and with the same colours are superimposed to highlight the overlap between estimates. Means and 95% HPD intervals are 0.080 [0.0580.101] and 0.530 [0.3040.780] for the patristic distances between SARS-CoV-2 and RaTG13 (green) and 0.143 [0.1090.180] and 0.154 [0.0930.231] for the patristic distances between SARS-CoV-2 and Pangolin 2019 (orange). Biol. Internet Explorer). 53), this is inferred to have occurred before the divergence of RaTG13 and SARS-CoV-2 and thus should not influence our inferences. Researchers in the UK had just set the scientific world . For the current pandemic, the novel pathogen identification component of outbreak response delivered on its promise, with viral identification and rapid genomic analysis providing a genome sequence and confirmation, within weeks, that the December 2019 outbreak first detected in Wuhan, China was caused by a coronavirus3. A counting renaissance: combining stochastic mapping and empirical Bayes to quickly detect amino acid sites under positive selection. Are pangolins the intermediate host of the 2019 novel coronavirus (SARS-CoV-2)? In early January, the aetiological agent of the pneumonia cases was found to be a coronavirus3, subsequently named SARS-CoV-2 by an International Committee on Taxonomy of Viruses (ICTV) Study Group4 and also named hCoV-19 by Wu et al.5. 1c). Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Emerg. We extracted a similar number (n=35) of genomes from a MERS-CoV dataset analysed by Dudas et al.59 using the phylogenetic diversity analyser tool60 (v.0.5). Microbes Infect. It is RaTG13 that is more divergent in the variable-loop region (Extended Data Fig. collected SARS-CoV data and assisted in analyses of SARS-CoV and SARS-CoV-2 data. PLoS ONE 5, e10434 (2010). J. Virol. Duchene, S., Holmes, E. C. & Ho, S. Y. W. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. Wong, A. C. P., Li, X., Lau, S. K. P. & Woo, P. C. Y. Instead, similarity in codon usage metrics between the SARS-CoV-2 and eukaryotes analyzed was correlated with coding sequence GC content of the eukaryote, with more similar codon usage being identified in eukaryotes with low GC content similar to that of the coronavirus (b). A distinct name is needed for the new coronavirus. Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. N. Engl. J. Virol. 3). The key to successful surveillance is knowing which viruses to look for and prioritizing those that can readily infect humans47. We thank T. Bedford for providing M.F.B. Because these subclades had different phylogenetic relationships in regionD (Supplementary Fig. Ji, W., Wang, W., Zhao, X., Zai, J. We use three bioinformatic approaches to remove the effects of recombination, and we combine these approaches to identify putative non-recombinant regions that can be used for reliable phylogenetic reconstruction and dating. We showed that severe acute respiratory syndrome coronavirus 2 is probably a novel recombinant virus. Rev. These datasets were subjected to the same recombination masking approach as NRA3 and were characterized by a strong temporal signal (Fig. While there is evidence of positive selection in the sarbecovirus lineage leading to RaTG13/SARS-CoV-2 (ref. 4), but also by markedly different evolutionary rates. Biol. Scientists defined the pangolin lineage of this variant to be B.1.1.523 and it was originally recognized as a variant under monitoring on July 14, 2021. Duchene, S. et al. The existing diversity and dynamic process of recombination amongst lineages in the bat reservoir demonstrate how difficult it will be to identify viruses with potential to cause major human outbreaks before they emerge. Google Scholar. MC_UU_1201412). 21, 15081514 (2015). Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M. & Kawaoka, Y. Evolution and ecology of influenza A viruses. Root-to-tip divergence as a function of sampling time for non-recombinant regions NRR1 and NRR2 and recombination-masked alignment set NRA3. Avian influenza a virus (H7N7) epidemic in The Netherlands in 2003: course of the epidemic and effectiveness of control measures. Sorting these breakpoint-free regions (BFRs) by length results in two segments >5kb: an ORF1a subregion spanning nucleotides (nt) 3,6259,150 and the first half of ORF1b spanning nt13,29119,628 (sequence numbering given in Source Data, https://github.com/plemey/SARSCoV2origins). The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Phylogenetic trees and exact breakpoints for all ten BFRs are shown in Supplementary Figs. The red and blue boxplots represent the divergence time estimates for SARS-CoV-2 (red) and the 2002-2003 SARS-CoV (blue) from their most closely related bat virus, with the light- and dark-colored versions based on the HCoV-OC43 and MERS-CoV centered priors, respectively. Divergence time estimates based on the HCoV-OC43-centred rate prior for the separate BFRs (Supplementary Table 3) show consistency in TMRCA estimates across the genome. The virus then. 725422-ReservoirDOCS). Google Scholar. Lam, H. M., Ratmann, O. We aimed to analyze 3 naso-oropharyngeal swab samples collected between August and December 2021 to describe the amino acid changes present in the sequence reads that may have a role in the emergence of new . Menachery, V. D. et al. Evol. PubMedGoogle Scholar. To begin characterizing any ancestral relationships for SARS-CoV-2, NRRs of the genome must be identified so that reliable phylogenetic reconstruction and dating can be performed. Anderson, K. G., Rambaut, A., Lipkin, W. I., Holmes, E. C. & Garry, R. F. The proximal origin of SARS-CoV-2. J. Med. You are using a browser version with limited support for CSS. Of importance for future spillover events is the appreciation that SARS-CoV-2 has emerged from the same horseshoe bat subgenus that harbours SARS-like coronaviruses. Virus Evol. The web application was developed by the Centre for Genomic Pathogen Surveillance. stand-alone pangolin work flows or Illumina DRAGEN COVID Lineage App (v3.5.5) following the default parameters. Mol. All three approaches to removal of recombinant genomic segments point to a single ancestral lineage for SARS-CoV-2 and RaTG13. 2 Lack of root-to-tip temporal signal in SARS-CoV-2. Trends Microbiol. Unlike other viruses that have emerged in the past two decades, coronaviruses are highly recombinogenic14,15,16. Chernomor, O. et al. We infer time-measured evolutionary histories using a Bayesian phylogenetic approach while incorporating rate priors based on mean MERS-CoV and HCoV-OC43 rates and with standard deviations that allow for more uncertainty than the empirical estimates for both viruses (see Methods). Press, 2009). Preprint at https://doi.org/10.1101/2020.04.20.052019 (2020). In our second stage, we wanted to construct non-recombinant regions where our approach to breakpoint identification was as conservative as possible. However, the coronavirus isolated from pangolin is similar at 99% in a specific region of the S protein, which corresponds to the 74 amino acids involved in the ACE (Angiotensin Converting Enzyme . Software package for assigning SARS-CoV-2 genome sequences to global lineages. Lond. With horseshoe bats currently the most plausible origin of SARS-CoV-2, it is important to consider that sarbecoviruses circulate in a variety of horseshoe bat species with widely overlapping species ranges57. Mol. Specifically, progenitors of the RaTG13/SARS-CoV-2 lineage appear to have recombined with the Hong Kong clade (with inferred breakpoints at 11.9 and 20.8kb) to form the CoVZXC21/CoVZC45-lineage. Bioinformatics 28, 32483256 (2012). Wang, H., Pipes, L. & Nielsen, R. Synonymous mutations and the molecular evolution of SARS-Cov-2 origins. A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. 3). Current sampling of pangolins does not implicate them as an intermediate host. N. China corresponds to Jilin, Shanxi, Hebei and Henan provinces, and the N. China clade also includes one sequence sampled in Hubei Province in 2004. Med. Extended Data Fig. BEAGLE 3: improved performance, scaling, and usability for a high-performance computing library for statistical phylogenetics. Because 3SEQ identified ten BFRs >500nt, we used GARDs (v.2.5.0) inference on 10, 11 and 12 breakpoints. Nature 503, 535538 (2013). Hon, C. et al. Nature 579, 270273 (2020). & Andersen, K. G. Pandemics: spend on surveillance, not prediction. Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. Aiewsakun, P. & Katzourakis, A. Time-dependent rate phenomenon in viruses. A single 3SEQ run on the genome alignment resulted in 67 out of 68sequences supporting some recombination in the past, with multiple candidate breakpoint ranges listed for each putative recombinant. MERS-CoV data were subsampled to match sample sizes with SARS-CoV and HCoV-OC43. The unsampled diversity descended from the SARS-CoV-2/RaTG13 common ancestor forms a clade of bat sarbecoviruses with generalist propertieswith respect to their ability to infect a range of mammalian cellsthat facilitated its jump to humans and may do so again. Biol. A., Filip, I., AlQuraishi, M. & Rabadan, R. Recombination and lineage-specific mutations led to the emergence of SARS-CoV-2. 84, 31343146 (2010). TMRCA estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent for the different data sets and different rate priors in our analyses. SARS-CoV-2 and RaTG13 are also exceptions because they were sampled from Hubei and Yunnan, respectively. Complete genome sequence data were downloaded from GenBank and ViPR; accession numbers of all 68sequences are available in Supplementary Table 4. Sarbecovirus, HCoV-OC43 and SARS-CoV data were assembled from GenBank to be as complete as possible, with sampling year as an inclusion criterion. We focused on these three non-recombining regions/alignments for divergence time estimation; this avoids inappropriate modelling of evolutionary processes with recombination on strictly bifurcating trees, which can result in different artefacts such as homoplasies that inflate branch lengths and lead to apparently longer evolutionary divergence times. Even before the COVID-19 pandemic, pangolins have been making headlines. The inset represents divergence time estimates based on NRR1, NRR2 and NRA3. From this perspective, it may be useful to perform surveillance for more closely related viruses to SARS-CoV-2 along the gradient from Yunnan to Hubei. PubMed Evol. A., Lytras, S., Singer, J. Phylogenies of subregions of NRR1 depict an appreciable degree of spatial structuring of the bat sarbecovirus population across different regions (Fig. Subsequently a bat sarbecovirusRaTG13, sampled from a Rhinolophus affinis horseshoe bat in 2013 in Yunnan Provincewas reported that clusters with SARS-CoV-2 in almost all genomic regions with approximately 96% genome sequence identity2. Virus Evol. Holmes, E. C., Dudas, G., Rambaut, A. Bioinformatics 22, 26882690 (2006). Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. Removal of five sequences that appear to be recombinants and two small subregions of BFRA was necessary to ensure that there were no phylogenetic incongruence signals among or within the three BFRs. Kosakovsky Pond, S. L., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. W. Automated phylogenetic detection of recombination using a genetic algorithm. Correspondence to Pangolin was developed to implement the dynamic nomenclature of SARS-CoV-2 lineages, known as the Pango nomenclature. It is clear from our analysis that viruses closely related to SARS-CoV-2 have been circulating in horseshoe bats for many decades. Divergence time estimates based on the three regions/alignments where the effects of recombination have been removed. Trova, S. et al. 5). Nucleotide positions for phylogenetic inference are 147695, 9621,686 (first tree), 3,6259,150 (second tree, also BFR B), 9,26111,795 (third tree, also BFR C), 12,44319,638 (fourth tree) and 23,63124,633, 24,79525,847, 27,70228,843 and 29,57430,650 (fifth tree). Despite the high frequency of recombination among bat viruses, the block-like nature of the recombination patterns across the genome permits retrieval of a clean subalignment for phylogenetic analysis. The assumption of long-term purifying selection would imply that coronaviruses are in endemic equilibrium with their natural host species, horseshoe bats, to which they are presumably well adapted. Boni, M. F., Zhou, Y., Taubenberger, J. K. & Holmes, E. C. Homologous recombination is very rare or absent in human influenza A virus. RegionsAC had similar phylogenetic relationships among the southern China bat viruses (Yunnan, Guangxi and Guizhou provinces), the Hong Kong viruses, northern Chinese viruses (Jilin, Shanxi, Hebei and Henan provinces, including Shaanxi), pangolin viruses and the SARS-CoV-2 lineage. Ge, X. et al. Consistent with this, we estimate a concomitantly decreasing non-synonymous-to-synonymous substitution rate ratio over longer evolutionary timescales: 1.41 (1.20,1.68), 0.35 (0.30,0.41) and 0.133 (0.129,0.136) for SARS, MERS-CoV and HCoV-OC43, respectively. Unfortunately, a response that would achieve containment was not possible. Nature 583, 282285 (2020). Biol. performed recombination analysis for non-recombining regions1 and 2, breakpoint analysis and phylogenetic inference on recombinant segments. The Pango dynamic nomenclature is a popular system for classifying and naming genetically-distinct lineages of SARS-CoV-2, including variants of concern, and is based on the analysis of complete or near-complete virus genomes. This is notable because the variable-loop region contains the six key contact residues in the RBD that give SARS-CoV-2 its ACE2-binding specificity27,37. At present, we analyzed the diversity of SARS-CoV-2 viral genomes in India to know the evolutionary patterns of viruses in the country through their pangolin lineage and GISAID-Clade. Its genome is closest to that of severe acute respiratory syndrome-related coronaviruses from horseshoe bats, and its receptor-binding domain is closest to that of pangolin viruses. Next, we (1) collected all breakpoints into a single set, (2) complemented this set to generate a set of non-breakpoints, (3) grouped non-breakpoints into contiguous BFRs and (4) sorted these regions by length. Smuggled pangolins were carrying viruses closely related to the one sweeping the world, say scientists. the development of viral diversity. Evol. Get the most important science stories of the day, free in your inbox. Intragenomic rearrangements involving 5-untranslated region segments in SARS-CoV-2, other betacoronaviruses, and alphacoronaviruses, Crystal structure of the CoV-Y domain of SARS-CoV-2 nonstructural protein 3, Association of underlying comorbidities and progression of COVID-19 infection amongst 2586 patients hospitalised in the National Capital Region of India: a retrospective cohort study, Molecular characterization of horse nettle virus A, a new member of subgroup B of the genus Nepovirus, Molecular phylogeny of coronaviruses and host receptors among domestic and close-contact animals reveals subgenome-level conservation, crossover, and divergence. GitHub - cov-lineages/pangolin: Software package for assigning SARS-CoV-2 genome sequences to global lineages.