No evidence that coronavirus genetic sequences were fabricated, contrary to preprint by Li-Meng Yan and colleagues
There is no evidence supporting the claim by Yan et al. that genetic sequences of several coronaviruses were fabricated to support the hypothesis that SARS-CoV-2 arose naturally. The presence of highly similar or identical gene and protein sequences are common among organisms that are evolutionarily related to each other. Therefore, it is expected that members of the coronavirus family share similar or identical genetic or protein features. Scientific evidence supports the hypothesis that the virus arose naturally in wildlife before it crossed over to humans.
Inadequate support: The preprint by Yan et al. offers no evidence to support their claim that the genetic sequences of other coronavirus strains were fabricated to support the hypothesis that SARS-CoV-2 arose naturally.
Incorrect: The fact that multiple coronavirus strains share highly similar or identical genetic or protein sequences is not evidence that those viruses were fabricated. Shared genetic or protein sequences is common among viruses that belong to the same family and indicates their evolutionary relatedness.
Uncertainty surrounding the origin of the novel coronavirus has provided fertile ground for breeding conspiracy theories, some of which Health Feedback previously found to be inaccurate and unsubstantiated (see here and here). The recent claim by virologist Li-Meng Yan that the SARS-CoV-2 virus is manmade is the latest in a long series of conspiracy theories stretching back to the beginning of the coronavirus pandemic.
On 14 September 2020, Yan and her colleagues published a preprint on the online repository Zenodo claiming that the SARS-CoV-2 virus is a product of genetic engineering. A preprint is a research paper that has not been peer-reviewed by other scientists yet. Experts who examined the preprint found it was highly flawed and provided no supporting evidence for their claims, as detailed in this Health Feedback review.
Yan et al. published a second preprint on 8 October 2020 claiming that the virus is an “unrestricted bioweapon” and alleging that the genetic sequences of ten other coronaviruses are fabricated and do not exist in nature. Contrary to this claim, these ten coronaviruses, including RaTG13—which is the closest known relative to SARS-CoV-2 and has about 96% genome sequence identity to SARS-CoV-2—and some pangolin coronaviruses, were analyzed by other scientists and found to support the natural origin hypothesis for SARS-CoV-2[2-7]. The second preprint from Yan et al. received more than 130,000 views on Zenodo since it was published, and was promoted by outlets known for publishing misinformation, such as Zero Hedge and National Pulse.
The alleged motivation for fabricating genetic sequences is related to one of the primary claims by Yan et al., specifically that the bat coronaviruses ZC45 and ZXC21 provided the genetic backbone for SARS-CoV-2. In support of this claim, Yan et al. point to the 100% identity in the envelope (E) protein sequence that exists between these three viruses. The E protein is a small protein on the surface of the membrane that encloses the viral genome and is important for producing virus particles that can efficiently infect cells.
Firstly, the claim that the bat coronaviruses ZC45 and ZXC21 provided the genetic backbone to artificially create SARS-CoV-2 was presented in the first preprint by Yan et al. This claim was debunked by scientists, who pointed out that the genetic sequences of ZC45 and ZXC21 are very different to that of SARS-CoV-2. In fact, the virus ZC45 is “only 89% related to SARS-CoV-2,” said Stanley Perlman, a professor at the University of Iowa who studies coronaviruses, in this FactCheck.org article:
“Perlman said it would be nearly impossible to make the reverse genetics system needed to manipulate the virus and ‘changing its sequence to arrive at SARS-CoV-2 would be virtually impossible since it would not be known how to manipulate the virus.’”
Kristian Andersen, a professor at Scripps Research who studies the evolution of viruses including SARS-CoV-2, also pointed out the incongruency of the claim on Twitter: “This simply can’t be true – there are more than 3,500 nucleotide differences between SARS-CoV-2 and these viruses.”
Marvin Reitz, a virologist at the University of Maryland, put it more bluntly in his review of the first preprint: “[I]t still would require more than 3,000 nucleotide substitutions [for ZC45] to become SARS-CoV-2. This is not even slightly credible; it beggars reason.”
A response by scientists at the Johns Hopkins University Center for Health Security also provides a detailed rebuttal of the claims made by Yan et al. in their first preprint. It also highlights the implausible use of ZC45 and ZXC21 as the genetic backbone for SARS-CoV-2.
In short, ZC45 and ZXC21 are very different from SARS-CoV-2 in terms of genome identity. Altering a backbone from either of the two to transform it into the genome of SARS-CoV-2 would require a feat of genetic engineering that is extremely difficult, if not impossible, to accomplish with current technology.
Based on their spurious initial assumption that ZC45 and ZXC21 provided the genetic backbone for SARS-CoV-2, Yan et al. claim that the genetic sequences of RaTG13 and the other coronaviruses were fabricated to obscure the link between SARS-CoV-2 and ZC45/ZXC21, and that RaTG13 and the other coronaviruses do not exist. To support this claim, they point to the observation that all these viruses also have an E protein sequence that is 100% identical to that of ZC45 and ZXC21.
The argument by Yan et al. that the genetic sequences of some coronaviruses were fabricated to support the hypothesis that SARS-CoV-2 arose naturally does not hold up to scrutiny. In a Business Insider interview, Emma Hodcroft, a postdoctoral fellow at the University of Basel and co-developer of the Nextstrain project that studies the evolution of pathogens, including SARS-CoV-2, pointed out that “most of the samples that Yan’s group says are fake predate the start of the pandemic.” Hodcroft also explained:
“‘This accusation implies there were years of coordination and fake sequence generation,” Hodcroft said, adding: “This is an incredible claim, and would require a significant evidence burden to back it up, which is missing from the paper.’“
Virologists have also analyzed the genome sequence of RaTG13 and found it to be authentic and supported by good-quality data.
Although some coronaviruses share certain identical genetic sequences with SARS-CoV-2, this is not evidence that the other coronaviruses were fabricated. Instead, similar or identical genetic and protein sequences of coronaviruses are evidence of their evolutionary relatedness, which is expected since these viruses all belong to the coronavirus family. Specifically, the E protein sequence of SARS-CoV-2, RaTG13, and the other coronaviruses analyzed in the preprint by Yan et al. are indeed identical to that of ZC45 and ZXC21, but this in itself does not indicate that the RaTG13 and the other coronaviruses were fabricated to mimic the E protein sequence of ZC45 and ZXC21.
Lastly, both preprints by Yan and her co-authors exhibit certain features of concern, one of which is the listing of their affiliations as the Rule of Law Society and the Rule of Law Foundation. These two organizations have no prior experience in conducting biological research and are linked to Stephen Bannon and Wengui Guo, both of whom have published COVID-19 misinformation in the past. In addition, Yan’s co-authors used pseudonyms in the preprints, according to this CNN report:
“Yan’s three co-authors in both papers — Shu Kang, Jie Guan and Shanchang Hu — are pseudonyms, a source told CNN. It’s a practice that is highly unusual in such research and generally discouraged due to the resulting lack of accountability and transparency, experts told CNN. The source didn’t know why the use of pseudonyms wasn’t disclosed in the papers.”
Overall, the claims in the second preprint by Yan and her colleagues are as ill-founded as the claims made in their first preprint. Evidence supporting claims that the virus was engineered is lacking. In contrast, scientific analyses support the hypothesis that SARS-CoV-2 arose naturally in wildlife before crossing over to humans during a zoonotic infection (transmission of pathogens from animals/insects to humans). There are numerous examples of emerging zoonotic pathogens causing disease outbreaks throughout human history and across the world.
Health Feedback published an Insight article analyzing the evidence for different hypotheses regarding the origins of the coronavirus.
UPDATE (28 Oct. 2020):
This review was updated to include additional information regarding the use of pseudonyms by Yan’s co-authors, as reported in this 21 October CNN article.
- 1 – Zhou et al. (2020) A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature.
- 2 – Andersen et al. (2020) The proximal origin of SARS-CoV-2. Nature Medicine.
- 3 – Zhou et al. (2020) A Novel Bat Coronavirus Closely Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site of the Spike Protein. Current Biology.
- 4 – Boni et al. (2020) Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nature Microbiology.
- 5 – Lam et al. (2020) Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Nature.
- 6 – Liu et al. (2020) Are pangolins the intermediate host of the 2019 novel coronavirus (SARS-CoV-2)? PLoS Pathogens.
- 7 – Xiao et al. (2020) Isolation of SARS-CoV-2-related coronavirus from Malayan pangolins. Nature.
- 8 – Schoeman and Fielding. (2019) Coronavirus envelope protein: current knowledge. Virology Journal.
- 9 – Morens and Fauci. (2020) Emerging Pandemic Diseases: How We Got to COVID-19. Cell.