Short identical gene sequence in SARS-CoV-2 and a gene sequence patented by Moderna can be found in other organisms; not evidence that virus was engineered
The claim is based on the observation that the spike protein of SARS-CoV-2 and a genetic sequence modified and patented by the pharmaceutical company Moderna share a 19-nucleotide long segment. However, this sequence isn’t unique nor is it a feature specific to manmade sequences. It can be found in other living things, showing that the sequence occurs naturally. The presence of a short, identical gene sequence isn’t evidence that the virus was engineered.
Misleading: The presence of a short, identical sequence in a patented gene sequence and the SARS-CoV-2 virus on its own isn’t evidence that the virus originated from a laboratory. In fact, this sequence can be found in other living things, showing that it occurs naturally.
Claims that the virus responsible for COVID-19, SARS-CoV-2, was engineered in a lab went viral again in February 2022, after a Daily Mail article reported the findings of a study published in the journal Frontiers of Virology on 22 February 2022.
The Daily Mail article received more than 40,000 user engagements on Facebook. Others, such as the Facebook page for the political activist group Young Americans for Liberty and the website Not the Bee, promoted the Daily Mail article, serving as a springboard for such claims by social media users.
The study reported that a small snippet of the gene for the SARS-CoV-2 spike protein was a 100% match with a segment of a modified human gene sequence patented about three years ago by Moderna. The researchers wrote that “The absence of [the sequence showing a 100% match] from any eukaryotic or viral genome in the BLAST database makes recombination in an intermediate host an unlikely explanation for its presence in SARS-CoV-2”. They postulated that the sequence was acquired by a SARS-like coronavirus when it infected human cell cultures grown in a laboratory, thus producing SARS-CoV-2. This hypothesis paved the way for renewed claims that the virus was engineered in a lab or was leaked from a lab.
In its headline, the Daily Mail, in the fashion of “just asking questions”, implied that the match is evidence that the virus SARS-CoV-2 is manmade, writing of the study: “More evidence Covid was tinkered with in a lab?” As we will explain below, this is misleading and fails to account for the evidence we already have that runs counter to the claim that SARS-CoV-2 was engineered in a lab.
What the study reported
Moderna’s patent US 9587003, which was filed in 2017, can be found here, and the complete sequence of the modified gene, given the number 11652, is shown here on GenBank, a database of nucleic acid sequences maintained by the U.S. National Center for Biotechnology Information (NCBI). According to the study, the modified gene is based on a human gene named MSH3, which is involved in DNA repair. This is unsurprising, as the title of Moderna’s patent signaled that they intended to use the modified gene sequences in cancer research.
The Moderna patent in question.
The sequence identified in the study, CTCCTCGGCGGGCACGTAG, is 19 nucleotides long. Nucleotides are the building blocks of nucleic acids like DNA and RNA. The authors stated that they didn’t find this sequence in any eukaryotic or viral genomes, except SARS-CoV-2. The word “eukaryote” refers to organisms, such as animals and plants, whose cells contain a nucleus and other membrane-bound organelles.
What made it appear more intriguing to the authors was the fact that the sequence occurred in the part of the spike protein known as the furin cleavage site (FCS), a short sequence of amino acids that can improve the virus’ infectivity of human cells. The presence of an FCS is not unique to SARS-CoV-2; other coronaviruses found in the wild, including MERS-CoV, also possess an FCS. The particular FCS in SARS-CoV-2 comprises a short sequence of amino acids in this order (table of amino acid abbreviations): PRRARSV (see Figure 1).
Figure 1. Protein sequence alignments between the spike protein of SARS-CoV-2 and the spike protein of different coronaviruses isolated in the wild.
The identical sequence detected in SARS-CoV-2 and the modified MSH3 gene occurs in nature and can be found in other animals
Overall, the claim seems to be founded on the belief that because the sequence in the spike protein of SARS-CoV-2 was identical to a manmade gene sequence, the sequence couldn’t have occurred by chance, and must therefore have been designed.
However, as scientists showed using the same search tools as the authors, this 19-nucleotide long sequence occurs naturally in other living things. For example, the sequence is present in eukaryotes, like a species of birds, contrary to the authors’ statement that it cannot be found in eukaryotes. This raises the question of whether the authors simply failed to check for matches to other organisms.
It's cleary just a coincidence. Anybody willing to join me in writing a paper about the "highly unusual" 19nt perfect match between the SARS-CoV-2 FCS insert and the chimney swift Chaetura pelagica APOPT1 mRNA? I'm sure this would make a great case for some mumbo jumbo conspiracy pic.twitter.com/SMJ5XvDLgM
— Marco Gerdol (@MGerdol) February 22, 2022
It is also present in a bacterium, although bacteria aren’t eukaryotes:
And here is an identical blast match to a sequence in a mycobacterium. These are not 'unique patented sequences' that are not found in nature. pic.twitter.com/rkUtL3OLAo
— Aris Katzourakis (@ArisKatzourakis) February 24, 2022
In short, the 19-nucleotide sequence isn’t unique to the modified MSH3 gene patented by Moderna and isn’t uniquely manmade, as it can occur in nature.
Using the NCBI’s BLAST tool, we can also find genes from other organisms that are highly similar to the modified MSH3 gene patented by Moderna, such as the chinchilla (see Figure 2).
Figure 2. BLAST results for sequences producing significant similarity to sequence 11652 patented by Moderna (GenBank accession number KH664781.1). Note the column Percent Identity. Among top hits are the human MSH3 gene.
In summary, the assumption that the most likely explanation for the FCS in SARS-CoV-2 is that it was derived from the modified MSH3 gene isn’t supported by the evidence, given that this particular sequence already occurs in many different living things naturally.
In addition, the 19-nucleotide sequence doesn’t produce amino acids corresponding to an FCS in the context of the modified MSH3 DNA, as pointed out by bioinformatician Moreno Colaiacovo.
Moreover, if we translate the sequence using a tool like ExPASy translate, we notice that the short sequence does not encode for PRRAR in this protein, but rather YVPAE (which is not a furin cleavage site). The translation is from 5' to 3' in frame 1. 5/ pic.twitter.com/ARqQmIY0Yj
— Moreno Colaiacovo 🇺🇦 (@emmecola) December 8, 2021
This is because the 19-nucleotide sequence is present on the reverse complement strand of the DNA. Double-stranded DNA comprises two strands of DNA, which are complementary to each other and run in opposite directions (see Figure 3 below). The two ends of the strand are labeled 5’ and 3’. The production of protein from mRNA, which codes for the instructions from DNA, is based on the sequence of the sense strand (the strand running from 5’ to 3’), not the reverse complement (antisense) strand (3’ to 5’ strand). This is why the 19-nucleotide sequence, which is on the reverse complement strand, isn’t relevant to protein production and doesn’t lead to a furin cleavage site.
Figure 3. Illustration of transcription (production of mRNA from DNA). By the U.S. National Human Genome Research Institute. Note that the sequence of the RNA transcript is identical to that of the sense strand, except for the replacement of thymine (T) by uracil (U), as is typical of RNA.
This isn’t the first time that a very brief nucleic acid sequence in SARS-CoV-2, which happened to be identical to already-known genetic sequences, was used to support claims that the virus was engineered in the laboratory. In 2020, a similar false claim that SARS-CoV-2 contained sequences from the human immunodeficiency virus was also made on the same spurious basis. Scientists cautioned that such short, identical sequences are commonly shared by many living things, and on their own don’t provide evidence of lab engineering.
The evidence available so far hasn’t shown signs that the virus was engineered. For starters, there isn’t a known coronavirus that is genetically similar enough to SARS-CoV-2 to be a plausible candidate for genetic modification. The closest known relative to SARS-CoV-2, a bat coronavirus named RaTG13, is 96% identical to SARS-CoV-2. But in evolutionary terms, even this level of similarity would still require RaTG13 to undergo decades of evolution in order to produce SARS-CoV-2. In addition, there are also numerous technical obstacles that would have to be involved in an attempt at genetic modification. Experts consider a natural origin for the virus to be the most likely. At the moment, there isn’t sufficient evidence to rule out the possibility that SARS-CoV-2 is a naturally-occurring virus that leaked from a lab, and scientists are still working to determine the origin of SARS-CoV-2.
- 1 – Ambati et al. (2022) MSH3 Homology and Potential Recombination Link to SARS-CoV-2 Furin Cleavage Site. Frontiers in Virology.
- 2 – Zhou et al. (2020) A Novel Bat Coronavirus Closely Related to SARS-CoV-2 Contains Natural Insertions at the S1/S2 Cleavage Site of the Spike Protein. Current Biology.
- 3 – Holmes et al. (2021) The origins of SARS-CoV-2: A critical review. Cell.
- 4 – Zhou et al. (2020) A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature.