No, “HIV insertions” were not identified in the 2019 coronavirus, contrary to claims based on questionable bioinformatics study
This claim is based on a study which compared extremely short protein sequences between the 2019 novel coronavirus and HIV, a practice likely to give false positives. The study’s authors also overlooked checking for potential similarities between 2019-nCoV and other organisms. As it turned out, these short protein sequences were similar to many other organisms, not just HIV, thereby refuting the claim that 2019-nCoV contains DNA from the HIV virus.
Inaccurate: The so-called “unique” protein sequence insertions found in the 2019 novel coronavirus can be found in many other organisms, not just HIV.
Misrepresents a complex reality: The similarity between 2019-nCoV and HIV was detected using extremely short protein sequences, a practice that often gives rise to false positive results.
The claim that the 2019 novel coronavirus (2019-nCoV) contains artificially-introduced insertions from the human immunodeficiency virus (HIV), thereby proving it is man-made, went viral on Facebook and other social media platforms like Twitter in early February 2020. One of the most viral of these articles appeared on websites known to publish misinformation in the past, such as ZeroHedge and Infowars, and was written by an anonymous person under the fictitious name of Tyler Durden.
The basis for this claim is a preprint of a research study uploaded to bioRXiV on 2 February 2020. The paper, titled “Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag”, has not been peer reviewed and was soon withdrawn by the authors after the study received extensive criticism from other scientists.
Nevertheless, the paper’s findings have been used by many on social media as evidence to support the baseless assertion that 2019-nCoV is a man-made virus. The authors had reported finding “4 insertions in the spike glycoprotein (S) which are unique to 2019-nCoV and are not present in other coronaviruses”. The authors further asserted that “all of [these inserts] have identity/similarity to amino acids residues in key structural proteins of HIV-1 [which] is unlikely to be fortuitous in nature”.
However, the authors failed to check the similarity of the four insertions to the protein sequences of other organisms besides HIV, explained Aaron Irving, a virologist and senior research fellow at Duke-NUS Medical School. “As it turned out, the region is also homologous to many unrelated sequences”, Irving said. “As such, the conclusions drawn from the data are no longer valid and there are many open-ended questions regarding this region highlighted.” [See Irving’s full comment below.]
Criticism of the study’s findings by experts have been published in several other outlets. In this Forbes article, Arinjay Banerjee, a postdoctoral fellow at McMaster University who has studied coronaviruses, said that:
“The authors compared very short regions of proteins in the novel coronavirus and concluded that the small segments of proteins were similar to segments in HIV proteins. Comparing very short segments can often generate false positives and it is difficult to make these conclusions using small protein segments.”
Researchers also took to Twitter to demonstrate this problem first-hand. Trevor Bedford, a faculty member at the Fred Hutchinson Cancer Research Center who studies viral evolution, re-analyzed the gene and protein sequences used by the authors and found that the so-called “unique” inserts appeared in many other organisms, including Cryptosporidium and Plasmodium malariae, which cause cryptosporidiosis and malaria, respectively.
In summary, there is no reason to believe that these inserts are unique to either HIV or 2019-nCoV, or that their presence in 2019-nCoV indicates they were inserted from HIV. This is because the sequences analyzed by the study authors were so short that it is easy to find similarities to a wide variety of organisms. An analogy would be to search for a short word, like “train”, in a search engine and claim that the contents in the search results must be identical or similar to each other based on that one word.
Senior Research Fellow, Duke-NUS Medical School
It’s easier to believe misinformation when it is mixed with truth. The region highlighted in the pre-print is indeed an insertion in nCoV-2019 relative to its bat ancestors and indeed it has high identity to the HIV gp120/gag. However, the authors chose to align only this small region and not do a basic check on whether there were other sequences which were also homologous (showing high degree of similarity/identity). As it turned out, the region is also homologous to many unrelated sequences. As such, the conclusions drawn from the data are no longer valid and there are many open-ended questions regarding this region highlighted. I see the authors themselves agree with this criticism by other scientists and have voluntarily withdrawn their preprint pending a much deeper investigation.
Several competing hypotheses have been proposed to explain where the novel coronavirus actually came from. Health Feedback investigated the three most widespread origin stories for the novel coronavirus (engineered, lab-leak or natural infection), and examined the evidence for or against each proposed hypothesis in this Insight article.