The examination of SARS-CoV-2 sequencing data has resulted in numerous key findings about this virus, and additional sequence data from specimens from all around the world will be required to establish effective techniques for managing and eliminating COVID-19 infections. Scientists from across the world are working to develop and disseminate this essential knowledge, which will be applied to the diagnosis and control of diseases.
Characteristics of the Genomes of Beta coronaviruses and SARS-CoV-2
The subfamily Coronavirinae of the family Coronaviridae contains the enclosed, positive-sense, single-stranded RNA viruses known as beta coronaviruses. These viruses’ genomes, which range in size from 27 to 32 kb, are the biggest among RNA viruses. Each genome contains polyproteins, and these polyproteins go through a process called proteolysis, which results in the production of nonstructural proteins with a variety of functions. These functions include viral proteases (3CL, PL) and RNA-dependent RNA polymerase (RdRP), both of which are essential for transcription and replication. This ground-breaking research, which was first published in Cell at the beginning of the year 2020, analyzes the method of viral gene expression as well as the architecture of the SARS-CoV-2 transcriptome.
Before the discovery of SARS-CoV-2, human beta coronaviruses comprised endemic human coronaviruses producing respiratory tract illnesses (such as OC43 and HKU1) and pandemic human coronaviruses. It is believed that the Middle East Respiratory Syndrome coronavirus or MERS-CoV, and the Severe Acute Respiratory Syndrome coronavirus or SARS-CoV, jumped from animals to humans (SARS). In January 2020, every time an RNA virus was found as the principal cause of the illness that would eventually be dubbed COVID-19, medical professionals quickly sequenced the genome of the virus in question. The virus had a sequence identity of 79.0 percent with SARS-CoV and a sequence identity of 86.7 percent to 89 percent with SARS-like coronaviruses originating in bats, but only 50 percent with MERS-CoV. The International Committee on the Taxonomy of Viruses (ICTV) assigned the name SARS-CoV-2 to the new virus, even though this implies that bats are a viral reservoir, the ecological distance between bats and humans suggests that other mammalian species may have served as “intermediate” or “amplifying” hosts.
The inherent proofreading mechanism of coronaviruses is one of their most impressive characteristics and the reproduction mechanism of RNA viruses often has a high mistake rate, resulting in quasispecies – a population of viruses that share the same host but have acquired distinct genetic alterations due to replication errors. Even so, with a proofreading function called protein 14 (nsp14), coronavirus still encodes a protein. It is hypothesized that this process is essential for coronaviruses due to their vast and complicated genomes. Without it, the significant mutation rates linked with RNA virus generation would have a detrimental effect on the viability of coronaviruses. Even while the mutation rate of coronaviruses (such as SARS-CoV-2) is around 10 times lower than that of other RNA viruses, these viruses are nonetheless capable of acquiring mutations as they travel from one host to another. Epidemiologists predict a mutation rate of 33 genomic alterations per year for SARS-CoV-2. The existence of these mutations in SARS-CoV-2 genomes is used by scientists to assign a history or clade to each strain. One of the programs newly released by British scientists in Nature Microbiology is the designation of a virus strain as belonging to one of two lineages (A or B), associated by numerical values predicated on phylogenetic scientific proof of onset from an ancestral lineage into a distinct geographical population.
Global Collaborations for SARS-CoV-2 Sequence Data Collection and Analysis
After the publication of the first SARS-CoV-2 genome, scientists throughout the world quickly grasped the urgent need to collect as much genetic information on as many SARS-CoV-2 strains as practicable. At the onset of the pandemic, several research organizations attempted to establish their techniques to extract SARS-CoV-2 sequencing data from positive culture or clinical specimens. Multiple strategies have been employed. The Advancing Real-Time Infection Control Network, also known as ARTIC, was successful in developing a technique for SARS-CoV-2 whole-genome sequencing (WGS) by utilizing the sequencing platforms made available by Oxford Nanopore Technologies which helped to streamline the sequencing process. Since then, the methodology has been extended for different sequencing platforms, allowing other researchers to examine the virus’s genome.
During a pandemic, pathogen sequencing data must always be made accessible to the public through databases. To enhance public health and research choices, the World Health Organization (WHO) aggressively promotes accessibility to sequencing data during outbreaks. GISAID hosts one of the largest curated international sources of SARS-CoV-2 sequencing data. As of September 2020, as shown on the GSAID SARS-CoV-2 Genomic Epidemiology (EpiCov) platform, more than one hundred thousand full SARS-CoV-2 genomic sequences and crucial contextual information (metadata) connected with each genome have been uploaded and shared.
NCBI remains to lead initiatives to make SARS-CoV-2 sequencing data accessible and sharable in the United States. Through the NCBI SARS-CoV-2 Resources Page, researchers have the ability to rapidly submit completed or incomplete SARS-CoV-2 sequencing data to either the GenBank or Sequence Read Archive (SRA) Databases. In addition to that, the AMD program at the CDC was responsible for kicking up the SARS-CoV-2 Sequencing for Public Health Emergency Response, Epidemiology, and Surveillance effort, which is also known by its acronym SPHERES. This nationwide collaboration provides a forum for public health organizations and other stakeholders to address developments in genomic epidemiology methodologies and processes, problems, and other topics. Certain states in the United States also have a network of public health laboratories that collaborate to share sequencing data and track the spread of SARS-CoV-2 strains.
Using WGS Data to Modify COVID-19 Diagnostics and Create Therapeutics
The SARS-CoV-2 sequencing data enables scientists to build novel targets for molecular assays and watch the trends of changes that may lower the sensitivity of existing tests. For instance, GISAID routinely analyzes popular diagnostic primers against high-quality genomes in the repository to track mutational changes that may impact clinical diagnostic testing.
In addition, the availability of SARS-CoV-2 sequence data enables researchers to recognize potential therapeutic targets and provides a foundation for epitope mapping and modeling as well as the forecasting of immune reaction to the virus, all of which can help direct the development of therapeutics and vaccines. The WGS data has been used by scientists since the start of the pandemic for epitope mapping along with structural modeling.