Scientists around the world are racing against time to find a cure or at least an effective treatment for COVID-19 and there are huge amounts of money being spent to achieve that. However, before fighting your enemy you must know them, and that’s why genome sequencing is so important these days.
COVID-19 is a disease caused by the SARS-CoV-2 virus, is a strain of the severe acute respiratory syndrome-related coronavirus (SARS-CoV) which are viruses of zoonotic origins. It is a a positive-sense single-stranded RNA virus(or (+)ssRNA virus), which means that it’s composed by a single strand of RNA coded from 5’ to 3’ (just like the majority of living things in this world) and it mainly enters human cells through the angiotensin converting enzyme 2(ACE2) receptor.
During the pandemics, several SARS-CoV-2 genomes have been sequenced in order to understand the virus mutation rate, the different strains and which would be the best molecular targets to aim when designing a new treatment. Genome sequencing is a time and resource consuming activity that unveils the nucleotide sequence of a given sample (in this case, the virus) and allows it to analyze all the potentially encoded proteins within the viral RNA.
The length of the SARS-CoV genome is over 30 Kb, in other words, more than 30 million nucleotides. This entails that a genome sequence project should have to guess the identity of each the nucleotides composing the genome but also the way they are ordered! Moreover, the current sequencing technology allows for determining only 200 – 3000 nucleotides per run (these segments are called reads), which means that the task is analogous to having an encyclopedia divided in paper strips of more of less 100 words and trying to recreate it from scratch.
There are two ways of overcoming this giant biological jigsaw puzzle:
The first one involves having a reference genome beforehand to help the assembly of the thousands of reads obtained in the sequencing steps, in the case of the SARS-CoV-2, fortunately, other similar viruses had been sequenced before, therefore it was reasonable to use those genomes as templates to map the new reads.
However, when no similar organism genome is available, the task becomes significantly harder, because there are no template to complete the puzzle. In those cases, a de novo assembly is attempted, which involves the matching of overlapping ends of each read and form contigs. Several contigs are then ordered to form scaffolds, which are the best approximation to a whole genome that this process can achieve.
Finally, when several genomes are sequenced, the sequencing processes for the next genomes become easier and easier, because several templates are available for guiding the subsequent projects. That’s why it’s crucial that research groups make their genome assemblies publicly available, in order to contribute to the massive endeavor that science worldwide is undertaking right now.
By Fabian Hernandez Portunus Co-founder – Biomedicine researcher, BS Pharmacy and Bioinformatics MSc student. Fabián’s focus is on molecular biology and biological data analysis.