covalign.pasteur.cloud Open in urlscan Pro
157.99.4.142  Public Scan

URL: https://covalign.pasteur.cloud/
Submission: On June 13 via api from US — Scanned from FR

Form analysis 0 forms found in the DOM

Text Content

Menu
 * Home
 * Run
 * Help


COVID-ALIGN - HCOV-19 GENOME ALIGNMENT

Since the emergence of the hCoV-19 virus (or SARS-CoV-2) responsible for the
COVID-19 pandemic, unprecedented efforts are taking place across the world to
sequence genomes of this virus and share the data. As of today (9/21/2020), the
GISAID (Shu et al., 2017) provides access to more than 105,000 full genomes, and
~23,000 for the NCBI and the EBI. The first genomes were sequenced in China by
the end of December 2019. Their number first increased slowly and then rapidly
when the pandemic appeared on all continents. Submissions of several thousand
sequences to GISAID in a single day has become common. Moreover, some genomes
may be submitted incomplete, with sequencing and assembly errors. These
characteristics pose major challenges to bioinformatics, notably that of
multiple sequence alignment (MSA; Chatzou et al., 2016), which is crucial for
subsequent analyses (phylogeny, transmission clusters, mutation study,
structure, etc.).

To solve this difficulty, we use a profile HMM-based approach (Durbin et al.,
1998), which is the norm for HIV (www.hiv.lanl.gov), and is particularly well
suited to hCoV-19, as its genome is highly conserved, without known
recombination in human hosts (Xiaolu et al., 2020; De Maio et al., 2020). Using
a profile, the addition of new data to an existing MSA requires linear computing
times in the number of input genomes. Moreover, profile-based MSA proved to be
very accurate (Earl et al., 2014; Nute and Warnow, 2016). This approach is
implemented in COVID-Align, which can be used thanks to a Web service and via
Docker.

We need your help to improve this web service. Please send your comments and/or
suggestions to: frederic[dot]lemoine[at]pasteur[dot]fr and
olivier[dot]gascuel[at]pasteur[dot]fr,
Evolutionary Bioinformatics unit, C3BI USR 3756, Institut Pasteur and CNRS,
Paris, France


MORE INFORMATION

 * To learn more about COVID-ALIGN, please read our help page;
 * To infer trees from aligned sequences, do not hesitate to use NGPhylogeny.fr


EXAMPLE DATASET

 * An example analysis is available here. The example is composed of 7 aligned
   sequences and the automatically added reference sequence (GISAID ID:
   EPI_ISL_402124). Amongst aligned sequences, there are five samples from Human
   hosts. Three of them are a part of clade G, related to the SNP mutation
   23403A > G : EPI_ISL_417851 from Iceland, EPI_ISL_421509 from France and
   EPI_ISL_427121 from Australia. The other two samples do not belong to the G
   clade: EPI_ISL_418955 from USA, EPI_ISL_413214 from Australia. These
   sequences were selected for having variation in the gene coding for RBD of
   Spike protein. Furthermore, there are two samples from animal hosts:
   EPI_ISL_402131 being the Bat isolate RatG13 and EPI_ISL_402131 being a sample
   from Pangolin.


REFERENCES

If you use this web service, please cite:

      Frederic Lemoine, Luc Blassel, Jakub Voznica, Olivier Gascuel,
      COVID-Align: Accurate online alignment of hCoV-19 genomes using a profile HMM,
      Bioinformatics, btaa871, https://doi.org/10.1093/bioinformatics/btaa871
    




RUN COVID-ALIGN

Run COVID-ALIGN

 * Mentions légales
 * Faites un don
 * 
 * 
 * 
 * 
 * Evolutionary Bioinformatics Unit - IP



twitter facebook linkedin youtube3