February 21, 2020

524 words 3 mins read



Nextstrain build for novel coronavirus (nCoV)

repo name nextstrain/ncov
repo link https://github.com/nextstrain/ncov
homepage https://nextstrain.org/ncov
language Python
size (curr.) 2191 kB
stars (curr.) 461
created 2020-01-19
license MIT License

This is a Nextstrain build for novel coronavirus (nCoV), visible at nextstrain.org/ncov.


The nCoV genomes were generously shared via GISAID. We gratefully acknowledge the Authors, Originating and Submitting laboratories of the genetic sequence and metadata made available through GISAID on which this research is based. For a full list of attributions please see the metadata file.

Situation Report Translations

We welcome translations of the situation reports (narratives) into languages other than English (in particular to countries affected by the outbreak), and have been very impressed with the contributions provided so far. Please get in touch if you can help.

We suggest creating a branch for each language after the each release of the English version. Unfortunately this means that the changes are not visible through nextstrain.org until release, but we are working on improving this.

The situation reports are generated from Markdown files (such as this one for 2020-01-25).

Current translations:

Language Translator(s) Latest version released
Mandarin Alvin X. Han, Fengjun Zhang, Wei Ding 2020-01-30
Spanish Ch. Julian Villabona-Arenas 2020-01-30
Portuguese Glaucio Santos, Anderson Brito 2020-01-30
French Etienne Simon-Lorière, Pierre Barrat-Charlaix 2020-01-30
German Vielen Dank, Nicola Müller, Richard Neher 2020-01-30
Russian Ivan Aksamentov, Vadim Puller 2020-01-30


We welcome contributions from the community to make this effort as useful as possible to as many people as possible. If you spot errors or inaccuracies, please file an issue or make a pull request. Or get in touch over email at hello@nextstrain.org or on Twitter at @nextstrain.


In order to run the Nextstrain build you must provision data/sequences.fasta and data/metadata.tsv. nCoV genomes are not included as part of this repo as many of them are protected by the terms of GISAID sharing. These genomes will need to be supplemented by the user. To do so, register for an account at GISAID and then navigate to EpiCoV and click “Download”. This should result in the file gisaid_cov2020_sequences.fasta. Move this file to ncov/data/gisaid_cov2020_sequences.fasta and then run

./scripts/normalize_gisaid_fasta.sh data/gisaid_cov2020_sequences.fasta data/sequences.fasta

This should be everything needed to have data for the build process described below as metadata for these viruses already exists in data/metadata.tsv.

After this, the entire build can be regenerated by running

snakemake -p

with a local Nextstrain installation or by running

nextstrain build .

with a containerized Nextstrain installation.

The resulting output JSON at auspice/ncov.json can be visualized by running auspice view --datasetDir auspice or nextstrain view auspice/ depending on local vs containerized installation.

This requires Augur version >=6.3.0, released Feb 13, 2020.

Genomic epidemiology notes

Site numbering and genome structure uses Wuhan-Hu-1/2019 as reference. The phylogeny is rooted relative to early samples from Wuhan. Temporal resolution assumes a nucleotide substitution rate of 0.8 × 10^-3 subs per site per year. There were SNPs present in the nCoV samples in the first and last few bases of the alignment that were masked as likely sequencing artifacts.

Developer notes

API access to the evolutionary analysis that powers the visualization at nextstrain.org/ncov is available at http://data.nextstrain.org/ncov.json. Schema information here.

comments powered by Disqus