covid19-data/covid19-data
COVID-19 workflows and datasets.
repo name | covid19-data/covid19-data |
repo link | https://github.com/covid19-data/covid19-data |
homepage | |
language | Jupyter Notebook |
size (curr.) | 1631 kB |
stars (curr.) | 27 |
created | 2020-03-05 |
license | |
2019-nCoV Data Processing Pipelines and datasets
We are looking for volunteers who want to contribute to the cleaning of the raw datasets!
This repository hosts workflows to process several data sources and cleaned datasets for COVID-19 cases across the world.
Data sources
Currently used
- WHO
- Processed by Our World in Data: https://ourworldindata.org/coronavirus-source-data
- Tableau: Tableau cleans the JHU CSSE dataset and provides a tidy-formatted dataset. However, as of now, it does not address the data consistency issues in the raw dataset.
- Worldbank country population data and country metadata
- Wikipedia ISO3166 Country code data
Not used at the moment
- COVID-19 daily report by JHU: This has many consistency issues regarding country names and aggregation of US data. Aggregation mechanism is not so transparent.
- https://www.worldometers.info/coronavirus/
Data Outputs and Usages
Daily case data based on WHO report, cleaned by Our World in Data
output/cases/cases_WHO.csv
: This converts the CSV dataset cleaned by Our World in Data team by using ISO 3166 Alpha-3 country code. It also fills up non-existing dates so that for every country, the dataset starts from the same date (Jan. 21st). One may want to combine this with country-level metadata or alternative country names here.
Daily case data in JSON
cntry_stat_owid.json
- Used in an interactive visualization of case fatality rate of COVID-19
- Website source code: https://github.com/covid19-data/covid19-dashboard
- visualization source code on ObservableHQ: https://observablehq.com/@yy/covid-19-fatality-rate and https://observablehq.com/@yy/covid-19-trends
- An example to create case time series charts in ObservableHQ by benjyz
- Used in an interactive visualization of case fatality rate of COVID-19
Country name conversion table
output/metadata/country/country_name_code.csv
: a conversion table from country name to code (ISO 3166 Alpha 3). Note that multiple names point to the same code.output/metadata/country/country_code_name.csv
: a conversion table from country code (ISO 3166 Alpha 3) to country name. The shortest country names are picked from the above dataset.
Country metadata
output/metadata/country/country_metadata.csv
: Country metadata, such as population, region, and income group, indexed by the ISO 3166 Alpha 3 codes.
Location data
coordinates.csv
: Lat Lng location data from JHU dataset (Unreliable).
Usage
Install pandas and snakemake using conda
.
conda install -c bioconda -c conda-forge snakemake pandas
or pip
:
pip install pandas snakemake