datasciencecampus/mobility-report-data-extractor
Scripts to extract data from the COVID-19 Google Community Mobility Reports
repo name | datasciencecampus/mobility-report-data-extractor |
repo link | https://github.com/datasciencecampus/mobility-report-data-extractor |
homepage | |
language | Python |
size (curr.) | 233 kB |
stars (curr.) | 40 |
created | 2020-04-04 |
license | MIT License |
mobius - Mobility Report graph extractor
PLEASE READ: As of 16/04/2020 Google have released the data in CSV format. This tool will not be maintained going forward.
For extracting every graph from any Google’s COVID-19 Community Mobility Report (182) into comma separated value (CSV) files. This code is developed at speed on the COVID-19 Community Mobility Report PDF documents published on Friday 3rd of April 2020.
Updates
10/04/2020: PDF and SVGs updated for the Friday 10th of April 2020 release of data. 16/04/2020: PDF and SVGs updated for the Wednesday 15th of April 2020 release of data.
Installation
We provide the python requirements.txt
file as well as a poetry
setup for
dependency management.
We recommend using a virtual environment before installing dependencies.
To install with pip
:
pip install -r requirements.txt
To install with poetry
poetry install
External Dependencies
This project uses Rtree
which in turn depends on spatialindex
.
On OSX this can require separate installation:
brew install spatialindex
Usage
TLDR:
# Check what report dates are available
python ./mobius.py dt
# Check if a country is available, with an option to select a date (not specifying a date will show all results)
python ./mobius.py ls <DATE>
# Download PDF and SVG
python ./mobius.py download <COUNTRY_CODE> <DATE>
# Process the PDF and SVG
python ./mobius.py summary <INPUT_PDF> <OUTPUT_FOLDER> <DATES_FILE>
python ./mobius.py full <INPUT_PDF> <INPUT_SVG> <OUTPUT_FOLDER> <DATES_FILE>
Note: DATE_FILE
refers to the look up file in the config directory named dates_lookup_xxxx_xx_xx.csv
where the x
mark the release date of the reports you are extracting the data from.
Full command list
Usage: mobius.py [OPTIONS] COMMAND [ARGS]...
Downloader and processor for Google mobility reports
Options:
--help Show this message and exit.
Commands:
download Download pdf and svg for a given country using the country code
dt List all the dates reports are available for
full Produce full CSV of trend data from PDF/SVG input
pdf List all the PDFs available in the buckets
proc Process a given country SVG
summary Produce summary CSV of regional headline figures from CSV
svg List all the SVGs available in the buckets
-
Check what dates reports are available for using
mobius.py dt
command:Usage: mobius.py dt List the dates reports are available for Options: --help Show this message and exit.
-
Check for, and download, SVG and PDF files using
mobius.py svg
andmobius.py download
commands:Use the
mobius.py
command line tool to list all the available countries for PDF/SVG and then download them with the two helpful commands below.Usage: mobius.py svg [OPTIONS][DATE] List the SVGs available in the buckets for the given date. If no date given, a list of all available SVGs is returned Options: --help Show this message and exit. <DATE> Lists all the SVGs published on the date Usage: mobius.py pdf [OPTIONS] [DATE] List the PDFs available in the buckets for the given date. If no date given, a list of all available PDFs is returned Options: --help Show this message and exit. Usage: mobius.py download [OPTIONS] COUNTRY_CODE [DATE] Download PDF and SVG for a given country using the country code and date. If no DATE argument given all available PDFs and SBGs for the given country will be downloaded Options: --help Show this message and exit.
-
Run the
mobius.py summary
commandUsage: mobius.py summary [OPTIONS] INPUT_PDF OUTPUT_FOLDER DATES_FILE Produce summary CSV of regional headline figures from CSV Options: --help Show this message and exit.
Specify the input pdf file for the individual country as the INPUT_PDF
,
and the output folder where you want the CSV to (e.g.
./output
).
Creates a summary CSV joined to the data extracted from the SVG plots
<OUTPUT_FOLDER>/<INPUT_PDF_BASENAME>_summary.csv
.
- Run the
mobius.py full
commandUsage: mobius.py full [OPTIONS] INPUT_PDF INPUT_SVG OUTPUT_FOLDER DATES_FILE Produce full CSV of trend data from PDF/SVG input Options: --help Show this message and exit.
Specify the input pdf/svg file for the individual country as the INPUT_PDF
/INPUT_SVG
,
and the output folder where you want the CSV to be saved to (e.g.
./output
).
Pass in a custom the dates lookup file (e.g. ./config/dates_lookup_<date>.csv
) - used to
convert coordinates to dates. <date>
is in the YYYY_MM_DD
format and all available dates for
download are available using mobius dt
command.
Creates a full CSV joined to the data extracted from the SVG plots
<OUTPUT_FOLDER>/<INPUT_PDF_BASENAME>.csv
.
Command gives a short summary of any discrepancies between the summary figures and the data extracted from svg plots.
-
(Alternative) Run the
mobius.py proc
commandUsage: mobius.py proc [OPTIONS] INPUT_LOCATION OUTPUT_FOLDER [DATES_FILE] Process a given country SVG Options: -f, --folder TEXT If provided will overwrite the output folder name -s, --svgs Enables saving of SVGs that get extracted -c, --csvs Enables saving of CSVs that get extracted -p, --plots Enables creation and saving of additional PNG plots --help Show this message and exit.
Specify the input file for the individual country as the INPUT_LOCATION
and the
output folder where you want the CSV and other files to be saved to (e.g.
./output
).
Optionally pass in a custom the dates lookup file (e.g.
./config/dates_lookup.csv
) - used to convert coordinates to dates.
If you want simple matplotlib PNG plots to save as well as CSV files, use the -p
flag.
Data format
Each CSV from proc
will be saved to (./output/<COUNTRY_CODE>/csv
), starting at 1.csv
. As of the COVID-19 Community Mobility Reports released on Friday 3rd April 2020, CSV files 1.csv
to 6.csv
relate to the country-level graphs in the original PDF (pages one and two). Then each set of 6 CSV files (e.g., 7.csv
to 12.csv
) will relate to a regional area.
Each set of 6 files follows the order: 1. Retail & recreation 2. Grocery & pharmacy 3. Parks 4. Transit stations 5. Workplaces 6. Residential
G20 CSV Datasets
CSV datasets for G20 countries (except Russia and China) can be found at the Data Science Campus' Google Mobility Reports Data repository.
Creating your own SVG
-
To create your own SVG from a PDF, we recommend using Affinity Designer. This is because Affinity Designer flattens the SVG, which is required for
mobius.py proc
. Affinity Designer also closes point features in SVG files, which other programmes do not. Use the following steps:- Load in PDF document to Affinity Designer.
- Click Load all pages.
File > Export > SVG (for print)
if using version 1.6.3 orFile > Export > SVG (digital - high quality)
for version 1.8.3- Select Area: Whole Document.
- Save the SVG file to (
./svgs
).
Utility script
To run through all available countries see run_all.sh
.
See run_all.sh help
for usage.
# Create summary and full CSVs for all countries returned by ./mobius.py svg
./run_all.sh
Contributing
Any suggestions or issues, please use the Issues template. We welcome collaborators. To help us with this work, fork the repository and issue a Pull Request when you have added a feature, or fixed a bug. Thanks!