April 10, 2020

1157 words 6 mins read

datasciencecampus/mobility-report-data-extractor

datasciencecampus/mobility-report-data-extractor

Scripts to extract data from the COVID-19 Google Community Mobility Reports

repo name datasciencecampus/mobility-report-data-extractor
repo link https://github.com/datasciencecampus/mobility-report-data-extractor
homepage
language Python
size (curr.) 233 kB
stars (curr.) 40
created 2020-04-04
license MIT License

mobius - Mobility Report graph extractor

PLEASE READ: As of 16/04/2020 Google have released the data in CSV format. This tool will not be maintained going forward.

For extracting every graph from any Google’s COVID-19 Community Mobility Report (182) into comma separated value (CSV) files. This code is developed at speed on the COVID-19 Community Mobility Report PDF documents published on Friday 3rd of April 2020.

Updates

10/04/2020: PDF and SVGs updated for the Friday 10th of April 2020 release of data. 16/04/2020: PDF and SVGs updated for the Wednesday 15th of April 2020 release of data.

Installation

We provide the python requirements.txt file as well as a poetry setup for dependency management.

We recommend using a virtual environment before installing dependencies.

To install with pip:

pip install -r requirements.txt

To install with poetry

poetry install

External Dependencies

This project uses Rtree which in turn depends on spatialindex.

On OSX this can require separate installation: brew install spatialindex

Usage

TLDR:

# Check what report dates are available
python ./mobius.py dt

# Check if a country is available, with an option to select a date (not specifying a date will show all results)
python ./mobius.py ls <DATE>

# Download PDF and SVG
python ./mobius.py download <COUNTRY_CODE> <DATE>

# Process the PDF and SVG
python ./mobius.py summary <INPUT_PDF> <OUTPUT_FOLDER> <DATES_FILE>
python ./mobius.py full <INPUT_PDF> <INPUT_SVG> <OUTPUT_FOLDER> <DATES_FILE>

Note: DATE_FILE refers to the look up file in the config directory named dates_lookup_xxxx_xx_xx.csv where the x mark the release date of the reports you are extracting the data from.

Full command list

Usage: mobius.py [OPTIONS] COMMAND [ARGS]...

  Downloader and processor for Google mobility reports

Options:
  --help  Show this message and exit.

Commands:
  download  Download pdf and svg for a given country using the country code
  dt        List all the dates reports are available for
  full      Produce full CSV of trend data from PDF/SVG input
  pdf       List all the PDFs available in the buckets
  proc      Process a given country SVG
  summary   Produce summary CSV of regional headline figures from CSV
  svg       List all the SVGs available in the buckets
  
  1. Check what dates reports are available for using mobius.py dt command:

    Usage: mobius.py dt
    
      List the dates reports are available for
    
    Options:
      --help  Show this message and exit.
    
  2. Check for, and download, SVG and PDF files using mobius.py svg and mobius.py download commands:

    Use the mobius.py command line tool to list all the available countries for PDF/SVG and then download them with the two helpful commands below.

    Usage: mobius.py svg [OPTIONS][DATE]
    
      List the SVGs available in the buckets for the given date. If no date given, a list of all available SVGs is returned
    
    Options:
      --help  Show this message and exit.
      <DATE> Lists all the SVGs published on the date
    
    
    Usage: mobius.py pdf [OPTIONS] [DATE]
    
      List the PDFs available in the buckets for the given date. If no date given, a list of all available PDFs is returned
    
    Options:
      --help  Show this message and exit.
    
    
    Usage: mobius.py download [OPTIONS] COUNTRY_CODE [DATE]
    
      Download PDF and SVG for a given country using the country code and date. If no DATE argument given all available PDFs and SBGs for the given country will be downloaded
    
    Options:
      --help  Show this message and exit.
    
  3. Run the mobius.py summary command

    Usage: mobius.py summary [OPTIONS] INPUT_PDF OUTPUT_FOLDER DATES_FILE
    
      Produce summary CSV of regional headline figures from CSV
    
    Options:
      --help  Show this message and exit.
    

Specify the input pdf file for the individual country as the INPUT_PDF, and the output folder where you want the CSV to (e.g. ./output).

Creates a summary CSV joined to the data extracted from the SVG plots <OUTPUT_FOLDER>/<INPUT_PDF_BASENAME>_summary.csv.

  1. Run the mobius.py full command
    Usage: mobius.py full [OPTIONS] INPUT_PDF INPUT_SVG OUTPUT_FOLDER DATES_FILE
    
      Produce full CSV of trend data from PDF/SVG input
    
    Options:
      --help                 Show this message and exit.
    
    

Specify the input pdf/svg file for the individual country as the INPUT_PDF/INPUT_SVG, and the output folder where you want the CSV to be saved to (e.g. ./output).

Pass in a custom the dates lookup file (e.g. ./config/dates_lookup_<date>.csv) - used to convert coordinates to dates. <date> is in the YYYY_MM_DD format and all available dates for download are available using mobius dt command.

Creates a full CSV joined to the data extracted from the SVG plots <OUTPUT_FOLDER>/<INPUT_PDF_BASENAME>.csv.

Command gives a short summary of any discrepancies between the summary figures and the data extracted from svg plots.

  1. (Alternative) Run the mobius.py proc command

    Usage: mobius.py proc [OPTIONS] INPUT_LOCATION OUTPUT_FOLDER [DATES_FILE]
    
      Process a given country SVG
    
    Options:
      -f, --folder TEXT  If provided will overwrite the output folder name
      -s, --svgs         Enables saving of SVGs that get extracted
      -c, --csvs         Enables saving of CSVs that get extracted
      -p, --plots        Enables creation and saving of additional PNG plots
      --help             Show this message and exit.
    

Specify the input file for the individual country as the INPUT_LOCATION and the output folder where you want the CSV and other files to be saved to (e.g. ./output).

Optionally pass in a custom the dates lookup file (e.g. ./config/dates_lookup.csv) - used to convert coordinates to dates.

If you want simple matplotlib PNG plots to save as well as CSV files, use the -p flag.

Data format

Each CSV from proc will be saved to (./output/<COUNTRY_CODE>/csv), starting at 1.csv. As of the COVID-19 Community Mobility Reports released on Friday 3rd April 2020, CSV files 1.csv to 6.csv relate to the country-level graphs in the original PDF (pages one and two). Then each set of 6 CSV files (e.g., 7.csv to 12.csv) will relate to a regional area.

Each set of 6 files follows the order: 1. Retail & recreation 2. Grocery & pharmacy 3. Parks 4. Transit stations 5. Workplaces 6. Residential

G20 CSV Datasets

CSV datasets for G20 countries (except Russia and China) can be found at the Data Science Campus' Google Mobility Reports Data repository.

Creating your own SVG

  1. To create your own SVG from a PDF, we recommend using Affinity Designer. This is because Affinity Designer flattens the SVG, which is required for mobius.py proc. Affinity Designer also closes point features in SVG files, which other programmes do not. Use the following steps:

    1. Load in PDF document to Affinity Designer.
    2. Click Load all pages.
    3. File > Export > SVG (for print) if using version 1.6.3 or File > Export > SVG (digital - high quality) for version 1.8.3
    4. Select Area: Whole Document.
    5. Save the SVG file to (./svgs).

Utility script

To run through all available countries see run_all.sh. See run_all.sh help for usage.

# Create summary and full CSVs for all countries returned by ./mobius.py svg
./run_all.sh

Contributing

Any suggestions or issues, please use the Issues template. We welcome collaborators. To help us with this work, fork the repository and issue a Pull Request when you have added a feature, or fixed a bug. Thanks!

comments powered by Disqus