April 9, 2021

9773 words 46 mins read

eddwebster/football_analytics

eddwebster/football_analytics

A collection of football analytics projects, data, and analysis by Edd Webster (@eddwebster), with links to publicly available resources in the football analytics community.

repo name eddwebster/football_analytics
repo link https://github.com/eddwebster/football_analytics
homepage
language Jupyter Notebook
size (curr.) 555151 kB
stars (curr.) 90
created 2020-09-01
license

Edd Webster Football Analytics

This repository is a public space for the football analytics projects by Edd Webster and a list of publicly available resources published by the football analytics community.

I am currently rewriting this README to include links not only to my own work, but also to include a concise list of learning resources, data sources, libraries, papers, blogs, podcasts, etc., created by all those that have made contributions to the football analytics community. This is currently in progress and could still do with a bit of editing, but most of the content is now available below. If you can think of any resources that I’ve missed, feel free to create a pull request or send me a message. Credits to the Soccer Analytics Handbook by Devin Pleuler, the Awesome Soccer Analytics by Matias Mascioto, and Jan Van Haaren’s Soccer Analytics 2020 Review, which were all used to plug gaps in the list once it was published.

If you like the repo, please feel free to give it a :star: (top right). Cheers!

:wave: About This Repository and Author

Please note, all the work produced in this repository is mine and/or credited to the publicly produced code, data, and/or libraries used, and is in no way related to the work and analysis I produce for my employers.

For more information about this repository and the author, I’m available through all the following channels:

:clipboard: Contents:

:notebook_with_decorative_cover: Notebooks

For code, see the notebooks subfolder, in which the workflow is divided into the following:

  1. Webscraping;
  2. Data Parsing;
  3. Data Engineering;
  4. Machine Learning; and
  5. Data Analysis - projects include working with Tracking data, constructing VAEP models (as introduced by SciSports), building xG models using Logistic Regression, Decision Trees and XGBoost, and analysing player similarity using PCA and Factor Analysis.

:bar_chart: Data Visualisation and Tableau

For Tableau dashboards produced using the data engineered in the notebooks in this repository, please see my Tableau Public profile: public.tableau.com/profile/edd.webster.

  • WSL dashboards and analysis [link];
  • ‘Big 5’ European leagues dashboards and analysis [link];
  • EFL dashboards and analysis [link];
  • StrataBet Chance dashboards and analysis [link]; and
  • Opta #mcfcanalytics dashboards and analysis [link].

:floppy_disk: Data Sources

The following data sources have been used in this repository. Due to the 100mb file size limitation in GitHub, all engineered datasets prepared in this repository have been exported and made publicly available to view and download in Google Drive. Please see the following [link]. However, all code in this repository should enable you to scrape, parse, and engineer the datasets to the format in which I have analysed and visualised the data in this repo.

Data sources featured in this repository include:

:classical_building: Libaries

The Python libraries used in this repository include:

:bookmark_tabs: Resources

Getting Started with Football Analytics:

:student: Tutorials

Python

R

Tableau

For a YouTube playlist of Tableau-football videos and tutorials that I have collated from various sources including the Tableau Football User Group, Rob Carroll, and Tom Goodall, see the following [link].

Excel

PowerPoint

Other Sports

:floppy_disk: Data Sources

All publicly available data sources and datasets relating to football, from Tracking data, Event data, aggregated player performance data, detailed match statistics, injury records and transfer values, and more.

Documentation

[TO ADD HERE]

Data Companies

Data Providers
Tracking
Video / Performance Analysis

:classical_building: Libaries

Python

  • codeball - data driven tactical and video analysis of soccer games;
  • Football Packing - a Python package to calculate packing rate for a given pass in football by Samira Kumar. This is a variation of the metric created by Impect;
  • kloppy - a Python package providing (de)serializers for soccer tracking- and event data, standardized data models, filters, and transformers designed to make working with different tracking- and event data like a breeze;
  • matplotsoccer - a Python library for visualising soccer event data by Tom Decroos;
  • mplsoccer - a Python library for drawing soccer/football pitches in Matplotlib and loading StatsBomb open-data by Andrew Rowlinson;
  • nayra - API that allows you track soccer player from camera inputs, and evaluate them with an Expected Discounted Goal (EDG) Agent. See the Evaluating Soccer Player paper by Paul Garnier and Théophane Gregoir;
  • northpitch - a Python football plotting library that sits on top of Matplotlib by Devin Pleuler;
  • PCA_Player_Finder by Parth Athale;
  • PySport including PySport Soccer - collection of open-source sport packages including many of those mentioned in this section, by Koen Vossen;
  • PyWaffle - an open source, MIT-licensed Python package for plotting waffle charts by Peter McKeever;
  • Scrape-FBref-data - Python library to scrape StatsBomb data via FBref by Parthe Athale, which in turn was updated from Christopher Martin’s repository;
  • statsbombapi - a Python API wrapper and dataclasses for Statsbomb data;
  • statsbombpy - a Python library written by Francisco Goitia to access StatsBomb data;
  • statsbomb-parser - Python library to convert StatsBomb’s JSON data into easy-to-use CSV format;
  • socceraction - a Python library for valuing the individual actions performed by soccer players. Includes an Expected Threat (xT) implementation by Tom Decroos et. al.;
  • soccermix - a soft clustering technique based on mixture models that decomposes event stream data into a number of prototypical actions of a specific type, location, and direction by Tom Deccoos and ML-KULeuven;
  • soccer_xg - a Python package for training and analyzing expected goals (xG) models in football;
  • soccerplots - a Python package that can be used for making visualizations for football analytics by Slothfulwave;
  • sync.soccer - a Python package to synchronise football datasets, so that an event in one dataset is matched to the corresponding event or snapshot in the other by Marek Kwiatkowski. This repository contains an implementation that aligns Opta’s (now STATS Perform’s) F24 feeds to ChyronHego’s Tracab files. More formats may be added in the future. See the following blog post for methodology [link];
  • tmscrape - a Python TransferMarkt webscraper by danzn1;
  • Tyrone Mings - a Python TransferMarkt webscraper by FCrSTATS; and

R

  • ggsoccer - a soccer visualisation library in R from Ben Torvaney;
  • worldfootballR - a R package to allow users to extract various world football results and player statistics data from FBref and valuations and transfer data from TransferMarkt.com by Jason Zivkovic; and
  • understatr - a R package to scrape data from Understat.

GitHub Repositories

Python

R

Apps

Video analysis

:page_with_curl: Papers

The following Shiny App from Lars Maurath is a great tool for looking up publications [link].

2021

2020

2019

2018

2017

2016

2015

2014

2011

1997

:books: Written Pieces

Many of these blog posts are recommended in Sam Gregory’s Best Football Analytics Pieces piece and Tom Worville’s “What’s the best Football Analytics piece you’ve ever read?”.

:pencil2: Blogs and Data Analytics Websites

Newsletters

:newspaper: News Articles

:vhs: Videos

For a YouTube playlist of over 800 Sports Analytics / Data Science videos that I have collated into one single playlist, originally for my own viewing but it may be useful to you, see [link]. For Football-specific Data Science lectures and seminars, see [link]. For a Tableau Football specific playlist, see [link].

:man_teacher: Webinars and Lectures

Ted Talks

Documentaries

Match Highlights

Others

:tv: YouTube Channels

:books: Books

Magazines:

:loud_sound: Podcasts

Spotify and YouTube links used where available.

Football Analytics Podcasts

Noteable Episodes (including non-football-data-specific podcasts)

:man_technologist: Notable Figures / Twitter Accounts

Career Advice

:spiral_calendar: Events and Conferences

Competitions

Includes non-football competitions.

Courses

:briefcase: Jobs

:key: Key Concepts

References to resources organised by topic.

Expected Goals (xG) Modeling

Videos

For a playlist of Expected Goals related videos available on YouTube, see the following playlist I have created [link].

Webinars and Lectures
Tutorials
Notable Models
Written pieces

For a collated list of Expected Goals literature collated by Keith Lyons, see the following [link]

Libraries
GitHub repos
Podcasts

Tweets

  • The benefits of including fake data in an Expected Goals model [link].

Tracking data

Possession Value (PV) Frameworks

Expected Threat (xT)

[TO ADD]

Valuing Actions by Estimating Probabilities (VAEP)
Goals Added (g+)

Player Similarity Analysis

[TO ADD]

Player Comparison and Similarity Analysis

[TO ADD]

Reinforcement Learning for Football Simulation

:grey_question: Miscellaneous

Credits

Credits to the Soccer Analytics Handbook by Devin Pleuler, the Awesome Soccer Analytics by Matias Mascioto, and Jan Van Haaren’s Soccer Analytics 2020 Review which were all used to plug gaps in the list once it was published.

comments powered by Disqus