February 13, 2020

822 words 4 mins read

ksachdeva/rethinking-tensorflow-probability

ksachdeva/rethinking-tensorflow-probability

Statistical Rethinking (2nd Ed) with Tensorflow Probability

repo name ksachdeva/rethinking-tensorflow-probability
repo link https://github.com/ksachdeva/rethinking-tensorflow-probability
homepage https://ksachdeva.github.io/rethinking-tensorflow-probability/
language Jupyter Notebook
size (curr.) 83276 kB
stars (curr.) 59
created 2020-01-14
license Apache License 2.0

Statistical Rethinking (2nd Edition) with Tensorflow Probability

This repository provides jupyter notebooks that port various R code fragments found in the chapters of Statistical Rethinking 2nd Edition by Professor Richard McElreath to python using tensorflow probability framework.

Note - These notebooks are based on the 8th December 2019 draft. I will update the notebooks once the book is released.

Misc Notes

  • Why Tensorflow Probability ? There are many great probabilitic frameworks (PPLs) out there. I especially like Numpyro & PyMC3 (& PyMC4). There are 2 main reasons why I chose to do this exercise in tfp.

    • First and main reason is to not use the magic of the libraries. Sometimes higher level libraries hide the details which are necessary for one to truly understand the subject. As a matter of fact, working with TFP has resulted in me becoming more appreciable of these high level libraries as indeed they not only provide great helpers but make the code easy to read and reuse.
    • Second is that I have other investments in Tensorflow ecosystem so am not keen on switching to pyTorch even though I really like what Pyro team has done. I am hoping that PyMC4 will be a great alternative.

    For production use, I strongly recommend that one must use these higher level libraries i.e. Numpyro, PyMC3, PyMC4

  • What worked ? Well of course this book is the best there is in this area. The community is also great. I got quick responses from tensorflow probability team whenever I asked questions on tfp google group.

  • What was hard ? It may be tad bit subjective because I am challenged when it comes to manipulating shapes (high dimensional arrays). I find numpy to be difficult and tensorflow is way more harder when it comes to working with multi-dimensional arrays. This is one of the main problems I have faced and continue to face. Another problem is that the stack trace generated by TFP can be really difficult to understand. This mostly is the side effect of graphs that make debugging difficult. Quite often as long as I used only 1 chain things would work but working with multiple chains require that you pay special attention to the shapes/batches of the various tensors/distributions.

  • Visualization I have made use of arviz and in order to do that I converted the output of various sampling procedures to the format/structure required by it. This made me learn and discover xarray. It was really worth doing it and made it easy to plot the graphs.

Work Remaining/Pending/TODO

There are few code cells in various notebooks that are still not working. I do plan to investigate & fix/finish them. Chapter 14 in particular is not working. Any help is appreciated.

In majority of the chapters, the book has used quadratic approximation (quap) where as I have used HMC everywhere. I plan to change this as well by implementing Quadratic/Laplace approximation.

Chapters

If you prefer the readonly view of notebooks (html pages) then use this link - https://ksachdeva.github.io/rethinking-tensorflow-probability/

If you want to run the notebooks locally -

# install the requirements
pip install -r requirements.txt
# install jupyter in your virtual environment
pip install -r requirements-extra.txt

If you prefer to run the notebooks in binder then click here Binder

Clicking on the links will open the notebooks in Google Colab

Acknowledgements

My immense gratitude goes to Professor Richard McElreath for writing such a wonderful book. His method of teaching has made somewhat difficult subject of Bayesian Statistics approachable, interesting and to some extent fun as well. We need more educators like you Sir !.

Another person I want to thank is Du Phan (https://github.com/fehiepsi). He is the main author of Numpyro, a great framework to do Bayesian Analysis. He has ported Statsical Rethinking (2nd Ed) to Numpyro and his notebooks were not only insipirational but were also of great help to me in creating graphs. I borrowed most of his code fragments when it came to plotting the figures using matplotlib.

comments powered by Disqus