November 12, 2020

1228 words 6 mins read

bradleyboehmke/data-science-learning-resources

A collection of machine learning resources that I've found helpful (I only post what I've read!)


repo name	bradleyboehmke/data-science-learning-resources
repo link	https://github.com/bradleyboehmke/data-science-learning-resources
homepage
language
size (curr.)	27505 kB
stars (curr.)	409
created	2018-11-28
license

Data Science Learning Resources

Programming

General

The Pragmatic Programmer (Book)
Clean Code (Book)

Python

R

R for Data Science (Book)
Advanced R (Book)
R Markdown: The Definitive Guide (Book)
bookdown: Authoring Books and Technical Documents with R Markdown (Book)
Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving (Book)
Automated Data Collection with R (Book)
Introduction to Data Science (Book)

Spark

Code Packaging

Style Guide, Readability, Best Practices

The Art of Readable Code (Book)
The Tidyverse Style Guide (Online book)
PEP 8 – Style Guide for Python Code (Online guide)
Guidelines for code reviews (README)
Code Review Best Practices (Blog post)

Testing

Testing R Code (Book)
Python Testing with pytest (Book)
Multiply your Testing Effectiveness with Parameterized Testing (PyCon Talk)
Test-Driven Development (Book)

Machine Learning

General

Introduction to Statistical Learning (Book)
Applied Predictive Modeling (Book)
Elements of Statistical Learning (Book)
Computer Age of Statistical Inference (Book)
Statistical Modeling: The Two Cultures (Paper)
Deep Learning (Book)
Hands-On Machine Learning with Scikit-Learn & TensorFlow (Book | GitHub)
Hands-On Machine Learning with R (Book)
Google’s Machine Learning Crash Course (MOOC)

Unsupervised Modeling

ISLR: Ch. 10.3 Clustering Methods (Book chapter)
A K-Means Clustering Algorithm (Paper)
Generalized Low Rank Models (Paper)
Deep Learning Ch. 15 Autoencoders (Book chapter)
Hands-On Mach. Learning with Scikit-Learn Ch. 15 Autoencoders (Book chapter | GitHub resource)
Sparse autoencoder (Andrew Ng CS294A lecture notes)

A/B Testing

Lessons from Running Thoursands of A/B Tests (Online presentation with many references)
Online Controlled Experiments at Large Scale (Paper)
Peaking at A/B Tests (Paper)
Multi-armed Bandit (Online tutorial)
A Modern Bayesian Look at the Multi-armed Bandit (Paper behind above online tutorial)
Predicting Search Satisfaction Metrics with Interleaved Comparisons (Paper)
Evaluating Retrieval Performance using Clickthrough Data (Paper)

Multivariate Adaptive Regression Splines

Multivariate Adaptive Regression Splines (Friedman’s original paper)
APM: Ch. 7.2 Multivariate Adaptive Regression Splines (Book chapter)
ESL: Ch. 9.4 Multivariate Adaptive Regression Splines (Book chapter)
Notes on the earth package (Paper)

K-Nearest Neighbor

k-Nearest neighbour classifiers (Paper)
APM: Ch. 7.4 & 13.5 K-Nearest Neighbors (Book chapter)
ESL: Ch. 13.3 k-Nearest-Neighbor Classifiers (Book chapter)

Random Forests

Gradient Boosting Machines

How to explain gradient boosting (Online tutorial)
Trevor Hastie - Gradient Boosting & Random Forests at H2O World 2014 (YouTube)
Trevor Hastie - Data Science of GBM (2013) (slides)
Mark Landry - Gradient Boosting Method and Random Forest at H2O World 2015 (YouTube)
Peter Prettenhofer - Gradient Boosted Regression Trees in scikit-learn at PyData London 2014 (YouTube)
Alexey Natekin1 and Alois Knoll - Gradient boosting machines, a tutorial (Paper)

Deep Learning

Deep Learning with R (Book)
Deep Learning with Python (Book)
Deep Learning Specialization (MOOC)
keras.rstudio.com (Online articles & tutorials)
blogs.rstudio.com/tensorflow (Online articles & tutorials)
Illustrated Guide to Recurrent Neural Networks (Blog)
Illustrated Guide on Vanishing Gradients (Blog)
Illustrated Guide to LSTMs and GRUs (Blog)
Understanding LSTMs (Blog)
Rohan & Lenny: Recurrent Neural Networks & LSTMs (Blog)
The Unreasonable Effectiveness of Recurrent Neural Networks (Blog)
Revisiting Small Batch Training for Deep Neural Networks (Paper)
On Loss Functions for Deep Neural Networks in Classification (Paper)
Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper)
Efficient BackProp (Paper)
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (Paper)
Cyclical Learning Rates for Training Neural Networks (Paper)
A Disciplined Approach to Neural Network Hyperparameters: Part 1 – Learning Rate, Batch Size, Momentum, and Weight Decay (Paper)
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (Paper)

Ensembles / Model Stacking / Super Learners

Ensemble Methods in Machine Learning (Paper)
Stacked Regressions (Paper)
Super Learner (Paper)

Natural Language Processing / Text Mining

Text Mining with R (Book)
Probabilistic Topic Models (Paper)
The Illustrated Word2vec (Online tutorial)
Sebastian Ruder’s series on Word Embeddings (Online articles & tutorials)
Neural Models for Information Retrieval (Paper)
Why do we use word embeddings in NLP? (Blog)

Tuning

Feature Engineering

Feature Selection

Feature Selection with the Boruta Package (Paper)
APM: Ch. 19 An Introduction to Feature Selection (Book chapter)

Machine Learning Interpretability

Auto ML

Benchmarking

The Design and Analysis of Benchmark Experiments (Paper)
Szilard Pafka’s ML Benchmarking Research (GitHub resources)
Data-driven advice for applying machine learning to bioinformatics problems (Paper)

Resampling Procedures

Productionalization

150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper)
Hidden Technical Debt in Machine Learning Systems (Paper)
Deep Learning in Production (Github resources)

Leadership & Strategy

bradleyboehmke/data-science-learning-resources

Data Science Learning Resources

Programming

General

Python

R

Spark

Command Line

Containers

Functional Programming

Version Control

Code Packaging

Style Guide, Readability, Best Practices

Testing

Machine Learning

General

Unsupervised Modeling

A/B Testing

Multivariate Adaptive Regression Splines

K-Nearest Neighbor

Random Forests

Gradient Boosting Machines

Deep Learning

Ensembles / Model Stacking / Super Learners

Natural Language Processing / Text Mining

Tuning

Feature Engineering

Feature Selection

Machine Learning Interpretability

Auto ML

Benchmarking

Resampling Procedures

Productionalization

Leadership & Strategy

data-science-on-aws/workshop

aws-samples/aws-machine-learning-university-accelerated-cv

SeldonIO/seldon-core

LiYangHart/Hyperparameter-Optimization-of-Machine-Learning-Algorithms

nidhaloff/igel

MicrosoftDocs/ml-basics

dafriedman97/mlbook

Palashio/libra

eugeneyan/applied-ml