bradleyboehmke/data-science-learning-resources
A collection of machine learning resources that I've found helpful (I only post what I've read!)
repo name | bradleyboehmke/data-science-learning-resources |
repo link | https://github.com/bradleyboehmke/data-science-learning-resources |
homepage | |
language | |
size (curr.) | 27505 kB |
stars (curr.) | 409 |
created | 2018-11-28 |
license | |
Data Science Learning Resources
Programming
General
- The Pragmatic Programmer (Book)
- Clean Code (Book)
Python
- A Whirlwind Tour of Python (Book)
- Python Data Science Handbook
- Python Tricks (Book)
- Learning Python (Book)
- Effective Python (Book)
R
- R for Data Science (Book)
- Advanced R (Book)
- R Markdown: The Definitive Guide (Book)
- bookdown: Authoring Books and Technical Documents with R Markdown (Book)
- Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving (Book)
- Automated Data Collection with R (Book)
- Introduction to Data Science (Book)
Spark
- Spark: The Definitive Guide: Big Data Processing Made Simple (Book)
- Learning Spark: Lightning-Fast Big Data Analysis (Book)
- Mastering Spark with R: The Complete Guide to Large-Scale Analysis and Modeling (Book)
Command Line
- The Missing Semester of Your CS Education (Online course)
- Learning the bash Shell (Book)
- The Art of the Command Line (GitHub resources)
- explainshell.com (Online help)
Containers
- Docker tips & tricks or just useful commands (Online article)
- Rocker: R configurations for Docker (GitHub resources)
- Docker and Python: making them play nicely and securely for Data Science and ML (PyCon Talk)
Functional Programming
- An Introduction to the Basic Principles of Functional Programming (Online article)
- R for Data Science, Ch. 21 (Book)
- Advanced R, Ch. 9 (Book)
- Jenny Bryan’s purrr tutorials (Online tutorial)
- Foundations of Functional Programming with purrr (DataCamp)
- Intermediate Functional Programming with purrr (DataCamp)
Version Control
- Excuse me, do you have a moment to talk about version control? (Paper)
- Happy Git and GitHub for the useR (Book)
- Learn Git (Online tutorial)
- Git Commit Message Style Guide (Online guide)
Code Packaging
Style Guide, Readability, Best Practices
- The Art of Readable Code (Book)
- The Tidyverse Style Guide (Online book)
- PEP 8 – Style Guide for Python Code (Online guide)
- Guidelines for code reviews (README)
- Code Review Best Practices (Blog post)
Testing
- Testing R Code (Book)
- Python Testing with pytest (Book)
- Multiply your Testing Effectiveness with Parameterized Testing (PyCon Talk)
- Test-Driven Development (Book)
Machine Learning
General
- Introduction to Statistical Learning (Book)
- Applied Predictive Modeling (Book)
- Elements of Statistical Learning (Book)
- Computer Age of Statistical Inference (Book)
- Statistical Modeling: The Two Cultures (Paper)
- Deep Learning (Book)
- Hands-On Machine Learning with Scikit-Learn & TensorFlow (Book | GitHub)
- Hands-On Machine Learning with R (Book)
- Google’s Machine Learning Crash Course (MOOC)
Unsupervised Modeling
- ISLR: Ch. 10.3 Clustering Methods (Book chapter)
- A K-Means Clustering Algorithm (Paper)
- Generalized Low Rank Models (Paper)
- Deep Learning Ch. 15 Autoencoders (Book chapter)
- Hands-On Mach. Learning with Scikit-Learn Ch. 15 Autoencoders (Book chapter | GitHub resource)
- Sparse autoencoder (Andrew Ng CS294A lecture notes)
A/B Testing
- Lessons from Running Thoursands of A/B Tests (Online presentation with many references)
- Online Controlled Experiments at Large Scale (Paper)
- Peaking at A/B Tests (Paper)
- Multi-armed Bandit (Online tutorial)
- A Modern Bayesian Look at the Multi-armed Bandit (Paper behind above online tutorial)
- Predicting Search Satisfaction Metrics with Interleaved Comparisons (Paper)
- Evaluating Retrieval Performance using Clickthrough Data (Paper)
Multivariate Adaptive Regression Splines
- Multivariate Adaptive Regression Splines (Friedman’s original paper)
- APM: Ch. 7.2 Multivariate Adaptive Regression Splines (Book chapter)
- ESL: Ch. 9.4 Multivariate Adaptive Regression Splines (Book chapter)
- Notes on the earth package (Paper)
K-Nearest Neighbor
- k-Nearest neighbour classifiers (Paper)
- APM: Ch. 7.4 & 13.5 K-Nearest Neighbors (Book chapter)
- ESL: Ch. 13.3 k-Nearest-Neighbor Classifiers (Book chapter)
Random Forests
- An Introduction to Recursive Partitioning Using the RPART Routines (Paper)
- Random Forests - Leo Breiman’s original research paper (Paper)
Gradient Boosting Machines
- How to explain gradient boosting (Online tutorial)
- Trevor Hastie - Gradient Boosting & Random Forests at H2O World 2014 (YouTube)
- Trevor Hastie - Data Science of GBM (2013) (slides)
- Mark Landry - Gradient Boosting Method and Random Forest at H2O World 2015 (YouTube)
- Peter Prettenhofer - Gradient Boosted Regression Trees in scikit-learn at PyData London 2014 (YouTube)
- Alexey Natekin1 and Alois Knoll - Gradient boosting machines, a tutorial (Paper)
Deep Learning
- Deep Learning with R (Book)
- Deep Learning with Python (Book)
- Deep Learning Specialization (MOOC)
- keras.rstudio.com (Online articles & tutorials)
- blogs.rstudio.com/tensorflow (Online articles & tutorials)
- Illustrated Guide to Recurrent Neural Networks (Blog)
- Illustrated Guide on Vanishing Gradients (Blog)
- Illustrated Guide to LSTMs and GRUs (Blog)
- Understanding LSTMs (Blog)
- Rohan & Lenny: Recurrent Neural Networks & LSTMs (Blog)
- The Unreasonable Effectiveness of Recurrent Neural Networks (Blog)
- Revisiting Small Batch Training for Deep Neural Networks (Paper)
- On Loss Functions for Deep Neural Networks in Classification (Paper)
- Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper)
- Efficient BackProp (Paper)
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification (Paper)
- Cyclical Learning Rates for Training Neural Networks (Paper)
- A Disciplined Approach to Neural Network Hyperparameters: Part 1 – Learning Rate, Batch Size, Momentum, and Weight Decay (Paper)
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (Paper)
Ensembles / Model Stacking / Super Learners
- Ensemble Methods in Machine Learning (Paper)
- Stacked Regressions (Paper)
- Super Learner (Paper)
Natural Language Processing / Text Mining
- Text Mining with R (Book)
- Probabilistic Topic Models (Paper)
- The Illustrated Word2vec (Online tutorial)
- Sebastian Ruder’s series on Word Embeddings (Online articles & tutorials)
- Neural Models for Information Retrieval (Paper)
- Why do we use word embeddings in NLP? (Blog)
Tuning
- Hyperparameters and Tuning Strategies for Random Forest (Paper)
- Tunability: Importance of Hyperparameters of Machine Learning Algorithms (Paper)
- Machine Learning Benchmarks and Random Forest Regression (Paper)
- Random Search for Hyperparameter Optimization (Paper)
Feature Engineering
- Feature Engineering for Machine Learning (Book)
- Feature Engineering and Selection: A Practical Approach for Predictive Models (Book)
Feature Selection
- Feature Selection with the Boruta Package (Paper)
- APM: Ch. 19 An Introduction to Feature Selection (Book chapter)
Machine Learning Interpretability
- Scott Lundberg’s presentation on SHAP
- H2O.ai Machine Learning Interpretability Resources (GitHub resources)
- Patrick Hall’s Awesome Machine Learning Interpretability Resources (GitHub resources)
- Interpretable Machine Learning (Book)
- Visualizing the Feature Importance for Black Box Models (Paper)
- A Simple and Effective Model-Based Variable Importance Measure (Paper)
- Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation (Paper)
- pdp: An R Package for Constructing Partial Dependence Plots (Paper)
- “Why Should I Trust You?": Explaining the Predictions of Any Classifier (Paper)
- A Unified Approach to Interpreting Model Predictions (Paper)
- Consistent Individualized Feature Attribution for Tree Ensembles (Paper)
- On the Art and Science of Machine Learning Explanations (Paper)
- Explanation in artificial intelligence: Insights from the social sciences (Paper)
- Please Stop Permuting Features: An Explanation and Alternatives (Paper)
- A Stratification Approach to Partial Dependence for Codependent Variables (Paper)
- Explaining Machine Learning Classifiers through Diverse Counterfactual Examples (Paper)
Auto ML
- A Review of Automatic Selection Methods for Machine Learning Algorithms and Hyperparameter Values (Paper)
- Learning Multiple Defaults for Machine Learning Algorithms (Paper)
Benchmarking
- The Design and Analysis of Benchmark Experiments (Paper)
- Szilard Pafka’s ML Benchmarking Research (GitHub resources)
- Data-driven advice for applying machine learning to bioinformatics problems (Paper)
Resampling Procedures
- Futility Analysis in the Cross-Validation of Machine Learning Models (Paper)
- Estimating Classification Error Rate: Repeated Cross-validation, Repeated Hold-out, and Bootstrap (Paper)
Productionalization
- 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper)
- Hidden Technical Debt in Machine Learning Systems (Paper)
- Deep Learning in Production (Github resources)