February 7, 2020

# davidrosenberg/mlcourse

Machine learning course materials.

repo name davidrosenberg/mlcourse
homepage https://davidrosenberg.github.io/ml2019
language Jupyter Notebook
size (curr.) 356314 kB
stars (curr.) 314
created 2015-10-11

## Notable Changes from 2017FOML to 2018

• Elaborated on the case against sparsity in the lecture on elastic net, to complement the reasons for sparsity on the slide Lasso Gives Feature Sparsity: So What?.
• Added a note on conditional expectations, since many students find the notation confusing.
• Added a note on the correlated features theorem for elastic net, which was basically a translation of Zou and Hastie’s 2005 paper “Regularization and variable selection via the elastic net.” into the notation of our class, dropping an unnecessary centering condition, and using a more standard definition of correlation.
• Changes to EM Algorithm presentation: Added several diagrams (slides 10-14) to give the general idea of a variational method, and made explicit that the marginal log-likelihood is exactly the pointwise supremum over the variational lower bounds (slides 31 and 32)).
• Treatment of the representer theorem is now well before any mention of kernels, and is described as an interesting consequence of basic linear algebra: “Look how the solution always lies in the subspace spanned by the data. That’s interesting (and obvious with enough practice). We can now constrain our optimization problem to this subspace…”
• The kernel methods lecture was rewritten to significantly reduce references to the feature map. When we’re just talking about kernelization, it seems like unneeded extra notation.
• Replaced the 1-hour crash course in Lagrangian duality with a 10-minute summary of Lagrangian duality, which I actually never presented and left as optional reading.
• Added a brief note on Thompson sampling for Bernoulli Bandits as a fun application for our unit on Bayesian statistics.
• Significant improvement of the programming problem for lasso regression in Homework #2.
• New written and programming problems on logistic regression in Homework #5 (showing the equivalence of the ERM and the conditional probability model formulations, as well as implementing regularized logistic regression).
• New homework on backpropagation Homework #7 (with Philipp Meerkamp and Pierre Garapon).

## Notable Changes from 2016 to 2017

• New lecture on geometric approach to SVMs (Brett)
• New lecture on principal component analysis (Brett)
• Added slide on k-means++ (Brett)
• Added slides on explicit feature vector for 1-dim RBF kernel
• Created notebook to regenerate the buggy lasso/elastic net plots from Hastie’s book (Vlad)
• L2 constraint for linear models gives Lipschitz continuity of prediction function (Thanks to Brian Dalessandro for pointing this out to me).
• Expanded discussion of L1/L2/ElasticNet with correlated random variables (Thanks Brett for the figures)

## Possible Future Topics

### Basic Techniques

• Gaussian processes
• MCMC (or at least Gibbs sampling)
• Importance sampling
• Density ratio estimation (for covariate shift, anomaly detection, conditional probability modeling)
• Local methods (knn, locally weighted regression, etc.)

### Applications

• Collaborative filtering / matrix factorization (building on this lecture on matrix factorization and Brett’s lecture on PCA)
• Learning to rank and associated concepts
• Bandits / learning from logged data?
• Generalized additive models for interpretable nonlinear fits (smoothing way, basis function way, and gradient boosting way)
• Automated hyperparameter search (with GPs, random, hyperband,…)
• Active learning
• Domain shift / covariate shift adaptation
• Reinforcement learning (minimal path to REINFORCE)

#### Latent Variable Models

• PPCA / Factor Analysis and non-Gaussian generalizations
• Personality types as example of factor analysis if we can get data?
• Variational Autoencoders
• Latent Dirichlet Allocation / topic models
• Generative models for images and text (where we care about the human-perceived quality of what’s generated rather than the likelihood given to test examples) (GANs and friends)

#### Bayesian Models

• Relevance vector machines
• BART
• Gaussian process regression and conditional probability models

### Other

• Class imbalance
• Black box feature importance measures (building on Ben’s 2018 lecture)
• Quantile regression and conditional prediction intervals (perhaps integrated into homework on loss functions);
• More depth on basic neural networks: weight initialization, vanishing / exploding gradient, possibly batch normalization
• Finish up ‘structured prediction’ with beam search / Viterbi
• give probabilistic analogue with MEMM’s/CRF’s
• Generative vs discriminative (Jordan & Ng’s naive bayes vs logistic regression, plus new experiments including regularization)