pdeguzman96/data_cleaning_workshop
Created the contents of this repo originally for a workshop I gave at UCLA
repo name | pdeguzman96/data_cleaning_workshop |
repo link | https://github.com/pdeguzman96/data_cleaning_workshop |
homepage | |
language | Jupyter Notebook |
size (curr.) | 5904 kB |
stars (curr.) | 63 |
created | 2020-01-02 |
license | |
Machine Learning Project Checklist
Summary: This checklist was created to help ML students/practitioners structure their projects and problems in a way that makes sense to me.
When I just got started learning Python for Machine Learning and worked on my first few projects, I found it very overwhelming because…
- it was difficult to remember all of the steps I needed to take in order to make my data ML-friendly,
- I couldn’t easily remember the functions, methods, and estimators from pandas, numpy, and sklearn, and
- it was tedious and time-consuming to try to understand large (>50 feature) datasets
So, I created the ML checklist (Pictured Below) to be a handy tool for whenever I start to feel lost creating an ML project.
In this repo, I also created…
ml_project_checklist_template.ipynb
: (Pictured below) a Jupyter .ipynb that you can use as a template for your project or Kaggle competitiondata_cleaning_for_ml_lab_EXERCISES.ipynb
: An exercises/lab that you can finish for data cleaning practice, originally made for a workshop that I gavedata_cleaning_for_ml_lab_SOLUTIONS.ipynb
: A solutions file for the exercises I give aboveboston.csv
andcambridge.csv
: Airbnb datasets from here used for the exercises- I also included a PDF version of the checklist.
I hope you find these resources as useful as I do!
Happy learning :).