interpretml/interpret
Fit interpretable machine learning models. Explain blackbox machine learning.
repo name | interpretml/interpret |
repo link | https://github.com/interpretml/interpret |
homepage | |
language | C++ |
size (curr.) | 3049 kB |
stars (curr.) | 2399 |
created | 2019-05-03 |
license | MIT License |
InterpretML - Alpha Release
In the beginning machines learned in darkness, and data scientists struggled in the void to explain them.
Let there be light.
InterpretML is an open-source python package for training interpretable machine learning models and explaining blackbox systems. Interpretability is essential for:
- Model debugging - Why did my model make this mistake?
- Detecting bias - Does my model discriminate?
- Human-AI cooperation - How can I understand and trust the model’s decisions?
- Regulatory compliance - Does my model satisfy legal requirements?
- High-risk applications - Healthcare, finance, judicial, …
Historically, the most interpretable machine learning models were not very accurate, and the most accurate models were not very interpretable. Microsoft Research has developed an algorithm called the Explainable Boosting Machine (EBM)* which has both high accuracy and interpretability. EBM uses modern machine learning techniques like bagging and boosting to breathe new life into traditional GAMs (Generalized Additive Models). This makes them as accurate as random forests and gradient boosted trees, and also enhances their intelligibility and editability.
Notebook for reproducing table
Dataset/AUROC | Domain | Logistic Regression | Random Forest | XGBoost | Explainable Boosting Machine |
---|---|---|---|---|---|
Adult Income | Finance | .907±.003 | .903±.002 | .922±.002 | .928±.002 |
Heart Disease | Medical | .895±.030 | .890±.008 | .870±.014 | .916±.010 |
Breast Cancer | Medical | .995±.005 | .992±.009 | .995±.006 | .995±.006 |
Telecom Churn | Business | .804±.015 | .824±.002 | .850±.006 | .851±.005 |
Credit Fraud | Security | .979±.002 | .950±.007 | .981±.003 | .975±.005 |
In addition to EBM, InterpretML also supports methods like LIME, SHAP, linear models, partial dependence, decision trees and rule lists. The package makes it easy to compare and contrast models to find the best one for your needs.
* EBM is a fast implementation of GA2M. Details on the algorithm can be found here.
Installation
Python 3.5+ | Linux, Mac OS X, Windows
pip install -U interpret
Getting Started
Let’s fit an Explainable Boosting Machine
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier()
ebm.fit(X_train, y_train)
# EBM supports pandas dataframes, numpy arrays, and handles "string" data natively.
Understand the model
from interpret import show
ebm_global = ebm.explain_global()
show(ebm_global)
Understand individual predictions
ebm_local = ebm.explain_local(X_test, y_test)
show(ebm_local)
And if you have multiple models, compare them
show([logistic_regression, decision_tree])
Example Notebooks
- Interpretable machine learning models for binary classification
- Interpretable machine learning models for regression
- Blackbox interpretability for binary classification
- Blackbox interpretability for regression
Roadmap
Currently we’re working on:
- R language interface (R is currently a WIP. Basic EBM classification can be done via the ebm_classify & ebm_predict_proba functions, but the predictions are a bit less accurate than in python. No plotting included yet, but other R plotting tools can do a basic job visualizing EBM models)
- Missing Values Support
- Improved Categorical Encoding
- Interaction effect purification (see citations for details)
…and lots more! Get in touch to find out more.
Contributing
If you are interested contributing directly to the code base, please see CONTRIBUTING.md.
Acknowledgements
InterpretML was originally created by (equal contributions): Samuel Jenkins & Harsha Nori & Paul Koch & Rich Caruana
Many people have supported us along the way. Check out ACKNOWLEDGEMENTS.md!
We also build on top of many great packages. Please check them out!
plotly | dash | scikit-learn | lime | shap | salib | skope-rules | treeinterpreter | gevent | joblib | pytest | jupyter
Citations
Paper link
External links
- A gentle introduction to GA2Ms, a white box model
- On Model Explainability: From LIME, SHAP, to Explainable Boosting
- Benchmarking and MLI experiments on the Adult dataset
- Dealing with Imbalanced Data (Mortgage loans defaults)
- Kaggle PGA Tour analysis by GAM
- Interpretable Prediction of Goals in Soccer
- Explaining Model Pipelines With InterpretML
- Explain Your Model with Microsoft’s InterpretML
Contact us
There are multiple ways to get in touch:
- Email us at interpret@microsoft.com
- Or, feel free to raise a GitHub issue
If a tree fell in your random forest, would anyone notice?