June 4, 2020

487 words 3 mins read

microsoft/hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.


repo name	microsoft/hummingbird
repo link	https://github.com/microsoft/hummingbird
homepage
language	Python
size (curr.)	1281 kB
stars (curr.)	1091
created	2020-03-12
license	MIT License

Hummingbird

Introduction

Hummingbird is a library for compiling trained traditional ML models into tensor computations. Hummingbird allows users to seamlessly leverage neural network frameworks (such as PyTorch) to accelerate traditional ML models. Thanks to Hummingbird, users can benefit from: (1) all the current and future optimizations implemented in neural network frameworks; (2) native hardware acceleration; (3) having a unique platform to support for both traditional and neural network models; and have all of this (4) without having to re-engineer their models.

Currently, you can use Hummingbird to convert your trained traditional ML models into PyTorch. Hummingbird supports a variety of tree-based classifiers and regressors. These models include scikit-learn Decision Trees and Random Forest, and also LightGBM and XGBoost Classifiers/Regressors. Support for other neural network backends (e.g., ONNX, TVM) and models is on our roadmap.

Installation

Hummingbird was tested on Python >= 3.5 on Linux, Windows and MacOS machines. It is recommended to use a virtual environment (See: python3 venv doc or Using Python environments in VS Code.)

Install the Hummingbird package:

pip install hummingbird-ml

If you require the optional dependencies lightgbm and xgboost, you can use:

pip install hummingbird-ml[extra]

See also Troubleshooting for common problems.

Examples

See the notebooks section for examples that demonstrate use and speedups.

In general, Hummingbird syntax is very intuitive and minimal. To run your traditional ML model on DNN frameworks, you only need to import hummingbird.ml and add to('dnn_framework') to your code. Below is an example using a scikit-learn random forest model and PyTorch as target framework.

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from hummingbird.ml import convert

# Create some random data for binary classification
num_classes = 2
X = np.array(np.random.rand(100000, 28), dtype=np.float32)
y = np.random.randint(num_classes, size=100000)

# Create and train a model (scikit-learn RandomForestClassifier in this case)
skl_model = RandomForestClassifier(n_estimators=10, max_depth=10)
skl_model.fit(X, y)

# Use Hummingbird to convert the model to PyTorch
model = convert(skl_model, 'pytorch')

# Run predictions on CPU
model.predict(X)

# Run predictions on GPU
model.to('cuda')
model.predict(X)

Documentation

The API documentation is here.

You can also read about Hummingbird in our blog post here.

For more details on the vision and on the technical details related to Hummingbird, please check our papers:

Taming Model Serving Complexity, Performance and Cost: A Compilation to Tensor Computations Approach. Supun Nakandalam, Karla Saur, Gyeong-In Yu, Konstantinos Karanasos, Carlo Curino, Markus Weimer, Matteo Interlandi. Technical Report
Compiling Classical ML Pipelines into Tensor Computations for One-size-fits-all Prediction Serving. Supun Nakandala, Gyeong-In Yu, Markus Weimer, Matteo Interlandi. System for ML Workshop. NeurIPS 2019

Contributing

We welcome contributions! Please see the guide on Contributing.

Also, see our roadmap of planned features.

Community

Join our community!

For more formal enquiries, you can contact us.

Authors

Supun Nakandala
Matteo Interlandi
Karla Saur

License

MIT License

microsoft/hummingbird

Hummingbird

Introduction

Installation

Examples

Documentation

Contributing

Community

Authors

License

Azure/azure-sdk-for-python

microsoft/unilm

microsoft/DeepSpeed

microsoft/DialoGPT

microsoft/cascadia-code

microsoft/c9-python-getting-started

microsoft/nlp-recipes

microsoft/dowhy

microsoft/MASS