October 9, 2019

616 words 3 mins read

BayesWitnesses/m2cgen

Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP) with zero dependencies


repo name	BayesWitnesses/m2cgen
repo link	https://github.com/BayesWitnesses/m2cgen
homepage
language	Python
size (curr.)	384 kB
stars (curr.)	1513
created	2019-01-13
license	MIT License

m2cgen

m2cgen (Model 2 Code Generator) - is a lightweight library which provides an easy way to transpile trained statistical models into a native code (Python, C, Java, Go, JavaScript, Visual Basic, C#, PowerShell, R, PHP, Dart).

Installation
Supported Languages
Supported Models
Classification Output
Usage
CLI
FAQ

Installation

Supported Python version is >= 3.5.

pip install m2cgen

Supported Languages

C
C#
Dart
Go
Java
JavaScript
PHP
PowerShell
Python
R
Visual Basic

Supported Models

	Classification	Regression
Linear	scikit-learnLogisticRegressionLogisticRegressionCVPassiveAggressiveClassifierPerceptronRidgeClassifierRidgeClassifierCVSGDClassifierlightningAdaGradClassifierCDClassifierFistaClassifierSAGAClassifierSAGClassifierSDCAClassifierSGDClassifier	scikit-learnARDRegressionBayesianRidgeElasticNetElasticNetCVHuberRegressorLarsLarsCVLassoLassoCVLassoLarsLassoLarsCVLassoLarsICLinearRegressionOrthogonalMatchingPursuitOrthogonalMatchingPursuitCVPassiveAggressiveRegressorRANSACRegressor(only supported regression estimators can be used as a base estimator)RidgeRidgeCVSGDRegressorTheilSenRegressorStatsModelsGeneralized Least Squares (GLS)Generalized Least Squares with AR Errors (GLSAR)Ordinary Least Squares (OLS)Quantile Regression (QuantReg)Weighted Least Squares (WLS)lightningAdaGradRegressorCDRegressorFistaRegressorSAGARegressorSAGRegressorSDCARegressor
SVM	scikit-learnLinearSVCNuSVCSVClightningKernelSVC (binary only, multiclass is not supported yet)LinearSVC	scikit-learnLinearSVRNuSVRSVRlightningLinearSVR
Tree	DecisionTreeClassifierExtraTreeClassifier	DecisionTreeRegressorExtraTreeRegressor
Random Forest	ExtraTreesClassifierLGBMClassifier(rf booster only)RandomForestClassifierXGBRFClassifier(binary only, multiclass is not supported yet)	ExtraTreesRegressorLGBMRegressor(rf booster only)RandomForestRegressorXGBRFRegressor
Boosting	LGBMClassifier(gbdt/dart/goss booster only)XGBClassifier(gbtree/gblinear booster only)	LGBMRegressor(gbdt/dart/goss booster only)XGBRegressor(gbtree/gblinear booster only)

Classification Output

Linear/Linear SVM

Binary

Scalar value; signed distance of the sample to the hyperplane for the second class.

Multiclass

Vector value; signed distance of the sample to the hyperplane per each class.

Comment

The output is consistent with the output of LinearClassifierMixin.decision_function.

SVM

Binary

Scalar value; signed distance of the sample to the hyperplane for the second class.

Multiclass

Vector value; one-vs-one score for each class, shape (n_samples, n_classes * (n_classes-1) / 2).

Comment

The output is consistent with the output of BaseSVC.decision_function when the decision_function_shape is set to ovo.

Tree/Random Forest/XGBoost/LightGBM

Binary

Vector value; class probabilities.

Multiclass

Vector value; class probabilities.

Comment

The output is consistent with the output of the predict_proba method of DecisionTreeClassifier/ForestClassifier/XGBClassifier/LGBMClassifier.

Usage

Here’s a simple example of how a linear model trained in Python environment can be represented in Java code:

from sklearn.datasets import load_boston
from sklearn import linear_model
import m2cgen as m2c

boston = load_boston()
X, y = boston.data, boston.target

estimator = linear_model.LinearRegression()
estimator.fit(X, y)

code = m2c.export_to_java(estimator)

Generated Java code:

public class Model {

    public static double score(double[] input) {
        return (((((((((((((36.45948838508965) + ((input[0]) * (-0.10801135783679647))) + ((input[1]) * (0.04642045836688297))) + ((input[2]) * (0.020558626367073608))) + ((input[3]) * (2.6867338193449406))) + ((input[4]) * (-17.76661122830004))) + ((input[5]) * (3.8098652068092163))) + ((input[6]) * (0.0006922246403454562))) + ((input[7]) * (-1.475566845600257))) + ((input[8]) * (0.30604947898516943))) + ((input[9]) * (-0.012334593916574394))) + ((input[10]) * (-0.9527472317072884))) + ((input[11]) * (0.009311683273794044))) + ((input[12]) * (-0.5247583778554867));
    }
}

You can find more examples of generated code for different models/languages here.

CLI

m2cgen can be used as a CLI tool to generate code using serialized model objects (pickle protocol):

$ m2cgen <pickle_file> --language <language> [--indent <indent>] [--function_name <function_name>]
         [--class_name <class_name>] [--module_name <module_name>] [--package_name <package_name>]
         [--namespace <namespace>] [--recursion-limit <recursion_limit>]

Don’t forget that for unpickling serialized model objects their classes must be defined in the top level of an importable module in the unpickling environment.

Piping is also supported:

$ cat <pickle_file> | m2cgen --language <language>

FAQ

Q: Generation fails with RuntimeError: maximum recursion depth exceeded error.

A: If this error occurs while generating code using an ensemble model, try to reduce the number of trained estimators within that model. Alternatively you can increase the maximum recursion depth with sys.setrecursionlimit(<new_depth>).

Q: Generation fails with ImportError: No module named <module_name_here> error while transpiling model from a serialized model object.

A: This error indicates that pickle protocol cannot deserialize model object. For unpickling serialized model objects, it is required that their classes must be defined in the top level of an importable module in the unpickling environment. So installation of package which provided model’s class definition should solve the problem.

BayesWitnesses/m2cgen

m2cgen

Installation

Supported Languages

Supported Models

Classification Output

Linear/Linear SVM

Binary

Multiclass

Comment

SVM

Binary

Multiclass

Comment

Tree/Random Forest/XGBoost/LightGBM

Binary

Multiclass

Comment

Usage

CLI

FAQ

zjhuang22/maskscoring_rcnn

RubensZimbres/Repo-2017

openai/gpt-2

uber/ludwig

uber-research/go-explore

modin-project/modin

j96w/DenseFusion

rowanz/r2c

cair/TsetlinMachine