intel-analytics/analytics-zoo
Distributed Tensorflow, Keras, PyTorch and BigDL on Apache Spark
repo name | intel-analytics/analytics-zoo |
repo link | https://github.com/intel-analytics/analytics-zoo |
homepage | https://analytics-zoo.github.io |
language | Jupyter Notebook |
size (curr.) | 209380 kB |
stars (curr.) | 1246 |
created | 2017-05-05 |
license | Apache License 2.0 |
A unified Data Analytics and AI platform for distributed TensorFlow, Keras, PyTorch, Apache Spark/Flink and Ray
What is Analytics Zoo?
Analytics Zoo provides a unified data analytics and AI platform that seamlessly unites TensorFlow, Keras, PyTorch, Spark, Flink and Ray programs into an integrated pipeline, which can transparently scale from a laptop to large clusters to process production big data.
-
Integrated Analytics and AI Pipelines for easily prototyping and deploying end-to-end AI applications.
- Write TensorFlow or PyTorch inline with Spark code for distributed training and inference.
- Native deep learning (TensorFlow/Keras/PyTorch/BigDL) support in Spark ML Pipelines.
- Directly run Ray programs on big data cluster through RayOnSpark.
- Plain Java/Python APIs for (TensorFlow/PyTorch/BigDL/OpenVINO) Model Inference.
-
High-Level ML Workflow that automates the process of building large-scale machine learning applications.
- Automatically distributed Cluster Serving (for TensorFlow/PyTorch/Caffe/BigDL/OpenVINO models) with a simple pub/sub API.
- Scalable AutoML for time series prediction (that automatically generates features, selects models and tunes hyperparameters).
-
Built-in Algorithms and Models for Recommendation, Time Series, Computer Vision and NLP applications.
Why use Analytics Zoo?
You may want to develop your AI solutions using Analytics Zoo if:
- You want to easily prototype the entire end-to-end pipeline that applies AI models (e.g., TensorFlow, Keras, PyTorch, BigDL, OpenVINO, etc.) to production big data.
- You want to transparently scale your AI applications from a laptop to large clusters with “zero” code changes.
- You want to deploy your AI pipelines to existing YARN or K8S clusters WITHOUT any modifications to the clusters.
- You want to automate the process of applying machine learning (such as feature engineering, hyperparameter tuning, model selection and distributed inference).
How to use Analytics Zoo?
- Quick start with Analytics Zoo using the pre-built Docker Image.
- Refer to the Python and Scala installation guides to install Analytics Zoo.
- Visit the Document Website for more information on Analytics Zoo.
- Check the Powered By & Presentations pages for real-world applications using Analytics Zoo.
- Join the Google Group (or subscribe to the Mail List) for more questions and discussions on Analytics Zoo.