andkret/Cookbook
The Data Engineering Cookbook
repo name | andkret/Cookbook |
repo link | https://github.com/andkret/Cookbook |
homepage | https://andreaskretz.com/ |
language | |
size (curr.) | 21710 kB |
stars (curr.) | 6307 |
created | 2019-03-10 |
license | Apache License 2.0 |
This Book Is & Will Always Be Free! But Please Support What You Like!
- Amazon: Click Here And buy whatever you like from Amazon* (Also check out my complete podcast gear and books)
- Patreon: Click Here Become a supporter on Patreon
- PayPal.me: Click Here Send some support (Please include a message and I read and answer it in the next video)
I’m Doing Data Engineer Coaching To Help You On Your Journey:
Do you need help becoming a Data Engineer and doing a personal project? I offer Data Engineer Coaching to help you on your journey. Go to my website teamdatascience.com to learn more.
Contents:
- Introduction
- Basic Engineering Skills
- Advanced Engineering Skills
- Hands On Course‚
- Case Studies
- Best Practices Cloud Platforms
- 130+ Data Sources Data Science
- 1001 Interview Questions
- Recommended Books and Courses
Full Table Of Contents:
Introduction
- What is this Cookbook
- Data Engineer vs Data Scientist
- My Data Science Platform Blueprint
- Who Companies Need
Basic Engineering Skills
- Learn To Code
- Get Familiar With Git
- Agile Development
- Software Engineering Culture
- Learn how a Computer Works
- Data Network Transmission
- Security and Privacy
- Linux
- Docker
- The Cloud
- Security Zone Design
Advanced Engineering Skills
- Data Science Platform
- Hadoop Platforms
- Connect
- Buffer
- Processing Frameworks
- Lambda and Kappa Architecture
- Batch Processing
- Stream Processing
- Should You do Stream or Batch Processing
- Is ETL still relevant for Analytics?
- MapReduce
- Apache Spark
- What is the Difference to MapReduce?
- How Spark Fits to Hadoop
- Spark vs Hadoop
- Spark and Hadoop a Perfect Fit
- Spark on YARn
- My Simple Rule of Thumb
- Available Languages
- Spark Driver Executor and SparkContext
- Spark Batch vs Stream processing
- How Spark uses Data From Hadoop
- What are RDDs and How to Use Them
- SparkSQL How and Why to Use It
- What are Dataframes and How to Use Them
- Machine Learning on Spark (TensorFlow)
- MLlib
- Spark Setup
- Spark Resource Management
- AWS Lambda
- Apache Flink
- Elasticsearch
- Apache Drill
- StreamSets
- Store
- Visualize
- Machine Learning
- How to do Machine Learning in production
- Why machine learning in production is harder then you think
- Models Do Not Work Forever
- Where are The Platforms That Support Machine Learning
- Training Parameter Management
- How to Convince People That Machine Learning Works
- No Rules No Physical Models
- You Have The Data. Use It!
- Data is Stronger Than Opinions
- AWS Sagemaker
Hands On Course
- What We Want To Do
- Thoughts On Choosing A Development Environment
- A Look Into the Twitter API
- Ingesting Tweets with Apache Nifi
- Writing from Nifi to Apache Kafka
- Apache Zeppelin Data Processing
- Switch Processing from Zeppelin to Spark
Case Studies
- Data Science @Airbnb
- Data Science @Amazon
- Data Science @Baidu
- Data Science @Blackrock
- Data Science @BMW
- Data Science @Booking.com
- Data Science @CERN
- Data Science @Disney
- Data Science @DLR
- Data Science @Drivetribe
- Data Science @Dropbox
- Data Science @Ebay
- Data Science @Expedia
- Data Science @Facebook
- Data Science @Google
- Data Science @Grammarly
- Data Science @ING Fraud
- Data Science @Instagram
- Data Science @LinkedIn
- Data Science @Lyft
- Data Science @NASA
- Data Science @Netflix
- Data Science @OLX
- Data Science @OTTO
- Data Science @Paypal
- Data Science @Pinterest
- Data Science @Salesforce
- Data Science @Siemens Mindsphere
- Data Science @Slack
- Data Science @Spotify
- Data Science @Symantec
- Data Science @Tinder
- Data Science @Twitter
- Data Science @Uber
- Data Science @Upwork
- Data Science @Woot
- Data Science @Zalando
Best Practices Cloud Platforms
130+ Free Data Sources For Data Science
- General And Academic
- Content Marketing
- Crime
- Drugs
- Education
- Entertainment
- Environmental And Weather Data
- Financial And Economic Data
- Government And World
- Health
- Human Rights
- Labor And Employment Data
- Politics
- Retail
- Social
- Travel And Transportation
- Various Portals
- Source Articles and Blog Posts
- Free Data Sources Data Science
1001 Interview Questions
Recommended Books and Courses
How To Contribute
If you have some cool links or topics for the cookbook, please become a contributor.
Simply pull the repo, add your ideas and create a pull request. You can also open an issue and put your thoughts there.
Please use the “Issues” function for comments.
Support
Everything is free, but please support what you like! Join my Patreon and become a plumber yourself: Link to my Patreon
Or support me and send a message I read on the next livestream through Paypal.me: Link to my Paypal.me/feedthestream
Important Links
Subscribe to my Plumbers of Data Science YouTube channel for regular updates: Link to YouTube
Check out my blog and get updated via mail by joining my mailing list: andreaskretz.com
I have a Medium publication where you can publish your data engineer articles to reach more people: Medium publication