adilkhash/Data-Engineering-HowTo
A list of useful resources to learn Data Engineering from scratch
repo name | adilkhash/Data-Engineering-HowTo |
repo link | https://github.com/adilkhash/Data-Engineering-HowTo |
homepage | |
language | |
size (curr.) | 36 kB |
stars (curr.) | 1180 |
created | 2019-03-28 |
license | |
How To Become a Data Engineer
Useful articles
- The AI Hierarchy of Needs
- The Rise of Data Engineer
- The Downfall of the Data Engineer
- A Beginner’s Guide to Data Engineering
- Functional Data Engineering — a modern paradigm for batch data processing
- How to become a Data Engineer (in Russian)
- Introduction to Apache Airflow (in Russian)
Talks
- Data Engineering Principles - Build frameworks not pipelines by Gatis Seja
- Functional Data Engineering - A Set of Best Practices by Maxime Beauchemin
- Advanced Data Engineering Patterns with Apache Airflow by Maxime Beauchemin
- Creating a Data Engineering Culture by Jesse Anderson
Algorithms & Data Structures
- Algorithmic Toolbox in Russian
- Data Structures in Russian
- Data Structures & Algorithms Specialization on Coursera
- Algorithms Specialization from Stanford on Coursera
SQL
- Comprehensive SQL Tutorial by Mode Analytics
- SQL Practice on Leetcode
- Modern SQL a website about modern SQL syntax
Programming
- Scala School by Twitter
- Fluent Python intermediate level book about Python
- Intro to Scala in Russian on Stepik by Tinkoff Bank
- The Hitchhiker’s Guide to Python by Kenneth Reitz & Tanya Schlusser
- Learn Python 3 The Hard Way by Zed A. Shaw
Databases
- Intro to Database Systems by Carnegie Mellon University
- Advanced Database Systems by Carnegie Mellon University
- On Disk IO
Distributed Systems
- Distributed systems for fun and profit by Mikito Takada
- Distributed Systems by Maarten van Steen & Andrew S. Tanenbaum
- CS 436: Distributed Computer Systems by University of Waterloo
- Distributed consensus reading list maintained by Heidi Howard from University of Cambridge
Books
- Design Data-Intensive Applications by Martin Kleppmann
- Introduction to Algorithms by Thomas Cormen
- The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
- Star Schema The Complete Reference
- Database Internals: A Deep Dive into How Distributed Data Systems Work
- Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
- A Philosophy of Software Design
Courses
- Big Data for Data Engineers Specialization by Yandex
- Data Engineering on Google Cloud Platform Specialization by Google
- Data Engineer Nanodegree by Udacity
- Data Engineering with Python by DataCamp
Blogs
- Martin Kleppmann author of Designing Data-Intensive Application
- BaseDS by Vaidehi Joshi about Distributed Systems
Tools
- Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
- Apache Spark is a unified analytics engine for large-scale data processing
- Apache Kafka is a distributed streaming platform
- Luigi is a Python package that helps you build complex pipelines of batch jobs.
- Dagster.io is a system for building modern data applications.
- Prefect includes everything you need to create and run data applications.
- Metaflow build and manage real-life data science projects with ease
Cloud Platforms
Communities
- data Engineering - telegram chat about data engineering
- Data Engineering Subreddit - subreddit about data engineering
Data Engineering Jobs
Other
Newsletters & Digests
- DataEng Telegram channel - Telegram channel about data engineering (rus/eng)
- Data Eng Weekly - Your weekly Data Engineering news
- SF Data Weekly - A weekly email of useful links for people interested in building data platforms
- Data Elixir - Data Elixir is an email newsletter that keeps you on top of the tools and trends in Data Science.