November 21, 2019

588 words 3 mins read

adilkhash/Data-Engineering-HowTo

adilkhash/Data-Engineering-HowTo

A list of useful resources to learn Data Engineering from scratch

repo name adilkhash/Data-Engineering-HowTo
repo link https://github.com/adilkhash/Data-Engineering-HowTo
homepage
language
size (curr.) 36 kB
stars (curr.) 1180
created 2019-03-28
license

How To Become a Data Engineer

Useful articles

Talks

Algorithms & Data Structures

SQL

Programming

Databases

Distributed Systems

Books

Courses

Blogs

  • Martin Kleppmann author of Designing Data-Intensive Application
  • BaseDS by Vaidehi Joshi about Distributed Systems

Tools

  • Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
  • Apache Spark is a unified analytics engine for large-scale data processing
  • Apache Kafka is a distributed streaming platform
  • Luigi is a Python package that helps you build complex pipelines of batch jobs.
  • Dagster.io is a system for building modern data applications.
  • Prefect includes everything you need to create and run data applications.
  • Metaflow build and manage real-life data science projects with ease

Cloud Platforms

Communities

Data Engineering Jobs

Other

Newsletters & Digests

  • DataEng Telegram channel - Telegram channel about data engineering (rus/eng)
  • Data Eng Weekly - Your weekly Data Engineering news
  • SF Data Weekly - A weekly email of useful links for people interested in building data platforms
  • Data Elixir - Data Elixir is an email newsletter that keeps you on top of the tools and trends in Data Science.
comments powered by Disqus