Build your first data warehouse with Airflow on GCP

Click for: original source

What are the steps in building a data warehouse? What cloud technology should you use? How to use Airflow to orchestrate your pipeline? By Tuan Nguyen.

In this project, we will build a data warehouse on Google Cloud Platform that will help answer common business questions as well as powering dashboards. You will experience first hand how to build a DAG to achieve a common data engineering task: extract data from sources, load to a data sink, transform and model the data for business consumption.

The article is split into:

  • Why Google Cloud Platform?
    • Cost
    • Ease of use
  • Business objective
  • The dataset
  • Data modeling
  • Architecture
  • Set up the infrastructure
  • Data pipeline

… and much more. Author will walk through the many steps of designing and deploying a data warehouse in GCP using Airflow as an orchestrator. You will also get source code which you can reference in GitHub repo.Excellent for anybody in data science!

[Read More]

Tags google cloud gcp big-data cio data-science