Course Overview

Apache Airflow is an open-source platform used to programmatically author, schedule, and monitor workflows. It allows you to define, schedule, and manage complex data workflows, making it easier to automate and manage data pipelines, ETL (Extract, Transform, Load) processes, and other tasks.

It is a versatile tool for orchestrating and automating complex workflows. Its ability to schedule, monitor, and manage tasks in a flexible and extensible manner makes it a popular choice for managing data pipelines, ETL processes, and various other automation scenarios.

Learning Outcomes

  • You’ll understand how to design, define, and schedule complex workflows using directed acyclic graphs (DAGs). This knowledge is fundamental for orchestrating tasks and dependencies within a workflow.
  • You’ll understand how to use Airflow’s scheduling capabilities, including cron-like expressions and interval-based triggers, to control when and how often your workflows run.
  • You’ll gain the ability to integrate Airflow with various external systems, databases, cloud services, and APIs, enabling you to automate a wide range of tasks and operations.

Course Outline

Day 1-2

  • Why Airflow?
  • Architecture
  • Workloads
  • DAGs and DAG runs
  • Tasks
  • Operators
  • Sensors
  • Control Flow
  • User Interface
  • Executor
  • XComs
  • Variables

Day 3-4

  • Airflow in Kubernetes
  • Deploying Spark Jobs
  • Security
  • Logging & Monitoring
  • Lineage
  • Listeners
  • DAG Serialization
  • Scheduler
  • Pools
  • Cluster Policies
  • Priority Weights

Day 5

  • Deployment
  • Sample Application
  • Individual and Group Work Presentations
  • Final Exam

Skill Level

Intermediate

Suitable For

It is suitable for anyone who needs to automate, schedule, and manage workflows involving data processing, data movement, task execution, and other complex operations.

Prerequisites

  • BIGDATA-103 – Python for Data Engineers
  • DEVOPS-101 — Docker and Kubernetes
  • BIGDATA-202 — Apache Spark

Duration

5

 day

s

Related Topics

Customize your Training

Skill Level

Intermediate

Suitable For

It is suitable for anyone who needs to automate, schedule, and manage workflows involving data processing, data movement, task execution, and other complex operations.

Duration

5

 day

s

Related Topics

Course Overview

Apache Airflow is an open-source platform used to programmatically author, schedule, and monitor workflows. It allows you to define, schedule, and manage complex data workflows, making it easier to automate and manage data pipelines, ETL (Extract, Transform, Load) processes, and other tasks.

It is a versatile tool for orchestrating and automating complex workflows. Its ability to schedule, monitor, and manage tasks in a flexible and extensible manner makes it a popular choice for managing data pipelines, ETL processes, and various other automation scenarios.

Learning Outcomes

  • You’ll understand how to design, define, and schedule complex workflows using directed acyclic graphs (DAGs). This knowledge is fundamental for orchestrating tasks and dependencies within a workflow.
  • You’ll understand how to use Airflow’s scheduling capabilities, including cron-like expressions and interval-based triggers, to control when and how often your workflows run.
  • You’ll gain the ability to integrate Airflow with various external systems, databases, cloud services, and APIs, enabling you to automate a wide range of tasks and operations.

Course Outline

Day 1-2

  • Why Airflow?
  • Architecture
  • Workloads
  • DAGs and DAG runs
  • Tasks
  • Operators
  • Sensors
  • Control Flow
  • User Interface
  • Executor
  • XComs
  • Variables

Day 3-4

  • Airflow in Kubernetes
  • Deploying Spark Jobs
  • Security
  • Logging & Monitoring
  • Lineage
  • Listeners
  • DAG Serialization
  • Scheduler
  • Pools
  • Cluster Policies
  • Priority Weights

Day 5

  • Deployment
  • Sample Application
  • Individual and Group Work Presentations
  • Final Exam