Course Overview
Apache Airflow is an open-source platform used to programmatically author, schedule, and monitor workflows. It allows you to define, schedule, and manage complex data workflows, making it easier to automate and manage data pipelines, ETL (Extract, Transform, Load) processes, and other tasks.
It is a versatile tool for orchestrating and automating complex workflows. Its ability to schedule, monitor, and manage tasks in a flexible and extensible manner makes it a popular choice for managing data pipelines, ETL processes, and various other automation scenarios.
Learning Outcomes
- You’ll understand how to design, define, and schedule complex workflows using directed acyclic graphs (DAGs). This knowledge is fundamental for orchestrating tasks and dependencies within a workflow.
- You’ll understand how to use Airflow’s scheduling capabilities, including cron-like expressions and interval-based triggers, to control when and how often your workflows run.
- You’ll gain the ability to integrate Airflow with various external systems, databases, cloud services, and APIs, enabling you to automate a wide range of tasks and operations.
Course Outline
Day 1-2
- Why Airflow?
- Architecture
- Workloads
- DAGs and DAG runs
- Tasks
- Operators
- Sensors
- Control Flow
- User Interface
- Executor
- XComs
- Variables
Day 3-4
- Airflow in Kubernetes
- Deploying Spark Jobs
- Security
- Logging & Monitoring
- Lineage
- Listeners
- DAG Serialization
- Scheduler
- Pools
- Cluster Policies
- Priority Weights
Day 5
- Deployment
- Sample Application
- Individual and Group Work Presentations
- Final Exam
Skill Level
Suitable For
It is suitable for anyone who needs to automate, schedule, and manage workflows involving data processing, data movement, task execution, and other complex operations.
Prerequisites
- BIGDATA-103 – Python for Data Engineers
- DEVOPS-101 — Docker and Kubernetes
- BIGDATA-202 — Apache Spark
Duration
day
Related Topics
Skill Level
Suitable For
It is suitable for anyone who needs to automate, schedule, and manage workflows involving data processing, data movement, task execution, and other complex operations.
Duration
day
Related Topics
Course Overview
Apache Airflow is an open-source platform used to programmatically author, schedule, and monitor workflows. It allows you to define, schedule, and manage complex data workflows, making it easier to automate and manage data pipelines, ETL (Extract, Transform, Load) processes, and other tasks.
It is a versatile tool for orchestrating and automating complex workflows. Its ability to schedule, monitor, and manage tasks in a flexible and extensible manner makes it a popular choice for managing data pipelines, ETL processes, and various other automation scenarios.
Learning Outcomes
- You’ll understand how to design, define, and schedule complex workflows using directed acyclic graphs (DAGs). This knowledge is fundamental for orchestrating tasks and dependencies within a workflow.
- You’ll understand how to use Airflow’s scheduling capabilities, including cron-like expressions and interval-based triggers, to control when and how often your workflows run.
- You’ll gain the ability to integrate Airflow with various external systems, databases, cloud services, and APIs, enabling you to automate a wide range of tasks and operations.
Course Outline
Day 1-2
- Why Airflow?
- Architecture
- Workloads
- DAGs and DAG runs
- Tasks
- Operators
- Sensors
- Control Flow
- User Interface
- Executor
- XComs
- Variables
Day 3-4
- Airflow in Kubernetes
- Deploying Spark Jobs
- Security
- Logging & Monitoring
- Lineage
- Listeners
- DAG Serialization
- Scheduler
- Pools
- Cluster Policies
- Priority Weights
Day 5
- Deployment
- Sample Application
- Individual and Group Work Presentations
- Final Exam