Introduction
-
What Is Airflow?
Airflow is a platform to programmatically author, schedule and monitor workflows.
-
Why Airflow?
- Scalable - Message queue is used to orchestrate workers.
- Dynamic - Workflows are expressed in Python code.
- Extensible - Define your own abstractions for your domain.
-
Architecture
-
Getting Started
Installation
-
Create Virtual Envionrment
$ conda create -n airflow python=3.7
-
Activate Virtual Environment
$ conda activate airflow
-
Install Airflow
$ pip install apache-airflow
-
Validate Airflow Installation
$ airflow version
(airflow) rd$ airflow version
____________ _____________
____ |__( )_________ __/__ /________ _-
-___ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/ v1.10.9
-
Getting Started
Configuration
-
Set AIRFLOW_HOME environment variable
Replace “rd” with your user id
Open .bash_profile
$ vim ~/.bash_profile
Update .bash_profile by adding the following line:
export AIRFLOW_HOME="/Users/rd/dev/airflow_home/"
Force shell to reload with new environment variable
$ source ~/.bash_profile
Validate env var
$ echo $AIRFLOW_HOME
/Users/rd/dev/airflow_home/
-
Create Directories
Replace “rd” with your user id
# create a directory for AIRFLOW_HOME
$ cd ~/dev/ && mkdir airflow_home
# create dags folder
$ mkdir $AIRFLOW_HOME/dags
-
Initialize
$ airflow initdb
$ cd $AIRFLOW_HOME
$ tree .
The directory structure of $AIRFLOW_HOME should look something like this after running the airflow initdb command:
├── airflow.cfg
├── airflow.db
├── dags
├── logs
│ └── scheduler
│ ├── 2020-03-03
│ └── latest -> /Users/rd/dev/airflow_home/logs/scheduler/2020-03-03
└── unittests.cfg
-
Getting Started
Launch
-
Start webserver
$ airflow webserver
-
Start the scheduler
Open another Terminal tab or window.
$ conda activate airflow
$ airflow scheduler
-
Open UI
Visit: 0.0.0.0:8080
-
Dags
-
What Is It?
A DAG is an Directed Acyclic Graphs.
-
How Does It Work?
- Demo UI
- Review sample DAGs.