Skip to the content.

Introduction

-

What Is Airflow?

Airflow is a platform to programmatically author, schedule and monitor workflows.

-

Why Airflow?

-

Architecture

airflow_architecture.jpg

-

Getting Started

Installation

-

Create Virtual Envionrment

$ conda create -n airflow python=3.7

-

Activate Virtual Environment

$ conda activate airflow

-

Install Airflow

$ pip install apache-airflow

-

Validate Airflow Installation

$ airflow version

(airflow) rd$ airflow version 
  ____________       _____________
 ____    |__( )_________  __/__  /________      _-
-___  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/  v1.10.9

-

Getting Started

Configuration

-

Set AIRFLOW_HOME environment variable

Replace “rd” with your user id

Open .bash_profile

$ vim ~/.bash_profile

Update .bash_profile by adding the following line:

export AIRFLOW_HOME="/Users/rd/dev/airflow_home/"

Force shell to reload with new environment variable

$ source ~/.bash_profile

Validate env var

$ echo $AIRFLOW_HOME 
/Users/rd/dev/airflow_home/

-

Create Directories

Replace “rd” with your user id

# create a directory for AIRFLOW_HOME
$ cd ~/dev/ && mkdir airflow_home
# create dags folder
$ mkdir $AIRFLOW_HOME/dags

-

Initialize

$ airflow initdb
$ cd $AIRFLOW_HOME
$ tree .

The directory structure of $AIRFLOW_HOME should look something like this after running the airflow initdb command:

├── airflow.cfg
├── airflow.db
├── dags
├── logs
│   └── scheduler
│       ├── 2020-03-03
│       └── latest -> /Users/rd/dev/airflow_home/logs/scheduler/2020-03-03
└── unittests.cfg

-

Getting Started

Launch

-

Start webserver

$ airflow webserver

-

Start the scheduler

Open another Terminal tab or window.

$ conda activate airflow
$ airflow scheduler

-

Open UI

Visit: 0.0.0.0:8080

-

Dags

-

What Is It?

A DAG is an Directed Acyclic Graphs.

-

How Does It Work?

-

The End

Parrot