A Beginner’s Guide to Apache Airflow — Part 3: Intro and Setup
Setup Airflow with Docker in 15 minutes
This is my third episode on the #dataseries.
Previously — on my #dataseries
1. A Beginner’s Guide to DBT (data build tool) — Part 1: Introduction
2. A Beginner’s Guide to DBT (data build tool) — Part 2: Setup guide & tips
Table of Contents
1. What is Airflow?
1.1 — Definition
1.2 — Quick start
2. Prerequisites
2.1 — What is Docker?
2.2 — My Noodle Analogy
2.3 — Install Docker
2.3.1 — Installdocker
2.3.2 — Installdocker compose
3. Install Airflow
3.1 — Initialising environment
3.2 — Running Airflow
3.3 — Accessing the environment
4. Final Word
1. What is Airflow?
1.1 Definition
- “Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies.”
- There are a few Airflow alternatives on that market such as Dagster or Prefect.
1.2 Quick Start
- From the official docs, there are two ways to set up Airflow — running Airflow locally and running Airflow in Docker. Personally, I think the latter one is the faster way to start Airflow.
- You can see more at the Documentation: Quick start — Airflow
2. Prerequisites
Before you begin, you will need to install Docker Community Edition (CE) on your workstation and Docker Compose v1.27.0 and newer on your workstation.
2.1 What is Docker?
“Docker is an open platform for developing, shipping, and running applications. Docker enables developers to package applications into containers to run the code in any environment.
Okay if I were you, I wouldn’t understand what the hell is that about! So I spent days and night (just kidding I have a day job too!) So I did a lot of research and finally understand what Docker is in simple terms by making up a simple analogy as follows.
2.2 The Noodle Analogy
There are 3 main key concepts we should be looking at in this early stage — docker image
, docker-container
and docker compose
.
Suppose you are an instant noodle lover who loves to make your own recipe, so you create the awesome recipe with unique ingredients and method on how to make an awesome bowl of noodle. So what do u do next? You pack it nicely easily readable format, which is called noodle recipe — This is similar to docker image
. It contains your specific instructions in the instant noodle cup.
After packing, you send to your 24-hours supermarket where anyone can enjoy your awesome noodle bowl for free. This 24-hours supermarket is docker registry
. Assuming your shiba inu (I love dogs) likes to eat noodle. He can go to the supermarket and get one package of your awesome noodle. After he gets home, he just needs to follow the instructions which is similar running the command docker run
. After following the instructions, finally he has an awesome noodle — This is similar to having a docker container
. And this docker container
will be running some applications.
Now come to docker compose
. Okay now let’s imagine you organise a noodle festival for people to come and enjoy. Instead of having just 1 bowl of noodle ( docker image
), you have hundreds of bowls with hundreds of recipes to follow. Hence we need an orchestration tool which makes the process smoother and easier for everyone. This is where the docker compose
come in place.
Okie let’s go back to the documentation language.
A container is a runnable instance of an image. An image is a read-only template with instructions for creating a Docker container. Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.
In summary,
Docker helps you to run your application in a containerised environment without worrying about dependencies and other configurations .Docker is a containerisation tool. Docker Compose is an orchestration tool that makes spinning up multi-container distributed applications with Docker an effortless task.
2.3 Install Docker
You can download Docker in the link inserted below.
3. Install Airflow
- To deploy Airflow on Docker Compose, you should fetch
docker-compose.yaml
curl -LfO ‘https://airflow.apache.org/docs/apache-airflow/2.0.1/docker-compose.yaml'
Fun fact: Relating to my Noodle example above, this
docker-compose.yaml
is the script (config file) of the orchestration guy (docker compose
)
3.1 Initialising Environment
- On Linux, run
mkdir ./dags ./logs ./plugins
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
- On MacOS and other operating systems, run:
docker-compose up airflow-init
Take note: if on a mac, just make sure docker is started.
cmd + space + type "docker" into the search bar + enter
- In case that you have this error
Error: docker-apache-airflow-201_airflow-webserver_1 exited with code 137
Solution: try increasing the memory of docker (image below)
The final result should look like this
3.2 Running Airflow
- Run
docker-compose up
- Open the second terminal using
command + T
- Check the condition of the containers and make sure that no containers are in unhealthy condition
3.3 Accessing the environment
After starting Airflow, you can interact with it in 3 ways:
- by running CLI commands.
- via a browser using the web interface.
- using the REST API.
I chose via a browser using the web interface.
- Go to your web browser (i.e Google Chrome), type
http://localhost:8080
. The default account has the loginairflow
and the passwordairflow
.
AND this is what success looks like
Final Word
Woooo that’s it for today. You have finished running Airflow with Docker. I hope you finish within 15 minutes. The image above is the Airflow UI, and I highly encourage you to play around with that.
In my next episode, I will cover the Airflow basics and how it can orchestrate your scheduleddbt run
.
I’ll be back soon so stay tuned!