A Beginner’s Guide to Apache Airflow — Part 3: Intro and Setup
This is my third episode on the #dataseries.
Table of Contents
1. What is Airflow?
- “Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies.”
- There are a few Airflow alternatives on that market such as Dagster or Prefect.
1.2 Quick Start
- From the official docs, there are two ways to set up Airflow — running Airflow locally and running Airflow in Docker. Personally, I think the latter one is the faster way to start Airflow.
- You can see more at the Documentation: Quick start — Airflow
2.1 What is Docker?
“Docker is an open platform for developing, shipping, and running applications. Docker enables developers to package applications into containers to run the code in any environment.
Okay if I were you, I wouldn’t understand what the hell is that about! So I spent days and night (just kidding I have a day job too!) So I did a lot of research and finally understand what Docker is in simple terms by making up a simple analogy as follows.
2.2 The Noodle Analogy
There are 3 main key concepts we should be looking at in this early stage —
docker image ,
docker compose .
Suppose you are an instant noodle lover who loves to make your own recipe, so you create the awesome recipe with unique ingredients and method on how to make an awesome bowl of noodle. So what do u do next? You pack it nicely easily readable format, which is called noodle recipe — This is similar to
docker image . It contains your specific instructions in the instant noodle cup.
After packing, you send to your 24-hours supermarket where anyone can enjoy your awesome noodle bowl for free. This 24-hours supermarket is
docker registry . Assuming your shiba inu (I love dogs) likes to eat noodle. He can go to the supermarket and get one package of your awesome noodle. After he gets home, he just needs to follow the instructions which is similar running the command
docker run . After following the instructions, finally he has an awesome noodle — This is similar to having a
docker container . And this
docker container will be running some applications.
Now come to
docker compose . Okay now let’s imagine you organise a noodle festival for people to come and enjoy. Instead of having just 1 bowl of noodle (
docker image ), you have hundreds of bowls with hundreds of recipes to follow. Hence we need an orchestration tool which makes the process smoother and easier for everyone. This is where the
docker compose come in place.
Okie let’s go back to the documentation language.
A container is a runnable instance of an image. An image is a read-only template with instructions for creating a Docker container. Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.
Docker helps you to run your application in a containerised environment without worrying about dependencies and other configurations .Docker is a containerisation tool. Docker Compose is an orchestration tool that makes spinning up multi-container distributed applications with Docker an effortless task.
2.3 Install Docker
You can download Docker in the link inserted below.
3. Install Airflow
- To deploy Airflow on Docker Compose, you should fetch
Fun fact: Relating to my Noodle example above, this
docker-compose.yamlis the script (config file) of the orchestration guy (
3.1 Initialising Environment
- On Linux, run
mkdir ./dags ./logs ./plugins
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
- On MacOS and other operating systems, run:
docker-compose up airflow-init
Take note: if on a mac, just make sure docker is started.
cmd + space + type "docker" into the search bar + enter
- In case that you have this error
docker-apache-airflow-201_airflow-webserver_1 exited with code 137
Solution: try increasing the memory of docker (image below)
The final result should look like this
3.2 Running Airflow
- Open the second terminal using
command + T
- Check the condition of the containers and make sure that no containers are in unhealthy condition
3.3 Accessing the environment
After starting Airflow, you can interact with it in 3 ways:
I chose via a browser using the web interface.
- Go to your web browser (i.e Google Chrome), type
http://localhost:8080. The default account has the login
airflowand the password
AND this is what success looks like
Woooo that’s it for today. You have finished running Airflow with Docker. I hope you finish within 15 minutes. The image above is the Airflow UI, and I highly encourage you to play around with that.
In my next episode, I will cover the Airflow basics and how it can orchestrate your scheduled
dbt run .
I’ll be back soon so stay tuned!