A Beginner’s Guide to Apache Airflow — Part 3: Intro and Setup

Setup Airflow with Docker in 15 minutes

This is my third episode on the #dataseries.

Previously — on my #dataseries
1. A Beginner’s Guide to DBT (data build tool) — Part 1: Introduction
2. A Beginner’s Guide to DBT (data build tool) — Part 2: Setup guide & tips

1. What is Airflow?

1.1 Definition

  • Airflow is a platform to programmatically author, schedule and monitor workflows. Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies.”
  • There are a few Airflow alternatives on that market such as Dagster or Prefect.

1.2 Quick Start

  • From the official docs, there are two ways to set up Airflow — running Airflow locally and running Airflow in Docker. Personally, I think the latter one is the faster way to start Airflow.
  • You can see more at the Documentation: Quick start — Airflow

2. Prerequisites

Before you begin, you will need to install Docker Community Edition (CE) on your workstation and Docker Compose v1.27.0 and newer on your workstation.

2.1 What is Docker?

“Docker is an open platform for developing, shipping, and running applications. Docker enables developers to package applications into containers to run the code in any environment.

Okay if I were you, I wouldn’t understand what the hell is that about! So I spent days and night (just kidding I have a day job too!) So I did a lot of research and finally understand what Docker is in simple terms by making up a simple analogy as follows.

2.2 The Noodle Analogy

There are 3 main key concepts we should be looking at in this early stage — docker image , docker-container and docker compose .

Suppose you are an instant noodle lover who loves to make your own recipe, so you create the awesome recipe with unique ingredients and method on how to make an awesome bowl of noodle. So what do u do next? You pack it nicely easily readable format, which is called noodle recipe — This is similar to docker image . It contains your specific instructions in the instant noodle cup.

After packing, you send to your 24-hours supermarket where anyone can enjoy your awesome noodle bowl for free. This 24-hours supermarket is docker registry . Assuming your shiba inu (I love dogs) likes to eat noodle. He can go to the supermarket and get one package of your awesome noodle. After he gets home, he just needs to follow the instructions which is similar running the command docker run . After following the instructions, finally he has an awesome noodle — This is similar to having a docker container . And this docker container will be running some applications.

Now come to docker compose . Okay now let’s imagine you organise a noodle festival for people to come and enjoy. Instead of having just 1 bowl of noodle ( docker image ), you have hundreds of bowls with hundreds of recipes to follow. Hence we need an orchestration tool which makes the process smoother and easier for everyone. This is where the docker compose come in place.

Okie let’s go back to the documentation language.

A container is a runnable instance of an image. An image is a read-only template with instructions for creating a Docker container. Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration.

In summary,

Docker helps you to run your application in a containerised environment without worrying about dependencies and other configurations .Docker is a containerisation tool. Docker Compose is an orchestration tool that makes spinning up multi-container distributed applications with Docker an effortless task.

2.3 Install Docker

You can download Docker in the link inserted below.

2.3.1 Install Docker Desktop

2.3.2 Install Docker Compose

3. Install Airflow

  • To deploy Airflow on Docker Compose, you should fetch docker-compose.yaml

curl -LfO ‘https://airflow.apache.org/docs/apache-airflow/2.0.1/docker-compose.yaml'

Fun fact: Relating to my Noodle example above, this docker-compose.yaml is the script (config file) of the orchestration guy ( docker compose )

3.1 Initialising Environment

  • On Linux, run
  • On MacOS and other operating systems, run:

docker-compose up airflow-init

Take note: if on a mac, just make sure docker is started.
cmd + space + type "docker" into the search bar + enter

  • In case that you have this error

Error: docker-apache-airflow-201_airflow-webserver_1 exited with code 137

Solution: try increasing the memory of docker (image below)

The final result should look like this

3.2 Running Airflow

  • Run docker-compose up
  • Open the second terminal usingcommand + T
  • Check the condition of the containers and make sure that no containers are in unhealthy condition

3.3 Accessing the environment

After starting Airflow, you can interact with it in 3 ways:

I chose via a browser using the web interface.

  • Go to your web browser (i.e Google Chrome), type http://localhost:8080. The default account has the login airflow and the password airflow.

AND this is what success looks like

Final Word

Woooo that’s it for today. You have finished running Airflow with Docker. I hope you finish within 15 minutes. The image above is the Airflow UI, and I highly encourage you to play around with that.

In my next episode, I will cover the Airflow basics and how it can orchestrate your scheduleddbt run .

I’ll be back soon so stay tuned!

Data-driven, strategic professional with a passion for driving user acquisition and product performance. Eager to make a social impact in this VUCA world.