SoFunction
Updated on 2025-03-10

Detailed steps to quickly build Airflow+MySQL using Docker

In order to install Apache Airflow 2.9.3 with Docker, configure the MySQL database and ensure data persistence, we can use Docker Compose. Here are the detailed steps:

Step 1: Create a project directory

Create a new directory in your working directory to store all relevant configuration files and scripts.

mkdir airflow-mysql
cd airflow-mysql

Step 2: Create a file

Create a name in the project directoryThe content of the file is as follows:

version: '3.7'
services:
  mysql:
    image: mysql:8.0.27
    environment:
      MYSQL_ROOT_PASSWORD: rootpassword
      MYSQL_DATABASE: airflow
      MYSQL_USER: airflow
      MYSQL_PASSWORD: airflowpassword
      MYSQL_CHARSET: utf8mb4
      MYSQL_COLLATION: utf8mb4_general_ci
    ports:
      - "3306:3306"
    volumes:
      - mysql_data:/var/lib/mysql
  airflow-webserver:
    image: apache/airflow:2.9.3
    environment:
      AIRFLOW__CORE__EXECUTOR: LocalExecutor
      AIRFLOW__CORE__SQL_ALCHEMY_CONN: mysql+mysqldb://airflow:airflowpassword@mysql:3306/airflow
      AIRFLOW__CORE__FERNET_KEY: 'YOUR_FERNET_KEY'
      AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
      AIRFLOW__WEBSERVER__RBAC: 'true'
    depends_on:
      - mysql
    ports:
      - "8080:8080"
    volumes:
      - ./dags:/opt/airflow/dags
      - ./logs:/opt/airflow/logs
      - ./plugins:/opt/airflow/plugins
    command: ["bash", "-c", "airflow db init && airflow users create -r Admin -u admin -p admin -e admin@ -f Admin -l User && airflow webserver"]
  airflow-scheduler:
    image: apache/airflow:2.9.3
    environment:
      AIRFLOW__CORE__EXECUTOR: LocalExecutor
      AIRFLOW__CORE__SQL_ALCHEMY_CONN: mysql+mysqldb://airflow:airflowpassword@mysql:3306/airflow
      AIRFLOW__CORE__FERNET_KEY: 'YOUR_FERNET_KEY'
      AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
      AIRFLOW__WEBSERVER__RBAC: 'true'
    depends_on:
      - mysql
    volumes:
      - ./dags:/opt/airflow/dags
      - ./logs:/opt/airflow/logs
      - ./plugins:/opt/airflow/plugins
    command: ["bash", "-c", "airflow scheduler"]
volumes:
  mysql_data:

Please note thatYOUR_FERNET_KEYIt needs to be replaced with an actual Fernet key, which can be generated by the following command:

python -c "from  import Fernet; print(Fernet.generate_key().decode())"

Step 3: Create a directory structure

Create directories for Airflow's DAGs, logs, and plugins:

mkdir -p dags logs plugins
chmod -R 777 dags logs plugins

Step 4: Start Docker Compose

Run the following command in the project directory to start all services:

docker-compose up -d

Step 5: Check the service status

You can view the running container using the following command:

docker-compose ps

Step 6: Access the Airflow Web UI

Open a browser and accesshttp://localhost:8080, you should be able to see the Airflow login page. Log in with the following default credentials:

  • username:admin
  • password:admin

explain

Service definition

  • mysql: Use MySQL 8.0.27 image to set the database name, user, and password, and persist data tomysql_dataIn volume.
  • airflow-webserver: Use the Airflow 2.9.3 image to configure the connection string to connect to MySQL, initialize the database and create an administrator user, and then start Airflow Webserver.
  • airflow-scheduler: Use the Airflow 2.9.3 image to configure the connection string to connect to MySQL, and start Airflow Scheduler.

Data persistence

  • MySQL data through Docker volumesmysql_dataPersistence.
  • Airflow's DAGs, logs, and plugins are mounted to the host's directory through binding./dags./logsand./plugins

Other configurations

Environment variable description

  • AIRFLOW__CORE__EXECUTOR: LocalExecutorIndicates the use of a local executor.
  • AIRFLOW__CORE__SQL_ALCHEMY_CONN: Sets the connection string for Airflow to connect MySQL. The format ismysql+mysqldb://<username>:<password>@<host>:<port>/<database>
  • AIRFLOW__CORE__FERNET_KEY: The key used to encrypt the connection password. Can be passedpython -c "from import Fernet; print(Fernet.generate_key().decode())"Generate this key.
  • AIRFLOW__CORE__LOAD_EXAMPLES: Set tofalseIndicates that sample DAGs are not loaded to keep the environment clean.
  • AIRFLOW__WEBSERVER__RBAC: Enable role-based access control.

Docker Compose Configuration Description

  • depends_on: Make sure the MySQL service is started before the Airflow service.
  • volumes: Used to persist data and share files. MySQL data is stored inmysql_dataIn volume; Airflow's DAGs, logs and plug-ins are bound to be mounted to the host./dags./logsand./pluginsTable of contents.
  • command: Defines the command to run when the container starts. forairflow-webserverService, first initialize the database and create an administrator user, and then start Airflow Webserver. forairflow-schedulerService, start Airflow Scheduler.

Ensure durability

Persistence of data is the key to ensuring that data is not lost after a service restart. We use Docker volumes to persist MySQL data and use bind mounts to persist Airflow's DAGs, logs, and plugins.

Start and manage containers

Start the container

docker-compose up -d

Check container status

docker-compose ps

View log

docker-compose logs -f

Stop and delete containers

docker-compose down

Further configuration and optimization

Security

Change the default password
The default administrator password isadmin, it is recommended to change the password immediately after the first login.

Use environment variables to protect sensitive information
Avoid writing sensitive information directly intoFiles, you can use Docker Secrets or environment variables to protect sensitive information.

Resource limitations

Depending on your hardware resources, you canSet resource limits for containers:

airflow-webserver:
  ...
  deploy:
    resources:
      limits:
        cpus: '0.50'
        memory: '512M'
  ...

Log Management

Make sure that log files do not grow unlimitedly, canConfigure logging options:

airflow-webserver:
  ...
  logging:
    driver: "json-file"
    options:
      max-size: "10m"
      max-file: "3"
  ...

Backup and restore

Regularly back up the MySQL database and Airflow configuration to prevent data loss. You can do this using cron jobs or other backup tools.

Extensibility

If you need to scale to multiple nodes, you can consider using CeleryExecutor or KubernetesExecutor. CeleryExecutor requires additional configuration and Redis/RabbitMQ as message queues.

Example DAG creation

existdagsCreate a simple example DAG file in the directory, e.g.example_dag.py

from airflow import DAG
from  import BashOperator
from datetime import datetime, timedelta
default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 1,
    'retry_delay': timedelta(minutes=5),
}
with DAG(
    'example_dag',
    default_args=default_args,
    description='A simple example DAG',
    schedule_interval=timedelta(days=1),
    start_date=datetime(2023, 1, 1),
    catchup=False,
) as dag:
    t1 = BashOperator(
        task_id='print_date',
        bash_command='date',
    )
    t2 = BashOperator(
        task_id='sleep',
        bash_command='sleep 5',
    )
    t1 >> t2

This will create a simple DAG with two tasks: Print the current date and wait for 5 seconds.

Monitoring and logging

Airflow provides rich monitoring and logging capabilities to help you track the execution status and performance of tasks.

Airflow Web UI
View the status, logs, and graphical representations of DAGs and tasks through the Web UI.

Log files
Check out the log files in the container for more details.

docker-compose logs airflow-webserver
docker-compose logs airflow-scheduler

Update and upgrade

Airflow and related dependencies are updated regularly to ensure the latest features and security patches are obtained.

Update Docker image

docker pull apache/airflow:2.9.3
docker-compose up -d

Update MySQL image

docker pull mysql:8.0.27
docker-compose up -d

in conclusion

With these steps, you can successfully deploy and configure Apache Airflow 2.9.3 using Docker and MySQL

Verify that the Docker container starts successfully, you can use a series of commands to check the status and log of the container. Here are some commonly used commands and steps:

1. Check the container status

usedocker psCommand to view the running container:

docker ps

This command lists all running containers, including their container ID, name, status, etc.

If you want to view all containers (including stopped), you can use-aOptions:

docker ps -a

2. View container log

Viewing the logs of a specific container can help you understand whether the container starts properly and troubleshoot potential errors. usedocker logsOrder:

docker logs <container_id_or_name>

You can also use-fOptions to track log output in real time:

docker logs -f <container_id_or_name>

3. Verify using Docker Compose

If you are using Docker Compose, you can use the following command to view the status of all services:

docker logs -f <container_id_or_name>

View all services logs:

docker-compose logs

View the log in real time:

docker-compose logs -f

4. Specific examples

Suppose you have a name calledairflow-webserverThe following are the steps to verify that it was successfully launched:

Check container status

docker ps

You should see outputs similar to the following:

CONTAINER ID   IMAGE                   COMMAND                  CREATED         STATUS         PORTS                    NAMES
abc123def456   apache/airflow:2.9.3    "/ airf…"   2 minutes ago   Up 2 minutes   0.0.0.0:8080->8080/tcp   airflow-webserver

If the container state isUp, indicating that the container is running.

View container log

docker logs airflow-webserver

You should see the startup log of Airflow Webserver and confirm that there is no error message.

Real-time tracking logs

docker logs -f airflow-webserver

Check the output of the container in real time to ensure there are no problems.

Verify using Docker Compose

docker-compose ps

You should see outputs similar to the following:

   Name                     Command               State                                  Ports
-----------------------------------------------------------------------------------------------------------
airflow-mysql_mysql_1       mysqld      Up             3306/tcp
airflow-mysql_webserver_1  / airflow w ...     Up (healthy)   0.0.0.0:8080->8080/tcp
airflow-mysql_scheduler_1  / airflow s ...     Up

View logs for all services

docker-compose logs

View all services' logs in real time

docker-compose logs -f

5. Health check

Some Docker images provide health checking capabilities that can be useddocker psIn the commandSTATUSColumn to view health status. If the image supports health checks, you will see something likeUp (healthy)status.

6. Access the Web UI

Ultimately, you can access the corresponding service URL through your browser to verify that the service is running normally. For example, for Airflow Webserver, you can accesshttp://localhost:8080, If you can see the login page of Airflow, it means that the Webserver has been successfully started.

Summarize

These commands and steps enable you to effectively verify that the Docker container is started successfully and to troubleshoot potential errors by viewing the log and status.

This is the end of this article about the detailed steps of using Docker to quickly build Airflow + MySQL. For more related Docker to build Airflow MySQL content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!