In order to install Apache Airflow 2.9.3 with Docker, configure the MySQL database and ensure data persistence, we can use Docker Compose. Here are the detailed steps:
Step 1: Create a project directory
Create a new directory in your working directory to store all relevant configuration files and scripts.
mkdir airflow-mysql cd airflow-mysql
Step 2: Create a file
Create a name in the project directoryThe content of the file is as follows:
version: '3.7' services: mysql: image: mysql:8.0.27 environment: MYSQL_ROOT_PASSWORD: rootpassword MYSQL_DATABASE: airflow MYSQL_USER: airflow MYSQL_PASSWORD: airflowpassword MYSQL_CHARSET: utf8mb4 MYSQL_COLLATION: utf8mb4_general_ci ports: - "3306:3306" volumes: - mysql_data:/var/lib/mysql airflow-webserver: image: apache/airflow:2.9.3 environment: AIRFLOW__CORE__EXECUTOR: LocalExecutor AIRFLOW__CORE__SQL_ALCHEMY_CONN: mysql+mysqldb://airflow:airflowpassword@mysql:3306/airflow AIRFLOW__CORE__FERNET_KEY: 'YOUR_FERNET_KEY' AIRFLOW__CORE__LOAD_EXAMPLES: 'false' AIRFLOW__WEBSERVER__RBAC: 'true' depends_on: - mysql ports: - "8080:8080" volumes: - ./dags:/opt/airflow/dags - ./logs:/opt/airflow/logs - ./plugins:/opt/airflow/plugins command: ["bash", "-c", "airflow db init && airflow users create -r Admin -u admin -p admin -e admin@ -f Admin -l User && airflow webserver"] airflow-scheduler: image: apache/airflow:2.9.3 environment: AIRFLOW__CORE__EXECUTOR: LocalExecutor AIRFLOW__CORE__SQL_ALCHEMY_CONN: mysql+mysqldb://airflow:airflowpassword@mysql:3306/airflow AIRFLOW__CORE__FERNET_KEY: 'YOUR_FERNET_KEY' AIRFLOW__CORE__LOAD_EXAMPLES: 'false' AIRFLOW__WEBSERVER__RBAC: 'true' depends_on: - mysql volumes: - ./dags:/opt/airflow/dags - ./logs:/opt/airflow/logs - ./plugins:/opt/airflow/plugins command: ["bash", "-c", "airflow scheduler"] volumes: mysql_data:
Please note thatYOUR_FERNET_KEY
It needs to be replaced with an actual Fernet key, which can be generated by the following command:
python -c "from import Fernet; print(Fernet.generate_key().decode())"
Step 3: Create a directory structure
Create directories for Airflow's DAGs, logs, and plugins:
mkdir -p dags logs plugins chmod -R 777 dags logs plugins
Step 4: Start Docker Compose
Run the following command in the project directory to start all services:
docker-compose up -d
Step 5: Check the service status
You can view the running container using the following command:
docker-compose ps
Step 6: Access the Airflow Web UI
Open a browser and accesshttp://localhost:8080
, you should be able to see the Airflow login page. Log in with the following default credentials:
- username:
admin
- password:
admin
explain
Service definition
-
mysql: Use MySQL 8.0.27 image to set the database name, user, and password, and persist data to
mysql_data
In volume. - airflow-webserver: Use the Airflow 2.9.3 image to configure the connection string to connect to MySQL, initialize the database and create an administrator user, and then start Airflow Webserver.
- airflow-scheduler: Use the Airflow 2.9.3 image to configure the connection string to connect to MySQL, and start Airflow Scheduler.
Data persistence
- MySQL data through Docker volumes
mysql_data
Persistence. - Airflow's DAGs, logs, and plugins are mounted to the host's directory through binding
./dags
、./logs
and./plugins
。
Other configurations
Environment variable description
-
AIRFLOW__CORE__EXECUTOR
:LocalExecutor
Indicates the use of a local executor. -
AIRFLOW__CORE__SQL_ALCHEMY_CONN
: Sets the connection string for Airflow to connect MySQL. The format ismysql+mysqldb://<username>:<password>@<host>:<port>/<database>
。 -
AIRFLOW__CORE__FERNET_KEY
: The key used to encrypt the connection password. Can be passedpython -c "from import Fernet; print(Fernet.generate_key().decode())"
Generate this key. -
AIRFLOW__CORE__LOAD_EXAMPLES
: Set tofalse
Indicates that sample DAGs are not loaded to keep the environment clean. -
AIRFLOW__WEBSERVER__RBAC
: Enable role-based access control.
Docker Compose Configuration Description
- depends_on: Make sure the MySQL service is started before the Airflow service.
-
volumes: Used to persist data and share files. MySQL data is stored in
mysql_data
In volume; Airflow's DAGs, logs and plug-ins are bound to be mounted to the host./dags
、./logs
and./plugins
Table of contents. -
command: Defines the command to run when the container starts. for
airflow-webserver
Service, first initialize the database and create an administrator user, and then start Airflow Webserver. forairflow-scheduler
Service, start Airflow Scheduler.
Ensure durability
Persistence of data is the key to ensuring that data is not lost after a service restart. We use Docker volumes to persist MySQL data and use bind mounts to persist Airflow's DAGs, logs, and plugins.
Start and manage containers
Start the container:
docker-compose up -d
Check container status:
docker-compose ps
View log:
docker-compose logs -f
Stop and delete containers:
docker-compose down
Further configuration and optimization
Security
Change the default password:
The default administrator password isadmin
, it is recommended to change the password immediately after the first login.
Use environment variables to protect sensitive information:
Avoid writing sensitive information directly intoFiles, you can use Docker Secrets or environment variables to protect sensitive information.
Resource limitations
Depending on your hardware resources, you canSet resource limits for containers:
airflow-webserver: ... deploy: resources: limits: cpus: '0.50' memory: '512M' ...
Log Management
Make sure that log files do not grow unlimitedly, canConfigure logging options:
airflow-webserver: ... logging: driver: "json-file" options: max-size: "10m" max-file: "3" ...
Backup and restore
Regularly back up the MySQL database and Airflow configuration to prevent data loss. You can do this using cron jobs or other backup tools.
Extensibility
If you need to scale to multiple nodes, you can consider using CeleryExecutor or KubernetesExecutor. CeleryExecutor requires additional configuration and Redis/RabbitMQ as message queues.
Example DAG creation
existdags
Create a simple example DAG file in the directory, e.g.example_dag.py
:
from airflow import DAG from import BashOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5), } with DAG( 'example_dag', default_args=default_args, description='A simple example DAG', schedule_interval=timedelta(days=1), start_date=datetime(2023, 1, 1), catchup=False, ) as dag: t1 = BashOperator( task_id='print_date', bash_command='date', ) t2 = BashOperator( task_id='sleep', bash_command='sleep 5', ) t1 >> t2
This will create a simple DAG with two tasks: Print the current date and wait for 5 seconds.
Monitoring and logging
Airflow provides rich monitoring and logging capabilities to help you track the execution status and performance of tasks.
Airflow Web UI:
View the status, logs, and graphical representations of DAGs and tasks through the Web UI.
Log files:
Check out the log files in the container for more details.
docker-compose logs airflow-webserver docker-compose logs airflow-scheduler
Update and upgrade
Airflow and related dependencies are updated regularly to ensure the latest features and security patches are obtained.
Update Docker image:
docker pull apache/airflow:2.9.3 docker-compose up -d
Update MySQL image:
docker pull mysql:8.0.27 docker-compose up -d
in conclusion
With these steps, you can successfully deploy and configure Apache Airflow 2.9.3 using Docker and MySQL
Verify that the Docker container starts successfully, you can use a series of commands to check the status and log of the container. Here are some commonly used commands and steps:
1. Check the container status
usedocker ps
Command to view the running container:
docker ps
This command lists all running containers, including their container ID, name, status, etc.
If you want to view all containers (including stopped), you can use-a
Options:
docker ps -a
2. View container log
Viewing the logs of a specific container can help you understand whether the container starts properly and troubleshoot potential errors. usedocker logs
Order:
docker logs <container_id_or_name>
You can also use-f
Options to track log output in real time:
docker logs -f <container_id_or_name>
3. Verify using Docker Compose
If you are using Docker Compose, you can use the following command to view the status of all services:
docker logs -f <container_id_or_name>
View all services logs:
docker-compose logs
View the log in real time:
docker-compose logs -f
4. Specific examples
Suppose you have a name calledairflow-webserver
The following are the steps to verify that it was successfully launched:
Check container status:
docker ps
You should see outputs similar to the following:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
abc123def456 apache/airflow:2.9.3 "/ airf…" 2 minutes ago Up 2 minutes 0.0.0.0:8080->8080/tcp airflow-webserver
If the container state isUp
, indicating that the container is running.
View container log:
docker logs airflow-webserver
You should see the startup log of Airflow Webserver and confirm that there is no error message.
Real-time tracking logs:
docker logs -f airflow-webserver
Check the output of the container in real time to ensure there are no problems.
Verify using Docker Compose:
docker-compose ps
You should see outputs similar to the following:
Name Command State Ports
-----------------------------------------------------------------------------------------------------------
airflow-mysql_mysql_1 mysqld Up 3306/tcp
airflow-mysql_webserver_1 / airflow w ... Up (healthy) 0.0.0.0:8080->8080/tcp
airflow-mysql_scheduler_1 / airflow s ... Up
View logs for all services:
docker-compose logs
View all services' logs in real time:
docker-compose logs -f
5. Health check
Some Docker images provide health checking capabilities that can be useddocker ps
In the commandSTATUS
Column to view health status. If the image supports health checks, you will see something likeUp (healthy)
status.
6. Access the Web UI
Ultimately, you can access the corresponding service URL through your browser to verify that the service is running normally. For example, for Airflow Webserver, you can accesshttp://localhost:8080
, If you can see the login page of Airflow, it means that the Webserver has been successfully started.
Summarize
These commands and steps enable you to effectively verify that the Docker container is started successfully and to troubleshoot potential errors by viewing the log and status.
This is the end of this article about the detailed steps of using Docker to quickly build Airflow + MySQL. For more related Docker to build Airflow MySQL content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!