Apache Airflow is an open source platform for orchestrating and scheduling tasks. It is suitable for creating, scheduling, and monitoring data workflows. Here are the basic steps to use Airflow:
1. Install Apache Airflow
You can install Airflow through the following command:
pip install apache-airflow
It is recommended to use a virtual environment to manage Airflow's dependencies.
2. Initialize the database
Airflow requires a database to store task execution status and other metadata information. Command to initialize the database:
airflow db init
3. Create a user
You need to create an administrator account to access Airflow's web interface:
airflow users create \ --username admin \ --password admin \ --firstname Firstname \ --lastname Lastname \ --role Admin \ --email admin@
4. Start Airflow Scheduler and Web Server
Airflow contains a scheduler (Scheduler
) and a web server (Web Server
). You need to start these two services separately:
Start the scheduler:
airflow scheduler
Start Web Server:
airflow webserver
Web Server defaults tolocalhost:8080
Run it on, you can access it through your browser.
5. Create a DAG (directed acyclic graph)
In Airflow, workflows are defined through DAG (Directed Acyclic Graph). A simple DAG example is as follows:
from airflow import DAG from import PythonOperator from datetime import datetime def my_task(): print("This is a task") default_args = { 'start_date': datetime(2023, 9, 1), 'retries': 1 } with DAG( 'my_dag', default_args=default_args, schedule_interval='@daily' ) as dag: task = PythonOperator( task_id='my_task', python_callable=my_task )
-
DAGIt is defined in Python.
default_args
Contains the default parameters for the task. - PythonOperatorUsed to execute Python functions.
6. Set up task dependencies
You can define the execution order of tasks by setting their dependencies. For example:
task1 >> task2 # task1 Execute first,task2 After execution
7. Put DAG into the DAGs folder
Save the DAG file you defined to the DAGs folder in Airflow. The location of this folder is usually$AIRFLOW_HOME/dags/
, or you canConfiguration in the file.
8. Monitor DAG
By accessing Airflow's web interface, you can see all defined DAGs, view their execution status, trigger execution manually, and monitor the logs of each task.
9. Common Airflow operations
Trigger DAG:
airflow dags trigger my_dag
List DAG:
airflow dags list
Check the task status:
airflow tasks list my_dag
Airflow is a powerful scheduling and workflow management tool suitable for handling complex data pipelines and task dependencies.
This is the end of this article about how to use Apache Airflow. For more information about using Apache Airflow, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!