SoFunction
Updated on 2025-03-01

Apache Airflow usage steps

Apache Airflow is an open source platform for orchestrating and scheduling tasks. It is suitable for creating, scheduling, and monitoring data workflows. Here are the basic steps to use Airflow:

1. Install Apache Airflow

You can install Airflow through the following command:

pip install apache-airflow

It is recommended to use a virtual environment to manage Airflow's dependencies.

2. Initialize the database

Airflow requires a database to store task execution status and other metadata information. Command to initialize the database:

airflow db init

3. Create a user

You need to create an administrator account to access Airflow's web interface:

airflow users create \
    --username admin \
    --password admin \
    --firstname Firstname \
    --lastname Lastname \
    --role Admin \
    --email admin@

4. Start Airflow Scheduler and Web Server

Airflow contains a scheduler (Scheduler) and a web server (Web Server). You need to start these two services separately:

Start the scheduler:

airflow scheduler

Start Web Server:

airflow webserver

Web Server defaults tolocalhost:8080Run it on, you can access it through your browser.

5. Create a DAG (directed acyclic graph)

In Airflow, workflows are defined through DAG (Directed Acyclic Graph). A simple DAG example is as follows:

from airflow import DAG
from  import PythonOperator
from datetime import datetime
def my_task():
    print("This is a task")
default_args = {
    'start_date': datetime(2023, 9, 1),
    'retries': 1
}
with DAG(
    'my_dag',
    default_args=default_args,
    schedule_interval='@daily'
) as dag:
    task = PythonOperator(
        task_id='my_task',
        python_callable=my_task
    )
  • DAGIt is defined in Python.default_argsContains the default parameters for the task.
  • PythonOperatorUsed to execute Python functions.

6. Set up task dependencies

You can define the execution order of tasks by setting their dependencies. For example:

task1 >> task2  # task1 Execute first,task2 After execution

7. Put DAG into the DAGs folder

Save the DAG file you defined to the DAGs folder in Airflow. The location of this folder is usually$AIRFLOW_HOME/dags/, or you canConfiguration in the file.

8. Monitor DAG

By accessing Airflow's web interface, you can see all defined DAGs, view their execution status, trigger execution manually, and monitor the logs of each task.

9. Common Airflow operations

Trigger DAG:

airflow dags trigger my_dag

List DAG:

airflow dags list

Check the task status:

airflow tasks list my_dag

Airflow is a powerful scheduling and workflow management tool suitable for handling complex data pipelines and task dependencies.

This is the end of this article about how to use Apache Airflow. For more information about using Apache Airflow, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!