Airflow

Making Apache Airflow Highly Available

In a previous post, we discussed Setting up an Apache Airflow Cluster. In this post we’ll talk about the shortcomings of a typical Apache Airflow Cluster and what can be done to provide a Highly Available Airflow Cluster. A Typical Apache Airflow Cluster In a typical multi-node Airflow cluster you can separate out all the major processes onto separate machines. Here are the main processes: Web Server A daemon which accepts HTTP requests and allows you to interact with Airflow via a Python Flask Web Application. It provides the ability to pause, unpause DAGs, manually trigger DAGs, view running DAGs, restart failed DAGs and much more. Scheduler A daemon which periodically polls to determine if any registered DAG and/or Task Instances needs to triggered based off its schedule. Executors/Workers A daemon that handles starting up and managing 1 to many CeleryD processes to execute the desired tasks of a particular DAG. High Availability in a…

Airflow

Setting up an Apache Airflow Cluster

In one of our previous blog posts, we described the process you should take when Installing and Configuring Apache Airflow.  In this post, we will describe how to setup an Apache Airflow Cluster to run across multiple nodes. This will provide you with more computing power and higher availability for your Apache Airflow instance. Airflow Daemons A running instance of Airflow has a number of Daemons that work together to provide the full functionality of Airflow. The daemons include the Web Server, Scheduler, Worker, Kerberos Ticket Renewer, Flower and others. Bellow are the primary ones you will need to have running for a production quality Apache Airflow Cluster. Web Server A daemon which accepts HTTP requests and allows you to interact with Airflow via a Python Flask Web Application. It provides the ability to pause, unpause DAGs, manually trigger DAGs, view running DAGs, restart failed DAGs and much more. The Web Server Daemon starts up…