Airflow

Making Apache Airflow Highly Available

In a previous post, we discussed Setting up an Apache Airflow Cluster. In this post we’ll talk about the shortcomings of a typical Apache Airflow Cluster and what can be done to provide a Highly Available Airflow Cluster. A Typical Apache Airflow Cluster In a typical multi-node Airflow cluster you can separate out all the major processes onto separate machines. Here are the main processes: Web Server A daemon which accepts HTTP requests and allows you to interact with Airflow via a Python Flask Web Application. It provides the ability to pause, unpause DAGs, manually trigger DAGs, view running DAGs, restart failed DAGs and much more. Scheduler A daemon which periodically polls to determine if any registered DAG and/or Task Instances needs to triggered based off its schedule. Executors/Workers A daemon that handles starting up and managing 1 to many CeleryD processes to execute the desired tasks of a particular DAG. High Availability in a…

Airflow

Upgrading Apache Airflow Versions

In a previous post we explained how to Install and Configure Apache Airflow (a platform to programmatically author, schedule and monitor workflows). The technology is actively being worked on and more and more features and bug fixes are being added to the project in the form of new releases. At some point, you will want to upgrade to take advantage of these new feature. In this post we’ll go over the process that you should for upgrading apache airflow versions. Note: You will need to separately make sure that your dags will be able to work on the new version of Airflow. Upgrade Airflow Note: These steps can also work to downgrade versions of Airflow Note: Execute all of this on all the instances in your Airflow Cluster (if you have more then one machine) Gather information about your current environment and your target setup: Get the Airflow Home directory. Placeholder for…