In a previous post, we discussed Setting up an Apache Airflow Cluster. In this post we’ll talk about the shortcomings of a typical Apache Airflow Cluster and what can be done to provide a Highly Available Airflow Cluster. A Typical Apache Airflow Cluster In a typical multi-node Airflow cluster you can separate out all the major processes onto separate machines. Here are the main processes: Web Server A daemon which accepts HTTP requests and allows you to interact with Airflow via a Python Flask Web Application. It provides the ability to pause, unpause DAGs, manually trigger DAGs, view running DAGs, restart failed DAGs and much more. Scheduler A daemon which periodically polls to determine if any registered DAG and/or Task Instances needs to triggered based off its schedule. Executors/Workers A daemon that handles starting up and managing 1 to many CeleryD processes to execute the desired tasks of a particular DAG. High Availability in a…
Category: Big Data
Imitation of Intelligence : Exploring Artificial Intelligence!
What is the difference between “calculate” and “compute”? I assure you, we are not going to discuss such quintessential terms related to computing world, which might bore some of us, as it might have given the impression 😀 But this is something out of curiosity about the crux of what we are going to go through. So, the calculation involves an arithmetic process. Computation is involved in the implementation of non-arithmetic steps of the algorithm which actually brings things up to the calculation. You got the idea where I am going with this right? We can try to visualize every aspect of data processing stages from data collection, cleansing, processing and then transforming it through mathematical operations to map data into something which makes more sense i.e. “Insight“. But the intelligence used for such meaningful transformation used to be the human intervention which now can be “Artificial” as…