Airflow

Upgrading Apache Airflow Versions

In a previous post we explained how to Install and Configure Apache Airflow (a platform to programmatically author, schedule and monitor workflows). The technology is actively being worked on and more and more features and bug fixes are being added to the project in the form of new releases. At some point, you will want to upgrade to take advantage of these new feature. In this post we’ll go over the process that you should for upgrading apache airflow versions. Note: You will need to separately make sure that your dags will be able to work on the new version of Airflow. Upgrade Airflow Note: These steps can also work to downgrade versions of Airflow Note: Execute all of this on all the instances in your Airflow Cluster (if you have more then one machine) Gather information about your current environment and your target setup: Get the Airflow Home directory. Placeholder for…

Analytics

Creating Custom Origin for Streamsets

Streamsets Data Collector: StreamSets Data Collector is a lightweight and powerful engine that streams data in real time. It allows you to build continuous data pipelines, each of which consumes record-oriented data from a single origin, optionally operates on those records in one or more processors and writes data to one or more destinations. Streamsets Origin Stage: To define the flow of data for Data Collector, you configure a pipeline. A pipeline consists of stages that represents the origin and destination of the pipeline and any additional processing that you want to perform. An origin stage represents the source for the pipeline. For example, this pipeline, based on the SDC taxi data tutorial https://streamsets.com/documentation/datacollector/latest/help/#Tutorial/Overview.html which uses the Directory origin, four processors and the Hadoop File System destination:     Stremsets comes bundled with many origin stage components to connect with almost all commonly used data sources and if you don’t find one for your source system, don’t worry  Streamsets APIs are…