So your company has some Big Data needs and decided to use Hadoop for processing all the data. As a developer you wonder where to start? You download and install Hadoop from Apache . You get started fairly quickly and begin writing your first Map Reduce job. Pretty soon you realize you need a workflow engine like Oozie and soon after that you think Hbase might be a good fit for what you are trying to accomplish or use Hive instead of writing Java code for Map Reduce. The Hadoop ecosystem has grown quite a bit and manually installing each piece can become frustrating and time consuming. A low barrier alternative to being productive quickly with Hadoop technologies is to use a vendor distribution like the one from Cloudera. Since we use the Cloudera distribution at BlueCanary, the rest of this tutorial will be for using Cloudera’s distribution of Hadoop….