Growing Data in the Desert

The Phoenix Data Conference on October 29th 2016 was the place to be for Arizona’s biggest big data event. Prominent companies in Big Data and Analytics assembled to spread their wisdom and experience to a mixed audience of experts and newbies hungry to explore this hot and dynamic field. Speakers from Cloudera, Hortonworks, MapR, Amazon, Microsoft to name some, demonstrated the case studies of their customers and valuable tips for setting up a high performing big-data environments. This was the third Data Conference in the valley organized by Clairvoyant along with their sponsors, the popularity and anticipation of this event skyrocketing through the years.

It was impressive to see the highly scalable and multitude of support on Azure HDInsight, Microsoft’s offered Hadoop distribution, demonstrated by Brig Lamoreaux during his presentation on Predictive Maintenance for Aerospace, where he went over the architecture and tools used for their use case. We were shown the phases leading up to the predictive analytics also going over the pipeline employing Kafka for stream processing, and analyzing with HDInsight powered by Cortona Intelligence, another Microsoft service, up to the final Web applications for the end users. Event hub, a stream ingestion solution was also utilized.

Another exciting session for me was the Hadoop Security Highlights, presented by Manish Ahluwalia and Scott Grzybowski from Cloudera, which brushed up my concepts on identity management, the triple A (Authentication, authorization and Auditing) and Transport layer security. The EDH (enterprise data hub) services, navigator for auditing and encryption are powerful tools to track, access and protect data with high performing key management. Further granular access control through ACLs and Sentry were also covered.

I have been attending all the conferences organized by Clairvoyant, and the PDC this year has been highly successful with a great turnout and reviews. The presentations have been valuable for me personally as a Hadoop administrator in aspects of security, service performance and products demonstrated. I would like to thank the organizers and sponsors for this great platform for networking and sharing with the community. Got to love the open source model in our industry!

Nithya Koka Hadoop Administrator
CLAIRVOYANT | Chandler, AZ

Installing RabbitMQ

RabbitMQ is a queueing service that implements the Advanced Message Queuing Protocol (AMQP).  It is a fast and dependable open-source message server that supports a wide range of use cases including reliable integration, content-based routing and global data delivery, and high volume monitoring and data ingestion.

Additional Documentation:

General Install: https://www.rabbitmq.com/download.html

Setting up RabbitMQ Clustering: https://www.rabbitmq.com/clustering.html

Install RabbitMQ

Install RabbitMQ on Ubuntu

  1. Login as root
  2. Install RabbitMQ Server
    apt-get install rabbitmq-server
  3. Verify status
    rabbitmqctl status
  4. Install RabbitMQ Web Interface
    rabbitmq-plugins enable rabbitmq_management

Install RabbitMQ on CentOS

  1. Login as root
  2. Install RabbitMQ Server
    yum install epel-release
    yum install rabbitmq-server
  3. Verify status
    rabbitmqctl status
  4. Install RabbitMQ Web Interface
    rabbitmq-plugins enable rabbitmq_management

Setting up and Running RabbitMQ Cluster as a Cluster

If you want to setup multiple machines to work as a RabbitMQ Cluster you can follow these instructions. Otherwise you can follow “Running RabbitMQ as Single Node” instructions to get it running on a single machine.

Note: The machines you want to use in the cluster need to be able to communicate with each other.

  1. Follow the above “Install RabbitMQ” steps for on each node you want to add to the RabbitMQ Cluster
  2. Ensure the RabbitMQ daemons are not running
    1. See the “Running RabbitMQ as Single Node” section bellow
  3. Choose one of the nodes as MASTER
  4. On the non-MASTER nodes backup the .erlang.cookie file
    mv /var/lib/rabbitmq/.erlang.cookie /var/lib/rabbitmq/.erlang.cookie.backup
  5. Copy the file “/var/lib/rabbitmq/.erlang.cookie” from the MASTER node to the other nodes and store it at the same location.
    • Be careful during this step. If you copy the contents of the .erlang.cookie file and use the vi or nano editor to update the non-MASTER nodes .erlang.cookie file you may add a next line character to the file. You’re better off using an FTP service to copy the .erlang.cookie file down from the MASTER node and copy it onto the non-MASTER machines.
  6. Set permissions of the .erlang.cookie file
    chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
    chmod 600 /var/lib/rabbitmq/.erlang.cookie
  7. Startup the MASTER RabbitMQ Daemon in detached mode (as root)
    rabbitmq-server -detached
  8. On each of the of the non-MASTER nodes, add them to the cluster and start them up one at a time (as root)
    #Stop App
    rabbitmqctl stop_app
     
    #Add the current machine to the cluster
    rabbitmqctl join_cluster rabbit@{MASTER_HOSTNAME}
     
    #Startup
    rabbitmqctl start_app
     
    #Check Status
    rabbitmqctl cluster_status
    • The “Check Status” command should return something like the following:
      1. Cluster status of node rabbit@{NODE_HOSTNAME} ...
        [{nodes,[{disc,[rabbit@{MASTER_HOSTNAME},rabbit@{NODE_HOSTNAME}]}]},
         {running_nodes,[rabbit@{MASTER_HOSTNAME},rabbit@{NODE_HOSTNAME}]}]
  9. Setup HA/Replication between Nodes
    1. Access the Management URL of one of the nodes (See “Managing the RabbitMQ Instance(s)” section bellow)
    2. Click on the Admin tab on top
    3. Click on the Policies tab on the right
    4. Add an HA policy
      1. Name: ha-all
      2. Pattern:
        1. leave blank
      3. Definitions:
        1. ha-mode: all
        2. ha-sync-mode: automatic
      4. Priority: 0
    5. Verify its setup correctly by navigating to the Queues section and clicking on one of the queues. You should see an entry in the Node and Slave section.
  10. Setup a load balancer to balance requests between the the Nodes
    • Port Forwarding
      1. Port 5672 (TCP) → Port 5672 (TCP)
      2. Port 15672 (HTTP) → Port 15672 (HTTP)
    • Health Check
      1. Protocol: HTTP
      2. Ping Port: 15672
      3. Ping Path: /
  11. Point all processes to that LB

Running RabbitMQ as Single Node

Start RabbitMQ

service rabbitmq-server start

Stop RabbitMQ

service rabbitmq-server stop

Restart RabbitMQ

service rabbitmq-server restart

Getting RabbitMQ Status

service rabbitmq-server status

Managing the RabbitMQ Instance(s)

When the RabbitMQ daemons are started you can then visit the management Web UI in a Web Browser to monitor the RabbitMQ instance(s):

http://{ANY_RABBITMQ_NODE_HOSTNAME}:15672/

Default credentials: guest/guest

Note: Change the password