Amazon Web Services

Impala Load Balancing with Amazon Elastic Load Balancer

In a previous post, we explained how to configure a proxy server to provide load balancing for the Impala daemon. The proxy software used was HAproxy, a free, open source load balancer. This post will demonstrate how to use Amazon’s Elastic Load Balancer (ELB) to perform Impala load balancing when running in Amazon’s Elastic Compute Cloud (EC2). Details Similar to HAproxy, an Elastic Load Balancer is a reverse proxy that will take incoming TCP connections and distribute them amongst a set of EC2 instances. This is done partly for fault tolerance and partly for load distribution. Cloudera’s Using Impala through a Proxy for High Availability details how load balancing applies to part of Impala. To summarize, the proxy will allow us to configure our Impala clients (Hue, Tableau, etc) with a single hostname and port. This well-known hostname will not have to be changed out if there were to be…