Impala Load Balancer

  • Impala Load Balancer

                Impala load balancer is a proxy value which is used to connect to the Impala server in case of HA(High Availability cluster). Consider a scenario of a production cluster where a multiple number of Impala jobs are running at a particular time. Load Balancer equalizes a load of all the jobs by splitting within different data nodes.

  • What is need of Load Balancer in Impala
  1.  In the case of clusters where Impala load balancer is not introduced, Impala jobs run through a data node or daemon which is in the active state. In this scenario, need to keep track of the data nodes/daemons which are up and running fine. While running the job through Impala you need to pass this value or else you can hard code the value if the given data nodes/daemons go down Impala jobs will start failing.
  2. Another case which could be when the multiple jobs are submitted to the particular daemon/data nodes, daemon/data nodes can be overwhelmed and goes down. To restart the service Impala State Store and Catalog roles need to deleted and assign to another daemon/data nodes.

                  To avoid these situations, Impala load balancer is introduced which make the service as HA and reduces the chance of Impala jobs failure.

  • Advantage of Load Balancer
    • Using load balancer gives the freedom to connect to the proxy server directly, in place of tracking the hosts where impalad daemon/data nodes are running.
    • In the case of the host running impalad daemon becomes unavailable or goes down, job request to connect impala will be even success in that scenario because jobs will connect to the proxy server instead of the particular data node.
    • Impala jobs run on the coordinator node, which requires more memory and CPU to process the queries. In the case of load balancer jobs uses different coordinator nodes based on round robin processing and thus reduces the additional tasks from a single machine.
  • Do Impala load balancer support kerberization

                Impala load balancer is compatible with the Kerberos connection.

  • How to run jobs using Load Balancer

                To run the Impala job we need to pass the value of load balancer/proxy server in the impala connect command. We can either connect to impala from the command line or be using the script.

Sample Query:-

                Using Direct Query:         Impala_connect -i -e””

                Using File:                           Impala_connect -i -f””

Leave a comment