Connect Tableau 9(Desktop version) with Hadoop Hive: Set-up and data analysis with Hive data source

Tableau supports connections to data stored in Cloudera distributions using Hive(Hiveserver1) and the data source's Hive ODBC driver.The main agenda of this post is to set-up Tableau 9.0 Desktop version to connect to Hadoop Hive(Cloudera distribution).
Prerequisite:-
1. Installation of cloudera distribution including Apache Hadoop CDH4u1.
2. Install Tableau 9 desktop version - Desktop version is not free,download trial version and install it.

Set-up Tableau 9 Desktop for Hadoop hive access

Step 1:  Download Hive ODBC driver, download latest version and install it in your machine where you have installed Tableau 9 desktop version. I have installed 32 bit of ODBC driver.

Step 2: After installation, do check whether driver is in place or not. Go to Start-> search for Open 32-bit ODBC Administrator.Click on "System DSN" tab and we find that Cloudera ODBC driver is in place.(Refer below diagram- Step 2)

Step 3: Now do DSN setup for Cloudera ODBC driver for Apache Hive. Click on Configure button and select Hive server Type as "Hive server 1", Host name is IP address of my machine where my cloudera setup is running.Keep port as 10000 and Database as default.(Refer above diagram- Step 3)

Step 4: Test hive connectivity, click on Test button at bottom - pop up should display "TESTS COMPLETED SUCCESSFULLY!".(Refer above diagram- Step 4)

Step 5: Connect Tableau with Hive via Cloudera ODBC driver 
  1. In Tableau Desktop, select the appropriate server and enter the information required to connect. Go to More servers -> Other databases (ODBC). 
  2. Now, select the schema and table(Step 1 and 2 in below diagram) followed by search for appropriate table (Step 3) of your interest to build your data source.
  3. Now drag or double click on listed table and create database join between appropriate fields.Here I have used two tables employee and department and performed inner join on department Id. (Step 4 and 5 in above diagram)
  4. Open workbook/sheet. Click "Sheet #"(left bottom), drag and drop fields(or right click on each field of interest and add to sheet) to build data analysis views.Below diagram shows data interpretation about Employee count vs Location and Average salary and Department.(Select Dname and Loc from Dimension & select salary from Measures). Apply function and create a new field Employee count - right click on employee_id -> create -> Calculated field. Use function "COUNT([Employee Id])" in text box . 
  1. Similarly, we can generate other statistics(side by side bar or packet bubble representation ) from these hive data source.


References:-
http://kb.tableau.com/articles/knowledgebase/hadoop-hive-connection

3 Comments

Previous Post Next Post