Nov 22, 2015

Textual description of firstImageUrl

Apache drill with Tableau 9 Desktop: Set-up and data analysis with JSON data source (SQL query against JSON)

Tableau is an interactive data visualization products focused on business intelligence and it provides support for integration with Hive, Apache drill and other big data storage or data processing engine. The main agenda of this post is to set-up Tableau 9 Desktop version and understand how to connect it to Apache Drill and explore multiple data formats with it.Apache drill.
1.Apache drill should be installed in distributed mode, if not first set-up Apache drill in distributed mode.
2.Install Tableau 9 desktop version - Desktop version is not free,download trial version and install it.
3.Apache ZooKeeper set-up, if not existing refer ZooKeeper installation 

Set-up Apache Drill with Tableau 9 Desktop

1. Download ODBC driver(provided by MapR) - Tableau 9.0 Desktop 64 bit can use either the 32-bit driver or the 64-bit driver. Download 32 bit driver installer
2. Install downloaded driver(keep default settings) and after installation verify the installation. Refer this for the same. Installing the ODBC Administrator installs Drill Explorer and the Tableau TDC file.
3. Configure ODBC driver - here we have to create a Data Source Name using ODBC Administrator app. Go to start-> find 32-bit ODBC Administrator. Refer this for more detail.
Click System DSN tab in ODBC Administrator window and click on configure. Update fields : Direct to Drillbit and port (IP address of my machine where Apache drill instance is running) and port number 31010 is default port used.Click on test and check for connectivity with Apache drill.
ODBC configuration for Apache Drill connection

4. Now connect Tableau to Drill via ODBC- Start tableau and create new data source of type(Other oracle ODBC).Click More servers -> Other oracle ODBC and select DSN as "MapR ODBC Driver for Drill DSN", update port number as 31010(default port unless modified).
5. Select schema(dfs.default) and double click on New Custom SQL. Execute following query in sequence - one for employee.json and another one for department.json. 
1. SELECT * FROM `dfs`.`default`.`./home/zytham/data/employee.json`
2. SELECT * FROM `dfs`.`default`.`./home/zytham/data/department.json`
These employee.json and department.json sample shipped with Apache drill, download it from here.

Click New custom SQL and execute both query mentioned above one after another.
6. After executing both SQL query against JSON, create join between employee and department based on department_id as shown below.

7. Click on Sheet 1(bottom left). right click on marital status and gender field from EMP_SQL_QUERY(Dimension section) and do "Add to sheet". Similarly, right click on salary from EMP_SQL_QUERY(Measures section) and do "Add to sheet". It will generate a bar graph automatically based on gender and marital status with total salary. Change total salary to average salary - Right click on Sum(salary) -> Measure -> Average.
8. Similarly, we can generate some other analysis chart based on - gender,educational level and average salary.Right click on these fields and add to sheet.  From show me panel, select side by side bar display option.

References :- 
Location: Hyderabad, Telangana, India