Connect JDBC client to Apache drill - Apache drill in distributed mode and connection via JDBC client

In previous post we discussed about installation of Apache drill in embedded mode and SQL query execution against various data sources.The main agenda of this post is to set-up Apache drill in pseudo distributed mode(Apache drill on single node) and connect it with JDBC client and execute query against JSON file.
Pre-requisites:- In order to set-up Apache drill we must have Java 7 + and ZooKeeper installed in our system. If not available, Refer Oracle JDK installation in Ubuntu for Java installation and ZooKeeper standalone setup.

Apache drill installation in distributed mode:- 

1.Download a stable distribution of Apache drill from here OR execute one of the following commands and get specified version of Apache drill.
zytham@ubuntu:~$ wget http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz

zytham@ubuntu:~$ curl -o apache-drill-1.1.0.tar.gz http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz
2.Unzip downloaded distribution and move it to any location as per your convenience.Let's say we have placed Apache drill at location "/opt/drill".Now Apache drill home directory is /opt/drill.
zytham@ubuntu:~$ tar -xvzf ./Downloads/apache-drill-1.1.0.tar.gz
zytham@ubuntu:~$ mv apache-drill-1.1.0 /opt/drill
3.Update cluster ID and add Zookeeper details in <drill_home>/conf/drill-override.conf.Provide a unique cluster-id and the Zookeeper host names and port numbers in zk.connect(Keep default value - drillbits1 and 127.0.0.1:2181).After updating it should looks something like this:
drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "127.0.0.1:2181"
}
Note:- zk stands for ZooKeeper. zk.connect specify, ZooKeeper running on which address and default port 2181.
4. Go to home directory and start Apache drill in distributed mode. We have 3 ways to start drill in distributed mode, here we are using first approach- Using drillbit.sh script. Execute following command to start drill in distributed mode.
zytham@ubuntu:/opt/drill$ bin/drillbit.sh start
starting drillbit, logging to /opt/drill/log/drillbit.out
Note:-
1. By executing above command,drillbit daemon started on the given node where Apache drill is set-up.
2.With drillbit.sh script file, we can restart,stop or check running status of drill. general syntax is-
drillbit.sh [--config ] (start|stop|status|restart|autorestart) 

Starting Zookeeper:-

1. Go to ZooKeeper installation directory, say /opt/zookeeper.
2. Start ZooKeeper using following command.
zytham@ubuntu:/opt/zookeeper$ bin/zkServer.sh start
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

JDBC client program for Apache drill:-Apache drill ships with a JDBC driver which can be used to connect Apache drill with any JDBC client.The driver is available at location "/opt/drill/jars/jdbc-driver/drill-jdbc-all-1.1.0.jar".Below is the sample JDBC client which connect to running instance Apache drill and displays data from employee.json present in classpath.
Create a Java project, a class file with sample code given below and add above jar in Build path.
package com.devinline.apachedrill;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class ApacheDrillJDBCClient {
    static final String JDBC_DRIVER = "org.apache.drill.jdbc.Driver";
    static final String DB_URL = "jdbc:drill:zk=localhost:2181";

    static final String USER = "admin";
    static final String PASS = "admin";

    public static void main(String[] args) {
        Connection conn = null;
        Statement stmt = null;
        try{
            Class.forName(JDBC_DRIVER);
            conn = DriverManager.getConnection(DB_URL,USER,PASS);

            stmt = conn.createStatement();
            /* Perform a select on data in the classpath storage plugin. */
            String sql = "select employee_id, first_name,last_name,salary FROM cp.`employee.json` " +
              "where salary > 30000 and position_id=2";
            ResultSet rs = stmt.executeQuery(sql);
            System.out.println("EmployeeID" + " "+ "First Name " +"Last Name " + " "+ "Salary");
            System.out.println("-------------------------------------------------------------");
            while(rs.next()) {
                int employeeId  = rs.getInt("employee_id");
                String firstName = rs.getString("first_name");
                String lastName = rs.getString("last_name");
                String salary = rs.getString("salary");
                System.out.println(employeeId+ ":  "+ firstName+ " :  "+ lastName + " :  "+ salary);
            }

            rs.close();
            stmt.close();
            conn.close();
        } catch(SQLException se) {
            //Handle errors for JDBC
            se.printStackTrace();
        } catch(Exception e) {
            //Handle errors for Class.forName
            e.printStackTrace();
        } finally {
            try{
                if(stmt!=null)
                    stmt.close();
            } catch(SQLException se2) {
            }
            try {
                if(conn!=null)
                    conn.close();
            } catch(SQLException se) {
                se.printStackTrace();
            }
        }
    }
}
Explanation:- Above program is a simple JDBC client with DB_URL referring to Apache drill instance running as pseudo distributed cluster(Just one instance). Here "zk=localhost:2181" indicates that ZooKeeper is running locally at port number 2181 and drill instance is registered with ZooKeeper. Execute above program and it displays following results from employee.json file.
07:13:21.645 [main] DEBUG o.a.drill.common.config.NestedConfig - Loading configs at the following URLs [jar:file:/opt/drill/jars/jdbc-driver/drill-jdbc-all-1.1.0.jar!/drill-module.conf]
07:13:21.672 [main] DEBUG o.a.drill.common.config.NestedConfig - Setting up config object.
07:13:21.936 [main] DEBUG o.a.drill.common.config.NestedConfig - Config object initialized.
.................
..............................
07:13:32.751 [Client-1] DEBUG o.a.drill.exec.rpc.user.UserClient - Sending response with Sender 1054266969
EmployeeID First Name Last Name  Salary
-------------------------------------------------------------
2:  Derrick :  Whelply :  40000.0
4:  Michael :  Spence :  40000.0
5:  Maya :  Gutierrez :  35000.0
21:  Pedro :  Castillo :  35000.0
22:  Laurie :  Borges :  35000.0
....................
.................................
07:13:32.985 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down
07:13:32.986 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x1512a4eec010002 closed

Debugging:- 

Error: java.lang.IllegalStateException: No DrillbitEndpoint can be found -
Solution:- It indicates Apache drill instance is not running.Go to Apache drill home directory and execute following command. If drill is running, it gives message "drillbit is running" otherwise "drillbit not running."
zytham@ubuntu:/opt/drill$ bin/drillbit.sh status
drillbit not running.

Exception:java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_60]
Solution:- It suggest that ZooKeeper might not be running or host name is not resolved properly.Execute following command and verify that ZooKeeper is running or not.
zytham@ubuntu:/opt/zookeeper$ bin/zkServer.sh status
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.

Start ZooKeeper and Drill if not up and running, then execute client program. It should work like a charm.
Note:- JDBC client program does not work when drill is started in embedded mode, so always remember to start it with one of the three ways -  Using the drillbit.sh command , Starting the Drill Shell or Using an Ad-Hoc Connection to Drill


Read also:- Apache Drill installation in embedded mode (in Ubuntu 13.04) and SQL query execution in drill prompt

3 Comments

  1. Great article Nikhil.
    I have a problem with accessing Drill via JDBC.
    I have the same kind of program structure and works fine from the IDE and the Console, but when I try to execute the packaged .jar from within Cloudera environment via Job Designer it isn't working. Do you have any experience with executing the Drill JDBC - Java via Cloudera / MapR jobs ?

    ReplyDelete
  2. I must say that this is an excellent post, and I appreciate the information. You mention Connect JDBC client to Apache drill - Apache drill in distributed mode and connection via JDBC client. Your arguments are so strong that I want to read more of them. Now, though, I'm looking for the UK essay writing service for a school assignment. I gain from this, and I do my assignment faster. The rich material on the website might be useful to all students.

    ReplyDelete
Previous Post Next Post