Nov 23, 2015

Textual description of firstImageUrl

Schema on write, Schema before read and Schema on the fly - Different level of schema adherence/enforcement

Organizations dealing with large volume of data are moving with rapid pace so as the their volume of data.In past 10 years,they have shown more inclination towards non-relational database over relational database.With advancement of NoSQL databases(MongoDB,Cassandra) and big data technologies(Hive, ApacheDrill, HBase) data analysis can be performed as efficiently as carried out with relational databases. The main agenda of this post is to discuss - how does schema enforcement is adopted in various relational/non-relational databases technologies and pros/cons associated with each of them.Schema declaration or finding for any database/data-source can be broadly classified in two category :
  1. Schema declared in advance - Schema on write and Schema before read
  2. Schema discovered on the fly - Schema on the fly 
Overview of schema declaration/discovery paradigm

Schema declared in advance

Traditionally relational databases like Oracle, DB2(IBM),SQL server(Microsoft),Vertica(HP), etc enforces the very first thing to do - create schema of database objects(tables, views,etc) and then perform database operation like data loading.This design methodology is termed as schema on write(data structuring is verified against schema when it is written to database).Sometimes RDBMS modelling is termed as- prescriptive data modelling (1.create static DB schema, 2.transform data into RDBMS and 3.Query data in RDBMS format).

Nov 22, 2015

Textual description of firstImageUrl

Apache drill with Tableau 9 Desktop: Set-up and data analysis with JSON data source (SQL query against JSON)

Tableau is an interactive data visualization products focused on business intelligence and it provides support for integration with Hive, Apache drill and other big data storage or data processing engine. The main agenda of this post is to set-up Tableau 9 Desktop version and understand how to connect it to Apache Drill and explore multiple data formats with it.Apache drill.
Prerequisites:- 
1.Apache drill should be installed in distributed mode, if not first set-up Apache drill in distributed mode.
2.Install Tableau 9 desktop version - Desktop version is not free,download trial version and install it.
3.Apache ZooKeeper set-up, if not existing refer ZooKeeper installation 

Set-up Apache Drill with Tableau 9 Desktop

1. Download ODBC driver(provided by MapR) - Tableau 9.0 Desktop 64 bit can use either the 32-bit driver or the 64-bit driver. Download 32 bit driver installer
2. Install downloaded driver(keep default settings) and after installation verify the installation. Refer this for the same. Installing the ODBC Administrator installs Drill Explorer and the Tableau TDC file.
3. Configure ODBC driver - here we have to create a Data Source Name using ODBC Administrator app. Go to start-> find 32-bit ODBC Administrator. Refer this for more detail.
Click System DSN tab in ODBC Administrator window and click on configure. Update fields : Direct to Drillbit and port (IP address of my machine where Apache drill instance is running) and port number 31010 is default port used.Click on test and check for connectivity with Apache drill.
ODBC configuration for Apache Drill connection

4. Now connect Tableau to Drill via ODBC- Start tableau and create new data source of type(Other oracle ODBC).Click More servers -> Other oracle ODBC and select DSN as "MapR ODBC Driver for Drill DSN", update port number as 31010(default port unless modified).
5. Select schema(dfs.default) and double click on New Custom SQL. Execute following query in sequence - one for employee.json and another one for department.json. 
1. SELECT * FROM `dfs`.`default`.`./home/zytham/data/employee.json`
2. SELECT * FROM `dfs`.`default`.`./home/zytham/data/department.json`
These employee.json and department.json sample shipped with Apache drill, download it from here.

Click New custom SQL and execute both query mentioned above one after another.
6. After executing both SQL query against JSON, create join between employee and department based on department_id as shown below.

Nov 21, 2015

Textual description of firstImageUrl

Connect JDBC client to Apache drill - Apache drill in distributed mode and connection via JDBC client

In previous post we discussed about installation of Apache drill in embedded mode and SQL query execution against various data sources.The main agenda of this post is to set-up Apache drill in pseudo distributed mode(Apache drill on single node) and connect it with JDBC client and execute query against JSON file.
Pre-requisites:- In order to set-up Apache drill we must have Java 7 + and ZooKeeper installed in our system. If not available, Refer Oracle JDK installation in Ubuntu for Java installation and ZooKeeper standalone setup.

Apache drill installation in distributed mode:- 

1.Download a stable distribution of Apache drill from here OR execute one of the following commands and get specified version of Apache drill.
zytham@ubuntu:~$ wget http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz

zytham@ubuntu:~$ curl -o apache-drill-1.1.0.tar.gz http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz
2.Unzip downloaded distribution and move it to any location as per your convenience.Let's say we have placed Apache drill at location "/opt/drill".Now Apache drill home directory is /opt/drill.
zytham@ubuntu:~$ tar -xvzf ./Downloads/apache-drill-1.1.0.tar.gz
zytham@ubuntu:~$ mv apache-drill-1.1.0 /opt/drill
3.Update cluster ID and add Zookeeper details in <drill_home>/conf/drill-override.conf.Provide a unique cluster-id and the Zookeeper host names and port numbers in zk.connect(Keep default value - drillbits1 and 127.0.0.1:2181).After updating it should looks something like this:
drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "127.0.0.1:2181"
}
Note:- zk stands for ZooKeeper. zk.connect specify, ZooKeeper running on which address and default port 2181.
4. Go to home directory and start Apache drill in distributed mode. We have 3 ways to start drill in distributed mode, here we are using first approach- Using drillbit.sh script. Execute following command to start drill in distributed mode.
zytham@ubuntu:/opt/drill$ bin/drillbit.sh start
starting drillbit, logging to /opt/drill/log/drillbit.out
Note:-
1. By executing above command,drillbit daemon started on the given node where Apache drill is set-up.
2.With drillbit.sh script file, we can restart,stop or check running status of drill. general syntax is-
drillbit.sh [--config ] (start|stop|status|restart|autorestart) 

Starting Zookeeper:-

1. Go to ZooKeeper installation directory, say /opt/zookeeper.
2. Start ZooKeeper using following command.
zytham@ubuntu:/opt/zookeeper$ bin/zkServer.sh start
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

JDBC client program for Apache drill:-Apache drill ships with a JDBC driver which can be used to connect Apache drill with any JDBC client.The driver is available at location "/opt/drill/jars/jdbc-driver/drill-jdbc-all-1.1.0.jar".Below is the sample JDBC client which connect to running instance Apache drill and displays data from employee.json present in classpath.
Create a Java project, a class file with sample code given below and add above jar in Build path.
package com.devinline.apachedrill;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class ApacheDrillJDBCClient {
    static final String JDBC_DRIVER = "org.apache.drill.jdbc.Driver";
    static final String DB_URL = "jdbc:drill:zk=localhost:2181";

    static final String USER = "admin";
    static final String PASS = "admin";

    public static void main(String[] args) {
        Connection conn = null;
        Statement stmt = null;
        try{
            Class.forName(JDBC_DRIVER);
            conn = DriverManager.getConnection(DB_URL,USER,PASS);

            stmt = conn.createStatement();
            /* Perform a select on data in the classpath storage plugin. */
            String sql = "select employee_id, first_name,last_name,salary FROM cp.`employee.json` " +
              "where salary > 30000 and position_id=2";
            ResultSet rs = stmt.executeQuery(sql);
            System.out.println("EmployeeID" + " "+ "First Name " +"Last Name " + " "+ "Salary");
            System.out.println("-------------------------------------------------------------");
            while(rs.next()) {
                int employeeId  = rs.getInt("employee_id");
                String firstName = rs.getString("first_name");
                String lastName = rs.getString("last_name");
                String salary = rs.getString("salary");
                System.out.println(employeeId+ ":  "+ firstName+ " :  "+ lastName + " :  "+ salary);
            }

            rs.close();
            stmt.close();
            conn.close();
        } catch(SQLException se) {
            //Handle errors for JDBC
            se.printStackTrace();
        } catch(Exception e) {
            //Handle errors for Class.forName
            e.printStackTrace();
        } finally {
            try{
                if(stmt!=null)
                    stmt.close();
            } catch(SQLException se2) {
            }
            try {
                if(conn!=null)
                    conn.close();
            } catch(SQLException se) {
                se.printStackTrace();
            }
        }
    }
}
Explanation:- Above program is a simple JDBC client with DB_URL referring to Apache drill instance running as pseudo distributed cluster(Just one instance). Here "zk=localhost:2181" indicates that ZooKeeper is running locally at port number 2181 and drill instance is registered with ZooKeeper. Execute above program and it displays following results from employee.json file.
07:13:21.645 [main] DEBUG o.a.drill.common.config.NestedConfig - Loading configs at the following URLs [jar:file:/opt/drill/jars/jdbc-driver/drill-jdbc-all-1.1.0.jar!/drill-module.conf]
07:13:21.672 [main] DEBUG o.a.drill.common.config.NestedConfig - Setting up config object.
07:13:21.936 [main] DEBUG o.a.drill.common.config.NestedConfig - Config object initialized.
.................
..............................
07:13:32.751 [Client-1] DEBUG o.a.drill.exec.rpc.user.UserClient - Sending response with Sender 1054266969
EmployeeID First Name Last Name  Salary
-------------------------------------------------------------
2:  Derrick :  Whelply :  40000.0
4:  Michael :  Spence :  40000.0
5:  Maya :  Gutierrez :  35000.0
21:  Pedro :  Castillo :  35000.0
22:  Laurie :  Borges :  35000.0
....................
.................................
07:13:32.985 [main-EventThread] INFO  org.apache.zookeeper.ClientCnxn - EventThread shut down
07:13:32.986 [main] INFO  org.apache.zookeeper.ZooKeeper - Session: 0x1512a4eec010002 closed

Debugging:- 

Error: java.lang.IllegalStateException: No DrillbitEndpoint can be found -
Solution:- It indicates Apache drill instance is not running.Go to Apache drill home directory and execute following command. If drill is running, it gives message "drillbit is running" otherwise "drillbit not running."
zytham@ubuntu:/opt/drill$ bin/drillbit.sh status
drillbit not running.

Exception:java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_60]
Solution:- It suggest that ZooKeeper might not be running or host name is not resolved properly.Execute following command and verify that ZooKeeper is running or not.
zytham@ubuntu:/opt/zookeeper$ bin/zkServer.sh status
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.

Start ZooKeeper and Drill if not up and running, then execute client program. It should work like a charm.
Note:- JDBC client program does not work when drill is started in embedded mode, so always remember to start it with one of the three ways -  Using the drillbit.sh command , Starting the Drill Shell or Using an Ad-Hoc Connection to Drill


Read also:- Apache Drill installation in embedded mode (in Ubuntu 13.04) and SQL query execution in drill prompt

Nov 20, 2015

Textual description of firstImageUrl

Apache Drill installation in embedded mode (in Ubuntu 13.04) and SQL query execution in drill prompt


A distributed SQL engine designed for data-intensive distributed applications for interactive analysis of large-scale datasets.It follows Schema-free JSON document model similar to MongoDB and has pluggable architecture that enables connectivity to multiple datastores. Apache drill project inspired form Google's Dremel system which is available as an infrastructure service called Google BigQuery. key features of Apache drill(Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage) are-
  • Drill supports a variety of NoSQL databases and file systems like HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. 
  • Data from multiple datastores(NoSQL,HDFS or HBase) can be joined using single query.
  • Drill processes the data in-situ(in place) without requiring users to define schemas or transform data.
  • Drill support industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs
The main agenda of this post is to install/set-up Apache drill in Linux(Ubuntu) machine and execute sample SQL like query on non-relational data storage  -JSON/csv file. Apache drill can be installed in Embedded mode or Distributed mode.In this post, we are going to install in embedded mode.Read Apache drill installation in distributed mode and JDBC client connection.
Note:- Oracle Java SE Development (JDK) Kit 7 + must be available in our system, if not available refer Oracle JDK installation in Ubuntu.

Apache drill installation in embedded mode 

1. Download a stable distribution of Apache drill from here  OR execute one of the following commands and get specified version of Apache drill.
zytham@ubuntu:~$ wget http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz

zytham@ubuntu:~$ curl -o apache-drill-1.1.0.tar.gz http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz
2. Unzip downloaded distribution and move it to any location as per your convenience.Let's say we have placed apache drill at location "/opt/drill".Now Apache drill installation directory is /opt/drill.
zytham@ubuntu:~$ tar -xvzf ./Downloads/apache-drill-1.1.0.tar.gz
zytham@ubuntu:~$ mv apache-drill-1.1.0 /opt/drill
3. Go to drill home directory /opt/drill and start Apache drill in embedded mode using following command.On success,drill prompt appears.
zytham@ubuntu:/opt/drill$ ./bin/drill-embedded 
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Nov 20, 2015 9:54:10 PM org.glassfish.jersey.server.ApplicationHandler initialize
INFO: Initiating Jersey application, version Jersey: 2.8 2014-04-29 01:25:26...
apache drill 1.0.0 
"say hello to my little drill"
0: jdbc:drill:zk=local> 
Error occurs while staring drill shell in embedded mode :- This error occurs, if no sufficient memory is available with JVM to allocate.(I am running Apache drill in VM with less memory configuration,)
zytham@ubuntu:/opt/drill$ ./bin/drill-embedded 
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006c0000000, 1431633920, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 1431633920 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /opt/drill/hs_err_pid3122.log
Solution:- In order to resolve this issue, we need to modify values assigned to properties DRILL_MAX_DIRECT_MEMORY and DRILL_HEAP in <DRILL_HOME>/conf/drill-env.sh.
  • Open drill-env.sh and update these two fields :- DRILL_MAX_DIRECT_MEMORY and DRILL_HEAP so that memory allocated to starting script is reduced.
    DRILL_MAX_DIRECT_MEMORY="1G"
    DRILL_HEAP="512M"
  • Save it and again start drill shell in embedded mode. 

SQL query execution in drill prompt

From drill prompt, we can execute SQL query on flat files (JSON, CSV, Praquet,etc.). Here we are using sample JSON file employee.json shipped with Apache drill, packaged in the Foodmart data JAR in Drill's classpath availale at <DRILL_HOME>/jars/3rdparty/foodmart-data-json-0.4.jar.
Note:- Below cp.'employee.json' refers to employee json file available in class path.
0: jdbc:drill:zk=local> show databases;
+---------------------+
|     SCHEMA_NAME     |
+---------------------+
| INFORMATION_SCHEMA  |
| cp.default          |
| dfs.default         |
| dfs.root            |
| dfs.tmp             |
| sys                 |
+---------------------+
6 rows selected (3.015 seconds)
0: jdbc:drill:zk=local> select employee_id, first_name,last_name,position_id,salary FROM cp.`employee.json` where salary > 30000;
+--------------+-------------+------------+--------------+----------+
| employee_id  | first_name  | last_name  | position_id  |  salary  |
+--------------+-------------+------------+--------------+----------+
| 1            | Sheri       | Nowmer     | 1            | 80000.0  |
| 2            | Derrick     | Whelply    | 2            | 40000.0  |
| 4            | Michael     | Spence     | 2            | 40000.0  |
| 5            | Maya        | Gutierrez  | 2            | 35000.0  |
| 10           | Darren      | Stanz      | 5            | 50000.0  |
| 21           | Pedro       | Castillo   | 2            | 35000.0  |
| 22           | Laurie      | Borges     | 2            | 35000.0  |
| 36           | Donna       | Arnold     | 7            | 45000.0  |
+--------------+-------------+------------+--------------+----------+
8 rows selected (0.231 seconds)
0: jdbc:drill:zk=local> select employee_id, first_name,last_name,position_id,salary FROM cp.`employee.json` where salary > 30000 and position_id=2;
+--------------+-------------+------------+--------------+----------+
| employee_id  | first_name  | last_name  | position_id  |  salary  |
+--------------+-------------+------------+--------------+----------+
| 2            | Derrick     | Whelply    | 2            | 40000.0  |
| 4            | Michael     | Spence     | 2            | 40000.0  |
| 5            | Maya        | Gutierrez  | 2            | 35000.0  |
| 21           | Pedro       | Castillo   | 2            | 35000.0  |
| 22           | Laurie      | Borges     | 2            | 35000.0  |
+--------------+-------------+------------+--------------+----------+
5 rows selected (0.276 seconds)
Note:- Here we have not specified any meta-data information(of columns), it is one of the brightest features of Apache drill.With Apache drill we can directly query self-describing data (eg, JSON, Parquet) without having to create and manage schemas. Drill is the world's only query engine that compiles and re-compiles queries at runtime.Drill's flexible JSON data model and on-the-fly schema discovery enable it to query self-describing data. This allows Drill to achieve high performance without knowing the structure of the data in advance.(Source: Apache drill faq).

Database Join operation:- In continuation of that, lets perform join operation on two json files(employee.json and department.json).First execute a query with aliasing followed by join two two files and get department_description from department_id.
0: jdbc:drill:zk=local> select emp.employee_id, emp.first_name,emp.salary,emp.department_id FROM cp.`employee.json` emp where emp.salary <40000 and emp.salary>21000;
+--------------+-------------+----------+----------------+
| employee_id  | first_name  |  salary  | department_id  |
+--------------+-------------+----------+----------------+
| 5            | Maya        | 35000.0  | 1              |
| 6            | Roberta     | 25000.0  | 2              |
| 20           | Beverly     | 30000.0  | 1              |
| 21           | Pedro       | 35000.0  | 1              |
| 22           | Laurie      | 35000.0  | 1              |
+--------------+-------------+----------+----------------+
0: jdbc:drill:zk=local> select emp.employee_id, emp.first_name,emp.salary,emp.department_id,dept.department_description FROM cp.`employee.json` emp , cp.`department.json` dept where emp.salary <40000 and emp.salary>21000 and emp.department_id = dept.department_id;
+--------------+-------------+----------+----------------+-------------------------+
| employee_id  | first_name  |  salary  | department_id  | department_description  |
+--------------+-------------+----------+----------------+-------------------------+
| 5            | Maya        | 35000.0  | 1              | HQ General Management   |
| 6            | Roberta     | 25000.0  | 2              | HQ Information Systems  |
| 20           | Beverly     | 30000.0  | 1              | HQ General Management   |
| 21           | Pedro       | 35000.0  | 1              | HQ General Management   |
| 22           | Laurie      | 35000.0  | 1              | HQ General Management   |
+--------------+-------------+----------+----------------+-------------------------+

SQL query on CSV file:-
Create a csv(comma separated values) file at location /tmp/sample.csv and copy following data in it.(Physical attributes of four people.)
1, NIKS, 79 , 5.8
2, CKM , 83 , 5.7
3, RITZ, 85 ,5.10
4, YUNNUS, 71, 5.11
Individual columns from CSV file is extracted - considering each row of csv file as zero indexed based array like columns[0], columns[1] and so on.
0: jdbc:drill:zk=local> select * FROM dfs.`/tmp/sample.csv`;
+---------------------------------+
|             columns             |
+---------------------------------+
| ["1"," NIKS"," 79 "," 5.8 "]    |
| ["2"," CKM "," 83 "," 5.7"]     |
| ["3"," RITZ"," 85 ","5.10"]     |
| ["4"," YUNNUS"," 71"," 5.11 "]  |
+---------------------------------+
4 rows selected (0.189 seconds)
0: jdbc:drill:zk=local> select columns[0] as id, columns[1] name, columns[2] as weight, columns[3] as height FROM dfs.`/tmp/sample.csv`;
+-----+----------+---------+---------+
| id  |   name   | weight  | height  |
+-----+----------+---------+---------+
| 1   |  NIKS    |  79     |  5.8    |
| 2   |  CKM     |  83     |  5.7    |
| 3   |  RITZ    |  85     | 5.10    |
| 4   |  YUNNUS  |  71     |  5.11   |
+-----+----------+---------+---------+
4 rows selected (0.288 seconds)

Querying multiple data source in single query :-

In Apache drill multiple data source(CSV,JSON, etc) can be accessed and joined in single query. Below query access employee.json, department.json and sample.csv file and display employee details with physical attributes.
0: jdbc:drill:zk=local> select emp.employee_id, phy.columns[1] as Name ,dept.department_description,phy.columns[2] as Weight , phy.columns[3] as Height FROM cp.`employee.json` emp , cp.`department.json` dept, dfs.`/tmp/sample.csv` phy where CAST(emp.employee_id AS INT) =  CAST(phy.columns[0] AS INT) and emp.department_id = dept.department_id;
+--------------+----------+-------------------------+---------+---------+
| employee_id  |   Name   | department_description  | Weight  | Height  |
+--------------+----------+-------------------------+---------+---------+
| 1            |  NIKS    | HQ General Management   |  79     |  5.8    |
| 2            |  CKM     | HQ General Management   |  83     |  5.7    |
| 4            |  YUNNUS  | HQ General Management   |  71     |  5.11   |
+--------------+----------+-------------------------+---------+---------+
3 rows selected (0.679 seconds)
In above result obtained from the given query, Name,Weight and Height is from csv file and other two fields are from two different JSON file.
Explicit type casting is mandatory when we are dealing with different sources,otherwise we will get error some thing like this.
Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between 1. Numeric data 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: BIGINT, Right type: VARCHAR. Add explicit casts to avoid this error

Apache drill web console:- Apache drill provides web Console, where we can customize data source and add new Storage Plugin(i.e:- add support for MySQL database). Below is the web console url with default port 8047.
http://<IP address or host name>:8047
In terminal type ifconfig and get ip(inet) address, update above url accordingly and open it in browser.
Apache drill web console and configuration window 
In above web console diagram-1, cp refers to classpath, dfs refers to file system(local or others).When we click on update button of any storage plugin we get configuration window - where we can update connection properties,supported file format and its extensions, workspaces(where files are present physically).
While querying CSV file in above example,we created sample.csv at location /tmp/ because there is a entry for workspace location /temp/ and write permission is enabled(Refer diagram-2.). We can modify these settrings as per our convenience.
Note:-
1. By convention this /tmp/ directory is used to create "view" and it is created locally.
CREATE or REPLACE view dfs.tmp.<VIEW_NAME> as <SLLECT_QUERY>
2. It is not mandatory to use configured location in SQL query, we can place data source file at some location and in SQL query we can specify the same as follows:
SELECT * FROM dfs.`/users/zytham/Downloads/data/<data_source_json>`;
Stopping Apache drill:- 
Execute following command to exit prompt and reach at terminal.
0: jdbc:drill:zk=local> !quit

In next post, we will see how to connect Apache drill with JDBC client and query in the same way we did form Apache drill prompt.

References :- Apache drill official documentation

Nov 17, 2015

Textual description of firstImageUrl

Configuration SQuirreL SQL Client for DB2 database : Execute SQL query from SQuirreL SQL client against DB2

SQuirreL SQL Client is a graphical SQL client written in Java that allows us to view the structure of a JDBC compliant database, browse the data in tables, issue SQL commands etc. Follow below steps to configure SQuirreL for DB2 database.
  1. Download SQuirreL SQL Client and Download DB2 driver.
  2. Start SQuirreL client(It is a jar file,plain java application).
    > java  -jar  <SQUrrel_Jar>
  3. Open Driver list from left menu, click the plus sign "Create a New Driver". Fill the popup window as per following guidance and diagram.
    Name = Any_name_as_per_convinience<IBM DB2 Universal Driver>
    Example URL =  jdbc:db2://<server_address>:<port>/<dbname>
    website URL = www.ibm.com (optional)
    Class Name  = com.ibm.db2.jcc.DB2Driver
  4. Click Extra Class Path tab and browse DB2 driver downloaded above and click Ok.
  5. Create an alias out of this Driver created above. Open Aliases list and click plus - Create a new Alias. Fill out the popup window,Test connection and press Ok.
    Name = Any_name_as_per_convinience
    Driver = select driver same as created above(IBM DB2 Universal Driver)
    UserName = <username>
    Password = <passoword>
Now we are ready to execute SQL command from SQuirreL SQL Client. Right click on DB2 Alias and click connect.It opens a worksheet where we can execute SQL command.


Nov 15, 2015

Textual description of firstImageUrl

Connect Tableau 9(Desktop version) with Hadoop Hive: Set-up and data analysis with Hive data source

Tableau supports connections to data stored in Cloudera distributions using Hive(Hiveserver1) and the data source's Hive ODBC driver.The main agenda of this post is to set-up Tableau 9.0 Desktop version to connect to Hadoop Hive(Cloudera distribution).
Prerequisite:-
1. Installation of cloudera distribution including Apache Hadoop CDH4u1.
2. Install Tableau 9 desktop version - Desktop version is not free,download trial version and install it.

Set-up Tableau 9 Desktop for Hadoop hive access

Step 1:  Download Hive ODBC driver, download latest version and install it in your machine where you have installed Tableau 9 desktop version. I have installed 32 bit of ODBC driver.

Step 2: After installation, do check whether driver is in place or not. Go to Start-> search for Open 32-bit ODBC Administrator.Click on "System DSN" tab and we find that Cloudera ODBC driver is in place.(Refer below diagram- Step 2)

Step 3: Now do DSN setup for Cloudera ODBC driver for Apache Hive. Click on Configure button and select Hive server Type as "Hive server 1", Host name is IP address of my machine where my cloudera setup is running.Keep port as 10000 and Database as default.(Refer above diagram- Step 3)

Step 4: Test hive connectivity, click on Test button at bottom - pop up should display "TESTS COMPLETED SUCCESSFULLY!".(Refer above diagram- Step 4)

Step 5: Connect Tableau with Hive via Cloudera ODBC driver 
  1. In Tableau Desktop, select the appropriate server and enter the information required to connect. Go to More servers -> Other databases (ODBC). 
  2. Now, select the schema and table(Step 1 and 2 in below diagram) followed by search for appropriate table (Step 3) of your interest to build your data source.
  3. Now drag or double click on listed table and create database join between appropriate fields.Here I have used two tables employee and department and performed inner join on department Id. (Step 4 and 5 in above diagram)
  4. Open workbook/sheet. Click "Sheet #"(left bottom), drag and drop fields(or right click on each field of interest and add to sheet) to build data analysis views.Below diagram shows data interpretation about Employee count vs Location and Average salary and Department.(Select Dname and Loc from Dimension & select salary from Measures). Apply function and create a new field Employee count - right click on employee_id -> create -> Calculated field. Use function "COUNT([Employee Id])" in text box . 
  1. Similarly, we can generate other statistics(side by side bar or packet bubble representation ) from these hive data source.


References:-
http://kb.tableau.com/articles/knowledgebase/hadoop-hive-connection

Nov 13, 2015

Textual description of firstImageUrl

Apache Hive - Introduction, Installation and database operations

Apache Hive:- It is a data warehouse infrastructure built on top of HDFS and MapReduce(HDFS provides distributed storage and MapReduce is execution model) and facilitates querying & managing large datasets residing in distributed storage.It provides an SQL-like language called HiveQL which transparently converts queries to map/reduce, Apache Tez and Spark jobs. It allows programmers to plug in their custom mappers and reducers for sophisticated analysis that may not be supported by the built-in capabilities of the language. Apache hive does not restrict for any particular file format, works equally well with even customized file format. Read more about Apache hive here and a nice discussion about it.
The main agenda of this post is to set-up hive in Linux(Ubuntu) already having hadoop setup running and execute SQL like query in hive environment.Since apace Hive built on top of Apache Hadoop, hadoop setup is prerequisite for running Hive.
Step 1:- Download stable release of Hive from here. I have downloaded apache-hive-1.2.1-bin.tar.gz, and unzip it, unzipped folder it creates is apache-hive-1.2.1-bin.
zytham@ubuntu:~/Downloads$ tar -xvzf apache-hive-1.2.1-bin.tar.gz
Step 2:- Move unzipped at some convenient location. I moved it at location : /usr/local/hive
zytham@ubuntu:~/Downloads$ sudo mv apache-hive-1.2.1-bin /usr/local/hive
Step 3:- Open bashrc file and update it with following entries.
#for Hive
export HADOOP_HOME=/usr/local/hadoop2.6.1
export HIVE_HOME=/usr/local/hive1.2
export PATH=$HIVE_HOME/bin:$PATH
Step 3:- Start hadoop services.Hive internally uses hdoop(or other computation engine too). Here we are relying on hadoop.
hduser1@ubuntu:/usr/local/hadoop2.6.1/sbin$ ./start-all.sh
Step 4:- Create two directories which is used by Hive for "location of default database for the warehouse" and "to restore the HBase table snapshot"- default location of directories is specified in configuration file hive-default.xml.template under $HIVE_HOME/conf/(search for property names hive.metastore.warehouse.dir and hive.hbase.snapshot.restoredir.)
Execute following command to create two directories.
hduser1@ubuntu:/usr/local/hadoop2.6.1$ ./bin/hadoop fs -mkdir -p /user/hive/warehouse
hduser1@ubuntu:/usr/local/hadoop2.6.1$ ./bin/hadoop fs -mkdir -p /tmp
Note:- Above command executed from hduser1 context, since I have installed hadoop under user hduser1.
Step 5:- Hadoop is installed in the context of user hduser1,give write permission to other users so that hive query can be executed from other user context. Execute following command for the same.
hduser1@ubuntu:/usr/local/hadoop2.6.1$ ./bin/hadoop fs -chmod -R a+w /user/hive/warehouse
hduser1@ubuntu:/usr/local/hadoop2.6.1$ ./bin/hadoop fs -chmod -R a+w /tmp
Step 6:- Start apache Hive prompt, just type hive in terminal and press enter.
zytham@ubuntu:~$ hive
Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.099 seconds, Fetched: 1 row(s)
Note:- 1. Error : "[ERROR] Terminal initialization failed; falling back to unsupported
java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected"
Reason behind it is Hive has upgraded to Jline2 but jline 0.94 exists in the Hadoop lib.
Solution:- Execute following command in terminal followed by hive shell command.
zytham@ubuntu:~$ export HADOOP_USER_CLASSPATH_FIRST=true
zytham@ubuntu:~$ hive
Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
Refer this for more error and resolution in Hive.
2. Press Ctril + C to exit form Hive prompt and return to terminal.

Database operations - HiveQL(Hive query language)  

HiveQL is a dialect of SQL,heavily influenced by MYSQL. Although, SQL does not provide a solution for all big data problems but it provides an efficient and powerful tool to analyst with strong SQL background and mere Java knowledgeto for analysing high volume of data.
Internally, HiveQL is translated into series of MapReduce job or other jobs depending on property name "hive.execution.engine.property" configuredOther possible execution engines that can be used are Apache Tez, Spark.

Hive Shell and interaction with Hive:- Start Hive shell (Just type hive in terminal and press enter), First create a new schema (database), get inside it(use that schema) and followed by create a table.
hive> create database retails;
OK
Time taken: 1.219 seconds
hive> use retails;
OK
Time taken: 0.144 seconds
hive> create table customers(name string, company string, id int);
OK
Time taken: 0.341 seconds
hive> show tables;
OK
customers
Time taken: 0.137 seconds, Fetched: 1 row(s)
hive> show databases;
OK
default
retails
Time taken: 0.168 seconds, Fetched: 2 row(s)
Above table creation query is nothing more than a plain SQL query. Since, the main purpose of using Hive is to deal with high volume of data(log files,retails/sells data,forecasting data) and analyse these raw data with HiveQL,which usually done by writing mapreduce programs.
Lets create a table employee and load data in it from a text file followed by execute HiveQL query on employee table.
hive> create table employee(employee_id int, first_name string, email string) ROW FORMAT DELIMITED FIELDS TERMINATED by '|';
OK
Time taken: 0.377 seconds
hive> show tables;
OK
customers
employee
Time taken: 0.137 seconds, Fetched: 2 row(s)
hive> LOAD DATA LOCAL INPATH '/home/zytham/Documents/employee_record.txt' OVERWRITE INTO TABLE employee;
Loading data to table retails.employee
Table retails.employee stats: [numFiles=1, numRows=0, totalSize=257, rawDataSize=0]
OK
Time taken: 0.551 seconds
hive> select * from employee;
OK
16 john john@devinline.com
17 robert robert@devinline.com
18 andrew andrew@devinline.com
19 katty katty@devinline.com
27 edward edward@devinline.com
29 alan alan@devinline.com
31 kerry kerry@devinline.com
34 tom tom@devinline.com
35 zack zack@devinline.com
Time taken: 0.263 seconds, Fetched: 9 row(s)
Above create table query creates a table employee with 3 column and ROW FORMAT clause tells Hive that each row in the data file is | (pipe-delimited) text. Hive expects there to be three fields in each row, corresponding to the table columns, with fields separated by | (pipe) and rows by newlines.
Load DATA ... query tells Hive to put the specified local file(employee_record.txt) in its warehouse directory named employee (Remember, Hive does not mandate any particular file format so whatever file input will be provided it will be simply copied to warehouse directory, even if it does not matches definition of table columns, if there is any mismatch it will appear while querying). OVERWRITE keyword wipe out existing files in warehouse and add the specified one.
Note:-
1. Along with normal text files we can also load Map file,JSON files  in warehouse directory and analyses using known UDF(user defined function) - get_json_object, json_tuple, JSON SerDe.
2. If no schema name is specified(if we do not execute use retails and try to create table), table is created in default schema.
3. Hive creates one directory for each table under /user/hive/warehouse, execute following command outside hive shell and see the directory structure.
4. Interactive VS non-interactive mode:- Until now we executed queries in Hive shell and result displayed at same place, this mode of execution is termed as interactive mode execution.
With -f option we can run hive shell in non-interactive mode and here we specify SQL in a script file.Create a script file say,script.hive and copy following SQL query in that file.
show databases;
use retails;
show tables;
select * from employee;
LOAD DATA LOCAL INPATH '/home/zytham/Documents/employee_record.txt' INTO TABLE employee;
select * from employee;
From terminal execute following command- execute script file in non-interactive mode in Hive.
zytham@ubuntu:~$ hive -f script.hive 

Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties
OK
default
retails
Time taken: 1.063 seconds, Fetched: 2 row(s)
OK
Time taken: 0.086 seconds
OK
customers
employee
Time taken: 0.132 seconds, Fetched: 2 row(s)
OK
16 john john@devinline.com
17 robert robert@devinline.com
18 andrew andrew@devinline.com
19 katty katty@devinline.com
27 edward edward@devinline.com
29 alan alan@devinline.com
31 kerry kerry@devinline.com
34 tom tom@devinline.com
35 zack zack@devinline.com
Time taken: 1.091 seconds, Fetched: 9 row(s)
Loading data to table retails.employee
Table retails.employee stats: [numFiles=2, numRows=0, totalSize=514, rawDataSize=0]
OK
Time taken: 1.014 seconds
OK
16 john john@devinline.com
17 robert robert@devinline.com
18 andrew andrew@devinline.com
19 katty katty@devinline.com
27 edward edward@devinline.com
29 alan alan@devinline.com
31 kerry kerry@devinline.com
34 tom tom@devinline.com
35 zack zack@devinline.com
16 john john@devinline.com
17 robert robert@devinline.com
18 andrew andrew@devinline.com
19 katty katty@devinline.com
27 edward edward@devinline.com
29 alan alan@devinline.com
31 kerry kerry@devinline.com
34 tom tom@devinline.com
35 zack zack@devinline.com
Time taken: 0.203 seconds, Fetched: 18 row(s)
 
Note:- Command hive -f script.hive instruct hive to execute file script.hive in non-interactive mode and result is displayed for each execution.One important point that need special mention here is,in script file OVERWRITE keyword is missing from LOAD DATA... query and new file is copied to warehouse directory employee(without deleting existing one) and on querying select * from employee; results from both file are retrieved and displayed. We can verify the same with following(since file names are same, so stored with suffix copy_1)
zytham@ubuntu:/usr/local/hadoop2.6.1$ ./bin/hadoop dfs -ls /user/hive/warehouse/retails.db/employee
Found 2 items
-rwxrwxrwx   1 zytham supergroup        257 2015-11-14 09:31 /user/hive/warehouse/retails.db/employee/employee_record.txt
-rwxrwxrwx   1 zytham supergroup        257 2015-11-14 10:40 /user/hive/warehouse/retails.db/employee/employee_record_copy_1.txt

Managed tables VS External tables

By default,tables created in Hive are managed by Hive - On executing LOAD ... query,  Hive moves data(files) in warehouse directory of that table and deletes data & metadata on executing Drop command.As shown earlier following commands creates table and copy file in warehouse directory.
create table employee(employee_id int, first_name string, email string) ROW FORMAT DELIMITED FIELDS TERMINATED by '|';
LOAD DATA LOCAL INPATH '/home/zytham/Documents/employee_record.txt' OVERWRITE INTO TABLE employee;
On executing Drop command - data along with metadata are deleted.
DROP Table employee;
Hive provides flexibility to creates an external table and allows user to manages data itself.While creating external table - location of external table is specified and Hive does not care whether location specified exist or not(Even if location does not exist, table will be created successfully).
While executing drop command, only metadata is wiped out not actual data.This feature allows data creation lazily after creating table.
hive> CREATE EXTERNAL TABLE external_customers (customer_id STRING, Company String, Email String, Place string) ROW FORMAT DELIMITED FIELDS TERMINATED by '|'  LOCATION '/tmp/hive/externalTable' ;
OK
Time taken: 0.275 seconds
hive> show tables;
OK
customers
employee
external_customers
Time taken: 0.253 seconds, Fetched: 3 row(s)

hive> LOAD DATA LOCAL INPATH '/home/zytham/Documents/customers_record.txt' INTO TABLE external_customers;
Loading data to table retails.external_customers
Table retails.external_customers stats: [numFiles=0, numRows=0, totalSize=0, rawDataSize=0]
OK
Time taken: 0.464 seconds

hive> select * from external_customers;
OK
16 john john@devinline.com Maricopa County
17 robert robert@devinline.com Taos County
18 andrew andrew@devinline.com Pleasant Valley
19 katty katty@devinline.com Kit Carson County
27 edward edward@devinline.com Barry County
29 alan alan@devinline.com Lakeview
31 kerry kerry@devinline.com Bledsoe County
34 tom tom@devinline.com Kit Carson County
35 zack zack@devinline.com Taos County
Time taken: 0.197 seconds, Fetched: 18 row(s)
Here we have created external table(External Keyword) and loaded data locally. Since it is external table no file is copied by Hive in table location "/tmp/hive/externalTable"(as contrast to earlier table creation associated with data file copying in warehouse directory).
zytham@ubuntu:/usr/local/hadoop2.6.1$ ls -l /tmp/hive/externalTable
total 0

ALTER, DROP, TRUNCATE, Create table as SELECT :- 

A table's definition can be changed after table has been created- table directory structure can be changed(External table underlying directory cannot be changed). New column can be added but data associated with it cannot be updated(Hive does not allow updating existing record, instead always recommended to create new table with columns and populate it with select statement).
hive> create table dummy(id string);
OK
Time taken: 0.203 seconds

hive> show tables;
OK
customers
dummy
employee
external_customers
Time taken: 0.139 seconds, Fetched: 4 row(s)

hive> alter table dummy RENAME TO dummy_1;
OK
Time taken: 0.657 seconds
hive> show tables;
OK
customers
dummy_1
employee
external_customers
Time taken: 0.158 seconds, Fetched: 4 row(s)

hive> alter table dummy_1 ADD COLUMNS(place String);
OK
Time taken: 0.264 seconds

Using Drop command data and metadata both wiped out in managed tables and only metadata data is wiped out in external table.In managed tables, using Truncate command we can keep definitions of table and delete data of table. Truncate does not work with external tables, instead we need to use dfs -rmr to remove external table directory.
hive> Truncate table dummy_1;
OK
Time taken: 0.401 seconds

hive> Truncate external_customers;
FAILED: ParseException line 1:9 missing TABLE at 'external_customers' near '<EOF>'
hive> Truncate table external_customers;
FAILED: SemanticException [Error 10146]: Cannot truncate non-managed table external_customers.

hive> dfs -rmr  /tmp/hive/externalTable;
rmr: DEPRECATED: Please use 'rm -r' instead.
Deleted /tmp/hive/externalTable

Create a table with same schema as another existing table using Like keyword.
hive> create table employee_duplicate Like employee;
OK
Time taken: 0.412 seconds
hive> show tables;
OK
customers
dummy_1
employee
employee_duplicate
external_customers
Time taken: 0.169 seconds, Fetched: 5 row(s)

Debugging Hive - Error and its solution

  • Exception in thread "main" java.lang.RuntimeException: java.net.ConnectException: Call From ubuntu/127.0.1.1 to localhost:54310 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    Solution: Do verify Hadoop services are running.
    Go to <Hadoop_Home>/sbin/ and execute ./start-all.sh.
  • [ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
    Solution:- Hive has upgraded to Jline2 but jline 0.9x exists in the Hadoop lib.Execute following command in terminal before executing Hive.
    zytham@ubuntu:/tmp$ export HADOOP_USER_CLASSPATH_FIRST=true


References :-
1. Apache hive confluence
2. Apache Hadoop by Tom White

Nov 9, 2015

Textual description of firstImageUrl

Custom ReentrantLock implementation - implement Lock interface and an use case discussion

In previous post, we discussed about Pros and Cons of Lock (java.util.concurrent.locks) over synchronized methods and statements and concluded that Lock interface provides high programmatic control and some times efficient too.The main agenda of this post is to create a custom ReentrantLock class by implementing Lock interface and use that custom class to show case an ticket booking use case where lock is acquired and released using lock() and unlock() method respectively.
Java doc of Lock interface discusses about three forms of lock acquisition (interruptible, non-interruptible, and timed), however for sake of simplicity we assume that no thread will be interrupted and we will just focus on syntax and semantics of lock() and unlock() method. Lets create a custom lock interface consisting of three methods as follows:-
/*
 * For sake of simplicity - assume custom lock interface having only three
 * methods.
 */
interface CustomLock {
 public void lock();
 public boolean tryLock();
 public void unlock();
}

lock() method sample code:- 

/*lock method implementation of CustomLock interface*/
public synchronized void lock() {
 /*
  * Acquires the lock if it is not held by another thread and set lock
  * hold count to 1.
  */
 if (lockHoldCount == 0) {
  lockHoldCount++;
  threadId = Thread.currentThread().getId();
 }
 /*
  * If current thread already holds lock then hold count is increased by
  * 1 - Chain locking.
  */
 else if (lockHoldCount > 0
   && threadId == Thread.currentThread().getId()) {
  lockHoldCount++;
 }
 // If the lock is held by another thread then the current
 // thread waits for another thread to release lock.
 else {
  try {
   wait();
   lockHoldCount++;
   threadId = Thread.currentThread().getId();
  } catch (InterruptedException e) {
   e.printStackTrace();
  }
 }
}
Explanation:- lockHoldCount is a instance variable of class implementing lock interface. If lock() method is called from context of any thread and lockHoldCount value is 0 then lockHoldCount is incremented and if lockHoldCount value is non-zero and lock has been already acquired by the calling thread then hold count is incremented(chain locking). Otherwise, thread has to wait until other treads executes notify() method and releases lock.

unlock() method sample code:- 

/*unlock method is executed to release lock*/
public synchronized void unlock() {
 /*
  * If current thread is not holding the lock, if unlock is called it
  * throws IllegalMonitorStateException.
  */
 if (lockHoldCount == 0)
  throw new IllegalMonitorStateException();
 /* If lock is held, decrement lock hold count by 1 */
 lockHoldCount--;

 /*
  * If lockHoldCount is 0, lock is released and waiting thread is
  * notified.
  */
 if (lockHoldCount == 0)
  notify();
}
Explanation:- Decrement lockHoldCount if it is greater than zero, else throw exception IllegalMonitorStateException (It indicates, lock was not acquired and thread is trying to release it). 

trylock() method sample code:-

/*tryLock method returns true if lock is available and calls lock() interanlly*/
public synchronized boolean tryLock() {
 /*
  * Acquires the lock if it is not held by another thread and // returns
  * true
  */
 if (lockHoldCount == 0) {
  lock();
  return true;
 }
 // If lock is held by another thread then method return false.
 else
  return false;
}

Concrete customReentrantLock class implementing CustomLock interface:-

Create a class file name it as CustomReentrantLock.java and copy/paste following sample code lines.
/**
 * http://www.devinline.com
 */
package com.devinline.locking;

/*
 * For sake of simplicity - assume custom lock interface having only three
 * methods.
 */
interface CustomLock {
 public void lock();

 public boolean tryLock();

 public void unlock();
}

public class CustomReentrantLock implements CustomLock {
 /* Maintain number of locks acquired by a thread */
 int lockHoldCount;

 /* Id of thread which is currently holding the lock. */
 long threadId;

 /**
  * Creates an instance of CustomReentrantLock and Initial lock hold count is
  * 0.
  */
 CustomReentrantLock() {
  lockHoldCount = 0;
 }

 @Override
 public synchronized void lock() {
  /*
   * Acquires the lock if it is not held by another thread and set lock
   * hold count to 1.
   */
  if (lockHoldCount == 0) {
   lockHoldCount++;
   threadId = Thread.currentThread().getId();
  }
  /*
   * If current thread already holds lock then hold count is increased by
   * 1 - Chain locking.
   */
  else if (lockHoldCount > 0
    && threadId == Thread.currentThread().getId()) {
   lockHoldCount++;
  }
  // If the lock is held by another thread then the current
  // thread waits for another thread to release lock.
  else {
   try {
    wait();
    lockHoldCount++;
    threadId = Thread.currentThread().getId();
   } catch (InterruptedException e) {
    e.printStackTrace();
   }
  }
 }

 @Override
 public synchronized void unlock() {
  /*
   * If current thread is not holding the lock, if unlock is called it
   * throws IllegalMonitorStateException.
   */
  if (lockHoldCount == 0)
   throw new IllegalMonitorStateException();
  /* If lock is held, decrement lock hold count by 1 */
  lockHoldCount--;

  /*
   * If lockHoldCount is 0, lock is released and waiting thread is
   * notified.
   */
  if (lockHoldCount == 0)
   notify();

 }

 @Override
 public synchronized boolean tryLock() {
  /*
   * Acquires the lock if it is not held by another thread and // returns
   * true
   */
  if (lockHoldCount == 0) {
   lock();
   return true;
  }
  // If lock is held by another thread then method return false.
  else
   return false;
 }
}

Ticket booking use case implementation - use lock()/unlock() of CustomReentrantLock  

The main goal of this use case is to use CustomReentrantLock class for acquiring and releasing lock while multiple threads are trying for some shared scarce resources. Below is the driver program which creates 5 threads(customer for ticket booking) and these 5 threads will try to acquire lock and book a ticket, out of 5 only 3 threads will be able to book ticket successfully(Since only 3 tickets are available).
package com.devinline.locking;

public class TicketBookingDriverProgram {

 public static void main(String[] args) {
  CustomLock customLock = new CustomReentrantLock();
  TicketBookingIRCTC myRunnable = new TicketBookingIRCTC(customLock, 3);
  new Thread(myRunnable, "Passanger-1 Nikhil").start();
  new Thread(myRunnable, "Passanger-2 Ranjan").start();
  new Thread(myRunnable, "Passanger-3 Yunnus").start();
  new Thread(myRunnable, "Passanger-4 CKM").start();
  new Thread(myRunnable, "Passanger-5 Ritz").start();
 }
}

class TicketBookingIRCTC implements Runnable {

 int ticketsAvailable; // scarce resource
 CustomLock customLock;

 public TicketBookingIRCTC(CustomLock customLock, int totalTicket) {
  this.customLock = customLock;
  ticketsAvailable = totalTicket;
 }

 public void run() {

  System.out.println("Waiting to book ticket : "
    + Thread.currentThread().getName());
  /* get hold of lock for booking ticket */
  customLock.lock();

  if (ticketsAvailable > 0) {
   System.out.println("Ticket booking started  for : "
     + Thread.currentThread().getName());

   // Ticket booking time is 2 sec, so sleep for 2sec
   try {
    Thread.sleep(1000);
   } catch (Exception e) {
   }
   /* Update available ticket count */
   ticketsAvailable--;
   System.out.println("Congratulation!!, Ticket BOOKED "
     + "successfully for : " + Thread.currentThread().getName());
   System.out.println("currently ticketsAvailable = "
     + ticketsAvailable);
  } else {
   ticketsAvailable--;
   System.out.println("Sorry!! Ticket NOT BOOKED for : "
     + Thread.currentThread().getName()
     + ". Current booking status is Waiting list(W/L): "
     + Math.abs(ticketsAvailable));
  }
  customLock.unlock();
 }
}
=====================Sample output====================
Waiting to book ticket : Passanger-1 Nikhil
Ticket booking started  for : Passanger-1 Nikhil
Waiting to book ticket : Passanger-2 Ranjan
Waiting to book ticket : Passanger-5 Ritz
Waiting to book ticket : Passanger-3 Yunnus
Waiting to book ticket : Passanger-4 CKM
Congratulation!!, Ticket BOOKED successfully for : Passanger-1 Nikhil
currently ticketsAvailable = 2
Ticket booking started  for : Passanger-2 Ranjan
Congratulation!!, Ticket BOOKED successfully for : Passanger-2 Ranjan
currently ticketsAvailable = 1
Ticket booking started  for : Passanger-5 Ritz
Congratulation!!, Ticket BOOKED successfully for : Passanger-5 Ritz
currently ticketsAvailable = 0
Sorry!! Ticket NOT BOOKED for : Passanger-3 Yunnus. Current booking status is Waiting list(W/L): 1
Sorry!! Ticket NOT BOOKED for : Passanger-4 CKM. Current booking status is Waiting list(W/L): 2
=====================================================
Explanation:- The constructor of class TicketBookingIRCTC initializes ticketsAvailable to 3 and main threads spawn five threads for booking three tickets, depending on OS which thread scheduled first - calls lock method and acquire lock, book ticket and decrement ticketsAvailable count.
When ticketsAvailable count becomes zero, else block is executed and no ticket is booked for other two threads(as displayed ticket in waiting list)