Jan 27, 2016

Print pattern 1 2 3 1 2 3 using three different thread

Problem statement:- Print  sequence 1 2 3 1 2 3....repeatedly using three different threads.
Handling three thread based on some condition will solve our problem. Here is another problem which is handling two threads - Print Even/Odd numbers using two different threads.

Below is the sample code which displays sequence 1 2 3 using three different threads.
package com.devinline.thread;

public class SequenceDisplay {

 /**
  * devinline.com
  */
 static Object monitor = new Object();

 static boolean one = true;
 static boolean two = false;
 static boolean three = false;

 public static void main(String[] args) {

  Thread t1 = new Thread(new SequenceDisplayImpl(1));
  Thread t2 = new Thread(new SequenceDisplayImpl(2));
  Thread t3 = new Thread(new SequenceDisplayImpl(3));
  t1.start();
  t2.start();
  t3.start();

 }

 static class SequenceDisplayImpl implements Runnable {

  int threadId;

  SequenceDisplayImpl(int threadId) {
   this.threadId = threadId;
  }

  public void run() {
   print();
  }

  private void print() {
   try {
    while (true) {
     Thread.sleep(500);
     synchronized (monitor) {
      if (1 == threadId) {
       if (!one) {
        monitor.wait();
       } else {
        System.out.print(threadId + " ");
        one = false;
        two = true;
        three = false;
        monitor.notifyAll();
       }
      }
      if (2 == threadId) {
       if (!two) {
        monitor.wait();
       } else {
        System.out.print(threadId + " ");
        one = false;
        two = false;
        three = true;
        monitor.notifyAll();
       }
      }
      if (3 == threadId) {
       if (!three) {
        monitor.wait();
       } else {
        System.out.print(threadId + " ");
        one = true;
        two = false;
        three = false;
        monitor.notifyAll();
       }
      }
     }
    }
   } catch (InterruptedException e) {
    e.printStackTrace();
   }

  }

 }

}
=====Sample output======
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 .......
======================

Read also :- Print Even and Odd number using two different threads

Jan 9, 2016

Textual description of firstImageUrl

Development and deployment of Spark applications using SBT(Scala build tool)

Apache spark is an in-memory computation framework in Hadoop ecosystem. Apache spark allows developer to write application code in Scala,Python,R and Java.The main agenda of this post is to write a spark application in Scala and deploy using SBT(Scala build tool).

Prerequisite :- Apache spark and Scala should be installed. Here I am using Spark-1.5.2 and Scala 2.10.6. First we will install SBT and followed by configuring assembly plugin required for build and then create sample spark application.Internet connection is mandatory for packaging project first time.

How to check spark and Scala is set up or not ? 
zytham@ubuntu:~$ spark-shell
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.
Note:- If SPARK_HOME/bin is not in path variable, go to SPARK_HOME/bin and execute spark-shell command. If you do not get prompt, first install Scala and Apache spark, then follow this tutorial.

1. SBT installation:-
SBT is an open source build tool for Scala and Java projects, similar to Java's Maven or Ant.SBT is the de facto build tool for the Scala community.Execute following command to download SBT tar ball and extract it.
zytham@ubuntu:~$ wget https://dl.bintray.com/sbt/native-packages/sbt/0.13.8/sbt-0.13.8.tgz
.....
Length: 1059183 (1.0M) [application/unknown]
Saving to: ‘sbt-0.13.8.tgz’

100%[======================================>] 10,59,183   17.0KB/s   in 26s    

2016-01-09 21:49:11 (39.5 KB/s) - ‘sbt-0.13.8.tgz’ saved [1059183/1059183]
Extract tar ball using following command.
zytham@ubuntu:~$ tar -xvf sbt-0.13.8.tgz 
sbt/
sbt/conf/
sbt/conf/sbtconfig.txt
sbt/conf/sbtopts
sbt/bin/
sbt/bin/sbt.bat
sbt/bin/sbt
sbt/bin/sbt-launch.jar
sbt/bin/sbt-launch-lib.bash
Move extracted files at some location and verify using all SBT files are in place.
zytham@ubuntu:~$ sudo mv sbt /opt
zytham@ubuntu:~$ cd /opt/
zytham@ubuntu:/opt$ ls
data                drill    eclipse.desktop  sbt        spark        zookeeper
datastax-ddc-3.2.1  eclipse  gnuplot-5.0.1    scala2.10  spark-1.5.2
In order to create and build projects from any directory using sbt, we nee do add sbt executable into the PATH shell variable. Add sbt bin in path variable in bashrc file using.
zytham@ubuntu:/opt/spark-1.5.2/bin$ gedit ~/.bashrc
Add these two lines at the end  of the file.
export SBT_HOME=/opt/sbt/
export PATH=$SBT_HOME/bin:$PATH

2. Install sbt assembly plugin:- 
sbt-assembly is an sbt plugin that creates a JAR out of our project with all of its dependencies except Hadoop and spark dependencies(These are termed as provided dependencies and provided by cluster itself at runtime).SBT manages a plugin definition file and we need to make entry in that file for any new entry(similar to pom.xml in Maven).
There are two ways to add sbt-assembly to plugin definition file (if existing or create one if doesn’t exist). we can use either :
  • the global file (for version 0.13 and up) at ~/.sbt/0.13/plugins/plugins.sbt
    OR 
  • the project-specific file at PROJECT_HOME/project/plugins.sbt 
Here we are using global definition file. Since plugin definition file does not exist, create a new file plugin.sbt and add sbt-assembly entry in it and press <Ctrl+D> to exit.
zytham@ubuntu:/opt$ mkdir -p ~/.sbt/0.13/plugins
zytham@ubuntu:/opt$ cat >> ~/.sbt/0.13/plugins/plugins.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")

3. Creating sample spark application:- Word count example
Load an input file and create an RDD. Count all words and display on console using collect() method.
  1. Create a project directory name it as "WordCountExample" followed by directory structure /src/main/scala/
    zytham@ubuntu:~$ mkdir WordCountExample
    zytham@ubuntu:~$ cd WordCountExample/
    zytham@ubuntu:~/WordCountExample$ mkdir -p src/main/scala
    
  2. Create a scala file with following code lines.
    zytham@ubuntu:~$ cd src/main/scala
    zytham@ubuntu:~/WordCountExample/src/main/$ gedit Wordcount.scala
    
    Copy below sample code lines in Wordcount.scala
    import org.apache.spark.SparkConf
    import org.apache.spark.SparkContext
    import org.apache.spark.rdd.RDD.rddToPairRDDFunctions
    object WordCount {
      def main(args: Array[String]) = {
    
        //Start the Spark context
        val conf = new SparkConf()
          .setAppName("WordCount")
          .setMaster("local")
        val sc = new SparkContext(conf)
    
        //Read some example file to a test RDD
        val test = sc.textFile("input.txt")
    
        test.flatMap { line => //for each line
          line.split(" ") //split the line in word by word.
        }
          .map { word => //for each word
            (word, 1) //Return a key/value tuple, with the word as key and 1 as value
          }
          .reduceByKey(_ + _) //Sum all of the value with same key
          .saveAsTextFile("output.txt") //Save to a text file
    
        //Stop the Spark context
        sc.stop
      }
    }
    
  3. In project home directory create a .sbt configuration file with following lines.
    zytham@ubuntu:~/WordCountExample/src/main/scala$ cd ~/WordCountExample/
    zytham@ubuntu:~/WordCountExample$ gedit WordcountExample.sbt
    
    Configuration file lines
    name := "WordCount Spark Application"
    version := "1.0"
    scalaVersion := "2.10.6"
    libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.2"
  4. View project directory structure and files.
    zytham@ubuntu:~/WordCountExample$ find .
    .
    ./WordcountExample.sbt
    ./src
    ./src/main
    ./src/main/scala
    ./src/main/scala/Wordcount.scala
    
4.Build/package using sbt:- 
zytham@ubuntu:~/WordCountExample$ sbt package
[info] Loading global plugins from /home/zytham/.sbt/0.13/plugins
.....
[info] Compiling 1 Scala source to /home/zytham/WordCountExample/target/scala-2.10/classes...
[info] Packaging /home/zytham/WordCountExample/target/scala-2.10/wordcount-spark-application_2.10-1.0.jar ...
[info] Done packaging.
[success] Total time: 101 s, completed Jan 31, 2016 11:42:25 AM
Note:- It may take some time, since it downloads some jar files and internet connection is mandatory. On successful build it creates a jar file(wordcount-spark-application_2.10-1.0.jar) at location "<Project_ome>/target/scala-2.10". (Name of directory and jar file might be different depending on what we have configured in configuration file Wodcountexample.sbt)

5. Deploy generated jar/Submit job to spark cluster:- 
spark-submit(present in <SPARK_HOME>/bin) executable is used to submit job in spark cluster.Use following command. Download input file from here and place it in home directory.
zytham@ubuntu:~/WordCountExample$ spark-submit --class "WordCount" --master local[2] target/scala-2.10/wordcount-spark-application_2.10-1.0.jar 
On successful execution, an output directory is created with name "ouput.txt" and file part-00000 contains (word and count) pairs.Execute following command to see output and verify the same.
zytham@ubuntu:~/WordCountExample$ cd output.txt/
zytham@ubuntu:~/WordCountExample/output.txt$ ls
part-00000  _SUCCESS
zytham@ubuntu:~/WordCountExample/output.txt$ head -10 part-00000 
(spark,2)
(is,1)
(Learn,1)
(This,1)
(time,1)

Explanation of word count example 
:- On applying flatmap unction on RDD test, each line is split with respect to space and array of string is obtained. This string array is converted into map with each word of list as key and 1 as value (collection of tuple is produced).Finally, reduceByKey is applied on for each tuple and aggregated output (unique word and corresponding count) is written to file. Lets take an example and understand the flow of method used in the above program unit.Suppose input.txt has two lines :
This is spark time
Learn spark
Flow of method's used in word count example 
An eclipse project can also be created using sbt, just we need to add an entry for sbt-eclipse in plugin configuration file in ~/.sbt/0.13/plugins/plugins.sbt
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "4.0.0")
and using "sbt eclipse" command  instead of "sbt package" eclipse project can be created.
zytham@ubuntu:~/WordCountExample$ sbt eclipse
[info] Loading global plugins from /home/zytham/.sbt/0.13/plugins
[info] Set current project to WordCount Spark Application (in build file:/home/zytham/WordCountExample/)
[info] About to create Eclipse project files for your project(s).
[info] Successfully created Eclipse project files for project(s):
[info] WordCount Spark Application
Now in scala IDE, we can import this spark application and execute it from there too.

Download scala IDE
:-
Execute following commands to download and extract tar ball.
zytham@ubuntu:~/Downloads$ wget http://downloads.typesafe.com/scalaide-pack/4.1.1-vfinal-luna-211-20150728/scala-SDK-4.1.1-vfinal-2.11-linux.gtk.x86_64.tar.gz
zytham@ubuntu:~/Downloads$ tar -xvf scala-SDK-4.1.1-vfinal-2.11-linux.gtk.x86_64.tar
For running eclipse IDE, execute following command form the directory where it has been extracted.
zytham@ubuntu:~/Downloads$ ~/eclipse/eclipse

Jan 2, 2016

Install Scala and Apache spark in Linux(Ubuntu)

Scala is prerequisite for Apache spark Installation.Lets install Scala followed by Apache spark.

Scala installation:-  

We can set-up Scala either downloading .deb version and extract it OR Download Scala tar ball and extract it. Follow either one of them and set-up Scala.
Approach-1 (Downloading tar ball):-
1. Download Scala distribution scala-2.11.7.tgz. or execute following command to download the tar ball
zytham@ubuntu:~/Downloads$ wget http://www.scala-lang.org/files/archive/scala-2.10.6.tgz
2. Extract downloaded tar ball and place it under say /opt/scala2.10.
zytham@ubuntu:~$ tar -xvf scala-2.10.6.tgz
zytham@ubuntu:~$ sudo mv scala-2.10.6 /opt/scala2.10
3. Add Scala's bin directory to PATH environment variable.By doing so, scala shell or other executable can be used from any directory. Open .bashrc file and add following lines at the bottom of the file.
zytham@ubuntu:~$ gedit ~/.bashrc
Add below lines in .bashrc file
# for scala
export SCALA_HOME=/opt/scala2.10/
export PATH=$PATH:/opt/scala2.10/bin
4. Test installation using command "scala -version". 
zytham@ubuntu:/opt$ source ~/.bashrc
zytham@ubuntu:/opt$ scala -version
Scala code runner version 2.10.6 -- Copyright 2002-2013, LAMP/EPFL
Note:- Source ~/.bashrc command update $PATH environment variable immediately for current session.
Approach-2 (Downloading .deb version):-
1.Remove Scala library(if exists).
sudo apt-get remove scala-library scala
2. Get .deb file for Scala
wget http://www.scala-lang.org/files/archive/scala-2.11.7.deb
3. Extract .deb distribution and install.
sudo dpkg -i scala-2.11.7.deb
4. Add Scala to your path variable by adding this line to .bashrc present in home directory
PATH="$PATH:/opt/scala/bin"
5. Test installation. Execute following command and it displays version details.
scala -version

Apache spark installation:-

1. Download Apache spark pre-built versionOR execute following command to download the same.
zytham@ubuntu:~/Downloads$ wget http://www.apache.org/dyn/closer.lua/spark/spark-1.5.2/spark-1.5.2.tgz
2. Extract tar ball and place it at some location i.e: /opt/spark-1.5.2
zytham@ubuntu:~$ tar -xvf spark-1.5.2.tgz
zytham@ubuntu:~$ sudo mv spark-1.5.2 /opt/spark-1.5.2/
3. Update PATH variable so that spark shell or other executable can be used from any directory.
zytham@ubuntu:~$ gedit ~/.bashrc
Add below lines in .bashrc file
#for SPARK
export SPARK_HOME=/opt/spark-1.5.2/
export PATH=$SPARK_HOME/bin:$PATH
4. Test Spark installation. Execute following command and we should get scala prompt with spark context and SQL context available.
zytham@ubuntu:~$ source ~/.bashrc
zytham@ubuntu:~$ spark-shell
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.
scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@6be50b35

scala> 

Jan 1, 2016

Textual description of firstImageUrl

Setup Apache Spark in eclipse(Scala IDE) : Word count example using Apache spark in Scala IDE

Apache spark - a very known in memory computing engine to process big data workloads. Scala IDE(an eclipse project) can be used to develop spark application. The main agenda of this post is to setup development environment for spark application in scala IDE and run word count example.

Download Scala IDE:- 
Scala IDE is an eclipse project which provides a very intuitive development environment for Scala and Spark application. Download Scala IDE and install it.  

Create a Maven project:-
Maven is a popular package management tool for Java-based languages that allows us to link libraries present in public repositories.We can use Maven itself to build our project, or use other tools like Scala’s sbt tool or Gradle.
1. Go to: File-> New -> Project -> Maven project  and create a maven project.Fill Group Id and Artifact Id & click finish.
Group Id = com.devinline.spark and Artifact Id = SparkSample

2.
 Update pom.xml:- Download pom.xml sample and update it in above maven project. It has spark dependency jar entry which will be downloaded while building. 

3. Add Scala Nature to this project :- 
Right click on project -> configure - > Add Scala Nature. 

4. Update Scala compiler version for Spark:- 
Scala IDE by default uses latest version(2.11) of Scala compiler, however Spark uses version 2.10.So we need to update appropriate version for IDE. 
Right click on project- > Go to properties -> Scala compiler -> update Scala installation version to 2.10.5
  
5. Remove Scala Library Container from build path :- (Optional)
Jars required in already added via spark core(via pom.xml), so multiple jars is not required.
Right click on the project -> Build path -> Configure build path  and remove Scala Library Container.

6. Update source folder src/main/java to src/main/scala (Right click -> Refactor -> Rename  to scala).Now create a package under this name it as com.devinline.spark.

7. Create a Scala object under package created above name it as WordCount.scala
Right click on package -> New -> Scala Object  and add WordCount at the end of Name.

8. Update WordCount.scala with following code lines
package com.devinline.spark
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.rdd.RDD.rddToPairRDDFunctions
object WordCount {
  def main(args: Array[String]) = {

    //Start the Spark context
    val conf = new SparkConf()
      .setAppName("WordCount")
      .setMaster("local")
    val sc = new SparkContext(conf)

    //Read some example file to a test RDD
    val test = sc.textFile("input.txt")

    test.flatMap { line => //for each line
      line.split(" ") //split the line in word by word.
    }
      .map { word => //for each word
        (word, 1) //Return a key/value tuple, with the word as key and 1 as value
      }
      .reduceByKey(_ + _) //Sum all of the value with same key
      .saveAsTextFile("output.txt") //Save to a text file

    //Stop the Spark context
    sc.stop
  }
}
Explanation:- On applying flatmap unction on RDD test, each line is split with respect to space and array of string is obtained. This string array is converted into map with each word of list as key and 1 as value (collection of tuple is produced).Finally, reduceByKey is applied on for each tuple and aggregated output (unique word and corresponding count) is written to file. Lets take an example and understand the flow of method used in the above program unit.Suppose input.txt has two lines :
 This is spark time
 Learn spark
Flow of method's used in word count example  

9. Download sample input file and place is at some location as per your convenience. Modify location of input.txt in above sample code accordingly(sc.textFile("<Your_input.txt_Location>")).

10. Execute wordcount program :-  Right click on WordCount.scala - > Run as -> Scala application. It should create an output directory output.txt  and it should contain two file : part-00000 and _SUCCESS.
Sample output in part-00000 is :-
(spark,2)
(is,1)
(Learn,1)
(This,1)
(time,1)