Jan 2, 2016

Install Scala and Apache spark in Linux(Ubuntu)

Scala is prerequisite for Apache spark Installation.Lets install Scala followed by Apache spark.

Scala installation:-  

We can set-up Scala either downloading .deb version and extract it OR Download Scala tar ball and extract it. Follow either one of them and set-up Scala.
Approach-1 (Downloading tar ball):-
1. Download Scala distribution scala-2.11.7.tgz. or execute following command to download the tar ball
zytham@ubuntu:~/Downloads$ wget http://www.scala-lang.org/files/archive/scala-2.10.6.tgz
2. Extract downloaded tar ball and place it under say /opt/scala2.10.
zytham@ubuntu:~$ tar -xvf scala-2.10.6.tgz
zytham@ubuntu:~$ sudo mv scala-2.10.6 /opt/scala2.10
3. Add Scala's bin directory to PATH environment variable.By doing so, scala shell or other executable can be used from any directory. Open .bashrc file and add following lines at the bottom of the file.
zytham@ubuntu:~$ gedit ~/.bashrc
Add below lines in .bashrc file
# for scala
export SCALA_HOME=/opt/scala2.10/
export PATH=$PATH:/opt/scala2.10/bin
4. Test installation using command "scala -version". 
zytham@ubuntu:/opt$ source ~/.bashrc
zytham@ubuntu:/opt$ scala -version
Scala code runner version 2.10.6 -- Copyright 2002-2013, LAMP/EPFL
Note:- Source ~/.bashrc command update $PATH environment variable immediately for current session.
Approach-2 (Downloading .deb version):-
1.Remove Scala library(if exists).
sudo apt-get remove scala-library scala
2. Get .deb file for Scala
wget http://www.scala-lang.org/files/archive/scala-2.11.7.deb
3. Extract .deb distribution and install.
sudo dpkg -i scala-2.11.7.deb
4. Add Scala to your path variable by adding this line to .bashrc present in home directory
5. Test installation. Execute following command and it displays version details.
scala -version

Apache spark installation:-

1. Download Apache spark pre-built versionOR execute following command to download the same.
zytham@ubuntu:~/Downloads$ wget http://www.apache.org/dyn/closer.lua/spark/spark-1.5.2/spark-1.5.2.tgz
2. Extract tar ball and place it at some location i.e: /opt/spark-1.5.2
zytham@ubuntu:~$ tar -xvf spark-1.5.2.tgz
zytham@ubuntu:~$ sudo mv spark-1.5.2 /opt/spark-1.5.2/
3. Update PATH variable so that spark shell or other executable can be used from any directory.
zytham@ubuntu:~$ gedit ~/.bashrc
Add below lines in .bashrc file
#for SPARK
export SPARK_HOME=/opt/spark-1.5.2/
4. Test Spark installation. Execute following command and we should get scala prompt with spark context and SQL context available.
zytham@ubuntu:~$ source ~/.bashrc
zytham@ubuntu:~$ spark-shell
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.
scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@6be50b35


Location: Hyderabad, Telangana, India