Download Apache spark:-
Download pre-build version of Apache spark and unzip it in some directory. I have placed it in following location E:\spark-1.5.2-bin-hadoop2.6.
Note:- It is also possible to download source code and build using Maven or SBT.Refer this for other options of download.
Download and install Scala:-
Download Scala executables and install it.It is prerequisite for working with Apache spark, spark is written in Scala. Scala installed at "C:\Program Files (x86)\scala".
Set-up SCALA_HOME and HADOOP_HOME :-
Once we are done with the installation of Spark and Scala, configure environment variable for SCALA_HOME and HADOOP_HOME.
SCALA_HOME = C:\Program Files (x86)\scala
download winutils.exe and configure it as HAOOP_HOME.Unzip it and add path before bin directory as HADOOP_HOME.
HADOOP_HOME = E:\dev\hadoop\hadoop-common-2.2.0-bin-master
Update PATH environment variable :-
Add Spark bin directory in PATH environment variable so that Scala or python shell can be started without visiting bin directory every time.
Start Spark’s shells(Scala or Python version) :-
Python version : pyspark
Scala version : spark-shell
Start cmd,type pyspark and press enter. If we have followed steps properly, it should open Python version of the Spark shell and as shown below.
Similarly, we can start Scala version of the Spark shell by typing spark-shell and press enter in cmd.
Note:- Here we will get some error on console regarding hive directory write permission, we can ignore it,we can start executing spark API's and learn Apache spark.
Sample API's execution in python or scala shell :-
Create a RDD, display total number of lines in file and followed by first line of that file.
In Python version of the Spark shell
>>> lines = sc.textFile("README.md") # Create an RDD called lines
>>> lines.count() # Count the number of items in this RDD
>>> lines.first() # First item in this RDD, i.e. first line of README.md
u'# Apache Spark'
Note:- If you execute the same set of commands, console will be flooded with lines. I have suppressed it by changing log level to warning.