Apache Spark is a general-purpose cluster computing system to process big data workloads. It is very possible to use spark with Hadoop HDFS, Amazon EC2 and others persistence storage system including local file system. For leaning Apache spark, it is very possible to setup it in standalone mode and start executing spark API's in Scala,Python or R shell. In this post we will setup spark and execute some sparks API's.
Download Apache spark:-
Download pre-build version of Apache spark and unzip it in some directory. I have placed it in following location E:\spark-1.5.2-bin-hadoop2.6.
Note:- It is also possible to download source code and build using Maven or SBT.Refer this for other options of download.
Download and install Scala:-
Download Scala executables and install it.It is prerequisite for working with Apache spark, spark is written in Scala. Scala installed at "C:\Program Files (x86)\scala".
Set-up SCALA_HOME and HADOOP_HOME :-
Once we are done with the installation of Spark and Scala, configure environment variable for SCALA_HOME and HADOOP_HOME.
SCALA_HOME = C:\Program Files (x86)\scala
As of now we do not want to stick with Hadoop ,we just want to learn Apache spark. So we need to download winutils.exe and configure it as HAOOP_HOME.Unzip it and add path before bin directory as HADOOP_HOME.
HADOOP_HOME = E:\dev\hadoop\hadoop-common-2.2.0-bin-master
Update PATH environment variable :-
Add Spark bin directory in PATH environment variable so that Scala or python shell can be started without visiting bin directory every time.
Start Spark’s shells(Scala or Python version) :-
Python version : pyspark
Scala version : spark-shell
Start cmd,type pyspark and press enter. If we have followed steps properly, it should open Python version of the Spark shell and as shown below.
Similarly, we can start Scala version of the Spark shell by typing spark-shell and press enter in cmd.
Note:- Here we will get some error on console regarding hive directory write permission, we can ignore it,we can start executing spark API's and learn Apache spark.
Sample API's execution in python or scala shell :-
Create a RDD, display total number of lines in file and followed by first line of that file.
In Python version of the Spark shell
>>> lines = sc.textFile("README.md") # Create an RDD called lines
>>> lines.count() # Count the number of items in this RDD
98
>>> lines.first() # First item in this RDD, i.e. first line of README.md
u'# Apache Spark'
Note:- If you execute the same set of commands, console will be flooded with lines. I have suppressed it by changing log level to warning.
Download Apache spark:-
Download pre-build version of Apache spark and unzip it in some directory. I have placed it in following location E:\spark-1.5.2-bin-hadoop2.6.
Note:- It is also possible to download source code and build using Maven or SBT.Refer this for other options of download.
Download and install Scala:-
Download Scala executables and install it.It is prerequisite for working with Apache spark, spark is written in Scala. Scala installed at "C:\Program Files (x86)\scala".
Set-up SCALA_HOME and HADOOP_HOME :-
Once we are done with the installation of Spark and Scala, configure environment variable for SCALA_HOME and HADOOP_HOME.
SCALA_HOME = C:\Program Files (x86)\scala
As of now we do not want to stick with Hadoop ,we just want to learn Apache spark. So we need to download winutils.exe and configure it as HAOOP_HOME.Unzip it and add path before bin directory as HADOOP_HOME.
HADOOP_HOME = E:\dev\hadoop\hadoop-common-2.2.0-bin-master
Update PATH environment variable :-
Add Spark bin directory in PATH environment variable so that Scala or python shell can be started without visiting bin directory every time.
Start Spark’s shells(Scala or Python version) :-
Python version : pyspark
Scala version : spark-shell
Start cmd,type pyspark and press enter. If we have followed steps properly, it should open Python version of the Spark shell and as shown below.
Similarly, we can start Scala version of the Spark shell by typing spark-shell and press enter in cmd.
Note:- Here we will get some error on console regarding hive directory write permission, we can ignore it,we can start executing spark API's and learn Apache spark.
Sample API's execution in python or scala shell :-
Create a RDD, display total number of lines in file and followed by first line of that file.
In Python version of the Spark shell
>>> lines = sc.textFile("README.md") # Create an RDD called lines
>>> lines.count() # Count the number of items in this RDD
98
>>> lines.first() # First item in this RDD, i.e. first line of README.md
u'# Apache Spark'
Note:- If you execute the same set of commands, console will be flooded with lines. I have suppressed it by changing log level to warning.
Really you have enclosed very good informations.please furnish more informations in future.
ReplyDeleteHadoop Training in Chennai
Big data training in chennai
Hadoop Training in Anna Nagar
JAVA Training in Chennai
Python Training in Chennai
Selenium Training in Chennai
Hadoop training in chennai
Big data training in chennai
big data course in chennai
Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. big data projects for students But it’s not the amount of data that’s important.Project Centres in Chennai
DeletePython Training in Chennai Python Training in Chennai The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training Project Centers in Chennai
If you are looking for Dating Call Girls in Shimla, you are in the right place.Hot and Sexy Call Girls in Shimla First of all, we want to that you for going to our website.Air hostess Escorts in Agra We are a top Girls Escort agency and call girl Service agency in Real Call Girls Photo in Faridabad. We have been in escort and call girl agency in Real Call Girls Photo in Faridabad for many years. Our escort and call girls in Gurgaon are expertly qualified and known for their beauty and supportive characteristics. This High Profile Escorts in Gurgaon can provide you with an actual escort and call girls like sensation or first evening encounter.
ReplyDeleteMua vé máy bay tại Aivivu, tham khảo
ReplyDeletevé máy bay đi Mỹ giá rẻ
vé máy bay từ los angeles về việt nam
vé máy bay từ canada về việt nam
é bay từ nhật về việt nam
Lịch bay từ Seoul đến Hà Nội
Vé máy bay từ Đài Loan về VN
khách sạn cách ly nha trang