In order to write map-reduce program in eclipse, we need to place hadoop2x-eclipse-plugin jar inside plugin directory of eclipse installation.The main agenda of this post is to generate hadoop2x-eclipse-plugin and run a sample hadoop program in eclipse.This post has been divided into three parts, install eclipse into Ubuntu 13.04, generate hadoop2x-eclipse-plugin jar and finally run a sample map- reduce program in eclipse. Part 2 of this post may be skipped, as I have generated hadoop2x-eclipse-plugin jar, Download hadoop2x-eclipse-plugin jar.
Install eclipse in Ubuntu 13.041. First check whether you require 64 bit/32 bit distribution by running following command and Download eclipse distribution accordingly.Place downloaded eclipse distribution as per your convenience.To check 32 bit or 64 bit machine run following command (uname -m Or getconf LONG_BIT) :
zytham@ubuntu:~$ uname -m x86_64 zytham@ubuntu:~$ getconf LONG_BIT 64
zytham@ubuntu:~$ tar -zxvf eclipse-jee-juno-SR2-linux-gtk-x86_64.tar.gz
zytham@ubuntu:~$ sudo mv eclipse /opt/
zytham@ubuntu:/opt$ /opt/eclipse/eclipse -clean &
Generate hadoop2x-eclipse-plugin jar1. Download hadoop2x-eclipse-plugin project and extract it at some convenient location, say hadoop2x-eclipse-plugin-master is your extracted directory name.
zytham@ubuntu:~/Downloads$ tar -zxvf hadoop2x-eclipse-plugin-master.tar2. Now using "ant" building tool, we build downloaded project and generate hadoop jar for eclipse.
zytham@ubuntu:~/Downloads$ cd hadoop2x-eclipse-plugin-master/ zytham@ubuntu:~/Downloads/hadoop2x-eclipse-plugin-master$ cd src/contrib/eclipse-plugin zytham@ubuntu:~/Downloads/hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin$ ant jar -Dversion=2.6.1 -Dhadoop.version=2.6.1 -Declipse.home=/opt/eclipse -Dhadoop.home=/usr/local/hadoop2.6.1
Note:- If you do not want to build eclipse plugin jar or your build failed, Download hadoop-eclipse-plugin-2.6.1.jar
Run sample map reduce program in eclipse
- Add hadoop-eclipse-plugin-2.6.1.jar in plugin directory of eclipse installation (/opt/eclipse/plugins). now start tart eclipse, using this command:- /opt/eclipse/eclipse -clean &
- If you have added hadoop-eclipse-plugin correctly, right after opening eclipse you should see "DFS Locations" node in project explorer section(Shown in following diagram).
- Create a map reduce project in eclipse. Go to File -> New - > Projects. Select map/Reduce project type from wizard.as shown in above diagram(right side).
Give a valid project name and configure hadoop installation directory. Click next and in Java settings page mark check box "Allow output folders in source folders"(as highlighted in following diagram). Click finish and we will have a map-reduce project in project explorer.
- Here we are going to run , word count example. Create a class (say WordCountSampleExample.java) in the give project and copy following word count example.
Input to this map-reduce program is input.txt (download from here and place in project home directory) and output is stored in output directory configured next.
passing input and output as program arguments:- Right click on the project , Go to Run as -> Run configurations. Click on Arguments tab and add input.txt output(separated by space) in it(as shown in following diagram).
Read in detail how to pass program arguments and VM arguments in Eclipse.
Run map-reduce program :- Right click on the class and Run as -> Run on hadoop.
After successful execution, an output directory will be created and word count is stored in file part-r-0000. Below is the input and output file content and Key is 3 times in input, key 3 is displayed in output, similarly = is 6 times in input files so it is indicated by output.
HDFS location access via eclipse plugin:-1. Open 'Map/Reduce' perspective.
Goto Window --> Open Perspective --> Other and select 'Map/Reduce' perspective.
2. Right click on Map/Reduce Locations tab and create New Hadoop location.
3. Configure DFS location in following window as follows:-
- Location name - Give any valid name.
- Map/Reduce(V2) master : Address of the Map/Reduce master node (where Job Tracker running).
Host name - Find IP address of node(machine) where hadoop service is running using ifconfig.
Or If hadoop is installed locally use localhost for host.
Port:- For finding port associated with Job tracker, hit the url http://192.168.213.133:8088/conf or http://localhost:8088/conf in browser and search for property name "mapreduce.jobtracker.http.address" and value associate with it will give port address. For me it is like this, port no is 50030.
- DFS master:- Address of the Distributed FileSystem Master node (where Name Node is running).
Host name:- By default, it will take same address as Map/Reduce(V2) master host name, change accordingly if File system is running at some different node.
Port :- For finding port number, search for property name "fs.defaultFS" in http://192.168.213.133:8088/conf or http://localhost:8088/conf and value associated with it gives DFS Master port address. For me it appears like this, port address is 54310.