Oct 19, 2015

Textual description of firstImageUrl

Mapreduce program in eclipse - Generate hadoop2x-eclipse-plugin and configure with eclipse

In order to write map-reduce program in eclipse, we need to place hadoop2x-eclipse-plugin jar inside plugin directory of eclipse installation.The main agenda of this post is to generate hadoop2x-eclipse-plugin and run a sample hadoop program in eclipse.This post has been divided into three parts, install eclipse into Ubuntu 13.04, generate hadoop2x-eclipse-plugin jar and finally run a sample map- reduce program in eclipse. Part 2 of this post may be skipped, as I have generated hadoop2x-eclipse-plugin jar, Download hadoop2x-eclipse-plugin jar.

Install eclipse in Ubuntu 13.04

1. First check whether you require 64 bit/32 bit distribution by running following command and  Download eclipse distribution accordingly.Place downloaded eclipse distribution as per your convenience.To check 32 bit or 64 bit machine run following command (uname -m  Or getconf LONG_BIT) :
zytham@ubuntu:~$ uname -m
x86_64
zytham@ubuntu:~$ getconf LONG_BIT
64
2. Now extract the downloaded distribution (eclipse-jee-juno-SR2-linux-gtk-x86_64.tar.gz) using following command. It creates a directory eclipse in current directory.
zytham@ubuntu:~$ tar -zxvf eclipse-jee-juno-SR2-linux-gtk-x86_64.tar.gz
3. Move the extracted folder "eclipse" to /opt. Use following command.It will create a new directory  /opt/eclipse
zytham@ubuntu:~$ sudo mv eclipse /opt/
4. We have set-up eclipse in our machine and it can be launched from shell using following command:
zytham@ubuntu:/opt$ /opt/eclipse/eclipse -clean &

5. For appending eclipse in unity launcher refer this.

Generate hadoop2x-eclipse-plugin jar

1. Download hadoop2x-eclipse-plugin project and extract it at some convenient location, say hadoop2x-eclipse-plugin-master is your extracted directory name.
zytham@ubuntu:~/Downloads$ tar -zxvf hadoop2x-eclipse-plugin-master.tar
2. Now using "ant" building  tool, we build downloaded project and generate hadoop jar for eclipse.
zytham@ubuntu:~/Downloads$ cd hadoop2x-eclipse-plugin-master/
zytham@ubuntu:~/Downloads/hadoop2x-eclipse-plugin-master$ cd src/contrib/eclipse-plugin
zytham@ubuntu:~/Downloads/hadoop2x-eclipse-plugin-master/src/contrib/eclipse-plugin$ ant jar -Dversion=2.6.1 -Dhadoop.version=2.6.1 -Declipse.home=/opt/eclipse -Dhadoop.home=/usr/local/hadoop2.6.1
It will take some time and once build process succeeded, final jar will be generated at following location:-  hadoop2x-eclipse-plugin-master/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.6.1.jar
Note:- If you do not want to build eclipse plugin jar or your build failed,  Download hadoop-eclipse-plugin-2.6.1.jar

Run sample map reduce program in eclipse

  1. Add hadoop-eclipse-plugin-2.6.1.jar in plugin directory of eclipse installation (/opt/eclipse/plugins). now start tart eclipse, using this command:-   /opt/eclipse/eclipse -clean &
  2.  If you have added hadoop-eclipse-plugin correctly, right after opening eclipse you should see "DFS Locations" node in project explorer section(Shown in following diagram). 
  1. Create a map reduce project in eclipse. Go to File -> New - > Projects. Select map/Reduce project type from wizard.as shown in above diagram(right side).
    Give a valid project name and configure hadoop installation directory. Click next and in Java settings page mark check box "Allow output folders in source folders"(as highlighted in following diagram). Click finish and we will have a map-reduce project in  project explorer.
  1. Here we are going to run , word count example. Create a class (say WordCountSampleExample.java) in the give project and copy following word count example.

import java.io.IOException;
import java.util.*;
        
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
        
public class WordCountSampleExample {
/*Map class which job will use and execute it map method*/      
 public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
        
    public void map(LongWritable key, Text value, Context context

                   throws IOException, InterruptedException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            context.write(word, one);
        }
    }
 
 /*Reduce class which job will use and execute it reduce method*/         
 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context
      throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        context.write(key, new IntWritable(sum));
    }
 }
        
 public static void main(String[] argsthrows Exception {
    Configuration conf = new Configuration();
    
  /*Created a job with name wordCountExample*/
    Job job = new Job(conf, "wordCountExample");
    
  /*Handler string and int in hadoop way: for string hadoop uses 

   Text class and for int uses IntWritable*/
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
  
    /*Configure map and reducer class, based on which it uses map and reduce mehtod*/
    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);
    
  /*Input and output format set as TextInputFormat*/    
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    
  /*addInputPath - passes input file path to job - here passed as program parameter */
    FileInputFormat.addInputPath(job, new Path(args[0]));
  /*setOutputPath - passes output path to job - here 
passed as program parameter  */
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
  /*Submit the job to the cluster and wait for it to finish.*/
    job.waitForCompletion(true);
 }
        
}

Input to this map-reduce program is input.txt (download from here and place in project home directory) and output is stored in output directory configured next.

passing input and output as program arguments:-
 Right click on the project , Go to Run as -> Run configurations. Click on Arguments tab and add input.txt output(separated by space) in it(as shown in following diagram).
Read in detail how to pass program arguments and VM arguments in Eclipse.

Run map-reduce program :-  Right click on the class and Run as -> Run on hadoop.
After successful execution, an output directory will be created and word count is stored in file part-r-0000. Below is the input and output file content and Key is 3 times in input, key 3 is displayed in output, similarly = is 6 times in input files so it is indicated by output.

HDFS location access via eclipse plugin:- 

1. Open 'Map/Reduce' perspective.
    Goto Window --> Open Perspective --> Other and select 'Map/Reduce' perspective.
2. Right click on Map/Reduce Locations tab and create New Hadoop location.
3. Configure DFS location in following window as follows:-
  • Location name - Give any valid name.
  • Map/Reduce(V2) master : Address of the Map/Reduce master node (where Job Tracker running).
    Host name - Find IP address of node(machine) where hadoop service is running using ifconfig.
    hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ifconfig
    Or If hadoop is installed locally use localhost for host.
    Port:- For finding port associated with Job tracker, hit the url http://192.168.213.133:8088/conf  or http://localhost:8088/conf in browser and search for property name "mapreduce.jobtracker.http.address" and value associate with it will give port address. For me it  is like this, port no is 50030.
    <property>
     <name>mapreduce.jobtracker.http.address</name>
     <value>0.0.0.0:50030</value>
     <source>mapred-default.xml</source>
    </property>
    
  • DFS master:- Address of the Distributed FileSystem Master node (where Name Node is running).
    Host name:- By default, it will take same address as Map/Reduce(V2) master host name, change accordingly if File system is running at some different node.
    Port :- For finding port number, search for property name "fs.defaultFS" in http://192.168.213.133:8088/conf or http://localhost:8088/conf and value associated with it gives DFS Master port address. For me it appears like this, port address is 54310.
    <property>
     <name>fs.defaultFS</name>
     <value>hdfs://hostname:54310</value>
     <source>core-site.xml</source>
    </property>
    
Refer following diagram and configure accordingly.Once we have configured we are connected with th DFS and view files/tree structure of stored files.

Location: Hyderabad, Telangana, India