Find total and average salary of employees - MapReduce sample example

byNikhil Ranjan •December 06, 2015

Problem statement:- Compute total and average salary of organization XYZ and group by based on sex(male or female).
Input data are in text file as tab separated. Schema of input data is - sex at position 4th and salary at 9th position. Download sample input file.
100 Steven King M SKING 515.123.4567 17-JUN-03 AD_PRES 25798.9 90
Expected output:-
F Total: 291800.0 :: Average: 7117.073
M Total: 424363.34 :: Average: 6333.7812

We can think of this problem in terms of database SQL query as "SELECT SUM(SALARY), AVG(SALARY) FROM EMPLOYEES1 GROUP BY SEX" and same can be solved by HQL in hive.In the context of map/reduce, we have to write mapper(map method) and reducer (reduce method ) class.
In map method, process input file line by line, split the given input line and extract sex and salary. Write extracted sex and salary in context object. Output of mapper is key as sex(M or F) and value as salary list of each employee as
<M sal1, sal2 ,sal3 ,.....>
<F sal1, sla2, sal3,.......>
In reduce method, salary list is iterated , total and average is computed. Total and average salary is written in context as Text with sex M or F.
Note:- In between map and reduce task, hadoop framework perform shuffle and sort based on key value. It can be verified by the output of this map/reduce program.In output file, record corresponding for M followed by for F (F come first in lexicographical order).

Sample Code:-

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
* @author http://www.devinline.com
*/
public class AverageAndTotalSalaryCompute {
/*
 * data schema(tab separated) :-100 Steven King M SKING 515.123.4567
 * 17-JUN-03 AD_PRES 25798.9 90 Sex at position 4th and salary at 9th
 * position
 */
public static class MapperClass extends
  Mapper<LongWritable, Text, Text, FloatWritable> {
 public void map(LongWritable key, Text empRecord, Context con)
   throws IOException, InterruptedException {
  String[] word = empRecord.toString().split("\\t");
  String sex = word[3];
  try {
   Float salary = Float.parseFloat(word[8]);
   con.write(new Text(sex), new FloatWritable(salary));
  } catch (Exception e) {
   e.printStackTrace();
  }
 }
}

public static class ReducerClass extends
  Reducer<Text, FloatWritable, Text, Text> {
 public void reduce(Text key, Iterable<FloatWritable> valueList,
   Context con) throws IOException, InterruptedException {
  try {
   Float total = (float) 0;
   int count = 0;
   for (FloatWritable var : valueList) {
    total += var.get();
    System.out.println("reducer " + var.get());
    count++;
   }
   Float avg = (Float) total / count;
   String out = "Total: " + total + " :: " + "Average: " + avg;
   con.write(key, new Text(out));
  } catch (Exception e) {
   e.printStackTrace();
  }
 }
}

public static void main(String[] args) {
 Configuration conf = new Configuration();
 try {
  Job job = Job.getInstance(conf, "FindAverageAndTotalSalary");
  job.setJarByClass(AverageAndTotalSalaryCompute.class);
  job.setMapperClass(MapperClass.class);
  job.setReducerClass(ReducerClass.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(FloatWritable.class);
  // Path p1 = new Path(args[0]);
  // Path p2 = new Path(args[1]);
  // FileInputFormat.addInputPath(job, p1);
  // FileOutputFormat.setOutputPath(job, p2);
  Path pathInput = new Path(
    "hdfs://192.168.213.133:54310/user/hduser1/employee_records.txt");
  Path pathOutputDir = new Path(
    "hdfs://192.168.213.133:54310/user/hduser1/testfs/output_mapred00");
  FileInputFormat.addInputPath(job, pathInput);
  FileOutputFormat.setOutputPath(job, pathOutputDir);
  System.exit(job.waitForCompletion(true) ? 0 : 1);
 } catch (IOException e) {
  e.printStackTrace();
 } catch (ClassNotFoundException e) {
  e.printStackTrace();
 } catch (InterruptedException e) {
  e.printStackTrace();
 }

}
}

In main method, Job object is using input and output directory of HDFS so start hadoop services (<hadoop_home>/sbin/start-all.sh). Copy input file from local file system to HDFS and change input location accordingly or uncomment 4 commented lines in main method and pass input and output information of local file system(comment HDFS file references).
Execute above program unit(Right click -> Run -> Run as hadoop) and verify the output using following commands.

hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop fs -cat /user/hduser1/testfs/output_mapred00/part-r-00000
F Total: 291800.0 :: Average: 7117.073
M Total: 424363.34 :: Average: 6333.7812

Notice the output, F record followed by M record due to intermediate shuffle and sort operation by hadoop framework between map and reduce operation. Change input file mark some of row with sex value as T and execute above sample program unit and verify the output. It will appear like in lexicographically sorted order.

hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop fs -cat /user/hduser1/testfs/output_mapred00/part-r-00000
F Total: 282200.0 :: Average: 7055.0
M Total: 412063.34 :: Average: 6438.4897
T Total: 21900.0 :: Average: 5475.0

Tags: Hadoop MapReduce

20 Comments

Sadhana RathoreMar 18, 2019, 5:28:00 AM
The given information was excellent and useful. This is one of the excellent blog, I have come across. Do share more.
R Training in Chennai
R Programming Training in Chennai
Data Science Course in Chennai
Data Science Training in Chennai
Data Science Training in Anna Nagar
Machine Learning Course in Chennai
Machine Learning Training in Chennai
R Programming Training in Chennai
ReplyDelete
Replies
vé máy bay từ canada về việt namAug 10, 2021, 8:56:00 PM
Mua vé máy bay tại Aivivu, tham khảo

gia ve may bay di my

mua vé máy bay từ mỹ về vn

giá vé máy bay nhật việt vietnam airline

vé máy bay từ frankfurt đi hà nội

đăng ký bay từ canada về Việt Nam

Các chuyến bay từ Incheon về Hà Nội hôm nay

khách sạn cách ly ở cam ranh

chuyen bay chuyen gia ve viet nam
ReplyDelete
Replies
JohnDec 21, 2021, 2:25:00 AM
I am a student of accounting and I’m looking for a part-time job in the related field where I can manage the accounts of employees and all. I need this job because I need to pay for education plus the Custom Assignment help that I buy from an online source.
ReplyDelete
Replies
William WoodruffFeb 21, 2022, 3:51:00 AM
People who think that life is theirs, so they will make their own rules, think wrong. They must have to follow the rules that already been made. You wrote a good blog, and I am sure Coursework Writing Service experts can be fruitful to understand the whole concept.
ReplyDelete
Replies
mubeenfaisalApr 5, 2022, 1:39:00 AM
Awesome blog. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work! سلم الرواتب السعودي
ReplyDelete
Replies
Eleanor RoseApr 7, 2022, 8:46:00 PM
This article is amazing for me because all the peoples in world are want to earn and wee always searched about it i read a lot of articles but this one complete my all desires
Cheap Dissertation Writing Services UK
ReplyDelete
Replies
Kevin HorganApr 8, 2022, 3:15:00 AM
What 12 months! For many people, 2009 is per year they need to forget. Between over-leveraged mortgages, banks that failed or were too big to fail, plus a restructuring of your vehicle industry, most people and businesses alike are prepared to ring in 2010 with an even more hopeful outlook. her comment is here
ReplyDelete
Replies
jamsroot888Sep 28, 2022, 11:07:00 PM
ukt
ReplyDelete
Replies
Do my online classFeb 25, 2023, 4:36:00 AM
This is a good way to checkout the payroll for the employees.
ReplyDelete
Replies
Michael WadeMar 23, 2023, 11:42:00 PM
dwd
ReplyDelete
Replies
Chajez SkillsApr 25, 2023, 10:09:00 AM
Hey there, Devinline! I came across your blog post about finding the total and average salary, and I must say it was a very informative read.
https://chajezskills.com/ Your step-by-step guide on calculating the total and average salary was very easy to follow, even for someone like me who isn't very good with numbers. I especially appreciated the way you explained the difference between the SUM and AVG functions in SQL, which is something that always used to confuse me.

Your use of examples to illustrate your points was also very helpful. I could easily see how I could apply your methodology to a real-world scenario, and that made it easier for me to understand the concepts you were explaining. Your post was not only informative, but it was also engaging and enjoyable to read. I also appreciated the fact that you responded to the comments left by your readers, which shows that you really care about your audience and want to help them understand the topic at hand.
ReplyDelete
Replies
Chajez SkillsApr 25, 2023, 10:13:00 AM
Hey there, Devinline! I came across your blog post about finding the total and average salary, and I must say it was a very informative read. Your step-by-step guide on calculating the total and average salary was very easy to follow, even for someone like me who isn't very good with numbers. I especially appreciated the way you explained the difference between the SUM and AVG functions in SQL, which is something that always used to confuse me.
Dissertation Writing Services UK
Your use of examples to illustrate your points was also very helpful. I could easily see how I could apply your methodology to a real-world scenario, and that made it easier for me to understand the concepts you were explaining. Your post was not only informative, but it was also engaging and enjoyable to read.
ReplyDelete
Replies
AnonymousApr 25, 2023, 10:16:00 AM
Hey there, Devinline! I came across your blog post about finding the total and average salary, and I must say it was a very informative read. Your step-by-step guide on calculating the total and average salary was very easy to follow, even for someone like me who isn't very good with numbers. I especially appreciated the way you explained the difference between the SUM and AVG functions in SQL, which is something that always used to confuse me.
Chajez Skills
Your use of examples to illustrate your points was also very helpful. I could easily see how I could apply your methodology to a real-world scenario, and that made it easier for me to understand the concepts you were explaining. Your post was not only informative, but it was also engaging and enjoyable to read.
ReplyDelete
Replies
Dissertation PhD WritersJun 26, 2023, 9:20:00 AM
In a MapReduce sample example, we can use this powerful framework to efficiently calculate the total and average salary of employees. MapReduce is designed for distributed computing, allowing us to process large datasets in parallel.Cheap UK Dissertation Writing
ReplyDelete
Replies
Joseph MobleySep 6, 2023, 5:16:00 AM
Our audit app, a service offered by our company, is a cutting-edge software solution designed to simplify and enhance the auditing process for businesses across diverse industries. With user-friendly features for data collection, automated report generation, and in-depth analysis, it empowers organizations to efficiently manage audits, ensure compliance, and drive improvements in quality and operational performance.
ReplyDelete
Replies
abadataliFeb 4, 2024, 12:04:00 AM
This comment has been removed by the author.
ReplyDelete
Replies
AnonymousMar 27, 2024, 2:18:00 AM
Interessante voorbeeldwebsite! Voor wie op zoek is naar een betaalbare website laten maken, kan ik een aantal geweldige opties aanbevelen. Goedkope website laten maken Het is essentieel om een betrouwbaar en professioneel team te vinden dat niet alleen kwaliteit levert, maar ook rekening houdt met het budget en de specifieke behoeften van elke klant. Als je meer wilt weten over betaalbare websiteontwikkeling, laat het me gerust weten!
ReplyDelete
Replies
JohnsonApr 18, 2024, 4:37:00 AM
This is one of the best websites that I’ve ever stumbled upon! All the content displayed is unique and informative; guess you got a lifetime reader for your posts.
ReplyDelete
Replies
Phone Number GeneratorAug 6, 2024, 11:06:00 AM
Anchor Text Link Generator
ReplyDelete
Replies
johnsonmioSep 26, 2024, 9:03:00 AM
"Write my dissertation for me" services offer personalized academic support for students needing assistance with their dissertations. These services provide professional help with research, writing, and editing, ensuring high-quality and original work. By opting for such services, students can manage their workload and meet deadlines while submitting polished, well-structured dissertations.
ReplyDelete
Replies

Add comment

Find total and average salary of employees - MapReduce sample example

20 Comments

Contact Form