Mapreduce, a data processing framework(engine), can be used to analyse various kinds of data (logs, feedbacks, sales details, etc) sources. In previous post, we analyses time-temperature statistics and generates report with max/min temperature for various cities. In this post we will analyse customer feedback/review comments, for various mobile phones, stored in text file and conclude that which mobile can be be good buy.
Input schema :- <Mobile_set_detail><TAB><Price><TAB><Vendor><TAB><Comment>
Example:- Lenovo A6000 Plus Rs. 7,499.00 Flipkart Satisfied with phone.
Expected output:-
In positiveFeedback_file : <Mobile_detail><TAB><Comment_count><TAB><All_comments_separated by ||>
Apple Iphone 4s - 16 Gb:Rs. 12,243.00 Comments(2) Amazingly smooth and has a much better battery life. || good for style and long term uses. ||
In negativeFeedback_file : <Mobile_detail><TAB><Comment_count><TAB><All_comments_separated by ||>
Lenovo VIBE P1m (Black, 16 GB):Rs. 7,999 Comments(3) Poor service so do not buy. ||Poor service so do not buy. ||Do not prefer and not reccomend ||
As part of solving this use case we will learn about -
1. Life cycle of mapper and reduce class- setup - > map()/reduce() -> cleanup()
For each mapper/reducer task order of execution of these three method is same. setup() method provides an opportunity to alter/setup or modify input or supporting data for mapper or reducer class.
In cleanup method resources can be released.
2. MultipleOutputs - more than one reduce file can be generated using MultipleOutputs
Reducer class:- In reducer class, setup() method creates positive and negative words list(based on these words in comments positive ande negative comments is separated out).
In reduce method, list is iterated and positive/negative feedback pattern is matched against POS_QUALIFY_PATTERN which is created using wordList.get(0) which gives positive comment words and similarly, NEG_QUALIFY_PATTERN is created using wordList.get(0) which gives negative comment words.If match is found corresponding comment string(sbfPos/sbNeg) is updated with count.
Once for loop is terminated, both file(positiveReview and negativeReview) is updated with comments count and comment string.
Driver class :-
Note:- Data used for sample program is fictitious, ONLY for educational purpose and it does not convey any message regarding good or bad of product.
Problem statement:- Analyse text file storing customer feedback about various mobile phone from various vendor using mapreduce and separate out positive & negative comments in separate file corresponding to each mobile phone with price.And corresponding to each mobile set display total number of comments too.Download sample input file.Input schema :- <Mobile_set_detail><TAB><Price><TAB><Vendor><TAB><Comment>
Example:- Lenovo A6000 Plus Rs. 7,499.00 Flipkart Satisfied with phone.
Expected output:-
In positiveFeedback_file : <Mobile_detail><TAB><Comment_count><TAB><All_comments_separated by ||>
Apple Iphone 4s - 16 Gb:Rs. 12,243.00 Comments(2) Amazingly smooth and has a much better battery life. || good for style and long term uses. ||
In negativeFeedback_file : <Mobile_detail><TAB><Comment_count><TAB><All_comments_separated by ||>
Lenovo VIBE P1m (Black, 16 GB):Rs. 7,999 Comments(3) Poor service so do not buy. ||Poor service so do not buy. ||Do not prefer and not reccomend ||
As part of solving this use case we will learn about -
1. Life cycle of mapper and reduce class- setup - > map()/reduce() -> cleanup()
For each mapper/reducer task order of execution of these three method is same. setup() method provides an opportunity to alter/setup or modify input or supporting data for mapper or reducer class.
In cleanup method resources can be released.
2. MultipleOutputs - more than one reduce file can be generated using MultipleOutputs
Sample code for mapper, reducer and driver class
Mapper class :- In below mapper class, input file is read and map method is executed for each line. Parse the input line and write in context. Both key and value is of type Text./* * Mapper executes setup for each task in sequence : setup - > map -> cleanup */ class ReviewMapperClass extends Mapper<Object, Text, Text, Text> { @Override protected void map(Object key, Text value, Context context) { try { String inputLine = value.toString(); String feedback = inputLine.split("\\t")[3]; String productId = inputLine.split("\\t")[0]; String price = inputLine.split("\\t")[1]; String mapperKey = productId + ":" + price; context.write(new Text(mapperKey), new Text(feedback)); } catch (IOException e) { e.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } } }
Reducer class:- In reducer class, setup() method creates positive and negative words list(based on these words in comments positive ande negative comments is separated out).
In reduce method, list is iterated and positive/negative feedback pattern is matched against POS_QUALIFY_PATTERN which is created using wordList.get(0) which gives positive comment words and similarly, NEG_QUALIFY_PATTERN is created using wordList.get(0) which gives negative comment words.If match is found corresponding comment string(sbfPos/sbNeg) is updated with count.
Once for loop is terminated, both file(positiveReview and negativeReview) is updated with comments count and comment string.
/* * Reducer executes on mapper output in sequence : setup - > map -> cleanup we * have not overridden setup and cleanup. */ class ReviewReducerClass extends Reducer<Text, Text, Text, Text> { MultipleOutputs<Text, Text> multiOutput; List<String> wordList = new LinkedList<String>(); @Override protected void setup(Context context) { multiOutput = new MultipleOutputs<Text, Text>(context); Configuration conf = context.getConfiguration(); wordList.add(conf.get("positiveWords")); wordList.add(conf.get("negativeWords")); } @Override public void reduce(Text key, Iterable<Text> feedbackList, Context con) { Matcher matcherQualifyPositive; Matcher matcherQualifyNegative; final String POS_QUALIFY_PATTERN = "(?)(.*)(" + wordList.get(0) + ")(.*)"; final String NEG_QUALIFY_PATTERN = "(?)(.*)(" + wordList.get(1) + ")(.*)"; Pattern posQualifyPattern = Pattern.compile(POS_QUALIFY_PATTERN, Pattern.CASE_INSENSITIVE); Pattern negQualifyPattern = Pattern.compile(NEG_QUALIFY_PATTERN, Pattern.CASE_INSENSITIVE); int countPos = 0; int countNeg = 0; try { StringBuffer sbfPos = new StringBuffer(""); StringBuffer sbfNeg = new StringBuffer(""); for (Text strVal : feedbackList) { matcherQualifyPositive = posQualifyPattern.matcher(strVal .toString()); matcherQualifyNegative = negQualifyPattern.matcher(strVal .toString()); if (matcherQualifyPositive.find()) { if (!matcherQualifyNegative.find()) { sbfPos.append(strVal).append(" || "); countPos++; } } else if (matcherQualifyNegative.find()) { sbfNeg.append(strVal).append("||"); countNeg++; } } /* Write on both positive and negative feedback file */ if (countPos != 0 && !sbfPos.equals("")) { multiOutput.write(PositiveAndNegativeReview.positiveReview, new Text(key.toString() + " Comments("+ countPos + ")"), new Text(sbfPos.toString())); } if (countNeg != 0 && !sbfNeg.equals("")) { multiOutput.write(PositiveAndNegativeReview.negativeReview, new Text(key.toString() + " Comments("+ countNeg + ")"), new Text(sbfNeg.toString())); } System.out.println(sbfNeg.toString()); System.out.println(sbfPos.toString()); } catch (IOException e) { e.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } } @Override protected void cleanup(Context context) { wordList = null; multiOutput = null; } }
Driver class :-
public class PositiveAndNegativeReview { public static String positiveReview = "positiveReview"; public static String negativeReview = "negativeReview"; /** * Uses of setUp and cleanup in Mapper and Reducer - */ public static void main(String[] args) { final String POSITIVE_WORD = "good |satisfied |classic|class|happy |thanks | recommend |good to go|best |rocking |yo |fancy |stylish |must buy | amazing |smooth |awesome |damn good "; final String NEGATIVE_WORD = "not good |Do not |donot |poor | not satisfied |very poor|not happy |worst | not recommend |do noy buy|not-satisfied|waste |bad | false |not stylish |should not buy |not amazing | not smooth |wasted |damn bad "; Configuration conf = new Configuration(); conf.set("positiveWords", POSITIVE_WORD); conf.set("negativeWords", NEGATIVE_WORD); try { Job job = Job.getInstance(conf, "Filer file with good feedback!!"); job.setMapperClass(ReviewMapperClass.class); job.setReducerClass(ReviewReducerClass.class); job.setJarByClass(ReviewFilterForBestBuy.class); /* * Set below four property carefully otherwise job fails silently * after first context.write */ job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); /* Optional, it's good to set */ job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); /* Multiple output setting */ MultipleOutputs.addNamedOutput(job, negativeReview, TextOutputFormat.class, Text.class, Text.class); MultipleOutputs.addNamedOutput(job, positiveReview, TextOutputFormat.class, Text.class, Text.class); Path pathInput = new Path( "hdfs://localhost:54310/user/hduser1/feedbackPosNeg.txt"); Path pathOutputDir = new Path( "hdfs://localhost:54310/user/hduser1/testfs/output_dir_feedback"); FileInputFormat.setInputPaths(job, pathInput); FileOutputFormat.setOutputPath(job, pathOutputDir); System.exit(job.waitForCompletion(true) ? 1 : 0); } catch (IOException e) { e.printStackTrace(); } catch (ClassNotFoundException e) { e.printStackTrace(); } catch (InterruptedException e) { e.printStackTrace(); } } }Start hadoop services(./start-all.sh from sbin directory) and execute driver program. verify output directory - it should two files(negativeReview-r-00000 and positiveReview-r-00000).Download sample output file.
hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop fs -cat /user/hduser1/testfs/output_dir_feedback/positiveReview-r-00000 Apple Iphone 4s - 16 Gb - Black:Rs. 12,617.00 Comments(2) Yo like it. || Amazingly smooth and has a much better battery life. || Apple iPhone 5s 40 16GB 41:Rs. 38,269.00 Comments(1) Good phone. || Lenovo A2010 (Black, 8 GB):Rs. 4,990 Comments(4) Very stylish and fancy. || Very stylish and fancy. || Good phone. || Very good in low end. ||
Mapreduce usecase discussion is very nice and understandable.
ReplyDeleteHowever, it algorithm to match positive and negative feedback can be improved.
Đặt vé máy bay tại Aivivu, tham khảo
ReplyDeleteVé máy bay đi Mỹ
vé về việt nam từ mỹ
khi nào có chuyến bay từ nhật về việt nam
vé máy bay từ đức về việt nam
giá vé máy bay từ canada về Việt Nam
đặt vé máy bay từ hàn quốc về việt nam
giá khách sạn cách ly
Love To Enjoy A Sexual Encounter With A Udaipur Escorts Sarakaur
ReplyDeleteYou are looking to spend quality time in Udaipur? Udaipur Escorts will make your life more colorful and help you enjoy the rest of your day. They will take away all your office stress and make your night memorable and joyful.
Udaipur Escorts
#UdaipurEscorts #UdaipurCallgirls #UdaipurescortsServices #EscortsinUdaipur
Very nice blog, Thanks for sharing a great article putting it all together for us. best boat airdopes under 2000
ReplyDeleteVery nice blog, Thanks for sharing great article putting it all together for us.
ReplyDeleteAmazon Upcoming Sale
I have just found this website while searching over the internet, you have posted valuable information which i like reading.
ReplyDeleteBest Massage Chair in India
Excellent Blog! I would like to thank you for the efforts you have made in writing this post. Best baby diapers in India
ReplyDeleteLooking forward to reading more. Great article post. Fantastic. Thanks so much for the blog. Much obliged.
ReplyDelete오피
You actually make it seem so easy with your presentation but I find this matter to be really something which I think I would never understand. It seems too complex and extremely broad for me. I am looking forward for your next post, I’ll try to get the hang of it!
ReplyDelete마사지
Thanks for ones marvelous posting! I truly enjoyed reading it, you will be a great author. I will make sure to bookmark your blog and will often come back sometime soon. I want to encourage yourself to continue your great writing, have a nice holiday weekend!
ReplyDelete건전마사지
I've been searching for hours on this topic and finally found your post. casino online, I have read your post and I am very impressed. We prefer your opinion and will visit this site frequently to refer to your opinion. When would you like to visit my site?
ReplyDeleteBuying reverse mobile phone call tracker? Learn what reverse mobile call tracker you should utilize and the way it work! kidstracker.io/call-tracking.html
ReplyDeleteAs I have read this blog by the Hot Delhi Girls it is quite great to think this as they have written a great point to discover.
ReplyDelete