Download input data from following URL:-
Download each text files from following URL and store the files in a some directory, For me it is downloaded in /home/zytham/Downloads/hadoop_data1. http://www.gutenberg.org/cache/epub/20417/pg20417.txt
2. http://www.gutenberg.org/files/5000/5000-8.txt
Upload input file to HDFS :-
Switch to hduser1, if you are not in that context, remember while doing hadoop 2.6.1 installation in Ubuntu 13.04, we created hduser1 and set-up hadoop in context of hduser1.Start hadoop services :- First start the Hadoop cluster using following command
hduser1@ubuntu:~$ cd /usr/local/hadoop2.6.1/sbin
hduser1@ubuntu:/usr/local/hadoop2.6.1/sbin$ ./start-all.sh
Call From ubuntu/127.0.1.1 to localhost:54310 failed on connection exception: java.net.ConnectException: Connection refused;
Copy local file to HDFS:- Copy downloaded files from /home/zytham/Downloads/hadoop_data to hadoop filesystem (a file system managed by hadoop).Execute following command to create a hdfs directory and copy files from local file system to newly created hdfs directory.hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hdfs dfs -mkdir -p /user/hduser1/hdfsdata/hadoop_data
hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop dfs -copyFromLocal /home/zytham/Downloads/hadoop_data /user/hduser1/hdfsdata/hadoop_data
hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop dfs -ls /user/hduser1/hdfsdata/hadoop_data
hduser1@ubuntu:/usr/local/hadoop2.6.1$ cd /user/hduser1/hdfsdata/hadoop_data
bash: cd: /user/hduser1/hdfsdata/hadoop_data: No such file or directory
Run map-reduce Hadoop word count example:-
For convenience I have created a Wordcount sample program jar, download word count sample program jar and save it in some directory of your convenience. I have placed in hadoop installation directory "/home/zytham/hadoop_poc/WordcountSample.jar". Now execute the word-count jar file in single node hadoop pseudo cluster with following command../hadoop jar <word_count_sample_jar> <classNameOfSampleJar> <Input_files_location> <Output_directory_location>
hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop jar /home/zytham/hadoop_poc/WordcountSample.jar WordCountExample /user/hduser1/hdfsdata/hadoop_data /user/hduser1/wordcountOuput
15/10/04 15:29:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/10/04 15:29:36 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/10/04 15:29:36 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/10/04 15:29:37 INFO input.FileInputFormat: Total input paths to process : 3
..........................
..........................
15/10/04 15:29:43 INFO mapred.LocalJobRunner: reduce task executor complete.
15/10/04 15:29:43 INFO mapreduce.Job: map 100% reduce 100%
15/10/04 15:29:43 INFO mapreduce.Job: Job job_local884144492_0001 completed successfully
15/10/04 15:29:44 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=4011472
FILE: Number of bytes written=8420485
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=11928267
HDFS: Number of bytes written=883509
HDFS: Number of read operations=37
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Map-Reduce Framework
Map input records=78578
Map output records=629920
Map output bytes=6083556
Map output materialized bytes=1462980
Input split bytes=397
Combine input records=629920
Combine output records=101397
Reduce input groups=82616
Reduce shuffle bytes=1462980
Reduce input records=101397
Reduce output records=82616
Spilled Records=202794
Shuffled Maps =3
Failed Shuffles=0
Merged Map outputs=3
GC time elapsed (ms)=180
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=807419904
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=3676562
File Output Format Counters
Bytes Written=883509
If you get output something similar to above, you are on right track and output of this mapreduce program is stored in "/user/hduser1/wordcountOuput". We will now see the output processed by hadoop.
First verify output directory and see what are the files it contains. Execute followoing command for the same.
hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop dfs -ls /user/hduser1/wordcountOuput
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
15/10/04 15:33:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 hduser1 supergroup 0 2015-10-04 15:29 /user/hduser1/wordcountOuput/_SUCCESS
-rw-r--r-- 1 hduser1 supergroup 883509 2015-10-04 15:29 /user/hduser1/wordcountOuput/part-r-00000
Now, execute following command to see processed output in terminal(Ouput shown below is just partial one, you have to scroll and see complete output):
hduser1@ubuntu:/usr/local/hadoop2.6.1/bin$ ./hadoop dfs -cat /user/hduser1/wordcountOuput/part-r-00000
........
.......
worst 10
worst. 1
worsted 2
worsted! 1
worsting 1
worth 36
worth. 5
worth._ 2
worthful 1
worthier 1
worthless. 1
worthy 21
worthy, 1
æsthetic 1
è 3
état_. 1
� 5
�: 1
�crit_ 1
�pieza; 1
Using "getmerge" command we can download mapreduce output to local file system. Use following command to merge output files present in hdfs output folder.
hduser@ubuntu:/usr/local/hadoop/bin$ ./hadoop dfs -getmerge /user/hduser1/wordcountOuput /tmp/wordCountLocal
sir kindly tell me what "classnamesimplejar"
ReplyDelete