Apache Hadoop Interview Questions - Set 1

byNikhil Ranjan •September 13, 2016

1. What is Hadoop and how it is related to Big data ?
Answer:- In 2012, Gartner updated its definition as follows: "Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."
Hadoop is a framework that allows for distributed processing of large data sets across clusters of commodity system using various programming model like Mapreduce.(Commodity hardware is a non-expensive system without high-availability traits).
As business expand,volume of data also grows and unstructured data is getting dumped into different machines for analysis.The major challenge is not to store large data but to retrieve and analyse the big data, that too data present in different machines geographically.
Hadoop framework comes here for rescue. Hadoop has the ability to analyse the data present in different machines at different locations very quickly and in a very cost effective way. It uses the concept of MapReduce programming model which enables it to process data sets in parallel.

2. What is Hadoop ecosystem and its building block elements ?
Answer:- The Hadoop ecosystem refers to the various components of the Apache Hadoop software library, as well as to the accessories and tools provided by the Apache Software Foundation for these types of software projects, and to the ways that they work together.
Core components of Hadoop are:
1. MapReduce - a framework for parallel prosessing vast amounts of data.
2. Hadoop Distributed File System (HDFS), a sophisticated distibuted file system.
3.YARN, a Hadoop resource manager.
In addition to these core elements of Hadoop, Apache has also delivered other kinds of accessories or complementary tools for developers. These include Apache Hive, a data analysis tool; Apache Spark, a general engine for processing big data; Apache Pig, a data flow language; HBase, a database tool; and also Ambarl, which can be considered as a Hadoop ecosystem manager, as it helps to administer the use of these various Apache resources together.

3. What is fundamental difference between classic Hadoop 1.0 and Hadoop 2.0 ?
Answer:-

Hadoop 1.X	Hadoop 2.X
Limited up to 4000 nodes per cluster	Potentially up to 10000 nodes per cluster
Supports only for MapReduce processing model.	Along with MapReduce processing model, support added for other distributed computing models(non MR) like Spark, Hama, Giraph, Message Passing Interface) MPI & HBase co-processors.
Job tracker is bottleneck in Hadoop 1.x - responsible for resource management, scheduling and monitoring.(MR does both processing and cluster resource management.)	YARN (Yet Another Resource Negotiator) does cluster resource management and processing is done using different processing models. Efficient cluster utilisation achieved using YARN.
Map Reduce slots are static. A given slots can run either a Map task or a Reduce task only.	Works on concepts of containers. Using containers can run generic tasks.
Only one namespace for managing HDFS.	Multiple namespace for managing HDFS.
Because of single NameNode it might lead of single point of failure and in case of NameNode failure, needs manual intervention.	SPOF overcome with a standby NameNode and in case of NameNode failure, it is configured for automatic recovery.

4. What is Job tracker and Task tracker. How are they used in Hadoop cluster ?
Answer:- Job Tracker is a daemon that runs on a Namenode for submitting and tracking MapReduce jobs in Hadoop. Some typical tasks of Job Tracker are:
- Accepts jobs from clients
- It talks to the NameNode to determine the location of the data.
- It locates TaskTracker nodes with available slots at or near the data.
- It submits the work to the chosen Task Tracker nodes and monitors progress of each task by receiving heartbeat signals from Task tracker.
Task tracker is a daemon that runs on Datanodes. It accepts tasks like Map, Reduce and Shuffle operations - from a Job Tracker. Task Trackers manage the execution of individual tasks on slave node. When a client submits a job, the job tracker will initialise the job and divide the work and assign them to different task trackers to perform MapReduce tasks.While performing this action, the task tracker will be simultaneously communicating with job tracker by sending heartbeat. If the job tracker does not receive heartbeat from task tracker within specified time, then it will assume that task tracker has crashed and assign that task to another task tracker in the cluster.

5. Whats the relationship between Jobs and Tasks in Hadoop ?
Answer:- In hadoop Jobs are submitted by client and Jobs are split into multiple tasks like Map, Reduce and Shuffle.

6. What is HDFS (Hadoop distributed file system)? Why HDFS is termed as Block structured file system ? What is default HDFS block size ?
Answer:- HDFS is a file system designed for storing very large files. HDFS is highly fault-tolerant, with high throughput, suitable for applications with large data sets, streaming access to file system data and can be built out of commodity hardware (Commodity hardware is a non-expensive system without high-availability traits).
HDFS is termed as Block structured file system because individual files are broken into blocks of fixed size (default block size of an HDFS block is 128 MB). These blocks are stored across a cluster of one or more machines with data storage capacity. Changing the dfs.blocksize property in hdfs-site.xml will change the default block size for all the files placed into HDFS.

7. Why HDFS blocks are large as compared to disk blocks (HDFS default block size is 128 MB and disk block size in Unix/Linux is 8192 bytes) ?
Answer:- HDFS is more suitable for large amount of data sets in a single file as compared to small amount of data spread across multiple files.In order to minimise the seek time while read operation - files are stored in large chunks in order of HDFS block size.
If file size is smaller than 128 MB then file will just use its's own size on a given block, rest will e used by other files.
If a particular file is 110 MB, will the HDFS block still consume 128 MB as the default size?
No, only 110 MB will be consumed by an HDFS block and 18 MB will be free to store something else.
Note:- In Hadoop 1 - default block size is 64 MB and in Hadoop 2 - default block size is 128 MB

8. What is significance of fault tolerance and high throughput in HDFS ?
Answer:- Fault Tolerance: - In Hadoop, when we store a file, it automatically gets replicated at two other locations also. So even if one or two of the systems collapse, the file is still available on the third system.So, chance of data loss is minimised and data loss can be recovered if there is any failure at one node.
Throughput:- Throughput is the amount of work done in a unit time. In HDFS, when client submit a job- it is divided and shared among different systems. All the systems will be executing the tasks assigned to them independently and in parallel. So the work will be completed in a very short period of time. In this way, the HDFS provides good throughput.

9. What does "Replication factor" mean in Hadoop? What is default replication factor in HDFS ? How to modify default replication factor in HDFS ?
Answer:- The number of times a file needs to be replicated in HDFS is termed as replication factor.
Default replication factor in HDFS is 3. Changing the dfs.replication property in hdfs-site.xml will change the default replication for all files placed in HDFS.
The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
We can change the replication factor on a per-file basis and on all files in the directory using hadoop FS shell.
$ hadoop fs –setrep –w 3 /MyDir/file
$ hadoop fs –setrep –w 3 -R /RootDir/Mydir

10. What is Datanode and Namenode in HDFS ?
Answer:- Datanodes are the slaves which are deployed on each machine and provide the actual storage. These are responsible for serving read and write requests for the clients.
Namenode is the master node on which job tracker runs and stores metadata about actual storage of data blocks, so that it can manages the blocks which are present on the datanodes. It is a high-availability machine, Namenode can never be a commodity hardware because the entire HDFS rely on it so it has to be a high-availability machine.

11. Can Namenode and Datanode system have same hardware configuration ?
Answer:- In a single node cluster there is only one machine so Namenode and Datanode can be on same machine. However, in production environment Namenode and datanodes are on different machines. Namenode should be a high-end and high- availability machine.

12. What is the fundamental difference between traditional RDBMS and Hadoop?
Answer:- Traditional RDBMS is used for transnational systems ,whereas Hadoop is an approach to store huge amount of data in the distributed file system and process it.

RDBMS	Hadoop
Data size are order of Gigabytes	Data size are order of Petabytes or Zettabytes
Access method support Interactive and batch	Access method support batch only
Static schema	Dynamic schema
Nonlinear scaling	Linear scaling
High integrity	Low integrity
Suitable for Read and write many times	Suitable for write once, multiple times

12. What is secondary Namenode and what is its significance in hadoop ?
Answer:- In Hadoop 1, Namenode was single point of failure. In order to make hadoop system up and running it was important to make the Namenode resilient to failure and add ability to recover from failure. If Namenode fails, no data access is possible from datanodes, as Namenode stores metadata about data balock stores on datanodes.
The main file written by the NameNode is called fsimage; This file is read into memory and all future modifications to the filesystem are applied to this in-memory representation of the filesystem. The Namenode does not write out new versions of fsimage as new changes are applied after it is run; instead, it writes another file called edits, which is a list of the changes that have been made since the last version of fsimage was written.
Secondary Namenode is used to periodically merge the Namespace image with the edit log to prevent the edit log from becoming too large. The secondary Namenode usually runs on a separate physical machine because it requires plenty of CPU and as much memory as the Namenode to perform the merge. It maintains a copy of the merged namespace image, which can be used in the event of the Namenode failing. However, the state of the secondary Namenode lags that of the primary, so in the event of total failure of the primary, data loss is almost certain.
Note:- Secondary Namenode is not standby of primary Namenode, so it is not substitute of Namenode. Read in detail about Namenode,Datanode and Secondry Namenode and Internals of read and write operation in hadoop.

13. What is importance of heartbeat in HDFS ?
Answer:- A heartbeat is a signal indicating that it is alive. A datanode sends heartbeat to Namenode and task tracker will send its heart beat to job tracker.
If the Namenode or job tracker does not receive heart beat then they will decide that there is some problem in datanode or task tracker is unable to perform the assigned task. 14. What is HDFS cluster? Answer:- HDFS cluster is the name given to the whole configuration of master and slaves where data is stored. In other words, collections of Datanode commodity machine and High availability Namenode collectively termed as HDFS cluster. Read in detail about Namenode,Datanode and Secondry Namenode

14. What is the communication channel between client and namenode/datanode?
Answer:- The mode of communication is SSH.

15. What is a rack ? What is Replica Placement Policy ?
Answer:- Rack is a physical collection of datanodes which are stored at a single location. There can be multiple racks in a single location.
When client wants to load a file into the cluster, the content of the file will be divided into blocks and Namenode provides information about 3 datanodes for every block of the file which indicates where the block should be stored.
While placing the datanodes, the key rule followed is “for every block of data, two copies will exist in one rack, third copy in a different rack“. This rule is known as “Replica Placement Policy“.

Tags: Hadoop HadoopInterview

28 Comments

Root JobsFeb 13, 2019, 12:14:00 PM
Very informative article that you have shared here about the Apache Hadoop Interview Questions. After reading your article I got very much information and It resolved many of my doubts. If anyone looking for the Hadoop Developer Jobs in USA, rootjobs is the best for you.
ReplyDelete
Replies
aaronnssdOct 12, 2019, 4:48:00 AM
This information is meaningful and magnificent which you have shared here about the Apache Hadoop Interview Questions. I am impressed by the details that you have shared in this post and It reveals how nicely you understand this subject. I would like to thanks for sharing this article here. jobsgaadi is the best online platform where you can find High Salary Fresher Jobs in India.
ReplyDelete
Replies
imlanguagesAug 11, 2020, 4:38:00 AM
Very well written article. I am really impressed by the way you detailed everything. It’s very informative and you are obviously very knowledgeable in this field. Its a great pleasure reading your post. It’s useful information. Thank you so much. Free Job Posting Sites In Bangalore.
ReplyDelete
Replies
vé máy bay đi PhápApr 8, 2021, 11:50:00 PM
Aivivu - đại lý chuyên vé máy bay trong nước và quốc tế

mua ve may bay di my

vé máy bay đi hcm

đặt vé máy bay sài gòn hà nội

giá vé khứ hồi hà nội nha trang

từ tphcm đi quy nhơn bao nhiêu tiếng

taxi sân bay rẻ nhất

giá combo vinpearl phú quốc
ReplyDelete
Replies
MichaelseoJun 7, 2021, 12:38:00 AM
This was really an interesting topic and I kinda agree with what you have mentioned here! get more info
ReplyDelete
Replies
fashionsfitsJun 15, 2021, 5:02:00 AM
I finally found great post here.I will get back here. I just added your blog to my bookmark sites. thanks.Quality posts is the crucial to invite the visitors to visit the web page, that's what this web page is providing. short good morning messages
ReplyDelete
Replies
mshahidJun 21, 2021, 4:58:00 AM
it's really nice and manful. it's really cool blog. Linking is very useful thing.you have really helped lots of people who visit blog and provide them use full information. Unique Dofollow Backlinks
ReplyDelete
Replies
oncasinositenetAug 12, 2021, 9:05:00 PM
This blog iswhat im exactly looking for. Great! and Thanks to you. 카지노
ReplyDelete
Replies
Turkey e visaOct 4, 2021, 12:47:00 AM
This website was... how do you say it? Relevant!! I finally found something that helped me. You can read online info about Turkey evisa requirements.The Turkey e visa is an electronic visa that allows entry into Turkey.

ReplyDelete
Replies
kenya multiple entry visaOct 28, 2021, 2:47:00 PM
Thank you �� The foreign visitors need to apply for kenya multiple entry visa online. That offers them fast and secure visa services. You also can check the al information regarding to visa here to get in the Kenya.
ReplyDelete
Replies
emergency visa application IndiaDec 17, 2021, 9:34:00 PM
Fabulous post. . Emergency visa application India is an electronic travel authorization granted to 165 eligible country citizens in the world. You can get an emergency visa for 1 to 3 working days.

ReplyDelete
Replies
Sophia LucasMar 14, 2022, 4:20:00 AM
You have provided me with some very useful illustrated information. I am grateful for this. I am sure it will be of great use to me in my future projects. How to get an e-Tourist Visa for India? Foreign tourists who dream of visiting India for tourism can apply online for an Indian Tourist Visa. You can check requirements for Indian tourist visa then you can apply 30 days India e-Tourist visa online via Indian visa website.
ReplyDelete
Replies
JamesMay 8, 2022, 11:34:00 PM
Thanks for this post... This article is very informative... An expedited Indian visa is available at most foreign consulates, for those who plan to travel abroad within a couple of weeks or less, for more information just click and check it.
ReplyDelete
Replies
evisa AzerbaijanJul 13, 2022, 12:03:00 AM
Superb! Top notch information... Thank you for sharing with us... Did you know all the things about evisa Azerbaijan? If you have any kind of doubt, just click, check and clear your doubt, because a doubt free mind makes your trip memorable. So clear your doubts and apply for a visa.
ReplyDelete
Replies
Indian visa processing time for UK citizensApr 10, 2024, 2:59:00 AM
Absolutely captivating! This blog post is a goldmine of valuable insights that have profoundly deepened my understanding of the topic. Your knack for simplifying intricate ideas is truly remarkable and keeps me captivated from beginning to end. Thank you for sharing such enriching content! Eagerly anticipating your next contributions. 👏
ReplyDelete
Replies
Azerbaijan work visa fee for PakistanApr 21, 2024, 10:35:00 PM
This blog post serves as a window into the intricate workings of the human mind. With each paragraph, the author peels back another layer of consciousness, revealing the complex interplay of thoughts and emotions that define our existence. It's as though they've captured the essence of the human experience in their words, articulating thoughts and feelings that resonate deeply with readers. In essence, this post is a testament to the power of language to illuminate the depths of our shared humanity.
ReplyDelete
Replies
Bahrain online visa feeApr 23, 2024, 12:32:00 AM
The author's mastery of language shines through, creating a tapestry of ideas that captivates the mind. Navigating through its pages feels akin to embarking on a philosophical journey, where every turn reveals new insights to ponder. In essence, it's a testament to the power of expression and the beauty of intellectual exploration.
ReplyDelete
Replies
visa requirements for Cameroonian citizensMay 2, 2024, 2:13:00 AM
Hi there! Really enjoyed your post! Your clear explanations made it simple to grasp the topic. The relatable examples you used helped me understand the concepts better. The visuals were a nice addition and improved the experience. I plan to try out some of your tips! Thanks for sharing your expertise. Can't wait to read more from you. Keep up the excellent work!
ReplyDelete
Replies
Saudi Arabia visa for AustralianJun 8, 2024, 12:45:00 AM
Absolutely fantastic post! Your talent for making difficult subjects easy to understand shines through. The way you present information is both clear and engaging. I'm excited to put these tips into practice. Thank you for generously sharing your expertise.
ReplyDelete
Replies
Online mexico visa applicationJun 19, 2024, 8:57:00 PM
I found this blog post to be incredibly insightful. The focus on mindful living and how our everyday habits impact our mental health is very eye-opening. It’s a powerful reminder that little adjustments, like taking moments for deep breathing or practicing gratitude, can greatly improve our overall well-being. The practical advice shared is not only useful but also easy to implement, making it less daunting to start these practices. In our busy lives, we often neglect self-care, but this post gently encourages us to pay more attention to our mental health, which is crucial in our high-paced society.
ReplyDelete
Replies
cost of visa to LaosJan 3, 2025, 2:14:00 AM
I loved this post! It made me reflect on how often we overlook the small, meaningful moments that make life beautiful. When we take the time to appreciate them, they truly become more meaningful. The idea of slowing down and being present really spoke to me. Life moves so quickly, but this article reminded me to focus on what truly matters. I’ll definitely keep this in mind moving forward. Thank you for sharing such a thought-provoking perspective.
ReplyDelete
Replies
Tanzania tourist visa applicationJan 3, 2025, 3:34:00 AM
This was such a delightful read! You’ve mastered the art of simplifying complexity and sharing actionable advice. The examples grounded the ideas perfectly, making them feel relevant. Your uplifting tone made it a joy to read. Thank you for this gem of a blog—I’m looking forward to more!

ReplyDelete
Replies
online visa for MyanmarJan 6, 2025, 9:21:00 PM
What an excellent article! I found the tips so easy to follow, and the explanations were very clear. The step-by-step guide really helped me understand the topic better. I appreciate how you made everything so simple, which made it easier for me to absorb. I’m excited to implement some of these tips in my own work. Keep up the amazing work, and I can’t wait to read more insightful articles like this in the future
ReplyDelete
Replies
How much is the Kaza visa?Jan 7, 2025, 1:03:00 AM
learning tool. Looking forward to reading more insightful posts from you!
This blog is a wonderful resource for anyone wanting to expand their knowledge. The tips are easy to follow, and the content is very informative. I particularly love how the author breaks down challenging concepts into simple, digestible points. The writing is clear, concise, and engaging, making learning fun and accessible to all. I can tell a lot of effort goes into researching and creating this content. This blog is a must-read for anyone looking to improve their skills.
ReplyDelete
Replies
e visa MoroccoJan 9, 2025, 9:42:00 PM
This is an outstanding blog post! It’s clear that the author has a deep understanding of the subject and knows how to present it effectively. The tips shared are not only practical but also easy to implement. I appreciate the way the content is written in simple language without compromising on depth. Blogs like this set a high standard and truly add value to readers. I’ll be bookmarking this page and revisiting it for future reference. Thank you for crafting such a well-thought-out and informative piece. Keep up the excellent work—you’re doing an amazing job!
ReplyDelete
Replies
Zambia visasJan 11, 2025, 2:58:00 AM
I really enjoyed reading this blog post! With such ease and clarity, you addressed the subject, making it understandable to all readers. The examples were excellent; they gave your views more substance and relatability. I really liked how your conversational style made the information seem like friendly guidance; it was entertaining as well as educational. Particularly helpful were the useful hints, which provided immediate actionable insights. You've done a great job, however it's uncommon to discover stuff this interesting and helpful. I look forward to reading more posts like this one!

ReplyDelete
Replies
visa to Turkey from AustraliaJan 13, 2025, 2:32:00 AM
Reading this post feels like discovering a rare treasure. It’s beautifully written, combining an engaging style with deep, meaningful content that resonates with the reader. It leaves behind a lasting sense of awe and appreciation for the effort, creativity, and talent it took to craft something so extraordinary.
ReplyDelete
Replies
e visa Russia priceJan 14, 2025, 4:13:00 AM
I really needed this post! You have simplified and made understandable a complex subject. This subject is now lot more approachable thanks to your succinct and straightforward explanation. I value the work you did to ensure that everyone could understand it. It's uncommon to find such insightful and well-written articles. I appreciate all of your hard work.
ReplyDelete
Replies

Add comment

Apache Hadoop Interview Questions - Set 1

28 Comments

Contact Form