Nov 8, 2015

Textual description of firstImageUrl

Read and Write operation in Hadoop 2.0 - An overview from API perspective

Hadoop has went thorough a substantial architectural and API changes from hadoop 1.x to hadoop 2.x. In order to improve performances of hadoop cluster some new concepts has been introduced like(Federated namenode, HA namenode and YARN, etc). The main agenda of this post is to give an overview of read and write operation in hadoop 2.0 from API perspective.Read Internals of write and read operation in hadoop.

Write operation Hadoop 2.0 

  1. The client creates the file by calling create() method on DistributedFileSystem.
  2. DistributedFileSystem makes an RPC call to the namenode to create a new file in the filesystem’s namespace, with no blocks associated with it.
    The namenode performs various checks to make sure the file doesn’t already exist and the client has the right permissions to create the file. If all these checks pass, the namenode makes a record of the new file; otherwise, file creation fails and the client is thrown an IOException. TheDistributedFileSystem returns an FSDataOutputStream for the client to start writing data to datanode. FSDataOutputStream wraps a DFSOutputStream which handles communication with the datanodes and namenode.
  3. As the client writes data, DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the DataStreamer, which is responsible for asking the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas. The list of datanodes forms a pipeline, and default replication level is three, so there are three nodes in the pipeline. The DataStreamer streams the packets to the first datanode in the pipeline, which stores the packet and forwards it to the second datanode in the pipeline.
  4. Similarly, the second datanode stores the packet and forwards it to the third (and last) datanode in the pipeline.
  5. DFSOutputStream also maintains an internal queue of packets that are waiting to be acknowledged by datanodes, called the ack queue. A packet is removed from the ack queue only when it has been acknowledged by all the datanodes in the pipeline.
  6. When the client has finished writing data, it calls close() on the stream.It flushes all the remaining packets to the datanode pipeline and waits for acknowledgments before contacting the namenode to signal that the file is complete The namenode already knows which blocks the file is made up of , so it only has to wait for blocks to be minimally replicated before returning successfully.
    Below diagram summarises file write operation in hadoop.
    Client writing data to HDFS 

Read operation Hadoop 2.0 

  1. The client opens the file by calling open() method on DistributedFileSystem.
  2. DistributedFileSystem makes an RPC call to the namenode to determine location of datanodes where files is stored in form of blocks.For each blocks,the namenode returns address of datanodes(metadata of blocks and datanodes) that have a copy of block. Datanodes are sorted according to proximity(depending of network topology information).
    The DistributedFileSystem returns an FSDataInputStream (an input stream that supports file seeks) to the client for it to read data from. FSDataInputStream in turn wraps a DFSInputStream, which manages the datanode and namenode I/O.
  3. The client then calls read() on the stream. DFSInputStream,which has stored the datanode addresses for the first few blocks in the file, then connects to the first (closest) datanode for the first block in the file. 
  4. Data is streamed from the datanode back to the client(in the form of packets) and read() is repeatedly called on the stream by client.
  5. When the end of the block is reached, DFSInputStream will close the connection to the datanode, then find the best datanode for the next block (Step 5 and 6)
  6. When the client has finished reading, it calls close() on the FSDataInputStream (step 7).
Client reading data from HDFS
  • During reading, if the DFSInputStream encounters an error while communicating with a datanode,it will try the next closest one for that block.It will also remember datanodes that have failed so that it doesn’t needlessly retry them for later blocks. 
  • The DFSInputStream also verifies checksums for the data transferred to it from the datanode. If a corrupted block is found, the DFSInputStream attempts to read a replica of the block from another datanode; it also reports the corrupted block to the namenode. 

Reference :- Hadoop definitive guide by Tom White.
Location: Hyderabad, Telangana, India