Mar 3, 2014

Textual description of firstImageUrl

How HashMap works internally - Internal implementation of HashMap

HashMap is one of the concrete implementation of Map(An associate array data structure). HashMap stores key/value pair in form of Entry object(An static class managed by HashMap having fields: key, value, next,hash) and associated basic operations(get() and put()) are of order O(1)/constant time,assuming the hash function disperses the elements properly among the buckets(unless it degenerate to Linkedlist when using poor hashcode mechanism).This post will mainly focus on internal implementation and working of HashMap.It is one of the favorite question of interviewer when dealing with collections for senior java developer.As we progress, we will go in detail of building blocks of HashMap and discuss its importance in HashMap implementation.
First question that pops-up in mind is what is underlying data structure being used by HashMap to store its key/value? Answer: It is an re-sizable array of Entry object(Entry[] table) which act as container for the key/value pairs.This table is also called hash table because index value for this table is calculated by hash mechanism and then entry object is stored. There are two factors which control the re-sizing mechanism of hash table(Entry[] table) i.e : initial capacity and loadfactor.
The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased.
When does table size got increased or re-hashing occurs ? Answer: When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
By default initial capacity of this table is set 16(Until java 6) and it's initial capacity can be changed by using appropriate constructor. Please note from java 7 update 40, empty table is created and initialization of table is also moved out of constructor to the same line, where table is declared. And when first time put method is called then inflateTable(threshold) is called to inflate table.Below mentioned sample code from java.util.HashMap(java 6) indicating Entry table(hash table) and constructors:
public class HashMap<k,V> extends AbstractMap<k,V> 
                implements Map<K,V>, Cloneable, Serializable
 { 
    ....various CONSTANTS for default initial capacity(16), 
                                    loadfactor(0.75f) etc. 
   transient Entry[] table; //A resizeable table whose length 
                           //MUST Always be a power of two.
   final float loadFactor;//The load factor for the hash table.
   int threshold; // The next size value at which
                    // to resize (capacity * load factor).
   public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    threshold = (int)(DEFAULT_INITIAL_CAPACITY 
                           * DEFAULT_LOAD_FACTOR);
    table = new Entry[DEFAULT_INITIAL_CAPACITY];
    init();
   }
   ... other supporting methods of HashMap 
  }
Now we have got the idea of where key/value stored in the form Entry object.It's time to investigate Entry class. Entry class is an static class maintained inside HashMap and it has four fields: key, value, next and hash.
key/value - it is user data which is being stored in HashMap.
hash - it is an integer value(dispersed/modified hash code calculated by hash(key.hashCode()))
next - it is a reference of type Entry, it comes into picture when HashMap degenerate to Linkedlist at particular index of hash table and will store entry object at that location.(We will revisit it later with diagram).Here is the sample code for Entry class:
static class Entry<K,V> implements Map.Entry<K,V> {
    final K key;
    V value;
    Entry<K,V> next;
    final int hash;
    //constructor for creating new Entry object
    Entry(int h, K k, V v, Entry<K,V> n) { 
       value = v;
       next = n;
       key = k;
       hash = h;
    } 
    .....many more supporting methods  
 }
As of now we have fair idea of how does Entry object looks like and where Entry object is stored.Now question arises How Entry objects is stored and how do we retrieve it? - put() and get() method of HashMap. We will discuss the put() method followed by get().
Say loudly!! HashMap so as HashTable works on hashing mechanism. It means,for storing Entry object we first find hashCode from key and then actual table index(bucket number) where Entry object is stored. It is three step process :
Step1: Calculate hashCode method hashCode(key) -> hashCode
Step2: Disperse the bit position of hashCode to avoid collison using hash(hashCode) -> modified hash value
Step3: Finally find actual index of table where Entry object is stored by passing this dispersed hash value to indexFor(hash, tableLength).
Now we have found the index(bucket) location of table, so there are two possibilities : Either that location is empty or some element is already there(Please note for two distinct key, hashCode and table index(bucket) can be same). If some Entry is already there then this state is called - collision has occurred in HashMap. Before storing/creating Entry object at that index position, HashMap check that some Entry object with same hash and key value is there or not.If some Entry is found, it simply returns value in that Entry object associated with given key and new Entry object is not created, otherwise new Entry object is created and null is returned.
What will happen when bucket number is same but key value is different ? In this case, HashMap maintains a linked list at that bucket location and next of Entry will come into picture. An Entry object will be created and appended with next of the previously added object(See in the diagram below).In such scenerio order of put operation degerate from O(1) to O(n).Below the sample code of put() operation dealing with index(bucket) location generation stated above and duplicate entry check in HashMap:
public V put(K key, V value) {
  if (key == null)
    return putForNullKey(value);
  int hash = hash(key.hashCode());// Step 1 and Step 2 
  int i = indexFor(hash, table.length);Step 3
  //loop for Checking Entry with same key is there or not
  for (Entry<K,V> e = table[i]; e != null; e = e.next) { 
   Object k;
   if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
   {
     V oldValue = e.value;
     e.value = value;
     e.recordAccess(this);
     return oldValue;
   }
  }
  ...... some other stuff
  addEntry(hash, key, value, i); //Create Entry Object
  return null;// return null, if new key and value is added.
 }
One important take away from above discussion is HashMap returns null if Entry object is created with key/value, else it returns the value associated with the key(already present). Java puzzles related to this concept find here(Question 3 and Question 4). Consider the following diagram for pictorial representation of our understanding :
Entry object 1 and 2 are originating from same bucket location since their hash value (H1) is same so they are forming linkedlist and Next of Entry 1 point to Entry 2 object. If only one element is present at bucket location then Next is pointing to null.
Before moving to get() operation,I would like to add one cent to the understanding of put() operation.As we know null is allowed as key in hashmap with null or non-null value. How does HashMap deals with null key? : HashMap has a offloaded version of put() method putForNullKey(value) to deal with null key. When put method finds that key value is null it will simply call this method.What putForNullKey(value) does? : It is important to note that null keys always map to hash 0, thus index 0. In other words, if we make an entry of null key and some value object, it will be always be stored at index 0 of hash table.As stated above in put() operation, before creating new Entry at bucket 0 , putForNullKey() check whether already some entry exist at location index 0 or not. If it finds any entry it returns the value associated  with null key, otherwise it will create new Entry object at that location and returns null. Please note only one null key is allowed in HashMap.Sample code of putForNullKey(value) can be found here.
We understood how Entry object is stored in HashTable using hashing mechanism.Next question come out, How does values are retrieved from HashMap using get(key) method ?
When any get(key) request comes to HashMap, it calculates hashCode and finds bucket location(similarly as we discussed above three step process). Once bucket location is known, it will iterate the Linkedlist at that location : if key and dispersed/modified hash value matches then it will return associated value of the key.Otherwise,it returns null(It means key was not found in HashMap).Sample code for get(key) method:
public V get(Object key) {
   if (key == null)
     return getForNullKey();
   //Find dispersed/modified hashCode that
   //is stored in Entry object
    int hash = hash(key.hashCode());
   //Find bucket no and iterare over linkedlist
   for (Entry<K,V> e = table[indexFor(hash, table.length)]; 
         e != null;
         e = e.next) {
    Object k;
    if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
      return e.value; //Return value associated with key
    }
  return null; //Return null, if key is not found in map
 }
What will happen when get(null) is called? Here Offloaded version of get() will be used to look up null keys. Null keys map to index 0.It will iterate the linkedlist at bucket location 0. If null is found value associated is returned else null is returned(indicating no null ket is found here).Sample code of getForNullKey() can be found here.
Before bidning up get(key) operation, we need to discuss one of the important interview question: What is role of equals() and hashCode() in working of HashMap?
As we can see in get(key) sample code above, key.hashCode() is used to find hashcode and then find bucket location.Similarly, equals() method is being used to check the requeted key and key in the linkedlist is same or not. In simple word, hashCode() and equals() method are used to retrieve correct entry object with same hashcode and key as requested by get(key) method.
This is all about internal class structure of HashMap and its internal working.
Here we have discussed some important concepts and questions related to HashMap.
=====================End of article======================

Happy Learning
Nikhil
Location: Hyderabad, Telangana, India