Jan 18, 2018

Textual description of firstImageUrl

TensorFlow and TensorBoard : Deep Learning terminology and tools

A machine learning algorithm is an algorithm that is able to learn from data and apply that learning to make instructive decision. Most commonly known ML problems are : Classification, Regression, Clustering. In traditional ML based problems input to the system(feature vectors) are decided by experts(knowledgeable entity) and it requires sound judgement.

Consider a system which figure out by themselves what features to pay attention and mark for feature vectors is termed as “Representation” ML-based systems. Deep learning systems are one type of representation systems. In other words, Deep Learning Algorithms learn what features(characteristics/attributes) requires more attention and each features are represented by data points (Vector). Neural Networks is most common and de facto class of deep learning algorithms.  Neural networks help find unknown patterns in massive data sets. "Neurons" are most basic building blocks that actually “learn”. Below image shows high level overview where does deep learning algorithm fits.
Deep Learning(Layers of inter connected neural networks) based Binary Classifier(Source: PluralSight)
What is relation between Machine learning and Deep learning ?
Deep learning is subfield/subset of Machine learning. Machine learning blossoms Artificial intelligence with bang. However, Deep learning giving birth of limitless growth of AI. The very way to visualise how Deep learning influenced AI development in big way - Google Deep learning program ApphaGo beat GO Master Lee Sedol.

What is TensorFlow and TensorBoard -  Deep learning de facto library and dashboard
Official doc of TensorFlow states it is an open source software library for numerical computation using data flow graphs. Everything in TensoFlow is a graph - nodes of graph represents operations and data flowing through graph and transformed along the way is termed as "Tensors".

Tensors : Central Unit of data in TensorFlow. A tensor consists of a set of primitive values shaped into an array of any number of dimensions. i.e : [1,2,3] , [[1,2],[2,3]], etc.  Scalers(2, 3.5, etc) are 0-D tensor, vector [2,4,6] are 1-D tensors, matrix are 3-D tensors.
Rank, shape and data types are 3 important characteristics which define a Tensor. The number of dimensions in a tensor is Rank of tensor. and the number of elements in each dimension Shape of tensor.
Tensors Rank
4 0
[1,2,3] 1
[[1,3],[4,6]] 2
[[[1],[2],[4],[5]]] 4

Tensors Shape
4 []
[1,2,3] [3]
[[1,4,3],[6,8,9]] [3,2]
[[[1],[2],[4],[5]]] [2,2,1]
What makes TensorFlow so special and why it's on the way to become default library for machine learning ?
  • TensorFlow provides a foundation for development of new ML algorithms and it supports large scale distributed models from training to production. 
  • It provides stable, efficient and performant python API. 
  • TensorFlow goes hand in hand with TensorBoard which gives graphical view(in the form of dashboard) of graph/activity with minute details for analysis.
TensorBord is a suite of visualisation tool which we can use to visualise your TensorFlow graph, plot quantitative metrics about the execution of your graph, and show additional data like images that pass through it.

Refer TensorFlow official guide to Install and setup TensorFlow environment. Once tensorFlow is setup prompt is changed and open python prompt, execute some commands(follow below lines of commands) to test installation.
~ source /Users/n0r0082/TensorFlowWork/bin/activate
(TensorFlowWork) ➜  ~ python
Python 2.7.10 (default, Jul 15 2017, 17:16:57) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow')
>>> sess = tf.Session()
2018-01-16 23:26:11.178955: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
>>> print(sess.run(hello))
Hello, TensorFlow

In order to explore usefulness of TensorBoard, we will perform some mathematical operations(square ,power, square root and addition) using python tensorFlow API's and construct graphs out of these operation & visualise the same in tensorBoard.
import tensorflow as tf

#create constants
a = tf.constant(6.5,name='constant_a')
b = tf.constant(3.4,name='constant_b')
c = tf.constant(3.0,name='constant_c')
d = tf.constant(100.2,name='constant_d')

#compute square, power and Sqrt
square = tf.square(a,name="square_a")
power = tf.pow(b,c,name="power_b_c")
sqrt =tf.sqrt(d,name="sqrt_d")

#Add all above computed results 
final_sum = tf.add_n([square,power,sqrt], "total_sum")

#create session 
sess = tf.Session()

#Execute above operations using session object
print "square is ", sess.run(square)
print "power is ", sess.run(power)
print "sqrt is ", sess.run(sqrt)
print "total sum is ", sess.run(final_sum)


# Create writer object for tensorBoard and save operations as grapgh at ./m2_eample
writer = tf.summary.FileWriter('./tensorBoardDir',sess.graph)
#close resources used. 
writer.close()
sess.close()

Execute above sample and find results
(TensorFlowWork) ➜  samples python sample-2.py
2018-01-17 15:08:16.781469: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
square is  42.25
power is  39.304005
sqrt is  10.0099945
total sum is  91.563995

Using following command we can start tensorBoard which launches dashboard On some URL.
(TensorFlowWork) ➜  ~ tensorboard --logdir="tensorBoardDir"
TensorBoard 0.4.0rc3 at http://m-C02T75XBG8WN:6006 (Press CTRL+C to quit)

Launch dashboard in browser at "http://0.0.0.0:6006". Below screenshot shows each operation as node and data flow from one operation to another as Tensor.

Each node represents operation and output of these three operations are tensors- feed into another operations final_sum.

Image recognition and processing with TensorFlow: Image is collection of pixel and each pixel contains some value. Coloured image is represented in the form of RGB values(3 channels) for each pixel and greyscale image each pixel stores value of intensity between 0.0 - 1.0 (1 channel). Here we will read an image with TensorFlow library and store image as tensors followed by rotate image - find transpose of image.
import tensorflow as tf
import matplotlib.image as mp_img
import matplotlib.pyplot as plot
import os

#input file 
filename = "./input.png"

#get image object from input file 
image = mp_img.imread(filename)

print "Image shape is: ", image.shape
print "Image array is: ", image

#plot.imshow(image)
#plot.show()

#create a variablefor holding image data 
x = tf.Variable(image, name='x')

#inititlize all global variables
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    #compute transpose of variable x (image matrix data)
    transpose = tf.image.transpose_image(x)
    #Run transpose data.
    result = sess.run(transpose)
    # Display shape of image - first two values shoudld be swapped. 
    print "Transposed image shape: ", result.shape
    plot.imshow(result)
    plot.show()

Sample output: -
(TensorFlowWork) ➜  samples python sample-5.py
Image shape is:  (816, 866, 4)
Image array is:  [[[0.30980393 0.10980392 0.27058825 1.        ]
  [0.30588236 0.10196079 0.27058825 1.        ]
  [0.31764707 0.10980392 0.28235295 1.        ]
  ...
  [0.18431373 0.05490196 0.17254902 1.        ]
  [0.18039216 0.04705882 0.16470589 1.        ]
  [0.16862746 0.03921569 0.15686275 1.        ]]

 [[0.29803923 0.10196079 0.2627451  1.        ]
  [0.30588236 0.10588235 0.27058825 1.        ]
  [0.32156864 0.11372549 0.28627452 1.        ]
  ...
  [0.18039216 0.05098039 0.16862746 1.        ]
  [0.17254902 0.03921569 0.15686275 1.        ]
  [0.1882353  0.05882353 0.1764706  1.        ]]
  ...
  [0.07843138 0.02745098 0.09411765 1.        ]
  [0.07843138 0.02745098 0.09411765 1.        ]
  [0.07843138 0.02745098 0.09803922 1.        ]]]

Transposed image shape:  (866, 816, 4)

Image orientation comparison :- First image is original image(un-comment line #15 and 16 to generate it) and second image is transposed image.

Original image with shape (816,866,4) and channel value is 4.

Original image displayed via tensorFlow

Transposed image displayed via tensorFlow
Transposed Image (Rotated by 90*) with shape (866,816,4) and channel value is 4. If we assume original axis index are 0,1,2 on transpose first two axis are swapped.

Note:- Using TensorFlow library we can re-size images, flip images too.

===== ****** =====


Jan 13, 2018

Textual description of firstImageUrl

Elasticsearch Tutorial : hands-on guide to learn elasticsearch

Metric and Bucketing aggregation: aggregation queries in elastic search

In previous posts Boolean compound queriesFull Text QueryQuery term and Source filtering, Elasticsearch in filter context we performed search query and filter context and retrieved documents and specific fields of our interest. For analytics purpose we want result oriented response instead of verbose document. The man agenda of this post is to use Elasticsearch aggregation to execute analytical queries and get appropriate concise response.

Prerequisite :-
1. Elasticsearch instance is up and running.
2. Prepare dataset for analytical queries.
  • Open https://www.json-generator.com/ in browser.
  • Copy and paste below sample template to generate compact form of JSON & save it in file "employee.json"
  • [
      '{{repeat(1000, 1000)}}',
      {
        name: '{{firstName()}} {{surname()}}',
        age: '{{integer(18, 75)}}',
        salary:'{{integer(10000,80000)}}',
        gender: '{{gender()}}',
        email: '{{email()}}',
        phone: '+1 {{phone()}}',
        street: '{{integer(100, 999)}} {{street()}}',
        city: '{{city()}}',
        state: '{{state()}}, {{integer(100, 10000)}}'
      }
    ]
3. Clean and format JSON file : (Using sublime text regex model)
  --> Remove [ and ] from file.
  --> Using regex replace method replace "},{" with "}\n{"
  --> Append index info before each payload:  Find-Replace {"name" with {"index" : {}}\n{"name"
If you have processed file then Download cleaned and processed file.
4. Update json fie with new line at end of file.
5. Using "_bulk" API update all document with indices employee.
curl -H "Content-Type: application/x-ndjson" -XPOST 'localhost:9200/employees/personal/_bulk?pretty&refresh' --data-binary @"employees.json"
6. Validate 1000 documents are in place in index employees with type name personal
➜  Desktop curl -XGET 'localhost:9200/_cat/indices?v&pretty'                                                                                           
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   employee  LU8xvoyMRwi-0o5K2JCyMg   5   1        100            0     86.1kb         86.1kb
yellow open   employees 3MQomR4CSRCLYYzywkZ9vg   5   1       1000            0     94.1kb         94.1kb
yellow open   customers KSbOq8eySwGgvJdH7VfQWQ   5   1       1000            0    485.1kb        485.1kb
yellow open   products  Lf8I7-H1QPeU6DrDyHWb1A   5   1          8            0     17.5kb         17.5kb

Metrics aggregations

Find average salary of all employees:  Below query is an example of aggregation query. It finds average salary of all employee. Here "aggs" indicates that query is of type aggregation, "avg_age" is filed name where result is assigned and "avg" is aggregate functions to compute average. Finally, field name indicates that - average is computed on which field. Since aggregation query is not interested in document so size= 0.
➜  Desktop curl -XPOST 'localhost:9200/employees/_search?&pretty' -d'
{
   "size" : 0,
    "aggs" : {
        "avg_age" : {
             "avg" : {
                 "field" : "salary"
             }
         }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 17,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "avg_age" : {
      "value" : 44966.073
    }
  }
}

Aggregate query(metric aggregation)with search query
: Below query finds average salary for all employee - either she is female employee or  whose state value contains "Mississippi".
➜  Desktop curl -XPOST 'localhost:9200/employees/_search?&pretty' -d'
{
    "size" : 0,
     "query" : {
          "bool" : {
                "should": [
                    { "match": { "state": "Mississippi" } },
                    { "match": { "gender": "female" } }
                  ]
           }
     },
    "aggs" : {
        "avg_age" : {
             "avg" : {
                 "field" : "salary"
             }
         }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 483,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "avg_age" : {
      "value" : 44222.351966873706
    }
  }
}

Elasticsearch stats aggregation
-  Wide range of statistics in one query. Below query uses "stats" aggregation to find various statistics on salary fields. it gives min, max, average and sum of salary in one go.
➜  Desktop curl -XPOST 'localhost:9200/employees/_search?&pretty' -d'
{
    "size" : 0,
    "aggs" : {
        "age_stats" : {
             "stats" : {
                 "field" : "salary"
             }
         }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_stats" : {
      "count" : 1000,
      "min" : 10026.0,
      "max" : 79968.0,
      "avg" : 44966.073,
      "sum" : 4.4966073E7
    }
  }
}

Number of Unique values of given filed (cardinality of field) : Below query find number of unique age value in entire document. "cardinality" aggregation gives number of unique values for  age field.
➜  Desktop curl -XPOST 'localhost:9200/employees/_search?&pretty' -d'
{
   "size" : 0,
    "aggs" : {
        "age_count" : {
             "cardinality" : {
                 "field" : "age"
             }
         }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 18,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_count" : {
      "value" : 58
    }
  }
}
Note:- Since Fielddata is disabled on text fields by default, if we replace field name as "gender", elastic search throws exception - "Fielddata is disabled on text fields by default. Set fielddata=true on [gender] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead."

What is "fielddata" in context of Elasticsearch(ELS) ?
In ELS Text field values are stored in an in-memory data structure called fielddata. fielddata is built on demand when a field is used for aggregations, sorting etc. Since fielddata on text fields take up lots of heap space so fielddata is disabled by default on text fields.

How to enable fielddata on textfield in Elasticsearch ?

Using "_mapping" API of elasticsearch we can enable fielddata on textfield. Below sample query enable fielddata on gender.
➜  Desktop curl -XPUT 'localhost:9200/employees/_mapping/personal?pretty' -d'
{
  "properties": {
    "gender": {
      "type":     "text",
      "fielddata": true
    }
  }
}
' -H 'Content-Type: application/json'
{
  "acknowledged" : true
}

Once fielddata is enabled on gender we can execute query associated with cardinality aggregation with textfield . Below query display cardinality of gender as 2.
➜  Desktop curl -XPOST 'localhost:9200/employees/_search?&pretty' -d'
{
   "size" : 0,
    "aggs" : {
        "age_count" : {
             "cardinality" : {
                 "field" : "gender"
             }
         }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 23,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_count" : {
      "value" : 2
    }
  }
}

Bucketing aggregations

In elastic search each document is indexed and associated with some type name(logical grouping). For analytics purpose, we can logically group these indexed documents into buckets and each bucket satisfies some criterion, it is termed as Bucketing aggregation. Bucketing aggregation is perfomed using "_search" API.

Bucket aggregations by field values : Below query execute term aggregation query and divide all documents in index employee into two category. All 1000 documents is split into two blocks (524 and 476) documents.
➜  Desktop curl -XPOST 'localhost:9200/employees/_search?&pretty' -d'
{
   "size" : 0,
   "aggs" : {
        "gender_bucket" : {
             "terms" : {
                 "field" : "gender"
             }
         }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 16,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "gender_bucket" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "male",
          "doc_count" : 524
        },
        {
          "key" : "female",
          "doc_count" : 476
        }
      ]
    }
  }
}

Range aggregation with key/value response:  With <"keyed" : true>  we indicate that response of this query will gives in key/value pair. Below query displays response as key/value pair.
➜  Desktop curl -XPOST 'localhost:9200/employees/_search?&pretty' -d'
{
   "size" : 0,
   "aggs" : {
       "age_ranges" : {
           "range" : {
               "field" : "age",
               "keyed" : true,
               "ranges" : [
                   { "to" : 30 },
                   { "from" : 30, "to" : 40 },
                   { "from" : 40, "to" : 55 },
                   { "from" : 55 }
                ]
            }
        }
     }
}
' -H 'Content-Type: application/json'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_ranges" : {
      "buckets" : {
        "*-30.0" : {
          "to" : 30.0,
          "doc_count" : 216
        },
        "30.0-40.0" : {
          "from" : 30.0,
          "to" : 40.0,
          "doc_count" : 174
        },
        "40.0-55.0" : {
          "from" : 40.0,
          "to" : 55.0,
          "doc_count" : 248
        },
        "55.0-*" : {
          "from" : 55.0,
          "doc_count" : 362
        }
      }
    }
  }
} 
Range aggregation with custom key value : Here we are passing key values in as input and same is displayed as key in output like "young", "quarter-aged", etc. instead of  "30, "to" : 40" as displayed above.
➜  Desktop curl -XPOST 'localhost:9200/employees/_search?&pretty' -d'
{
   "size" : 0,
   "aggs" : {
       "age_ranges" : {
           "range" : {
               "field" : "age",
               "keyed" : true,
               "ranges" : [
                   { "key": "young", "to" : 35 },
                   { "key": "quarter-aged", "from" : 35, "to" : 45 },
                   { "key": "middle-aged", "from" : 45, "to" : 65 },
                   { "key": "senior", "from" : 55 }
                ]
            }
        }
     }
}
' -H 'Content-Type: application/json'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "age_ranges" : {
      "buckets" : {
        "young" : {
          "to" : 35.0,
          "doc_count" : 302
        },
        "quarter-aged" : {
          "from" : 35.0,
          "to" : 45.0,
          "doc_count" : 169
        },
        "middle-aged" : {
          "from" : 45.0,
          "to" : 65.0,
          "doc_count" : 324
        },
        "senior" : {
          "from" : 55.0,
          "doc_count" : 362
        }
      }
    }
  }
}

Nested aggregation - Multi-level aggregation 

Metric aggregation inside Bucketing aggregation (Two level nesting) : Find average age of each gender. Below query first perform bucketing aggregation and then average age is computed. Outer "aggs" keyword specifies a gender bucket and inner "aggs" does average computation.
➜  Desktop curl -XPOST 'localhost:9200/employees/_search?&pretty' -d'
{
   "size" : 0,
   "aggs" : {
        "gender_bucket" : {
             "terms" : {
                 "field" : "gender"
             },
             "aggs": {
                 "average_age": {
                      "avg": {
                          "field": "age"
                      }
                 }
              }
         }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "gender_bucket" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "male",
          "doc_count" : 524,
          "average_age" : {
            "value" : 47.333969465648856
          }
        },
        {
          "key" : "female",
          "doc_count" : 476,
          "average_age" : {
            "value" : 45.71848739495798
          }
        }
      ]
    }
  }
}

Find average age of male and female within range of age (3 Level nesting):  Outermost "aggs" specifies bucketing aggregation which divides documents into two bucket (male and Female). Second "aggs" split buckets into range of age and finally inner "aggs" find average age of each rage of age.
➜  Desktop curl -XPOST 'localhost:9200/employees/_search?&pretty' -d'
{
   "size" : 0,
   "aggs" : {
        "gender_bucket" : {
             "terms" : {
                 "field" : "gender"
             },
             "aggs" : {
                 "age_ranges" : {
                     "range" : {
                         "field" : "age",
                         "keyed" : true,
                         "ranges" : [
                             { "key": "young", "to" : 35 },
                             { "key": "middle-aged", "from" : 35, "to" : 50 },
                             { "key": "senior", "from" : 55 }
                          ]
                      },
                      "aggs": {
                          "average_age": {
                               "avg": {
                                   "field": "age"
                               }
                          }
                       }
                  }
               }
         }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "gender_bucket" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "male",
          "doc_count" : 524,
          "age_ranges" : {
            "buckets" : {
              "young" : {
                "to" : 35.0,
                "doc_count" : 145,
                "average_age" : {
                  "value" : 26.048275862068966
                }
              },
              "middle-aged" : {
                "from" : 35.0,
                "to" : 50.0,
                "doc_count" : 135,
                "average_age" : {
                  "value" : 42.19259259259259
                }
              },
              "senior" : {
                "from" : 55.0,
                "doc_count" : 192,
                "average_age" : {
                  "value" : 65.734375
                }
              }
            }
          }
        },
        {
          "key" : "female",
          "doc_count" : 476,
          "age_ranges" : {
            "buckets" : {
              "young" : {
                "to" : 35.0,
                "doc_count" : 157,
                "average_age" : {
                  "value" : 25.840764331210192
                }
              },
              "middle-aged" : {
                "from" : 35.0,
                "to" : 50.0,
                "doc_count" : 112,
                "average_age" : {
                  "value" : 41.517857142857146
                }
              },
              "senior" : {
                "from" : 55.0,
                "doc_count" : 170,
                "average_age" : {
                  "value" : 65.5
                }
              }
            }
          }
        }
      ]
    }
  }
}

Filter aggregation  

Find average salary of all employees who belongs to minnesota. It to also an nested aggregation where first filtering is applied followed by aggregation is applied to find average salary.
➜  Desktop curl -XPOST 'localhost:9200/employees/_search?size=0&pretty' -d'
{
    "aggs" : {
        "state" : {
            "filter" : { "term": { "state": "minnesota" } },
            "aggs" : {
                "avg_age" : { "avg" : { "field" : "salary" } }
            }
        }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1000,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "state" : {
      "doc_count" : 17,
      "avg_age" : {
        "value" : 43855.35294117647
      }
    }
  }
}

====******======

Jan 12, 2018

Textual description of firstImageUrl

Elasticsearch in filter context and query context - How to retrieve documents using filter

In previous posts Boolean compound queriesFull Text QueryQuery term and Source filtering,  we performed search operation in query context and retrieved documents using "query" keyword. Elasticsearch also provides search capability in "filter" context.

Filter context search operation

When query is run in filter context there is no relevance of document score. Each document in response has score of 1.0 and filter context queries responds with Yes(given document included) or No (given document excluded) from response.

Filter with boolean compound queries: Below query filter documents based on age and only allow document in response with age range <30 to 35>.
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "age": {
            "gte": 30,
            "lte": 35
          }
        }
      }
    }
  },
  "size": 2,
  "_source" :["nam*", "age"]
}
' -H 'Content-Type: application/json'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 97,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "7n_g6mABB3_D7Pc85hFw",
        "_score" : 1.0,
        "_source" : {
          "name" : "Green Boyer",
          "age" : 32
        }
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "G3_g6mABB3_D7Pc85hJw",
        "_score" : 1.0,
        "_source" : {
          "name" : "Randall Sutton",
          "age" : 31
        }
      }
    ]
  }
}

Using filters along with search terms  : 
Below query uses filter within query context and filters documents with gender= female and age greater than 45. Response contains all records with gender female and age > 45.

➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
  "query": {
    "bool": {
      "must":
        { "match": {
             "state":   "mississippi"
        }
      },
      "filter": [
        { "term":  { "gender": "female" }},
        { "range": { "age": { "gte": "45" }}}
      ]
    }
  },
  "size": 2,
  "_source" :["nam*", "age", "gend*"]
}
' -H 'Content-Type: application/json'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 6,
    "max_score" : 4.326261,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "P3_g6mABB3_D7Pc85hJw",
        "_score" : 4.326261,
        "_source" : {
          "gender" : "female",
          "name" : "Regina Frederick",
          "age" : 49
        }
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "J3_g6mABB3_D7Pc85hNx",
        "_score" : 4.326261,
        "_source" : {
          "gender" : "female",
          "name" : "Alfreda Kent",
          "age" : 75
        }
      }
    ]
  }
}
 =======*****======