Jan 12, 2018

Elastic search Boolean compound queries : must, should and must_not

In previous post we executed full text search query using "match", "match_phrase", "match_phrase_prefix". The main agenda of this post to explore compound boolean query features of Elastisearch.
Boolean query matches documents by combining multiple queries using boolean operators such as OR, AND. In ELS boolean compound query is run using construct - must, should and must_not 
  • must - The clause must appear in matching documents
  • should - The clause may appear in matching documents but may not sometimes
  • must_not - The clause must not appear in the document results
Prerequisite:
1. Elasticseacch should be running at least standalone mode.()
2. Setup customer documents in index customer. Refer this and this post to index customer document in bulk uisng "_bulk" API

Using must :  Retrieve all documents that matches all clauses under match construct. Below query retrieves all doc with street name "bushwick" and "avenue".
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty'  -d'
{
  "query": {
    "bool": {
      "must": [
        { "match": { "street": "bushwick" } },
        { "match": { "street": "avenue" } }
      ]
    }
  }, "_source" :"st*"

}
' -H 'Content-Type: application/json'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 6.169423,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "FX_g6mABB3_D7Pc85hJw",
        "_score" : 6.169423,
        "_source" : {
          "street" : "521 Bushwick Avenue",
          "state" : "Nevada, 3794"
        }
      }
    ]
  }
}

Note: must construct implicitly uses AND operator and retrieves document matching all clauses.

Using should :  Retrieve all documents that matches either clauses under should construct. Below query retrieves all doc with street name "bushwick" or "avenue".
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty'  -d'
{
  "query": {
    "bool": {
      "should": [
        { "match": { "street": "bushwick" } },
        { "match": { "street": "avenue" } }
      ]
    }
  }, "_source" :"st*",
  "size": 2

}
' -H 'Content-Type: application/json'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 216,
    "max_score" : 6.169423,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "FX_g6mABB3_D7Pc85hJw",
        "_score" : 6.169423,
        "_source" : {
          "street" : "521 Bushwick Avenue",
          "state" : "Nevada, 3794"
        }
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "en_g6mABB3_D7Pc85hVz",
        "_score" : 4.9511213,
        "_source" : {
          "street" : "566 Bushwick Court",
          "state" : "New Hampshire, 1810"
        }
      }
    ]
  }
}
Note:-
- should construct uses OR operator and retrieve all document where either one of match clause found.
- Basic difference between must and should is - must(uses AND) and should (uses OR) operator

Using must_not :  It is a filtering construct, it leaves all document that matches match construct Below query retrieves all doc that does not contains any one of street names "mermaid Livonia broome Kaufman". Result can be validated from total returned document(1000-4 = 996)
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty'  -d'
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "street": "mermaid Livonia" } }, 
        { "match": { "street": "broome Kaufman" } }
      ]
    }
  }, "_source" :"st*",
  "size": 2

}
' -H 'Content-Type: application/json'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 996,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "tn_g6mABB3_D7Pc85hFv",
        "_score" : 1.0,
        "_source" : {
          "street" : "159 Lester Court",
          "state" : "New Jersey, 4742"
        }
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "uX_g6mABB3_D7Pc85hFv",
        "_score" : 1.0,
        "_source" : {
          "street" : "717 Ira Court",
          "state" : "Rhode Island, 3493"
        }
      }
    ]
  }
}

Note:- Above query can be written as
curl -XGET 'localhost:9200/customers/_search?pretty'  -d'
{
  "query": {
    "bool": {
      "must_not": [
        { "match": { "street": "mermaid Livonia broome Kaufman" } }
      ]
    }
  }, "_source" :"st*",
  "size": 2
  
}
' -H 'Content-Type: application/json'
Note: street name is <"mermaid Livonia broome Kaufman"> - OR operator is applied and filtered.

==== ****** ======
Location: Bengaluru, Karnataka, India