Sep 9, 2018

Elasticsearch Query DSL : Query context and Filter context. (Elasticsearch and Kibana Devtool)

Elasticsearch provides a full Query DSL (Domain Specific Language) based on JSON to define queries. Query clauses behave differently depending on whether they are used in query context or filter context.

Query context

A query clause used in query context works on the principle of relevancy score of documents and it answers the question “How well does this document match this query clause?”. It can be viewed as lists all relevant document order by relevancy score. Relevancy score is formalised in terms of _score computed by query clause in query context. "_score" represents how well the document matches, relative to other documents.
Query context is in effect whenever a query clause is passed to a query parameter, such as the query parameter in the search API. Blow query with query context returns all document where course description matches word science.
GET /courses/_search
{
  "query": {
    "match": { 
      "course_description": "science" 
    }
  }
}

Filter context

Filter clause can be viewed as binary tool with outcome 0/1. A query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No .
Filter context is mostly used for filtering structured data, e.g. Range query(date in given range), status check, etc. Frequently used filters will be cached automatically by Elasticsearch, to speed up performance.
Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation. Below query with filter context returns all course document where student enrolled  >=33.
GET /courses/_search
{
  "query": {
    "bool": {
      "filter": {
          "range":  { "students_enrolled": { "gte": 33 }}
        }
      
    }
  }
}
Note: The fundamental difference between Query and Filter context is that - query context is associated with _score(relevancy score) however filter context is associated with binary outcome(either True or False)

Download this document and Index in elastic search for illustrating concept of Query and Filter context. Once all documents are indexed, verify with GET command and it should list 10 documents.

Query context and filter context query samples:

1. Only query context : It gives two document with _score value based on doc relevancy. 
GET /courses/_search
{
  "query": {
    
    "match": { 
      "course_description": "science" 
    }
  }
}
Show response

2. Only query context with place holder for filter: Multiple match clause with bool, here filter parameters is empty. The filter parameter indicates filter context. It returns 4 documents as output with score relevance. Here bool query combines one or moreboolean clauses each clause with a typed occurrence.
GET /courses/_search
{
  "query": { 
    "bool": { 
      "must": [
        { "match": { "professor.facutly_type": "full-time" }},
        { "match": { "professor.department": "finance" }}
      ],
      "filter": [ 
         
      ]
    }
  }
}
Show response

3. Query context with filter:
 Add filter criteria in above query. Range filters remove one document and returns 3 document in response.
GET /courses/_search
{
  "query": { 
    "bool": { 
      "must": [
        { "match": { "professor.facutly_type": "full-time" }},
        { "match": { "professor.department": "finance" }}
      ],
      "filter": [ 
         { "range":  { "students_enrolled": { "gte": 16 }}}
      ]
    }
  }
}
Show response

4. Using must_not clause
: must_not clause reduces response document count to 1.
GET /courses/_search
{
  "query": { 
    "bool": { 
      "must": [
        { "match": { "professor.facutly_type": "full-time" }},
        { "match": { "professor.department": "finance" }}
      ],
      "must_not": [
        { "match": { "course_description": "business" }}
      ], 
      "filter": [ 
         { "range":  { "students_enrolled": { "gte": 16 }}}
      ]
    }
  }
}
Show response
5. Using multi_match :
GET /courses/_search
{
  "query": {
    "multi_match": {
      "query": "computer",
      "fields": ["name","professor.department"]
    }
  }
}
Show response
6. Using multi_phrase: It requires complete and valid phase for search. Partial or broken phrase will result empty documents.
GET /courses/_search
{
  "query": {
    "match_phrase": {
      "course_description": "computer science introduction teaching"
    }
  }
}
Show response
7. Using match_phase_prefix: Partial or broken phase will also work out and returns documents in response.
GET /courses/_search
{
  "query": {
    "match_phrase_prefix": {
      "course_description": "computer science"
    }
  }
}
Show response
8. Using range clause: gte stands for greater than equal and lte stands for less than equal. Other options are gt(greater than), lt (less than equal).
GET /courses/_search
{
  "query": {
    "range": {
      "students_enrolled": {
        "gte": 20,
        "lte": 30
      }
    }
  }
}
Show response
9. Using Should clause : Should clause is good to have when we need most relevant document. If we remove "minimum_should_match" clause then we get multiple documents along with score. But when want to retrieve most relevant doc use "minimum_should_match".
GET /courses/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": "accounting"
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "room": "e7"
          }
        }
      ],
      "should": [
        {
          "range": {
            "students_enrolled": {
              "gte": 10,
              "lte": 20
            }
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}
Show response
Location: Bengaluru, Karnataka, India