Filtering Document Contents using Elasticsearch Query DSL - Query term and Source filtering

In previous post Search using query param and Search using request body we executed various search query via query param and passing query param via URL request body and output obtained was verbose for almost all request. Filter can be applied to retrieve only those fields that we are interested in. Filtering can achieved in two ways - Using "query term" and "source filtering".

Prerequisite:-
Setup customer indices with sample documents. Refer prerequisite section of this post. For more details refer this.

Query terms to filter relevant documents of interest

Retrieve all documents in customer index where name is king. If we execute as query parameter we get three records as shown below. However we can filter to retrieve relevant documents using "term"
Using query param: Expected 2 documents are hidden and irrelevant one is shown (king in address)

Using query term:- Here term should have exact match in inverted index.
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
    "query" : {
        "term" : { "name" : "king" }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 4.9511213,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "rn_g6mABB3_D7Pc85hJw",
        "_score" : 4.9511213,
        "_source" : {
          "name" : "King Bond",
          "age" : 75,
          "gender" : "male",
          "email" : "kingbond@comvex.com",
          "phone" : "+1 (881) 588-3032",
          "street" : "643 Vandervoort Avenue",
          "city" : "Woodlake",
          "state" : "Northern Mariana Islands, 351"
        }
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "lH_g6mABB3_D7Pc85hNx",
        "_score" : 4.9416423,
        "_source" : {
          "name" : "Atkinson King",
          "age" : 34,
          "gender" : "male",
          "email" : "atkinsonking@comvex.com",
          "phone" : "+1 (808) 415-3430",
          "street" : "601 Kent Avenue",
          "city" : "Robbins",
          "state" : "Georgia, 6021"
        }
      }
    ]
  }
}

Source filtering to retrieve relevant fields of interest

Retrieve all docs without its source details, just displays Id : Here <"_source" : false> in request body does all trick and just display document with Id, verbose payload is filtered and left out, it saves network bandwidth and response will be faster.
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
    "query" : {
        "term" : { "name" : "king" }
    },"_source" : false
}
' -H 'Content-Type: application/json'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 4.9511213,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "rn_g6mABB3_D7Pc85hJw",
        "_score" : 4.9511213
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "lH_g6mABB3_D7Pc85hNx",
        "_score" : 4.9416423
      }
    ]
  }
}

Display/retrieve all docs using regular expression in source filed:  Display all documents with source details where fields start with "st" (st*).
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
    "_source" : "st*",
    "size": 2,
    "query" : {
        "term" : { "state" : "hawaii" }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 17,
    "max_score" : 4.674879,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "Dn_g6mABB3_D7Pc85hRx",
        "_score" : 4.674879,
        "_source" : {
          "street" : "454 Meadow Street",
          "state" : "Hawaii, 9588"
        }
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "MH_g6mABB3_D7Pc85hVy",
        "_score" : 4.674879,
        "_source" : {
          "street" : "226 Anchorage Place",
          "state" : "Hawaii, 4486"
        }
      }
    ]
  }
}
Note:-
- maximum 17 documents found but 2 displayed as requested.
- st* indicates that retrieve all fields that start with "st" thats why we are seeing street and state in source section.
- Source filtering does not affect the relevance of documents (no change in document relvance ranking)

Display all documents with source info name and email: Below shows only name and email in source section.
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
    "_source" : ["nam*", "*ema*"],
    "size": 1,
    "query" : {
        "term" : { "state" : "washington" }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 17,
    "max_score" : 5.2268524,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "On_g6mABB3_D7Pc85hVy",
        "_score" : 5.2268524,
        "_source" : {
          "name" : "Good Hatfield",
          "email" : "goodhatfield@comvex.com"
        }
      }
    ]
  }
}

Source filtering with include/exclude of source fields: List all customers from state hawaii excluding explicitly their age and gender .
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
     "_source": {
        "includes": ["na*", "em*"],
        "excludes": [ "*der","ag*" ]
     },
     "size":1,
    "query" : {
        "term" : { "state" : "hawaii" }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 17,
    "max_score" : 4.674879,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "Dn_g6mABB3_D7Pc85hRx",
        "_score" : 4.674879,
        "_source" : {
          "name" : "Hayden Kline",
          "email" : "haydenkline@comvex.com"
        }
      }
    ]
  }
}

Above query does not give explicit use of exclude. Display details of customer without age and gender.
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
     "_source": {
        "excludes": [ "*der","ag*" ]
     },
     "size":1,
    "query" : {
        "term" : { "state" : "hawaii" }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 17,
    "max_score" : 4.674879,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "Dn_g6mABB3_D7Pc85hRx",
        "_score" : 4.674879,
        "_source" : {
          "phone" : "+1 (860) 410-3540",
          "city" : "Fairview",
          "street" : "454 Meadow Street",
          "name" : "Hayden Kline",
          "state" : "Hawaii, 9588",
          "email" : "haydenkline@comvex.com"
        }
      }
    ]
  }
}

===*******========

7 Comments

  1. Nice post, I like to read this blog. It is very interesting to read.
    fibonacci in python
    python class inheritance

    ReplyDelete
  2. Some PDF converter instruments empower a client to try and scramble the documents with the passwords for security of the substance. PDF to PNG

    ReplyDelete
Previous Post Next Post