Jan 12, 2018

Textual description of firstImageUrl

Full Text Query in Elasticsearch : match, match_phrase, match_phrase_prefix

In previous posts Search using query param and request body and Query term and Source filtering  received fair understanding of how to query and filter documents fields to retrieve relevant fields of interest. In this post we will go thorough advanced searching techniques using match, match_phrase and match_phrase_prefix construct provided by Elasticsearch.

Match keyword

"match" keyword is used with query and it hints search request to look for given value of the fields. It is not exact term match (as discussed in Query term and Source filtering). match keyword is used along with OR/AND logical operators.

Display/retrieve all documents with name (first or last) contains <X> : Below search query finds all documents with name = "gibson".
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
    "query": {
        "match" : {
            "name" : "gibson"
        }
    }
}
' -H 'Content-Type: application/json'
{
  "took" : 21,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 4.9511213,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "xn_g6mABB3_D7Pc85hFv",
        "_score" : 4.9511213,
        "_source" : {
          "name" : "Sue Gibson",
          "age" : 60,
          "gender" : "female",
          "email" : "suegibson@comvex.com",
          "phone" : "+1 (919) 450-2888",
          "street" : "166 Newel Street",
          "city" : "Jacksonwald",
          "state" : "Northern Mariana Islands, 9865"
        }
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "2X_g6mABB3_D7Pc85hNx",
        "_score" : 4.804021,
        "_source" : {
          "name" : "Gibson Velasquez",
          "age" : 57,
          "gender" : "male",
          "email" : "gibsonvelasquez@comvex.com",
          "phone" : "+1 (906) 436-3683",
          "street" : "362 Beverly Road",
          "city" : "Dalton",
          "state" : "Texas, 8682"
        }
      }
    ]
  }
}
Note:- match keyword suggest request query to retrieve all docs with name as "gibson". Elasticsearch brings two documents-  first record with high score has second name as "gibson" and second document has first name "gibson". Generally, Elasticsearch gives more preference to First name(more score) however here due to small size of name its score is more( Sue gibson is more relevant than Gibson valasquez)

Match with OR operator :- Retrieve all documents where name contains either "Tyler"  or "Macdonald". Below query retrieves all documents which has either of these two names. Match keyword hints query that apply OR operator on given name values <tyler macdonald> and retrieve all documents wherever either one of name appears.
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'{
    "query": {
        "match" : {
              "name" : {
                  "query" : "tyler macdonald",
                  "operator" : "or"
               }
        }
    }
}
' -H 'Content-Type: application/json'

{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 4.9416423,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "IX_g6mABB3_D7Pc85hRy",
        "_score" : 4.9416423,
        "_source" : {
          "name" : "Macdonald Perkins",
          "age" : 49,
          "gender" : "male",
          "email" : "macdonaldperkins@comvex.com",
          "phone" : "+1 (863) 559-2182",
          "street" : "687 Bayview Avenue",
          "city" : "Ola",
          "state" : "South Carolina, 2216"
        }
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "pH_g6mABB3_D7Pc85hRy",
        "_score" : 4.9416423,
        "_source" : {
          "name" : "Tyler Flores",
          "age" : 25,
          "gender" : "male",
          "email" : "tylerflores@comvex.com",
          "phone" : "+1 (977) 433-3222",
          "street" : "974 Sedgwick Place",
          "city" : "Vallonia",
          "state" : "Kansas, 1804"
        }
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "eH_g6mABB3_D7Pc85hNx",
        "_score" : 4.804021,
        "_source" : {
          "name" : "Kimberly Tyler",
          "age" : 50,
          "gender" : "female",
          "email" : "kimberlytyler@comvex.com",
          "phone" : "+1 (867) 568-3457",
          "street" : "679 Rugby Road",
          "city" : "Walton",
          "state" : "Alabama, 3785"
        }
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "mX_g6mABB3_D7Pc85hRy",
        "_score" : 4.804021,
        "_source" : {
          "name" : "Dotson Macdonald",
          "age" : 24,
          "gender" : "male",
          "email" : "dotsonmacdonald@comvex.com",
          "phone" : "+1 (874) 525-3190",
          "street" : "525 Boardwalk ",
          "city" : "Sylvanite",
          "state" : "Oklahoma, 2414"
        }
      }
    ]
  }
}
Note:-  By default match keyword uses OR operator , if not specified.

Query with "and" operator : Below query retrieves all document where name contains both "arnold"and "knowles". It retrieves just one document.
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'{
    "query": {
        "match" : {
              "name" : {
                  "query" : "arnold knowles",
                  "operator" : "and"
               }
        }
    },"_source":false
}
' -H 'Content-Type: application/json'

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 9.902243,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "ln_g6mABB3_D7Pc85hRy",
        "_score" : 9.902243
      }
    ]
  }
}
Note:- "_source": false is added in request to shorten response, just display doc with id. We are interested in number of documents retrieved.

Match keyword Default operator OR:  Below screenshot suggest that query apply default Operator as OR and retrieve total 53 documents. It contains all doc where either south or carolina is found.

Match_phrase keyword

Retrieve all documents which matches a given phrase (whole text): Below query retrieves all documents where south carolina is found as whole. It retrieves total 17 documents (less than retrieved above in default OR query).
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
    "query": {
        "match_phrase" : {
            "state" : "south carolina"
        }
    },
    "size":1
}
' -H 'Content-Type: application/json'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 17,
    "max_score" : 6.3648453,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "F3_g6mABB3_D7Pc85hJw",
        "_score" : 6.3648453,
        "_source" : {
          "name" : "Horton Mcclure",
          "age" : 59,
          "gender" : "male",
          "email" : "hortonmcclure@comvex.com",
          "phone" : "+1 (860) 507-2823",
          "street" : "991 Oakland Place",
          "city" : "Northchase",
          "state" : "South Carolina, 8608"
        }
      }
    ]
  }
}

Match_phrase_prefix keyword

Retrieve all documents where state starts with "mi" - Michigan,Minnesota, Mississippi,Missouri etc.
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
    "query": {
        "match_phrase_prefix" : {
            "state" : "mi"
        }
    },
    "size": 3,
    "_source" :"st*"
}
' -H 'Content-Type: application/json'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 86,
    "max_score" : 4.7598696,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "nn_g6mABB3_D7Pc85hNx",
        "_score" : 4.7598696,
        "_source" : {
          "street" : "457 Thomas Street",
          "state" : "Michigan, 2490"
        }
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "NH_g6mABB3_D7Pc85hVy",
        "_score" : 4.7598696,
        "_source" : {
          "street" : "106 Tompkins Place",
          "state" : "Michigan, 3205"
        }
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "P3_g6mABB3_D7Pc85hNx",
        "_score" : 4.6157947,
        "_source" : {
          "street" : "921 Judge Street",
          "state" : "Minnesota, 6838"
        }
      }
    ]
  }
}
Retrieve all documents with prefix as word followed by space :
Below query retrieves all doc with street name start with < Sunnyside > and it displays two documents with street name <Sunnyside Avenue> and <Sunnyside Court>.
➜  Desktop curl -XGET 'localhost:9200/customers/_search?pretty' -d'
{
    "query": {
        "match_phrase_prefix" : {
            "street" : "Sunnyside "
        }
    },
    "_source" :"st*"

}
' -H 'Content-Type: application/json'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 4.9511213,
    "hits" : [
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "G3_g6mABB3_D7Pc85hJw",
        "_score" : 4.9511213,
        "_source" : {
          "street" : "733 Sunnyside Avenue",
          "state" : "Oklahoma, 9311"
        }
      },
      {
        "_index" : "customers",
        "_type" : "vendors",
        "_id" : "CH_g6mABB3_D7Pc85hNx",
        "_score" : 4.800417,
        "_source" : {
          "street" : "864 Sunnyside Court",
          "state" : "New York, 1549"
        }
      }
    ]
  }
}
Note: math_phrase_prefix is widely used in suggestion as it gives relevant documents only with context match.

======  **** ======
Location: Bengaluru, Karnataka, India