Elasticsearch Bulk API and Aggregation DSL (Elasticsearch and Kibana Dev tool)

In previous posts(1, 2, 3) we covered how to Index and delete document. Imagine we have huge archived data and need to be brought to elasticsearch, indexing document one by one is not viable and efficient solution. Here Elasticsearch Bulk API comes for rescue, the bulk API makes it possible to perform many index/delete operations in a single API call. This can greatly increase the indexing speed.
In order to demonstrate bulk API download data file. This file has been generated using http://www.json-generator.com/ and this post explains how?

Index documents using Bulk API: Using "_bulk" api(end point) with put command we can index multiple documents efficiently. I have copied one row from file for reference, add all rows and execute below command.

PUT /employees/memmber/_bulk
{"index" : {}}
{"name" :"Hunt Morrow","age":47,"gender":"male","email":"huntmorrow@concility.com","phone":"+91 (996) 574-3591","address":"750 Lott Avenue","city":"Martinsville","state":"Maine, 6952"}

Pagination of response documents : Using from and size we can fetch specific set of documents in GET response.
With size parameter it returns only specified number of documents. In response it will affect hits array.

GET /employees/memmber/_search
{
  "size" : 5,
  "query": {
    "match_all": {}
  }
}

Use from parameter which indicate starting position. "from" is like offset.

GET /employees/memmber/_search
{
  "from" : 0,
  "size" : 5,
  "query": {
    "match_all": {}
  }
}

Sorting response documents : The sort is defined on a per field level, with special field name for _score to sort by score, and _doc to sort by index order.The order option can have the following values: asc (Sort in ascending order) and desc (Sort in descending order)

GET /employees/memmber/_search
{
  "from" : 0,
  "size" : 5,
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ]
}

Note: By default Fielddata is disabled on text fields, Try sort on text field like name in above query we will get illegal_argument_exception with reason "Fielddata is disabled on text fields by default. Set fielddata=true on [name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.". For more details regarding sort clause refer this.

Aggregation DSL(Search aggregation):

Count response documents: Using "_count" endpoint retrieve documents count matching query criteria.

GET /employees/memmber/_count
{
  "query": {
    "match": {
      "gender": "male"
    }
  }
}

Show response

{
  "count": 44,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  }
}

Aggregation based on text field : Aggregation are good with value not filed type. We have to use keyword version of field because gender is of text type. Keyword fields are only searchable by their exact value. From Elasticserch 5, the string field has split into two new types: text, which should be used for full-text search, and keyword, which should be used for keyword search.
Below aggregation command uses gender.keyword indicating - it wants to match exact value of gender (male and Female). In response we have aggregation -> buckets[] with two node male and female.

GET /employees/memmber/_search
{
  "aggs": {
    "genders": {
      "terms": {"field":"gender.keyword"}
    }
  }
}

Show response

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
  ....
  ....
  },
  "aggregations": {
    "genders": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "female",
          "doc_count": 56
        },
        {
          "key": "male",
          "doc_count": 44
        }
      ]
    }
  }
}

Aggregation(Average) based on numeric field: Average age of male and female group computed. In this case age is not text so we do not have to use keyword version of field. In response we have aggregation -> buckets[] we have agv_age for male and female. Aggregation query consist of two parts : "genders" creates buckets and "avg_price" creates metric. From response we can see that buckets array created based on key = "male/female" along with metric data.

GET /employees/memmber/_search
{
  "aggs": {
    "genders": {
      "terms": {
        "field":"gender.keyword"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

Show response

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
  ....
  ....
  },
  "aggregations": {
    "genders": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "female",
          "doc_count": 56,
          "avg_age": {
            "value": 42.5
          }
        },
        {
          "key": "male",
          "doc_count": 44,
          "avg_age": {
            "value": 47.61363636363637
          }
        }
      ]
    }
  }
}

Find min and max(aggregation using "min" and "max") :

GET /employees/memmber/_search
{
  "aggs": {
    "genders": {
      "terms": {
        "field":"gender.keyword"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        },
        "max_price" : {
          "max": {
            "field": "age"
          }
        },
        "min_price" : {
          "min": {
            "field": "age"
          }
        }
      }
    }
  }
}

Show response

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
  ....
  ....
  },
  "aggregations": {
    "genders": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "female",
          "doc_count": 56,
          "avg_age": {
            "value": 42.5
          },
          "max_price": {
            "value": 73
          },
          "min_price": {
            "value": 18
          }
        },
        {
          "key": "male",
          "doc_count": 44,
          "avg_age": {
            "value": 47.61363636363637
          },
          "max_price": {
            "value": 75
          },
          "min_price": {
            "value": 18
          }
        }
      ]
    }
  }
}

Aggregation with query : Aggregation is applied in context of query condition.

GET /employees/memmber/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 30,
        "lte": 60
      }
    }
  }, 
  "aggs": {
    "genders": {
      "terms": {
        "field":"gender.keyword"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        },
        "max_price" : {
          "max": {
            "field": "age"
          }
        },
        "min_price" : {
          "min": {
            "field": "age"
          }
        }
      }
    }
  }
}

Show response

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
  ....
  ....
  },
  "aggregations": {
    "genders": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "female",
          "doc_count": 31,
          "avg_age": {
            "value": 40.483870967741936
          },
          "max_price": {
            "value": 58
          },
          "min_price": {
            "value": 30
          }
        },
        {
          "key": "male",
          "doc_count": 20,
          "avg_age": {
            "value": 46.2
          },
          "max_price": {
            "value": 60
          },
          "min_price": {
            "value": 31
          }
        }
      ]
    }
  }
}

Aggregation without retrieving documents : Using size = 0, we can restrict to return zero document and do aggregation as usual. In response size of hits array is zero.

GET /employees/memmber/_search
{
  "size": 0, 
  "query": {
    "range": {
      "age": {
        "gte": 30,
        "lte": 60
      }
    }
  }, 
  "aggs": {
    "genders": {
      "terms": {
        "field":"gender.keyword"
      },
      "aggs": {
        "avg_age": {
          "avg": {
            "field": "age"
          }
        },
        "max_price" : {
          "max": {
            "field": "age"
          }
        },
        "min_price" : {
          "min": {
            "field": "age"
          }
        }
      }
    }
  }
}

Show response

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 51,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "genders": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "female",
          "doc_count": 31,
          "avg_age": {
            "value": 40.483870967741936
          },
          "max_price": {
            "value": 58
          },
          "min_price": {
            "value": 30
          }
        },
        {
          "key": "male",
          "doc_count": 20,
          "avg_age": {
            "value": 46.2
          },
          "max_price": {
            "value": 60
          },
          "min_price": {
            "value": 31
          }
        }
      ]
    }
  }
}

Aggregation using "stats" : Elasticsearch provides a construct which gives complete statistics like min,max,avg, count,sum,etc using "stats" switch. Aggregation query consist of two parts : "genders" creates buckets and "stas_on_age" creates metric. From response we can see that buckets array created based on key = "male/female" along with metric data min, max, etc.

GET /employees/memmber/_search
{
  "size": 0, 
  "aggs": {
    "genders": {
      "terms": {
        "field":"gender.keyword"
      },
      "aggs": {
        "stats_on_age": {
          "stats": {
            "field": "age"
          }
        }
      }
    }
  }
}

Show response

Aggregation date range query : Download this card_sell_json file and index all doc using following command.

PUT /sell/vehicle/_bulk
{"index" : {}}
{"company":"AUDI","price": 1508.81,"color":"blue","registered":"2016-07-27","model":"V4","condition":"Used","customer":{"name":"Juliet Reeves","email":"julietreeves@bunga.com","phone":"+1 (900) 479-2096","address":"677 Dunne Court, Buxton, Virgin Islands,3005","age":25}}

Get all cars between date range and display document counts.

GET /sell/vehicle/_search
{
  "size": 0, 
  "aggs": {
    "popular_cars": {
      "terms": {
        "field": "model.keyword"
      },
      "aggs": {
        "sold_date_range": {
          "range": {
            "field": "registered",
            "ranges": [
              {
                "from": "2014-04-21",
                "to": "2016-02-01"
              },
              {
                "from": "2017-11-22",
                "to": "2018-09-01"
              }
            ]
          }
        }
      }
    }
  }
}

Show response

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 20,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "popular_cars": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "V5",
          "doc_count": 10,
          "sold_date_range": {
            "buckets": [
              {
                "key": "2014-04-21T00:00:00.000Z-2016-02-01T00:00:00.000Z",
                "from": 1398038400000,
                "from_as_string": "2014-04-21T00:00:00.000Z",
                "to": 1454284800000,
                "to_as_string": "2016-02-01T00:00:00.000Z",
                "doc_count": 3
              },
              {
                "key": "2017-11-22T00:00:00.000Z-2018-09-01T00:00:00.000Z",
                "from": 1511308800000,
                "from_as_string": "2017-11-22T00:00:00.000Z",
                "to": 1535760000000,
                "to_as_string": "2018-09-01T00:00:00.000Z",
                "doc_count": 2
              }
            ]
          }
        },
        {
          "key": "V4",
          "doc_count": 7,
          "sold_date_range": {
            "buckets": [
              {
                "key": "2014-04-21T00:00:00.000Z-2016-02-01T00:00:00.000Z",
                "from": 1398038400000,
                "from_as_string": "2014-04-21T00:00:00.000Z",
                "to": 1454284800000,
                "to_as_string": "2016-02-01T00:00:00.000Z",
                "doc_count": 1
              },
              {
                "key": "2017-11-22T00:00:00.000Z-2018-09-01T00:00:00.000Z",
                "from": 1511308800000,
                "from_as_string": "2017-11-22T00:00:00.000Z",
                "to": 1535760000000,
                "to_as_string": "2018-09-01T00:00:00.000Z",
                "doc_count": 2
              }
            ]
          }
        },
        {
          "key": "V7",
          "doc_count": 3,
          "sold_date_range": {
            "buckets": [
              {
                "key": "2014-04-21T00:00:00.000Z-2016-02-01T00:00:00.000Z",
                "from": 1398038400000,
                "from_as_string": "2014-04-21T00:00:00.000Z",
                "to": 1454284800000,
                "to_as_string": "2016-02-01T00:00:00.000Z",
                "doc_count": 1
              },
              {
                "key": "2017-11-22T00:00:00.000Z-2018-09-01T00:00:00.000Z",
                "from": 1511308800000,
                "from_as_string": "2017-11-22T00:00:00.000Z",
                "to": 1535760000000,
                "to_as_string": "2018-09-01T00:00:00.000Z",
                "doc_count": 1
              }
            ]
          }
        }
      ]
    }
  }
}

Nested aggregation : Get all cars between date range along with car details and document counts

GET /sell/vehicle/_search
{
  "size": 0, 
  "aggs": {
    "popular_cars": {
      "terms": {
        "field": "model.keyword"
      },
      "aggs": {
        "sold_date_range": {
          "range": {
            "field": "registered",
            "ranges": [
              {
                "from": "2014-04-21",
                "to": "2016-02-01"
              },
              {
                "from": "2017-11-22",
                "to": "2018-09-01"
              }
            ]
          },
          "aggs": {
            "avg_price": {
              "avg": {
                "field": "price"
              }
            },
            "make" :{
              "terms": {
                "field": "company.keyword"
              }
            }
          }
        }
      }
    }
  }
}

Show response

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 20,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "popular_cars": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "V5",
          "doc_count": 10,
          "sold_date_range": {
            "buckets": [
              {
                "key": "2014-04-21T00:00:00.000Z-2016-02-01T00:00:00.000Z",
                "from": 1398038400000,
                "from_as_string": "2014-04-21T00:00:00.000Z",
                "to": 1454284800000,
                "to_as_string": "2016-02-01T00:00:00.000Z",
                "doc_count": 3,
                "avg_price": {
                  "value": 2134.0633138020835
                },
                "make": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "AUDI",
                      "doc_count": 2
                    },
                    {
                      "key": "BUGATTI",
                      "doc_count": 1
                    }
                  ]
                }
              },
              {
                "key": "2017-11-22T00:00:00.000Z-2018-09-01T00:00:00.000Z",
                "from": 1511308800000,
                "from_as_string": "2017-11-22T00:00:00.000Z",
                "to": 1535760000000,
                "to_as_string": "2018-09-01T00:00:00.000Z",
                "doc_count": 2,
                "avg_price": {
                  "value": 2626.8650512695312
                },
                "make": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "AUDI",
                      "doc_count": 1
                    },
                    {
                      "key": "SUZUKI",
                      "doc_count": 1
                    }
                  ]
                }
              }
            ]
          }
        },
        {
          "key": "V4",
          "doc_count": 7,
          "sold_date_range": {
            "buckets": [
              {
                "key": "2014-04-21T00:00:00.000Z-2016-02-01T00:00:00.000Z",
                "from": 1398038400000,
                "from_as_string": "2014-04-21T00:00:00.000Z",
                "to": 1454284800000,
                "to_as_string": "2016-02-01T00:00:00.000Z",
                "doc_count": 1,
                "avg_price": {
                  "value": 1210.6199951171875
                },
                "make": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "RENAULT",
                      "doc_count": 1
                    }
                  ]
                }
              },
              {
                "key": "2017-11-22T00:00:00.000Z-2018-09-01T00:00:00.000Z",
                "from": 1511308800000,
                "from_as_string": "2017-11-22T00:00:00.000Z",
                "to": 1535760000000,
                "to_as_string": "2018-09-01T00:00:00.000Z",
                "doc_count": 2,
                "avg_price": {
                  "value": 1932.9300537109375
                },
                "make": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "AUDI",
                      "doc_count": 1
                    },
                    {
                      "key": "LAMBORGHINI",
                      "doc_count": 1
                    }
                  ]
                }
              }
            ]
          }
        },
        {
          "key": "V7",
          "doc_count": 3,
          "sold_date_range": {
            "buckets": [
              {
                "key": "2014-04-21T00:00:00.000Z-2016-02-01T00:00:00.000Z",
                "from": 1398038400000,
                "from_as_string": "2014-04-21T00:00:00.000Z",
                "to": 1454284800000,
                "to_as_string": "2016-02-01T00:00:00.000Z",
                "doc_count": 1,
                "avg_price": {
                  "value": 2408.409912109375
                },
                "make": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "LAMBORGHINI",
                      "doc_count": 1
                    }
                  ]
                }
              },
              {
                "key": "2017-11-22T00:00:00.000Z-2018-09-01T00:00:00.000Z",
                "from": 1511308800000,
                "from_as_string": "2017-11-22T00:00:00.000Z",
                "to": 1535760000000,
                "to_as_string": "2018-09-01T00:00:00.000Z",
                "doc_count": 1,
                "avg_price": {
                  "value": 3145.719970703125
                },
                "make": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "RENAULT",
                      "doc_count": 1
                    }
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}

2 Comments

vé máy bay từ canada về Việt NamApr 13, 2021, 12:57:00 AM
Liên hệ Aivivu, đặt vé máy bay tham khảo

săn vé máy bay giá rẻ đi Mỹ

đặt vé máy bay từ vinh vào sài gòn

vé máy bay từ chu lai đi hà nội

giá vé máy bay đà nẵng đi đà lạt

vé máy bay cần thơ đi quy nhơn

taxi sân bay nội bài hà nội

combo trọn gói đà lạt
AnonymousJul 21, 2023, 4:40:00 AM
The majority of roll laminating film is polyester and thermal (requires heat to achieve a precise seal). However, specialized films use a wide variety of application methods, including pressure-sensitive film and nylon film, among others. There are four things to keep in mind while looking for a provider of China Bopp thermal lamination film suppliers : The film's thickness, breadth, grade, and core measurements are all specified.

Elasticsearch Bulk API and Aggregation DSL (Elasticsearch and Kibana Dev tool)

Aggregation DSL(Search aggregation):

2 Comments

Contact Form