Jan 12, 2018

Create documents from JOSN file using "_bulk" API of Elasticsearch - Bulk index document from JOSN file

In introductory post we walked through Basic CRUD operations in Elasticsearch (ELS) and also saw how "_bulk" API can be used to perform multiple operations like create, update and delete in one request.
In this post we will use "_bulk" API to create multiple documents (JSON read from file) and index them. Elasticsearch expects JOSN input in specific format.
<FIRST_LINE_INDEX_INFO> {"index" : { .... .... .... }}
<SECOND_LINE_PAYLOAD> {"name": "nikhil, "age": 30}
<FIRST_LINE_INDEX_INFO> {"index" : { .... .... .... }}
<SECOND_LINE_PAYLOAD> {"name": "ranjan, "age": 30}

Note
: Last line of JOSN file should be new line terminated otherwise we can expect Exception thrown ELS.
"illegal_argument_exception" : "The bulk request must be terminated by a newline [\n]"

1. Visit website http://www.json-generator.com/ 
2. Create customer details using following template and save compact form of JSON in a file "customer.json"
[
  '{{repeat(100, 100)}}',
  {
    name: '{{firstName()}} {{surname()}}',
    age: '{{integer(18, 75)}}',
    gender: '{{gender()}}',
    email: '{{email()}}',
    phone: '+91 {{phone()}}',
    address: '{{integer(100, 999)}} {{street()}}',
    city: '{{city()}}',
    state: '{{state()}}, {{integer(100, 10000)}}'
  }
]
3. Clean and format JSON file : (Using sublime text regex model)
  --> Remove [ and ] from file.
  --> Using regex replace method replace "},{" with "}\n{"
  --> Append index info before each payload:  Find-Replace {"name" with {"index" : {}}\n{"name"
If you have processed file then Download cleaned and processed file.
4. Update json fie with new line at end of file.

Bulk index document from JOSN file :

Execute following command from same location where customer.josn is saved. Here "_bulk" API creates "employee" index and create 100 documents in it marked with type name "personal".
curl -H "Content-Type: application/x-ndjson" -XPOST 'localhost:9200/employee/personal/_bulk?pretty&refresh' --data-binary @"customer.json"

Now display list of indices to validate docs.count for employee index. Below response shows docs.count 100 for employee index.
➜  Desktop curl -XGET 'localhost:9200/_cat/indices?v&pretty'                                                                                         
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   employee  LU8xvoyMRwi-0o5K2JCyMg   5   1        100            0     85.7kb         85.7kb
yellow open   products  AIA9n0qFQN6suaMG6kzYMw   5   1          6            0     25.2kb         25.2kb
yellow open   customers j5KPYo3mRGuf4ahPFgbF0g   5   1          2            0       16kb           16kb


What happens if end of JSON file is not terminated by newline[\n] ?
➜  Desktop curl -H "Content-Type: application/x-ndjson" -XPOST 'localhost:9200/employee/personal/_bulk?pretty&refresh' --data-binary @"customer.json"
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "The bulk request must be terminated by a newline [\n]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "The bulk request must be terminated by a newline [\n]"
  },
  "status" : 400
}
==== ***** =======
Location: Bengaluru, Karnataka, India

6 comments:

  1. Replies
    1. IEEE Final Year Project centers make amazing deep learning final year projects ideas for final year students Final Year Projects for CSE to training and develop their deep learning experience and talents.

      IEEE Final Year projects Project Centers in India are consistently sought after. Final Year Students Projects take a shot at them to improve their aptitudes, while specialists like the enjoyment in interfering with innovation.

      corporate training in chennai corporate training in chennai

      corporate training companies in india corporate training companies in india

      corporate training companies in chennai corporate training companies in chennai

      I have read your blog its very attractive and impressive. I like it your blog. Digital Marketing Company in Chennai Project Centers in Chennai

      Delete
  2. Thanks for sharing the best information and suggestions, it is very nice and very useful to us. I appreciate the work that you have shared in this post. Keep sharing these types of articles here. bulk grilling wood

    ReplyDelete