Jan 12, 2018

Create documents from JOSN file using "_bulk" API of Elasticsearch - Bulk index document from JOSN file

In introductory post we walked through Basic CRUD operations in Elasticsearch (ELS) and also saw how "_bulk" API can be used to perform multiple operations like create, update and delete in one request.
In this post we will use "_bulk" API to create multiple documents (JSON read from file) and index them. Elasticsearch expects JOSN input in specific format.
<FIRST_LINE_INDEX_INFO> {"index" : { .... .... .... }}
<SECOND_LINE_PAYLOAD> {"name": "nikhil, "age": 30}
<FIRST_LINE_INDEX_INFO> {"index" : { .... .... .... }}
<SECOND_LINE_PAYLOAD> {"name": "ranjan, "age": 30}

Note
: Last line of JOSN file should be new line terminated otherwise we can expect Exception thrown ELS.
"illegal_argument_exception" : "The bulk request must be terminated by a newline [\n]"

1. Visit website http://www.json-generator.com/ 
2. Create customer details using following template and save compact form of JSON in a file "customer.json"
[
  '{{repeat(100, 100)}}',
  {
    name: '{{firstName()}} {{surname()}}',
    age: '{{integer(18, 75)}}',
    gender: '{{gender()}}',
    email: '{{email()}}',
    phone: '+91 {{phone()}}',
    address: '{{integer(100, 999)}} {{street()}}',
    city: '{{city()}}',
    state: '{{state()}}, {{integer(100, 10000)}}'
  }
]
3. Clean and format JSON file : (Using sublime text regex model)
  --> Remove [ and ] from file.
  --> Using regex replace method replace "},{" with "}\n{"
  --> Append index info before each payload:  Find-Replace {"name" with {"index" : {}}\n{"name"
If you have processed file then Download cleaned and processed file.
4. Update json fie with new line at end of file.

Bulk index document from JOSN file :

Execute following command from same location where customer.josn is saved. Here "_bulk" API creates "employee" index and create 100 documents in it marked with type name "personal".
curl -H "Content-Type: application/x-ndjson" -XPOST 'localhost:9200/employee/personal/_bulk?pretty&refresh' --data-binary @"customer.json"

Now display list of indices to validate docs.count for employee index. Below response shows docs.count 100 for employee index.
➜  Desktop curl -XGET 'localhost:9200/_cat/indices?v&pretty'                                                                                         
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   employee  LU8xvoyMRwi-0o5K2JCyMg   5   1        100            0     85.7kb         85.7kb
yellow open   products  AIA9n0qFQN6suaMG6kzYMw   5   1          6            0     25.2kb         25.2kb
yellow open   customers j5KPYo3mRGuf4ahPFgbF0g   5   1          2            0       16kb           16kb


What happens if end of JSON file is not terminated by newline[\n] ?
➜  Desktop curl -H "Content-Type: application/x-ndjson" -XPOST 'localhost:9200/employee/personal/_bulk?pretty&refresh' --data-binary @"customer.json"
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "The bulk request must be terminated by a newline [\n]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "The bulk request must be terminated by a newline [\n]"
  },
  "status" : 400
}
==== ***** =======
Location: Bengaluru, Karnataka, India