0

I am new to Azure Data Factory and I need to create a json file for bulk api upsert of elasticsearch with the following considerations;

  1. input is in json format which will be used as payload for upsert api, each row consists of an array and objects (number of objects is not the same for all row)
  2. I need to create a dynamic json output, 2 rows of output for each row of input, please see sample below
  3. output json file should end with newline (\n)
  4. I have tested the bulk api upsert using postman, I just need to create it dynamically using either pipeline activity, dataflow or pyspark notebook
  5. Also I am open to the possibility of editing the dataflow that created the input json file (which sources from a parquet file) to make the desired output json.

Sample:

input json:

    row1: {"array1":["value","value","value"],"object2":"value2","object3":"value3","object4":"value4","object6":"value6"}
    row2: {"object2":"value2","object3":"value3","object4":"value4","object5":"value5","object6":"value6","object7":"value7"}

notes for input json:

  1. array can have multiple data inside
  2. values can be of any data type
  3. if there's no data on array, it will not be on the input json
  4. input json can have up to 1M of rows

output json:

    {"update": {"_id": <object2_object3>, "_index": <constant literal>}}   <--- need to get the objects from input row1
    {"doc": {<whole input json row1>}, "doc_as_upsert" : true}               <--- will use the whole input row1
    {"update": {"_id": <object2_object3>, "_index": <constant literal>}}   <--- need to get the objects from input row2
    {"doc": {<whole input json row2>}, "doc_as_upsert" : true}               <--- will use the whole input row2
                                                                       <--- should have empty line at the end

notes for output json

  1. each input json row should have 2 output json rows (update function and doc)
  2. doc object should contain the whole row of the input json

Thanks for your help guys

0 Answers0