0

I have been looking at this for a few weeks (in the background) and am stumped on how to convert JSON data approximating a CSV into a tagged set using the NiFi JoltTransformJson processor. What I mean by this is to use the data from the first row of an array in the input as the JSON object name in the output.

As an example I have this input data:

[
  [
    "Company",
    "Retail Cost",
    "Percentage"
  ],
  [
    "ABC",
    "5,368.11",
    "17.09%"
  ],
  [
    "DEF",
    "101.47",
    "0.32%"
  ],
  [
    "GHI",
    "83.79",
    "0.27%"
  ]
]

and what I am trying to get as output is:

[
  {
    "Company": "ABC",
    "Retail Cost": "5,368.11",
    "Percentage": "17.09%"
  },
  {
    "Company": "DEF",
    "Retail Cost": "101.47",
    "Percentage": "0.32%"
  },
  {
    "Company": "GHI",
    "Retail Cost": "83.79",
    "Percentage": "0.27%"
  }
]

I see this as primarily 2 problems: getting access to the content of the first array and then making sure that the output data does not include that first array.

I would love to post a Jolt Specification showing myself getting somewhat close, but the closest gives me the correct shape of output without the correct content. It looks like this:

[
  {
    "operation": "shift",
    "spec": {
      "*": {
        "*": "[&1].&0"
      }
    }
  }
]

But it results in an output like this:

[ {
  "0" : "Company",
  "1" : "Retail Cost",
  "2" : "Percentage"
}, {
  "0" : "ABC",
  "1" : "5,368.11",
  "2" : "17.09%"
}, {
  "0" : "DEF",
  "1" : "101.47",
  "2" : "0.32%"
}, {
  "0" : "GHI",
  "1" : "83.79",
  "2" : "0.27%"
} ]

Which clearly has the wrong object name and it has 1 too many elements in the output.

Andy
  • 13,916
  • 1
  • 36
  • 78
Mark
  • 5
  • 1
  • 3

1 Answers1

4

Can do it, but wow it is hard to read / looks like terrible regex

Spec

[
  {
    // this does most of the work, but producs an output
    //  array with a null in the Zeroth space.
    "operation": "shift",
    "spec": {
      // match the first item in the outer array and do 
      //  nothing with it, because it is just "header" data
      //   e.g. "Company", "Retail Cost", "Percentage".
      // we need to reference it, but not pass it thru
      "0": null,
      // 
      // loop over all the rest of the items in the outer array
      "*": {
        // this is rather confusing
        // "*" means match the array indices of the innner array
        // and we will write the value at that index "ABC" etc
        // to "[&1].@(2,[0].[&])"
        // "[&1]" means make the ouput be an array, and at index
        //   &1, which is the index of the outer array we are
        //   currently in.
        // Then "lookup the key" (Company, Retail Cost) using
        //  @(2,[0].[&])
        // Which is go back up the tree to the root, then 
        //  come back down into the first item of the outer array
        //  and Index it by the by the array index of the current
        //  inner array that we are at.
        "*": "[&1].@(2,[0].[&])"
      }
    }
  },
  {
    // We know the first item in the array will be null / junk,
    //  because the first item in the input array was "header" info.
    // So we match the first item, and then accumulate everything
    //  into a new array
    "operation": "shift",
    "spec": {
      "0": null,
      "*": "[]"
    }
  }
]
Milo S
  • 4,466
  • 1
  • 19
  • 22
  • Thank you Milo... I can confirm that this works and I am learning a lot from it. Thank you very much for commenting the spec. Although I am confused, if the first shift operator emits a null, then why do we need the 2nd shift? I have tried removing it and I see the result. But I find it confusing because the two shift clauses look identical for the first row. So, if the first one did not remove it... why does the second one succeed? – Mark Oct 01 '17 at 06:03
  • Updated the spec with some more comments. Hopefully that makes it clear. That said, just try remove the first "0" : null and just running that first shift and see what it does. ;) – Milo S Oct 08 '17 at 21:18