0

Background: I work for a company that basically sells passes. Every order that is placed by the customer will contain N number of passes.

Issue: I have these JSON event-transaction files coming into a S3 bucket on a daily basis from DocumentDB (MongoDB). This JSON file is associated to the relevant type of event (insert, modify or delete) for every document key (which is an order in my case). The example below illustrates a "Insert" type of event that came through to the S3 bucket:

{
  "_id": {
    "_data": "11111111111111"
  },
  "operationType": "insert",
  "clusterTime": {
    "$timestamp": {
      "t": 11111111,
      "i": 1
    }
  },
  "ns": {
    "db": "abc",
    "coll": "abc"
  },
  "documentKey": {
    "_id": {
      "$uuid": "abcabcabcabcabcabc"
    }
  },
  "fullDocument": {
    "_id": {
      "$uuid": "abcabcabcabcabcabc"
    },
    "orderNumber": "1234567",
    "externalOrderId": "12345678",
    "orderDateTime": "2020-09-11T08:06:26Z[UTC]",
    "attraction": "abc",
    "entryDate": {
      "$date": 2020-09-13
    },
    "entryTime": {
      "$date": 04000000
    },
    "requestId": "abc",
    "ticketUrl": "abc",
    "tickets": [
      {
        "passId": "1111111",
        "externalTicketId": "1234567"
      },
      {
        "passId": "222222222",
        "externalTicketId": "122442492"
      }
    ],
    "_class": "abc"
  }
}

As we see above, every JSON file might contain N number of passes and every pass is - in turn - is associated to an external ticket id, which is a different column (as seen above). I want to use Pentaho Kettle to read these JSON files and load the data into the DW. I am aware of the Json input step and Row Normalizer that could then transpose "PassID 1", "PassID 2", "PassID 3"..."PassID N" columns into 1 unique column "Pass" and I would have to have to apply a similar logic to the other column "External ticket id". The problem with that approach is that it is quite static, as in, I need to "tell" Pentaho how many Passes are coming in advance in the Json input step. However what if tomorrow I have an order with 10 different passes? How can I do this dynamically to ensure the job will not break?

ERR
  • 213
  • 1
  • 6
  • 17

1 Answers1

1

If you want a tabular output like

TicketUrl   Pass            ExternalTicketID
----------  ------          ----------------
abc         PassID1Value1    ExTicketIDvalue1
abc         PassID1Value2    ExTicketIDvalue2
abc         PassID1Value3    ExTicketIDvalue3

And make incoming value dynamic based on JSON input file values, then you can download this transformation Updated Link

I found everything work dynamic in JSON input.

Ahmmed
  • 1,038
  • 1
  • 5
  • 13
  • perfect thank you, I will take a look and let you know. – ERR Sep 14 '20 at 10:40
  • Hi I tried to use your approach but got this message: "JSON Input.0 - The data structure is not the same inside the resource! We found 13 values for json path [$..passId], which is different that the number returned for path [$.fullDocument.ticketUrl] (1 values). We MUST have the same number of values for all paths.". Do you know how to go about this? Thanks – ERR Sep 14 '20 at 11:16
  • I have updated the transformation. Please let me know if you want anything else. – Ahmmed Sep 14 '20 at 12:02
  • Hi Runa thank you for updating this. I have tried running the new approach but now am getting this other error message "Select values.0 - ERROR (version 7.0.0.0-25, build 1 from 2016-11-05 15.35.36 by buildguy) : Couldn't find field 'operation_type' in row!". It seems for some reason Pentaho can't find the column "operation_type" when using the 2 different Json input steps? – ERR Sep 14 '20 at 13:14
  • The error seems like 'select value' step did not get column operation_type. Please ensure the exact column name in 'select value' input hub. Also you can share your transformation so that i can look the issue. – Ahmmed Sep 14 '20 at 13:55
  • Sure, how can I upload the transformation for you to take a look? I tried to upload into your Google Drive but it didnt let me do so? Thank you – ERR Sep 14 '20 at 13:57
  • You can use filedropper.com – Ahmmed Sep 14 '20 at 14:11
  • Hi please find my transformation in here and let me know your findings: http://www.filedropper.com/test1_111 Thank you – ERR Sep 14 '20 at 14:59
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/221464/discussion-between-runa-and-err). – Ahmmed Sep 14 '20 at 16:07