1

I am using a invoke HTTP processor that gives an array of JSON objects. the size of the response is between 2-3 GB. The response looks something like this -

[
{
  "id": 17,
  "name": "ONE by AOL: Video"
},
{
  "id": 63,
  "name": "Adform"
} ---

Later at the downstream, I want to use each JSON object at a time because I need to apply some filters on it and save it in the database later. Want to know if splitjson is the right processor to go for? Currently, I am getting Out of memory exception while using the splitjson even when I have given 5GB in the bootstrap.conf file and also 2gb to the queue configs. I can go for more RAM if that's the only option?

gashu
  • 863
  • 2
  • 10
  • 21

1 Answers1

2

You should use the record processors to avoid needing to split. You can use QueryRecord for filtering and PutDatabaseRecord for inserting to a database.

Bryan Bende
  • 18,320
  • 1
  • 28
  • 39
  • You mean to use QueryRecord just after invoke HTTP ? – gashu Sep 23 '19 at 13:45
  • Wherever you need to the filtering you mentioned – Bryan Bende Sep 23 '19 at 15:15
  • one filtering that I have to do is to extract a substring from the name field. For eg from this "name": "Livescore Championship 2017 - 2018 - Android (com.LivescoreChampions20172018.pro)" i have to extract value com.LivescoreChampions20172018.pro. – gashu Sep 23 '19 at 20:19
  • i made this query in query record processor - SELECT SUBSTRING(name, CHARINDEX('Android (', name) , CHARINDEX(')',name) - CHARINDEX('Android (', name) + Len(')')) as name from FLOWFILE WHERE name like '%Android (%' but it is saying no match found for function charindex – gashu Sep 23 '19 at 20:20
  • do you know SQL version the query record processor supports and what level of sql queries we can make? Thanks for the help BTW – gashu Sep 23 '19 at 20:24
  • I don't really know the version of SQL, but it uses Apache Calcite so maybe you can look into what type of SQL is supported by Calcite – Bryan Bende Sep 23 '19 at 20:27
  • As far as filtering, I was thinking of filtering as selecting records based on some condition, but in this case you are trying to modify records so it may be more appropriate to use UpdateRecord for that – Bryan Bende Sep 23 '19 at 20:28
  • for UpdateRecord you would do something like /name = and you can see all the record path statements here - https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html – Bryan Bende Sep 23 '19 at 20:29
  • thanks, it works just fine I used UpdateRecord as you said and getting no memory error on even bigger files – gashu Sep 25 '19 at 08:37
  • should I add the template here so others can use? – gashu Sep 25 '19 at 08:38