I'm using NiFi for recover and put to Kafka many data. I'm actually in test phase and i'm using a large Json file.
My Json file countains 500K recordings.
Actually, I have a processor getFile
for get the file and a SplitJson
.
JsonPath Expression : $..posts.*
This configuration works with little file that countain 50K recordings but for large files, she crashes.
My Json file looks like that, with the 500K registeries in "posts":[]
{
"meta":{
"requestid":"request1000",
"http_code":200,
"network":"twitter",
"query_type":"realtime",
"limit":10,
"page":0
},
"posts":[
{
"network":"twitter",
"posted":"posted1",
"postid":"id1",
"text":"text1",
"lang":"lang1",
"type":"type1",
"sentiment":"sentiment1",
"url":"url1"
},
{
"network":"twitter",
"posted":"posted2",
"postid":"id2",
"text":"text2",
"lang":"lang2",
"type":"type2",
"sentiment":"sentiment2",
"url":"url2"
}
]
}
I read some documentations for this problem but, topics are for text file and speakers propose to link many SplitText
for split progressively the file. With a rigide structure like my Json, I don't understand how I can do that.
I'm looking for a solution that she makes the job on 500K recordings well.