0

Example Input is below: I need to split JSON objects present in a JSON array into individual JSON files using Apache NiFi and publish it to a Kafka Topic. There are multiple JSON objects present in the below array

[
{
    "stops": "1 Stop",
    "ticket price": "301.20",
    "days to departure": -1,
    "date of extraction": "03/22/2019",
    "departure": ", Halifax",
    "arrival": ", Toronto",
    "flight duration": "0 days 3 hours 58 minutes",
    "airline": "Porter Airlines",
    "plane": "DE HAVILLAND DHC-8 DASH 8-400 DASH 8Q",
    "timings": [
        {
            "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "6:40pm",
            "arrival_airport": "Ottawa, ON, Canada (YOW-Macdonald-Cartier Intl.)",
            "arrival_time": "7:58pm"
        },
        {
            "departure_airport": "Ottawa, ON, Canada (YOW-Macdonald-Cartier Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "8:30pm",
            "arrival_airport": "Toronto, ON, Canada (YTZ-Billy Bishop Toronto City)",
            "arrival_time": "9:38pm"
        }
    ],
    "plane code": "DH4",
    "id": "8e6c69c8-65e0-4f1b-b540-ae61abf8aa6d"
},
{
    "stops": "Nonstop",
    "ticket price": "390.95",
    "days to departure": -1,
    "date of extraction": "03/22/2019",
    "departure": ", Halifax",
    "arrival": ", Toronto",
    "flight duration": "0 days 2 hours 35 minutes",
    "airline": "Air Canada",
    "plane": "Boeing 767-300",
    "timings": [
        {
            "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "7:40pm",
            "arrival_airport": "Toronto, ON, Canada (YYZ-Pearson Intl.)",
            "arrival_time": "9:15pm"
        }
    ],
    "plane code": "763",
    "id": "fc13c5cb-93d1-46f9-b496-abbf6faba85a"
},
{
    "stops": "Nonstop",
    "ticket price": "391.33",
    "days to departure": -1,
    "date of extraction": "03/22/2019",
    "departure": ", Halifax",
    "arrival": ", Toronto",
    "flight duration": "0 days 2 hours 30 minutes",
    "airline": "WestJet",
    "plane": "BOEING 737-700 (WINGLETS) PASSENGER",
    "timings": [
        {
            "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "7:10pm",
            "arrival_airport": "Toronto, ON, Canada (YYZ-Pearson Intl.)",
            "arrival_time": "8:40pm"
        }
    ],
    "plane code": "73W",
    "id": "4d49c24b-6fb0-4f45-ba05-a3969ce7308a"
}
]

Needed Output: Individual JSON objects like below. I would like to post each JSON object to a Kafka topic.

{
        "stops": "Nonstop",
        "ticket price": "390.95",
        "days to departure": -1,
        "date of extraction": "03/22/2019",
        "departure": ", Halifax",
        "arrival": ", Toronto",
        "flight duration": "0 days 2 hours 35 minutes",
        "airline": "Air Canada",
        "plane": "Boeing 767-300",
        "timings": [
            {
                "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
                "departure_date": "03/22/2019",
                "departure_time": "7:40pm",
                "arrival_airport": "Toronto, ON, Canada (YYZ-Pearson Intl.)",
                "arrival_time": "9:15pm"
            }
        ],
        "plane code": "763",
        "id": "fc13c5cb-93d1-46f9-b496-abbf6faba85a"
    }
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Meghashyam
  • 53
  • 2
  • 6

2 Answers2

2

You can use SplitJson processor, this processor will split json array of messages into individual messages as content of each flowfile i.e if your json array having 100 messages in it then split json processor splits relation will output 100 flowfiles having each message in it

JSONPath is $.*

https://community.hortonworks.com/questions/183055/need-to-display-each-element-of-array-in-a-separat.html

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • this is fine, but the issue with SplitJson processor is that each item is now in an individual flow file causing you to post individual messages to Kafka. Is there a way to keep these JSON in a single flow file separated by a newline ? – Vijay Kumar Jun 13 '19 at 14:59
0

This is an old post, but still wants to add my suggestions. Firstly, @OneCricketeer is correct that you have to use SplitJson processor for the same, but expression is very important in that.

As per the json provided by @Meghashaym, i would suggest to wrap the array into one object like below:

{"payload":[
{
    "stops": "1 Stop",
    "ticket price": "301.20",
    "days to departure": -1,
    "date of extraction": "03/22/2019",
    "departure": ", Halifax",
    "arrival": ", Toronto",
    "flight duration": "0 days 3 hours 58 minutes",
    "airline": "Porter Airlines",
    "plane": "DE HAVILLAND DHC-8 DASH 8-400 DASH 8Q",
    "timings": [
        {
            "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "6:40pm",
            "arrival_airport": "Ottawa, ON, Canada (YOW-Macdonald-Cartier Intl.)",
            "arrival_time": "7:58pm"
        },
        {
            "departure_airport": "Ottawa, ON, Canada (YOW-Macdonald-Cartier Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "8:30pm",
            "arrival_airport": "Toronto, ON, Canada (YTZ-Billy Bishop Toronto City)",
            "arrival_time": "9:38pm"
        }
    ],
    "plane code": "DH4",
    "id": "8e6c69c8-65e0-4f1b-b540-ae61abf8aa6d"
},
{
    "stops": "Nonstop",
    "ticket price": "390.95",
    "days to departure": -1,
    "date of extraction": "03/22/2019",
    "departure": ", Halifax",
    "arrival": ", Toronto",
    "flight duration": "0 days 2 hours 35 minutes",
    "airline": "Air Canada",
    "plane": "Boeing 767-300",
    "timings": [
        {
            "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "7:40pm",
            "arrival_airport": "Toronto, ON, Canada (YYZ-Pearson Intl.)",
            "arrival_time": "9:15pm"
        }
    ],
    "plane code": "763",
    "id": "fc13c5cb-93d1-46f9-b496-abbf6faba85a"
},
{
    "stops": "Nonstop",
    "ticket price": "391.33",
    "days to departure": -1,
    "date of extraction": "03/22/2019",
    "departure": ", Halifax",
    "arrival": ", Toronto",
    "flight duration": "0 days 2 hours 30 minutes",
    "airline": "WestJet",
    "plane": "BOEING 737-700 (WINGLETS) PASSENGER",
    "timings": [
        {
            "departure_airport": "Halifax, NS, Canada (YHZ-Stanfield Intl.)",
            "departure_date": "03/22/2019",
            "departure_time": "7:10pm",
            "arrival_airport": "Toronto, ON, Canada (YYZ-Pearson Intl.)",
            "arrival_time": "8:40pm"
        }
    ],
    "plane code": "73W",
    "id": "4d49c24b-6fb0-4f45-ba05-a3969ce7308a"
}
]}

Now i am using the Jsonpath finder to view the json structure. When we click on Payload object, we can see the array items in path x.payload

In this case, You can use $.payload[*] as the expression in the processor and Set the Primary Node For Execution option under scheduling tab. enter image description here This should queue up the individual items in the queue list. So basically we are parsing each element of the array object.

Dharman
  • 30,962
  • 25
  • 85
  • 135
ArjunArora
  • 986
  • 3
  • 12
  • 27