Combine json response in nifi

Question

We are calling invokehttp processes and getting response which json. Example

{
"id": "h569gcjhcm",
"doi": {
"id": "10.17632/h569gcjhcm.1",
"status": "allocated",
"prefix": "10.17632"
},
"name": "Data for: Flooding of the Caspian Sea at the intensification of Northern Hemisphere Glaciations",
"description": "Supplementary data for the Jeirankechmez section in Azerbaijan.\n\n- Appendix A contains all paleomagnetic data and interpretations of the Jeirankechmez section. This .dir file can be imported into the paleomagnetism.org webportal under \"Interpretation Portal\", \"Advanced Options\", \"Import Application Save\". For further details on the use of paleomagnetism.org please refer to the article by Koymans et al. (2016) - https://doi.org/10.1016/j.cageo.2016.05.007.\n- Appendix B contains the magnetic susceptibility data for the analysed samples, including geographic coordinates and stratigraphic levels.\n- Appendix C contains the 40Ar/39Ar data for the three analysed volcanic ash layers. ",
"version": 1,
"publish_date": "2019-01-29T12:51:38.090Z",
"data_licence": {
"id": "01d9c749-3c4d-4431-9df3-620b2dcfe144",
"short_name": "CC BY 4.0",
"full_name": "Creative Commons Attribution 4.0 International",
"description": "This dataset is licensed under a Creative Commons Attribution 4.0 International licence.\n\nWhat does this mean?\nYou can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.",
"url": "http://creativecommons.org/licenses/by/4.0",
"category": "Creative"
},
"contributors": [
{
"first_name": "Christiaan",
"last_name": "van Baak"
},
{
"first_name": "Marius",
"last_name": "Stoica"
},
{
"first_name": "Arjen",
"last_name": "Grothe"
},
{
"first_name": "Gareth",
"last_name": "Davies"
},
{
"profile_id": "72970719-95c8-341b-80d2-afa9e7154baf",
"first_name": "Wout",
"last_name": "Krijgsman"
},
{
"profile_id": "3a4bfe2c-4098-3859-9b88-789fa993e05a",
"first_name": "Keith",
"last_name": "Richards"
},
{
"profile_id": "f1660f3c-ebbd-3289-8240-1f4ea7913df4",
"first_name": "Klaudia",
"last_name": "Kuiper"
},
{
"first_name": "Elmira",
"last_name": "Aliyeva"
}
],
"versions": [
{
"version": 1,
"publish_date": "2019-01-29T12:51:38.090Z",
"available": true
}
],
"files": [
{
"filename": "Appendix_A_Jeirankechmez_pmag_interpretations.dir",
"id": "f2f4cba7-2411-4737-a9b2-f094db30dca1",
"content_details": {
"id": "994bc865-5300-4d76-a373-e528ccd830e8",
"sha256_hash": "2427c4b077372760973ce8224694f2a2ee5383c7f022ad818164d847a20e27cc",
"sha1_hash": "73792dc6d6eb2c1de1e04926ba5d4420dd0aaece",
"content_type": "application/x-director",
"size": 917022,
"created_date": "2019-01-03T00:00:00.000Z"
"download_expiry_time": "2019-01-29T13:52:25.729Z"
},
"metrics": {
"downloads": 0,
"previews": 0
}
},
{
"filename": "Appendix_B_Sample_locations_susceptibility.xlsx",
"id": "64241bf0-5279-49e8-a505-be9075b910e1",
"content_details": {
"id": "af8809d0-8e63-4599-abaa-e7af9ad39959",
"sha256_hash": "0588f44a0cbd477aa2798323e57ce0b2d4a118e767c0b1ffdc9eb1017e4d23c2",
"sha1_hash": "02e89f6f197ebf495e1e2c3d1aab250efc7545e7",
"content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"size": 24770,
"created_date": "2019-01-03T00:00:00.000Z"
,
"download_expiry_time": "2019-01-29T13:52:25.732Z"
},
"metrics": {
"downloads": 0,
"previews": 0
}
},
{
"filename": "Appendix_C_ArAr_data.xlsx",
"id": "2e912027-ff3f-48ad-98b9-b643b59ba0e3",
"content_details": {
"id": "4960377c-060d-41f6-b7af-150617d8ebeb",
"sha256_hash": "235dc32c1e99f350ee5c99908a5f5d72d1aeeab02f78c2e0181d585bd1880fa6",
"sha1_hash": "6483156e4577948cac5d2679eee862c76faed1c9",
"content_type": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
"size": 18510,
"created_date": "2019-01-03T00:00:00.000Z"
},
"metrics": {
"downloads": 0,
"previews": 0
}
}
],
"articles": [
{
"id": "10.1016/j.gloplacha.2019.01.007",
"title": "Flooding of the Caspian Sea at the intensification of Northern Hemisphere Glaciations",
"doi": "10.1016/j.gloplacha.2019.01.007",
"journal": {
"issn": "0921-8181",
"name": "Global and Planetary Change",
"url": "http://www.sciencedirect.com/science/journal/09218181"
}
}
],
"categories": [
{
"id": "http://com/vocabulary/OmniScience/Concept-170590667",
"label": "Geology"
},

{
"id": "http://data.elsevier.com/vocabulary/OmniScience/Concept-473860195",
"label": "Strontium Isotope"
}
],
"institutions": [ ],
"metrics": {

},
"available": true,
"related_links": [ ]
}

I am using $contributors.profile_id from above json to call new endpoint(invokeshttp) (https://api.xxx.com/profile/$.profile_id)

Json response for this

"contributors": [
{
“profile_id”:”cedferfiherhforhforf”
"first_name": “xxx”,
"last_name": "van Baak”,
“other_ids”:[] ,
“Other info”: “deeded” }

I have to call this endpoint depending upon number of object in contributor(let say we have 5 object in contributor ,so I have to call this endpoint 5 time)and combine these 5 response together
Then I have to merge the response(above response to the main response )

Welcome to StackOverflow. Have you tried anything yet? Please include an example of your coding attempts in the question if you have. Please include a [mcve] — tshimkus, Feb 11 '19 at 19:30
@tshimkus I have tried below: 1. Called invokehttp -> got json response --> splitjson (on $.contributors[*]) --> EvaluateJsonPath( to extract id )-->invokehttp calling different endpoimnt uisng these ids . Now i have to combined these json response then merge this json response with first json response (1st invokehttp response) — Monika Naagar, Feb 12 '19 at 12:56

score 1 · Answer 1 · answered Feb 11 '19 at 19:48

1

just an example:

EvaluateJsonPath to extract "id" into attribute, later join by this attribute
SplitJson to split your json by "contributors"
call endpoint
MergeContent merge by "id" and with count after SplitJson

answered Feb 11 '19 at 19:48

daggett

26,404
3
40
56

thanks !! how to I merge main json response with these json responses ? how do i maintain json format ? how would nifi identify which are all the files to merge ? – Monika Naagar Feb 12 '19 at 13:00
by extracting `id` into `fragment.identifier` attribute before split - all splitted files will have it required for merge. by adding header=`[`, delimiter=`,` and footer=`]` you'll have a valid json after merge. and then you can do jolt transform to normalize json... however i would use groovy script for this algorithm - it will be simple. – daggett Feb 12 '19 at 17:28
just an [example of groovy script](https://stackoverflow.com/questions/49578492/programmatically-provide-nifi-invokehttp-different-certificates/49591669#49591669). and in your case it could be easier. – daggett Feb 12 '19 at 17:41
can I use Correlation Attribute Name instead of fragment.identifier? – Monika Naagar Feb 13 '19 at 10:20
I am using Merge Strategy as Defragment . so if I specify header,delimiter and footer. nifi will ignore that. nifi will use these properties only when merge strategy is Bin-Packing Algorithm. I want to defragment strategy as I want to specify Correlation Attribute Name as id . – Monika Naagar Feb 13 '19 at 10:32
use Merge Strategy = Defragment, and Merge Format = Binary Concatenation. in this case header, footer and delim must be used. I'm sure there is an error in documentation: `This property is valid only when using the binary-concatenation merge strategy` - there should be `format` instead of `strategy` – daggett Feb 13 '19 at 12:05
Mergecontent is not behaving in way it has expected work. Or I would say I am bit confused with the way it is behaving . I am using Correlation Attribute Name in merge content process. I using id value for it. but it is not merging the flow files with same id . and it keep on showing flowfiles in input queue – Monika Naagar Feb 13 '19 at 13:24
what strategy are you using? – daggett Feb 13 '19 at 13:37
I am using Defragment – Monika Naagar Feb 13 '19 at 13:42
defragment uses `fragment.*` attributes of the flow file, and not Correlation Attribute Name – daggett Feb 13 '19 at 13:43
will flow file before i call splitjson and after i call splitjson will have same fragment.idenifier ? – Monika Naagar Feb 13 '19 at 13:50
seems - no: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.8.0/org.apache.nifi.processors.standard.SplitJson/index.html split json generates `fragment.identifier`. you can keep it like this – daggett Feb 13 '19 at 14:12
after splitjson processor, m calling few more processor then I am merging the flow files(splitjson--> EvaluateJsonPath-->invokehttp-->EvaluateJsonPath--> AttributesToJSON) after all these processor I am calling merge content – Monika Naagar Feb 13 '19 at 14:14
I am new to nifi. I dont know how to read (read attributes properties (ie. fragment.identifier) in merge content processor ? – Monika Naagar Feb 13 '19 at 20:12
merge content reads this attribute from the flow file. you just have to ensure attribute was set before this processor. – daggett Feb 13 '19 at 21:56
if fragment.count is not matching then merge content is not merging the flow files .Due to some processing some of the flow file are not coming to merge content .how can I merge flow file with same fragment identifier but fragment count is not correct. I want ignore frgment count in merge content.is there any way ? – Monika Naagar Feb 15 '19 at 15:40

Combine json response in nifi

1 Answers1