I've never used Jolt Transform before and I'm not sure how to fix what I've done, so I apologize if this is actually a really easy fix.
I have two XML files (dummy versions below, actual contains PII) that I merged together using MergeRecord in NiFi. Because of the way that the output is (one flow file with an array of JSONS) it was suggested that I use JoltTransform to merge them together properly. I was pointed to this question on how to essentially do a streaming join in NiFi (which is what I needed).
While this works for the most part, I am still having one issue. All of the tags on my "base" level (FatherID, FatherName, BirthDate, etc.) are turned into arrays. I need these to not be arrays, because I'd like to use the same combined schema I used in MergeRecord (which does not have those fields as arrays).
Is there something I need to change in the spec, or do I need to do another JoltTransform (which is fine)?
Input XML 1
<?xml version="1.0" encoding="UTF-8"?>
<FoundingFathers>
<FatherID>1234</FatherID>
<FatherName>George Washington</FatherName>
<ResidentialInformation>
<Name>Mount Vernon</Name>
<StreetAddress>3200 Mount Vernon Hwy</StreetAddress>
<City>Mt Vernon</City>
<State>VA</State>
<ZipCode>22121</ZipCode>
</ResidentialInformation>
<BirthDate>1732-02-22</BirthDate>
</FoundingFathers>
Input XML 2
<?xml version="1.0" encoding="UTF-8"?>
<DOC>
<DOCID>1234</DOCID>
<FATHERNAME>George Washington</FATHERNAME>
<RAW_TXT>George Washington lived in Mount Vernon in Mt Vernon, VA. The Washington family had owned land in the area since 1674. The original house was built in 1734 by Washington's father.</RAW_TXT>
<TXT>
<S>
<FATHERNAME>George Washington</FATHERNAME>
<ESTATENAME>Mount Vernon</>
<ESTATEPLACE>VA</ESTATEPLACE>
</S>
<S>
<OWNER>Washington family</OWNER>
<YEAROWNED>1674</YEAROWNED>
</S>
<S>
<BUILTIN>1734</BUILTIN>
<BUILTBY>Washington's father</BUILTBY>
</S>
</TXT>
</DOC>
MergeRecord Configs
Record Reader: XMLReader
Record Writer: JsonRecordSetWriter
Merge Strategy: Bin-Packing Algorithm
Correlation Attribute Name: FatherID
Attribute Strategy: Keep All Unique Attributes
Minimum Number of Records: 2
Maximum Number of Records: 2
Minimum Bin Size: 0 B
Maximum Bin Size: No value set
Max Bin Age: No value set
Maximum Number of Bins: 10
Schema
{
"namespace": "ff",
"name": "founders",
"type": "record",
"fields": [
{"name":"FatherID", "type": ["string", "null"], "default": null},
{"name":"FatherName", "type": ["string", "null"], "default": null},
{"name":"ResidentialInformation", "type": ["null", {
"name": "ResidentialInformation", "type": "array", "items": {
"name": "ResidentialInformation", "type": "record", "fields": [
{"name": "Name", "type": ["string","null"], "default":null},
{"name": "StreetAddress", "type": ["string","null"], "default":null},
{"name": "City", "type": ["string","null"], "default":null},
{"name": "State", "type": ["string","null"], "default":null},
{"name": "ZipCode", "type": ["string","null"], "default":null}
]
}
}]},
{"name":"BirthDate", "type": ["string", "null"], "default": null},
{"name": "DOCID", "type": ["string", "null"], "default": null},
{"name": "FINAME", "type": ["string", "null"], "default": null},
{"name": "CUSTNAME", "type": {"type": "array", "items": "string"}},
{"name": "RAW_TXT", "type": {"type": "array", "items": "string"}},
{"name": "TXT", "type": {
"name": "TXT", "type": "record", "namespace": "txt.sar", "fields": [
{"name": "S", "type": {
"type": "array", "items": {
"name": "RecordInArray", "type": "record", "fields": [
{"name": "FATHERNAME", "type": {"type": "array", "items": ["string","null"]}},
{"name": "ESTATENAME", "type": {"type": "array", "items": ["string","null"]}},
{"name": "ESTATEPLACE", "type": {"type": "array", "items": ["string","null"]}},
{"name": "OWNER", "type": {"type": "array", "items": ["string","null"]}},
{"name": "YEAROWNED", "type": {"type": "array", "items": ["string","null"]}},
{"name": "BUILTIN", "type": {"type": "array", "items": ["string","null"]}},
{"name": "BUILTBY", "type": {"type": "array", "items": ["string","null"]}}
]
}
}}
]
}}
]}
jolt spec (shift operation)
{
"*": {
"*": "&"
}
}
Actual Output
[ {
"FatherID" : ["1234", null],
"FatherName" : ["George Washington", null],
"ResidentialInformation" : [ {
"Name" : "Mount Vernon",
"StreetAddress" : "3200 Mount Vernon Hwy",
"City" : "Mt Vernon",
"State" : "VA",
"ZipCode" : "22121"
} ],
"BirthDate" : ["1732-02-22", null],
"DOCID" : "1234",
"FATHERNAME" : "George Washington",
"RAW_TXT" : [ "\nGeorge Washington lived in Mount Vernon in Mt Vernon, VA. The Washington family had owned land in the area since 1674. The original house was built in 1734 by Washington's father.\n" ],
"TXT" : {
"S" : [ {
"FATHERNAME" : [ "George Washington" ],
"ESTATENAME" : [ "Mount Vernon" ],
"ESTATEPLACE" : [ "VA" ]
}, {
"OWNER" : [ "Washington family" ],
"YEAROWNED" : [ "1674" ]
}, {
"BUILTIN" : [ "1734" ],
"BUILTBY" : [ "Washington's father" ]
} ]
}
} ]
Expected Output
[ {
"FatherID" : "1234",
"FatherName" : "George Washington",
"ResidentialInformation" : [ {
"Name" : "Mount Vernon",
"StreetAddress" : "3200 Mount Vernon Hwy",
"City" : "Mt Vernon",
"State" : "VA",
"ZipCode" : "22121"
} ],
"BirthDate" : "1732-02-22",
"DOCID" : "1234",
"FATHERNAME" : "George Washington",
"RAW_TXT" : [ "\nGeorge Washington lived in Mount Vernon in Mt Vernon, VA. The Washington family had owned land in the area since 1674. The original house was built in 1734 by Washington's father.\n" ],
"TXT" : {
"S" : [ {
"FATHERNAME" : [ "George Washington" ],
"ESTATENAME" : [ "Mount Vernon" ],
"ESTATEPLACE" : [ "VA" ]
}, {
"OWNER" : [ "Washington family" ],
"YEAROWNED" : [ "1674" ]
}, {
"BUILTIN" : [ "1734" ],
"BUILTBY" : [ "Washington's father" ]
} ]
}
} ]