(First post!)
I’ve been playing with an example resume dataset. The resume object is somewhat complex, with multiple sub-objects. For the current phase of my plan, I’m trying to flatten the dataset by storing the sub-objects as JSON strings. I’m running into a schema issue with the ToJSON UDF. (https://github.com/rjurney/pig-to-json)
If I do the following statement in my Pig script, I get the right data in my fields, but it reuses the Positions field names for all the ToJSson() calls:
stringifiedJSON =
FOREACH fullJSON
GENERATE
id .. TotalYears,
com.hortonworks.pig.udf.ToJson(Awards) AS Awards:chararray,
com.hortonworks.pig.udf.ToJson(Certifications) AS Certifications:chararray,
CASE WHEN Degrees IS NULL THEN ‘[]’ ELSE com.hortonworks.pig.udf.ToJson(Degrees) END AS Degrees:chararray,
com.hortonworks.pig.udf.ToJson(Links) AS Links:chararray,
com.hortonworks.pig.udf.ToJson(Groups) AS Groups:chararray,
com.hortonworks.pig.udf.ToJson(MilitaryService) AS MilitaryService:chararray,
com.hortonworks.pig.udf.ToJson(Positions) AS Positions:chararray;
If I describe the “fullJSON” dataset, here’s what I get in return ("…" are other fields not really relevant to the discussion):
fullJSON:
{
id: chararray,
..
Awards: {award: (AwardDate: chararray,AwardDescription: chararray,AwardTitle: chararray)},
Certifications: {certification: (CertDescription: chararray,CertEndDate: chararray,CertStartDate: chararray,CertTitle: chararray)},
…
Degrees: {(DegreeTitle: chararray,DegreeEndDate: chararray,DegreeStartDate: chararray,School: chararray,SchoolCity: chararray,SchoolState: chararray,DegreeEducationLevel: chararray)},
…
Links: {link: (LinkTitle: chararray,LinkURL: chararray)},
Groups: {group: (GroupDescription: chararray,GroupEndDate: chararray,GroupStartDate: chararray,GroupTitle: chararray)},
…
MilitaryService: {military_service: (MilitaryBranch: chararray,MilitaryCommendations: chararray,MilitaryCountry: chararray,MilitaryDescripton: chararray,MilitaryStartDate: chararray,MilitaryEndDate: chararray,MilitaryRank: chararray)},
…
Positions: {(Company: chararray,CompanyCity: chararray,CompanyState: chararray,JobStartDate: chararray,JobEndDate: chararray,JobTitle: chararray,IsCurrentTitle: int)},
…
}
Anyone got any ideas? I tried splitting the ToJson() calls each into their own step, but I got the same results.
I later played with the source code of ToJSON.java a bit, and I think I've narrowed it down to the following bit of code. I had added a log output of strSchema immediately after this, and it always returned the same information (that of the positions).
if (myProperties == null) {
// Retrieve our class specific properties from UDFContext
myProperties = UDFContext.getUDFContext().getUDFProperties(this.getClass());
}
String strSchema = myProperties.getProperty("horton.json.udf.schema");
Here's a sample of the stringifiedJSON output:
{
"id":"http://something.com/some_guy",
...
"Awards":"[]",
"Certifications":"[]",
"Degrees":"[{\"CompanyState\":null,\"CompanyCity\":null,\"JobEndDate\":\"\",\"IsCurrentTitle\":\"Bachelor's Degree\",\"JobTitle\":\"\",\"Company\":\"BS in Marketing\",\"JobStartDate\":\"State University\"}]",
"Links":"[]",
"Groups":"[]",
"MilitaryService":"[]",
"Positions":"[{\"CompanyState\":\"AZ\",\"CompanyCity\":\"Scottsdale\",\"JobEndDate\":\"2010-03-01T00:00:00.000Z\",\"IsCurrentTitle\":0,\"JobTitle\":\"Job runner\",\"Company\":\"somecompany\",\"JobStartDate\":\"2005-06-01T00:00:00.000Z\"},{\"CompanyState\":\"AZ\",\"CompanyCity\":\"Scottsdale\",\"JobEndDate\":\"2010-03-01T00:00:00.000Z\",\"IsCurrentTitle\":0,\"JobTitle\":\"Sales Rep\",\"Company\":\"Company2\",\"JobStartDate\":\"2005-06-01T00:00:00.000Z\"},{\"CompanyState\":\"AZ\",\"CompanyCity\":\"Phoenix\",\"JobEndDate\":\"2004-12-01T00:00:00.000Z\",\"IsCurrentTitle\":0,\"JobTitle\":\"Job 3\",\"Company\":\"Company3\",\"JobStartDate\":\"1991-05-01T00:00:00.000Z\"},{\"CompanyState\":\"AZ\",\"CompanyCity\":\"Phoenix\",\"JobEndDate\":\"2004-12-01T00:00:00.000Z\",\"IsCurrentTitle\":0,\"JobTitle\":\"CompanyRep\",\"Company\":\"Company4\",\"JobStartDate\":\"1991-05-01T00:00:00.000Z\"},{\"CompanyState\":\"AZ\",\"CompanyCity\":\"Phoenix\",\"JobEndDate\":null,\"IsCurrentTitle\":null,\"JobTitle\":\"Job5\",\"Company\":\"Company5\",\"JobStartDate\":\"2014-09-01T00:00:00.000Z\"}]"
}