I am trying to process JSON events received in a mobile app (like clicks etc.) using spark 1.5.2
. There are multiple app versions and the structure of the events varies across versions.
Say version 1 has the following structure:
{
"timestamp": "",
"ev": {
"app": {
"appName": "XYZ",
"appVersion": "1.2.0"
}
"device": {
"deviceId": "ABC",
...
}
...
}
}
And another version has the following structure:
{
"timestamp": "",
"ev": {
"_a": {
"name": "XYZ",
"version": "1.3.0"
}
"_d": {
"androidId": "ABC",
...
}
...
}
}
I want to be able to create a single dataframe for both the structure and perform some queries.
I am creating two different dataframes for each structure using the filter
function. Now I need to be able to able rename the columns to perform union operation on the two dataframes.
I am using:
df.withColumnRenamed("ev.app", "ev._a").withColumnRenamed("ev.device", "ev._d");
But this does not work. How do I achieve this?