How does one get particular fields from RDD[String]
to a List
of maps with the specific field. I have an RDD[String]
: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[19]
Each entry is JSON in this format:
{
count: 1,
itemId: "1122334",
country: {
code: {
preferred: "USA"
},
name: {
preferred: "America"
}
},
states: "50",
self: {
otherInfo: [
],
preferred: "National Parks"
},
Rating: 4
}
How do I get a list of maps that have only itemId
as the key and self.preferred
as the value ({itemid , self.preferred}
):
itemId : 1122334 self.preferred : "National Parks"
itemId : 3444444 self.preferred : "State Parks"
...
Is it efficient to broadcast the resulting map across all nodes? I need this map to be shared/referenced by further calculations.