I have a list which contains a couple of string values/field names. I also have a Spark RDD, I'd like to iterate the rdd and remove any field name that exists in the list. For example:
field_list = ["name_1", "name_2"]
RDD looks like this:
[Row(field_1=1, field_2=Row(field_3=[Row(field_4=[Row(name_1='apple', name_2='banana', name_3='F'), Row(name_1='tomato', name_2='eggplant', name_3='F')])]))]
I'm not very familiar with RDD, I understand that I can use map()
to perform iteration, but how can I add the conditions, if it finds "name_1"
or "name_2"
which exists in the field_list
, then remove the value and the field, so the expected result is a new RDD looks like:
[Row(field_1=1, field_2=Row(field_3=[Row(field_4=[Row(name_3='F'), Row(name_3='F')])]))]