Is it possible to pass extra arguments to the mapping function in pySpark? Specifically, I have the following code recipe:
raw_data_rdd = sc.textFile("data.json", use_unicode=True)
json_data_rdd = raw_data_rdd.map(lambda line: json.loads(line))
mapped_rdd = json_data_rdd.flatMap(processDataLine)
The function processDataLine
takes extra arguments in addition to the JSON object, as:
def processDataLine(dataline, arg1, arg2)
How can I pass the extra arguments arg1
and arg2
to the flaMap
function?