I have a simple paired word counter problem in PySpark: This is the input as an RDD:
[' the adventure of the blue carbuncle the adventure of the blue carbuncle the adventure of the blue carbuncle ',' the adventure of the blue carbuncle']
I've already written a function that maps all the words pairs and gets an RDD output but it is a list of dictionaries for every string...
I just need to flatten the two dictionaries so the output is (of, blue), 4, not 3 in the first dictionary and 1 in the second. Tried all sorts of iterations of flatMap and reduceByKey and it's not working. Thanks!