PySpark: RDD how to reducebykey on a list of dictionaries?

Asked Oct 10 '21 at 23:15

Active Oct 11 '21 at 02:55

Viewed 90 times

I have a simple paired word counter problem in PySpark: This is the input as an RDD:

[' the adventure of the blue carbuncle  the adventure of the blue carbuncle  the adventure of the blue carbuncle ',' the adventure of the blue carbuncle']

I've already written a function that maps all the words pairs and gets an RDD output but it is a list of dictionaries for every string...

I just need to flatten the two dictionaries so the output is (of, blue), 4, not 3 in the first dictionary and 1 in the second. Tried all sorts of iterations of flatMap and reduceByKey and it's not working. Thanks!

edited Oct 11 '21 at 02:55

pltc

5,836
1
13
31

asked Oct 10 '21 at 23:15

Teddy

Any solution to this? I'm stuck on exactly the same problem. – pramesh shakya Apr 10 '22 at 18:53

PySpark: RDD how to reducebykey on a list of dictionaries?

0 Answers0