2

My code :

ks1 = KafkaUtils.createStream(ssc, zkQuorum='localhost:2181', groupId='G1', topics={'test': 2})
ks2 = KafkaUtils.createStream(ssc, zkQuorum='localhost:2181', groupId='G2', topics={'test': 2})

d1 = ks1.map(lambda x: x[1]).flatMap(lambda x: list(x)).countByValue()
d2 = ks2.map(lambda x: x[1]).flatMap(lambda x: list(x)).countByValue()

d3 = d1.transformWith(lambda t, x, y: x.cartesian(y), d2)

And then I get some error :

java.lang.ClassCastException: org.apache.spark.api.java.JavaPairRDD cannot be cast to org.apache.spark.api.java.JavaRDD

p.s. Python2.7.11 + Spark 2.0.2

Thank you

Zhang Tong
  • 4,569
  • 3
  • 19
  • 38

1 Answers1

1

Yes, there is a known bug. Here is a JIRA:

https://issues.apache.org/jira/browse/SPARK-17756