0

I'm facing this error with flatMap but not with map. Is boolean operation is not supported with flatMap transformation?

scala> val array = Array("age","astro")
array: Array[String] = Array(age, astro)

scala> val baseRdd = sc.parallelize(array)
baseRdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:29

scala> val flat2 = baseRdd2.flatMap(x => x.contains("a"))
<console>:31: error: type mismatch;
 found   : Boolean
 required: TraversableOnce[?]
         val flat2 = baseRdd2.flatMap(x => x.contains("a"))
Chris Martin
  • 30,334
  • 10
  • 78
  • 137
Balaji Reddy
  • 5,576
  • 3
  • 36
  • 47
  • What is your expected result? An RDD with two booleans, both true? – stholzm May 29 '16 at 05:52
  • @stholzm yes . But i can do tat with map. but my doubt is why flatmap is not returning the boolean result as a TraversableOnce – Balaji Reddy May 29 '16 at 05:55
  • 1
    Well, both `map` and `flatMap` return RDDs, the difference is that you have to pass a function that returns `TraversableOnce` to `flatMap`. It will then "flatten" the data structure, hence the name. `flatMap` is just defined that way. You *could* pass `x => Array(x.contains("a"))` to `flatMap`, but it would be more simple to just use `map` in that case. – stholzm May 29 '16 at 06:02
  • @stholzm it makes sense. pls update ur answer so tat i can accept your answer . :) – Balaji Reddy May 29 '16 at 06:04

2 Answers2

3

flatMap expects a function as parameter that returns TraversableOnce, i.e. a list or something. x.contains("a") returns a plain boolean - maybe you meant to use map instead of flatMap?

Both map and flatMap return RDDs, the difference is that you have to pass a function that returns TraversableOnce to flatMap. It will then "flatten" the data structure, hence the name. flatMap is just defined that way. You could pass x => Array(x.contains("a")) to flatMap, but it would be more simple to just use map in that case.

stholzm
  • 3,395
  • 19
  • 31
1

map evaluates a function over each element in the list, returning a list with the same number of elements. whereas flatMap invokes the function f for the element(s) of the collection producing a new collection. flatMap is simply a combination of map with flatten.

flatMap[B](f: A => Container[B]): Container[B]
Avi Chalbani
  • 842
  • 7
  • 11