0

I have a method that converts ArrayBuffer of Strings into RDD.

def makeRddFromArray() : RDD[String] = {
val rdd = Conf.sc.parallelize(listOfStrings)
//rdd.count
rdd}

With commented rdd.count() it's returning a rdd of size 0. When I uncomment this, the RDD is of proper size. Could someone explain me why? Thanks

Tomasz
  • 135
  • 1
  • 14

1 Answers1

0

rdd.count is an Action which will trigger the DAG and get the count of elements.

rdd will only specify the class

scala> rdd
res0: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize 
at <console>:24

Its not performing anything here.

Ishan Kumar
  • 1,941
  • 3
  • 20
  • 29
  • Thanks for answer. Later in the code I have : `val finalRDD = sc.union(listOfStringRDD)` If I have commented this `.count()` method in `makeRDDFromArray` method `println("Final RDD size: " + rdd.count())` prints 0. With commented it prints 100 what is right. – Tomasz Aug 31 '17 at 14:16