0

I have a function that takes a string and extract values from string using sub-string and query the Cassandra table using these values.

def formatInputString(line: String) = {
    // extract values from line using sub-string and query Cassandra table.
}

If I pass the values by reading text file using Source.fromFile, It works (prints the result from Cassandra)...

// using Scala getLine()
for (line <- Source.fromFile("file.txt").getLines()) {
          formatInputString(line)
}

But it just hangs up if use Spark RDD like this...

// using Spark RDD
val line = sc.textFile("file.txt")
val lst = line.map(formatInputString)

Can somebody explain this behaviour and how to get around this (I need to use RDD version).

xeonie
  • 3
  • 3
  • Is "file.txt" a local file? `textFile` expects an HDFS file. Since you're not getting an error, maybe you just need to `collect` the results? See here regarding local files in spark: http://stackoverflow.com/questions/27299923/how-to-load-local-file-in-sc-textfile-instead-of-hdfs – Alfredo Gimenez Jun 22 '16 at 16:17

1 Answers1

0

Spark executes many operations in a lazy fashion by default. If you're calling rdd.map(x => (some element transformation)), this transformation would not take place until you execute an action.

http://spark.apache.org/docs/latest/programming-guide.html#actions

You can try calling 'rdd.foreach' instead of 'map', as seen in the above documentation.

Bryan
  • 71
  • 2