I have a text file sherlock.txt containing multiple lines of text. I load it in spark-shell using:
val textFile = sc.textFile("sherlock.txt")
My purpose is to count the number of words in the file. I came across two alternative ways to do the job.
First using flatMap:
textFile.flatMap(line => line.split(" ")).count()
Second using map followed by reduce:
textFile.map(line => line.split(" ").size).reduce((a, b) => a + b)
Both yield the same result correctly. I want to know the differences in time and space complexity of the above two alternative implementations, if indeed there is any ?
Does the scala interpreter convert both into the most efficient form ?