I am just beginning with Scala development and am trying to filter out unnecessary lines from an iterator using filter and collect. But the operation seems to be too slow.
val src = Source.fromFile("/home/Documents/1987.csv") // 1.2 Million
val iter = src.getLines().map(_.split(":"))
val iter250 = iter.take(250000) // Only interested in the first 250,000
val intrestedIndices = range(1, 100000, 3).toSeq // This could be any order
val slicedData = iter250.zipWithIndex
// Takes 3 minutes
val firstCase = slicedData.collect { case (x, i) if intrestedIndices.contains(i) => x }.size
// Takes 3 minutes
val secondCase = slicedData.filter(x => intrestedIndices.contains(x._2)).size
// Takes 1 second
val thirdCase = slicedData.collect { case (x,i ) if i % 3 == 0 => x}.size
It appears the intrestedIndices.contains(_) part is slowing down the program in the first and second case. Is there an alternative way to speed this process up.