2

How to convert flatMap of a text file to flatMap of characters? I have to count of occurrences of each character from a text file. What approach to take after following code?

val words = readme.flatMap(line => line.split(" ")).collect()
Community
  • 1
  • 1
Govind Yadav
  • 37
  • 1
  • 1
  • 5
  • I'll bet you a pint that running this serially outside of Spark will run quicker for almost any size of input. Is this an assignment? And if you're just counting characters, why are you splitting on space first? – The Archetypal Paul Feb 01 '17 at 12:43

3 Answers3

1

In order to convert each String into its representing characters, you need an additional flatMap:

val characters = lines.flatMap(_.split(" ")).flatMap(_.toCharArray)

scala> val lines = Array("hello world", "yay more lines")
lines: Array[String] = Array(hello world, yay more lines)

scala> lines.flatMap(_.split(" ")).flatMap(_.toCharArray)
res3: Array[Char] = Array(h, e, l, l, o, w, o, r, l, d, y, a, y, m, o, r, e, l, i, n, e, s)

Although this is a Scala console, it will work the same on an RDD.

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
0

If you are only interested in char's then I think you probably want to count spaces ' ' too

val chars = readme.flatMap(line => line.toCharArray)

// but if you dont want to count spaces too,
// val chars = readme.flatMap(line => line.toCharArray.filter(_ != ' '))

val charsCount = chars
  .map(c => (c, 1))
  .reduceByKey((i1: Int, i2: Int) => i1 + i2)
sarveshseri
  • 13,738
  • 28
  • 47
0
val txt = a.getClass.getResourceAsStream("/a.txt")
val txtFile = File.createTempFile("a", "txt")
txtFile.deleteOnExit()
ByteStreams.copy(txt, Files.newOutputStreamSupplier(txtFile))
val tokenized = sc.textFile(txtFile.toString).flatMap(_.split(' ')) 
val char = tokenized.flatMap(_.toCharArray)
Hinrich
  • 13,485
  • 7
  • 43
  • 66
muyexm329
  • 41
  • 1