Suppose I have the text file like this:
Apple#mango&banana@grapes
The data needs to be split on multiple delimiters before performing the word count.
How to do that?
Suppose I have the text file like this:
Apple#mango&banana@grapes
The data needs to be split on multiple delimiters before performing the word count.
How to do that?
Use split
method:
scala> "Apple#mango&banana@grapes".split("[#&@]")
res0: Array[String] = Array(Apple, mango, banana, grapes)
If you just want to count words, you don't need to split. Something like this will do:
val numWords = """\b\w""".r.findAllIn(string).length
This is a regex that matches start of a word (\b
is a (zero-length) word boundary, \w
is any "word" character (letter, number or underscore), so you get all the matches in your string, and then just check how many there are.
If you are looking to count each word separately, and do it across multiple lines, then, split
is, probably, a better option:
source
.getLines
.flatMap(_.split("\\W+"))
.filterNot(_.isEmpty)
.groupBy(identity)
.mapValues(_.size)