-1

I have a function which should take in a long string and separate it into a list of strings where each list element is a sentence of the article. I am going to achieve this by splitting on space and then grouping the elements from that split according to the tokens which end with a dot:

  def getSentences(article: String): List[String] = {
    val separatedBySpace = article
      .map((c: Char) => if (c == '\n') ' ' else c)
      .split(" ")

    val splitAt: List[Int] = Range(0, separatedBySpace.size)
      .filter(i => endsWithDot(separatedBySpace(0))).toList

    // TODO
  }

I have separated the string on space, and I've found each index that I want to group the list on. But how do I now turn separatedBySpace into a list of sentences based on splitAt?

Example of how it should work:

article = "I like donuts. I like cats."
result = List("I like donuts.", "I like cats.")

PS: Yes, I now that my algorithm for splitting the article into sentences has flaws, I just want to make a quick naive method to get the job done.

Sahand
  • 7,980
  • 23
  • 69
  • 137
  • Possible duplicate of [How to group a variable-length, repeating sequence in Scala](https://stackoverflow.com/questions/6800737/how-to-group-a-variable-length-repeating-sequence-in-scala) – Joe Jul 05 '19 at 10:30

2 Answers2

0

I ended up solving this by using recursion:

  def getSentenceTokens(article: String): List[List[String]] = {
    val separatedBySpace: List[String] = article
      .replace('\n', ' ')
      .replaceAll(" +", " ") // regex
      .split(" ")
      .toList

    val splitAt: List[Int] = separatedBySpace.indices
      .filter(i => ( i > 0 && endsWithDot(separatedBySpace(i - 1)) ) || i == 0)
      .toList

    groupBySentenceTokens(separatedBySpace, splitAt, List())
  }

  def groupBySentenceTokens(tokens: List[String], splitAt: List[Int], sentences: List[List[String]]): List[List[String]] = {
    if (splitAt.size <= 1) {
      if (splitAt.size == 1) {
        sentences :+ tokens.slice(splitAt.head, tokens.size)
      } else {
        sentences
      }
    }
    else groupBySentenceTokens(tokens, splitAt.tail, sentences :+ tokens.slice(splitAt.head, splitAt.tail.head))
  }
Sahand
  • 7,980
  • 23
  • 69
  • 137
0
val s: String = """I like donuts. I like cats
                   This is amazing"""

s.split("\\.|\n").map(_.trim).toList
//result: List[String] = List("I like donuts", "I like cats", "This is amazing")

To include the dots in the sentences:

val (a, b, _) = s.replace("\n", " ").split(" ")
                 .foldLeft((List.empty[String], List.empty[String], "")){

    case ((temp, result, finalStr), word) => 
        if (word.endsWith(".")) {
            (List.empty[String], result ++ List(s"$finalStr${(temp ++ List(word)).mkString(" ")}"), "")
        } else {
            (temp ++ List(word), result, finalStr)
        }
}

val result = b ++ List(a.mkString(" ").trim)
//result = List("I like donuts.", "I like cats.", "This is amazing")
lprakashv
  • 1,121
  • 10
  • 19