2

Im working on a problem set to count sentences. I decided to implement by using regular expressions to split the string at the characters "?, ., !". When I pass my text to re.split, it is including an empty string at the end of the list.

source code:

from cs50 import get_string
import re


def main():
    text = get_string("Text: ")
    cole_liau(text)


# Implement 0.0588 * L - 0.296 * S - 15.8; l = avg num of letters / 100 words , S = avg num of sentences / 100 words
def cole_liau(intext):

    words = []
    letters = []

    sentences = re.split(r"[.!?]+", intext)
    print(sentences)
    print(len(sentences))

main()

Output:

Text: Congratulations! Today is your day. You're off to Great Places! You're off and away!

['Congratulations', ' Today is your day', " You're off to Great Places", " You're off and away", '']

5

I tried adding the + expression to make sure it was matching at least 1 [.!?] but that did not work either.

Jan
  • 42,290
  • 8
  • 54
  • 79

2 Answers2

1

You may use a comprehension:

def cole_liau(intext):

    words = []
    letters = []

    sentences = [sent for sent in re.split(r"[.!?]+", intext) if sent]
    print(sentences)
    print(len(sentences))

Which yields

['Congratulations', ' Today is your day', " You're off to Great Places", " You're off and away"]
4

As to why re.split() returns an empty string, see this answer.

Jan
  • 42,290
  • 8
  • 54
  • 79
  • Using a comprehension here is expensive, it iterates through all the list just to remove the last element. – totok Jun 24 '20 at 14:38
  • @totok: Not really, it could be the first element, the last or anything in between. – Jan Jun 24 '20 at 14:45
1

re.split is working fine here. You have a ! at the end of the last sentence, so it will split the text before (a sentence), and after (a null character).

You can just add [:-1] at the end of your line to remove the last element of the list :

sentences = re.split(r"[.!?]+", intext)[:-1]

Output :

['Congratulations', ' Today is your day', " You're off to Great Places", " You're off and away"]
totok
  • 1,436
  • 9
  • 28