I have written a simple function that does what you want:
def sentences_counter(text: str):
end_of_sentence = ".?!…"
# complete with whatever end of a sentence punctuation mark I might have forgotten
# you might for instance want to add '\n'.
sentences_count = 0
sentences = []
inside_a_quote = False
start_of_sentence = 0
last_end_of_sentence = -2
for i, char in enumerate(text):
# quote management, to solve your issue
if char == '"':
inside_a_quote = not inside_a_quote
if not inside_a_quote and text[i-1] in end_of_sentence: #
last_end_of_sentence = i #
elif inside_a_quote:
continue
# basic management of sentences with the punctuation marks in `end_of_sentence`
if char in end_of_sentence:
last_end_of_sentence = i
elif last_end_of_sentence == i-1:
sentences.append(text[start_of_sentence:i].strip())
sentences_count += 1
start_of_sentence = i
# same as the last block in case there is no end punctuation mark in the text
last_sentence = text[start_of_sentence:]
if last_sentence:
sentences.append(last_sentence.strip())
sentences_count += 1
return sentences_count, sentences
Consider the following:
text = '''Annie said, "Are you sure? How is it possible? you are joking, right?" No, I'm not... I thought you were'''
To generalize your problem a bit, I added 2 more sentences, one with ellipsis and the last one without even any end punctuation mark. Now, if I execute this:
sentences_count, sentences = sentences_counter(text)
print(f'{sentences_count} sentences detected.')
print(f'The detected sentences are: {sentences}')
I obtain this:
3 sentences detected.
The detected sentences are: ['Annie said, "Are you sure? How is it possible? you are joking, right?"', "No, I'm not...", 'I thought you were']
I think it works fine.
Note: Please consider the quote management of my solution works for American style quotes, where the end punctuation mark of the sentence can be inside of the quote. Remove the lines where I have put flag emojis to disable this.