How to capitalize every beginning of a sentence in a text in python?

Question

I want to create a function that takes as an input a string which is a text, and I want to capitalize every letter that lies after a punctuation. The thing is, strings don't work like lists so I don't really know how to do it, I tried to do this, but it doesn't seem to be working :

def capitalize(strin):
    listrin=list(strin)
    listrin[0]=listrin[0].upper()
    ponctuation=['.','!','?']
    strout=''
    for x in range (len(listrin)):
        if listrin[x] in ponctuation:
            if x!=len(listrin):
                if listrin[x+1]!=" ":
                    listrin[x+1]=listrin[x+1].upper()
                elif listrin[x+2]!=" ":
                    listrin[x+1]=listrin[x+1].upper()
    for y in range(len(listrin)):
        strout=strout+listrin[y]
    return strout

For now, I am trying to solve it with this string: 'hello! how are you? please remember capitalization. EVERY time.'

score 4 · Answer 1 · answered Nov 29 '17 at 11:22

4

I use regexp to do this.

>>> import re
>>> line = 'hi. hello!   how are you?  fine!  me too, haha. haha.'
>>> re.sub(r"(?:^|(?:[.!?]\s+))(.)",lambda m: m.group(0).upper(), line)
'Hi. Hello!   How are you?  Fine!  Me too, haha. Haha.'

answered Nov 29 '17 at 11:22

Kinght 金

17,681
4
60
74

Find the first non-space char of the sentence, then upper it. Repeat on all the sentences. The sentence is the at the start of one paragraph, or starts with ".!?" and some spaces. – Kinght 金 Nov 29 '17 at 11:33
I meant for Sammy to explain it :) – Marcin Nov 29 '17 at 11:37
Yeah, I taught him to explain it. :) – Kinght 金 Nov 29 '17 at 11:52
Thank you very much for your answer it reall helps me, even though I don't really understand what is happening, I will try to dive more into it. – Sammy Steffensen Nov 29 '17 at 12:16

sid8491 · Answer 2 · 2017-11-30T06:18:04.760

The most basic approach is to split the sentences based on the punctuation, then you will have a list. Then loop into all the items of list, strip() them and then capitalize() them. Something like below might solve your problem:

import re
input_sen = 'hello! how are you? please remember capitalization. EVERY time.'
sentence = re.split(pass_your_punctuation_list_here, input_sen)
    for i in sentence:
        print(i.strip().capitalize(), end='')

However better to use nltk library:

from nltk.tokenize import sent_tokenize
input_sen = 'hello! how are you? please remember capitalization. EVERY time.'
sentences = sent_tokenize(input_sen)
sentences = [sent.capitalize() for sent in sentences]
print(sentences)

It is better to use NLTK library or some other NLP library than manually writing rules and regex because it takes care of many cases which we don't account. It solves the problem of Sentence boundary disambiguation.

Sentence boundary disambiguation (SBD), also known as sentence breaking, is the problem in natural language processing of deciding where sentences begin and end. Often natural language processing tools require their input to be divided into sentences for a number of reasons. However sentence boundary identification is challenging because punctuation marks are often ambiguous. For example, a period may denote an abbreviation, decimal point, an ellipsis, or an email address – not the end of a sentence. About 47% of the periods in the Wall Street Journal corpus denote abbreviations. As well, question marks and exclamation marks may appear in embedded quotations, emoticons, computer code, and slang. Languages like Japanese and Chinese have unambiguous sentence-ending markers.

Hope it helps.

I have updated the code and used different method to break input into sentences. Tokenizers is used to divide strings into lists of substrings. eg, sent_tokenize is used to find list of sentences. @SammySteffensen — sid8491, Nov 30 '17 at 06:14

How to capitalize every beginning of a sentence in a text in python?

2 Answers2

Linked

Related