0

I want to add a full stop after every line of sentence that I am getting from my cleaned text after performing text cleaning in order to perform summarisation using heapq or gensim. If I don't get a full stop, heapq or Gensim will not understand the different sentences and would take all the sentences as one. I am using the following code :

import en_core_web_sm
nlp = en_core_web_sm.load()
text = nlp(str1_clean_summary)

for sent in text.sents:
  print(sent.string.strip())

str1_clean_summary looks like this :

many price increase options 
still believe us need prove consistently 
aim please delay end displeasingich
responds wuickly

This gives me sentences in a different lines but I need to add a full stop after each sentence so they are treated separately.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Django0602
  • 797
  • 7
  • 26
  • `for sent in text.sents: print(sent.string.strip())` just prints the sentences without leading/trailing whitespace, do you mean you want to also add a dot to that output if missing? – Wiktor Stribiżew Jan 20 '20 at 10:23
  • It would help if you gave us examples of your input text (str1_clean_summary) – Tiago Duque Jan 20 '20 at 10:29
  • @WiktorStribiżew : Yes I want to want add a full stop after every line of output I am getting after this code. I have shared the set of output above. – Django0602 Jan 20 '20 at 10:49
  • 1
    Why not do that then? `print(sent.string.strip() + '.')`. Or `print('{}.'.format(sent.string.strip()))`, `print(f'{sent.string.strip()}.')` – Wiktor Stribiżew Jan 20 '20 at 10:51
  • A duplicate of [How can strings be concatenated?](https://stackoverflow.com/questions/2711579/how-can-strings-be-concatenated) – Wiktor Stribiżew Jan 20 '20 at 10:54

1 Answers1

1

If you don't want to fiddle with span indexing, I'd recommend you to add the final dot to each sentence before running spacy through them.

Ex.:

import en_core_web_sm
sents = "many price increase options\nstill believe us need prove consistently\naim please delay end displeasingich\nresponds wuickly\n"
sents = sents.replace('\n', '.\n')

nlp = en_core_web_sm.load()
text = nlp(sents)

for sent in text.sents:
  sentence = sent
  print(sentence)

Output:

many price increase options.

still believe us need prove consistently.

aim please delay end displeasingich.

responds wuickly.

Otherwise you'll have to work with token positioning (spans are token lists and due to spacy internal way to organize its vocabulary and other resources, the tokens in a span are "pointers" to a token dictionary. To add a new token, you'd have to move the tails of each span forward which is worse than just playing with a simple replace.) Read more here and here.

Community
  • 1
  • 1
Tiago Duque
  • 1,956
  • 1
  • 12
  • 31