0

Hi i have a code for pre processing with NLP below :

text = '''Gelaran perdana MotoGP Mandalika GP Indonesia, berhasil dimenangkan oleh pebalap Red Bull KTM, Miguel Oliveira, Minggu (20/3/2022).
Posisi kedua ditempati oleh pebalap Yamaha Fabio Quartararo dan podium ketiga dimenangkan oleh pebalap Pramac Ducati, Johann Zarco.'''

import re
import time 
text = text.replace('\n','')
sentence = re.split('\. |\.',text)
sentence

and the result like below :

['Gelaran perdana MotoGP Mandalika GP Indonesia, berhasil dimenangkan oleh pebalap Red Bull KTM, Miguel Oliveira, Minggu (20/3/2022)',
 'Posisi kedua ditempati oleh pebalap Yamaha Fabio Quartararo dan podium ketiga dimenangkan oleh pebalap Pramac Ducati, Johann Zarco',
 '']

It seems made a new value at the end which ''. how do i fix that? i mean is it naturally making a new value? how to not making that?

Thanks

Ken Arya
  • 15
  • 6
  • not really. kinda confused tho – Ken Arya Jun 27 '22 at 04:43
  • You have a `.` as your final character in your string, so splitting on `.` means it is also 'split' and you get an empty string because there was nothing after it. That's why you end up with a list with three elements (because you split on 2 separate `.`) where one element just happens to be an empty string. So the accepted answer in that question I linked tells you how to remove such things from your `split` results. – wkl Jun 27 '22 at 04:45

1 Answers1

0

The answer is to add strip and split.

sentence = text.strip(".").split(".")
Ken Arya
  • 15
  • 6