1

I would like to divide a text into sentences based on a delimiter in python. However, I do not want to split them based on decimal points between numbers, or comma between numbers. How do we ignore them.

For example, I have a text like below.

I am xyz.I have 44.44$. I would like, to give 44,44 cents to my friend. 

The sentences has to be

I am xyz
I have 44.44$
I would like
to give 44,44 cents to my friend

Could you please help me with the regular expression. I am sorry if this question has already been asked before. I could not find it.

Thank you

hulk
  • 21
  • 3

1 Answers1

4

This works for your example, although there's a trailing full stop (period) on the last part if that matters.

import re

s = 'I am xyz. I have 44.44$. I would like, to give 44,44 cents to my friend.'

for part in re.split('[.,]\s+', s):
    print(part)

Output

I am xyz
I have 44.44$
I would like
to give 44,44 cents to my friend.

Wiktor's expression \s*[.,](?!\d)\s will work for your new example:

I am xyz.I have 44.44$. I would like, to give 44,44 cents to my friend.

Breaking this down:

  • \s* will match 0 to many whitespace characters.
  • [.,] will match either a , or a . character.
  • (?!\d) will cause the match to be discarded if a digit is matched at this point. This is necessary to avoid splitting within numbers.
  • \s will match a single whitespace character.

Note that it will still fail for sentences like "I am 22.10 years ago I was 12.", though I don't think there's any way to get around that using regular expressions alone.

Tagc
  • 8,736
  • 7
  • 61
  • 114
  • I am sorry, I did not give a proper example then. The regex which you have given will not work for I am xyz.I have 44.44$. I would like, to give 44,44 cents to my friend. – hulk Jan 27 '17 at 09:33
  • Thank you very much for the solution!! – hulk Jan 27 '17 at 10:00