7

Suppose I have this piece of text:

Saturday and Sunday and Monday and Tuesday and Wednesday and Thursday and Friday are days of the week.  

I want all but the last and to be replaced with a comma:

Saturday, Sunday, Monday, Tuesday, Wednesday, Thursday and Friday are days of the week. 

Is there an easy way to do that in regex? As far as I know, the replace method in regex replaces the strings all the way through.

Mangu Singh Rajpurohit
  • 10,806
  • 4
  • 68
  • 97
Clement Attlee
  • 723
  • 3
  • 8
  • 16
  • 5
    Not using the Oxford comma, I see. – Peter Wood Nov 13 '15 at 04:23
  • Strictly speaking, regular expressions only do matching, and substitution is a feature of the hosting language, usually its string processing facilities. – tripleee Nov 13 '15 at 04:34
  • This is a bit unreadable. Maybe you could amuse yourself with it. "".join(reduce(lambda x , y : x+["and"+y] if len(x)==0 else x+[","+y] ,re.split("and","Saturday and Sunday and Monday and Tuesday and Wednesday and Thursday and Friday are days of the week. ")[::-1],[])[::-1])[1:] – Akshay Hazari Nov 13 '15 at 05:04

2 Answers2

18

str.replace() method has a count argument:

str.replace(old, new[, count])

Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

Then, use str.count() to check how many and in the string and then -1 (because you need the last and):

str.count(sub[, start[, end]])

Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.

Demo:

>>> string = 'Saturday and Sunday and Monday and Tuesday and Wednesday and Thursday and Friday are days of the week.'   
>>> string.replace(' and ', ", ", (string.count(' and ')-1))
'Saturday, Sunday, Monday, Tuesday, Wednesday, Thursday and Friday are days of the week.  '
Remi Guan
  • 21,506
  • 17
  • 64
  • 87
4

If you want a regex solution, you could match all the ands which are followed by another one later in the string.

>>> str='Monday and Tuesday and Wednesday and Thursday and Friday and Saturday and Sunday are the days of the week.'
>>> import re
>>> re.sub(' and (?=.* and )', ', ', str)
'Monday, Tuesday, Wednesday, Thursday, Friday, Saturday and Sunday are the days of the week.'

(?=...) is a lookahead which makes sure there is a match later in the string without including it in the actual match (so also not in the substitution). It's sort of like a conditional on the match.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • What will happen to this string: `'Monday and Tuesday and Wednesday and Thursday and Friday and Saturday and Sunday are the days of the week and it is Monday.'`? – kylieCatt Nov 13 '15 at 04:49
  • That's easy to find out, isn't it? Maybe change the `.*` in the lookahead to `[^.?!]*` to never allow it to match past sentence punctuation. But then how do you deal with inter-sentence abbreviations with a period which isn't a sentence terminator? You are quickly ending up with [Zawinski's problem](http://programmers.stackexchange.com/questions/223634/what-is-meant-by-now-you-have-two-problems). For anything beyond simple tokens, regex is probably not a suitable tool. – tripleee Nov 13 '15 at 04:53
  • But then for this simple problem you could probably restrict it even further, and hope it never matches past a verb, either. "John and Mary and I went to Buckingham Palace and had a beer." – tripleee Nov 13 '15 at 04:54
  • As a workaround (that may also fail in some situations), you could limit the number of words between each occurrence of `and`, ie: `' and (?=(?:[^.,?! ]+ ){1,4}and )'`. – Mariano Nov 13 '15 at 08:26
  • @Mariano John and John's second cousin's husband's dog and I ...? Actually that's a pretty good idea for a limited scope, but you can't solve the general problem with regex. – tripleee Nov 13 '15 at 08:29
  • @tripleee Indeed, you can't solve the general case with regex. I simply suggested an alternative, which would also fail in some scenarios. – Mariano Nov 13 '15 at 08:32