1

Suppose I have this:

import re
s = '     Hello,     world     '
re.sub(r'\S+', 'a', s)        

it gets me

'   a    a    '

as expected (greedily taking longest substring with at least one \S (any unicode non-whitespace) character, but

re.sub(r'\S*', 'a', s)       

gets me

'a a a a a a a a a a a a'

which seems weird to me. Where did all those whitespace substrings go? What is really going on here?

E: I understand that it puts string 'a' every time it matches an empty string, but I can't follow why it doesn't leave whitespace substrings alone, as \S* shouldn't be a match for them, as opposed to \s*

nalzok
  • 14,965
  • 21
  • 72
  • 139
Hugh Mongous
  • 123
  • 1
  • 2
  • 6
  • `re.sub(r'\S+', ' ', s)` gives me an empty string without any `a` characters – RomanPerekhrest Dec 11 '16 at 20:40
  • 1
    The explanation is the same as in the linked dupe. `\S*` matches a location before a non-matching symbol, ie. before any whitespace symbol. See [this demo](https://regex101.com/r/NLUDpa/1) to see where the matches are, hence, the result. – Wiktor Stribiżew Dec 11 '16 at 20:42
  • ^ same here. Which makes sense, because \S+ matches everything that is not whitespace, and replaces it with whitespaces. – Ulf Aslak Dec 11 '16 at 20:42
  • corrected to `re.sub(r'\S+', 'a', s)` as it should be – Hugh Mongous Dec 11 '16 at 20:42

0 Answers0