regular expressions and whitespaces in re library

Asked Dec 11 '16 at 20:34

Active Dec 12 '16 at 12:29

Viewed 58 times

Suppose I have this:

import re
s = '     Hello,     world     '
re.sub(r'\S+', 'a', s)

it gets me

'   a    a    '

as expected (greedily taking longest substring with at least one \S (any unicode non-whitespace) character, but

re.sub(r'\S*', 'a', s)

gets me

'a a a a a a a a a a a a'

which seems weird to me. Where did all those whitespace substrings go? What is really going on here?

E: I understand that it puts string 'a' every time it matches an empty string, but I can't follow why it doesn't leave whitespace substrings alone, as \S* shouldn't be a match for them, as opposed to \s*

edited Dec 12 '16 at 12:29

nalzok

14,965
21
72
139

asked Dec 11 '16 at 20:34

Hugh Mongous

`re.sub(r'\S+', ' ', s)` gives me an empty string without any `a` characters – RomanPerekhrest Dec 11 '16 at 20:40
1

The explanation is the same as in the linked dupe. `\S*` matches a location before a non-matching symbol, ie. before any whitespace symbol. See [this demo](https://regex101.com/r/NLUDpa/1) to see where the matches are, hence, the result. – Wiktor Stribiżew Dec 11 '16 at 20:42
^ same here. Which makes sense, because \S+ matches everything that is not whitespace, and replaces it with whitespaces. – Ulf Aslak Dec 11 '16 at 20:42
corrected to `re.sub(r'\S+', 'a', s)` as it should be – Hugh Mongous Dec 11 '16 at 20:42

regular expressions and whitespaces in re library

0 Answers0