How to match any non white space character except a particular one?

Question

In Perl \S matches any non-whitespace character.

How can I match any non-whitespace character except a backslash \?

score 194 · Accepted Answer · edited Dec 01 '19 at 12:13

194

/[^\s\\]/

matches anything that is not a whitespace character nor a \. Here's another example:

[abc] means "match a, b or c"; [^abc] means "match any character except a, b or c".

edited Dec 01 '19 at 12:13

Ben Carp

24,214
9
60
72

answered May 25 '11 at 13:23

Tim Pietzcker

328,213
58
503
561

When is `^` interpreted as negation and when as line beginning ? In that respect, why this wont match a line starting with number of white spaces `$0~/\s*^\s/` – Alexander Cska Mar 26 '19 at 21:43
1

Outside of a character class, it's "beginning of the string" (or line, depending on the current matching mode). Inside a character class, and only if it's the first character after the opening bracket, it negates the contents of the character class. – Tim Pietzcker Mar 26 '19 at 21:45
Will the following match line that begins with a number of white spaces `$0~/\s*^\s/` followed by any character that is not a white spaces – Alexander Cska Mar 26 '19 at 21:47
1

That should probably be `/^\s+/` - start of line, followed by one or more whitespace characters. – Tim Pietzcker Mar 26 '19 at 21:47
Unfortunately it does not work. I am trying to match a line if it begins with an indent – Alexander Cska Mar 26 '19 at 21:49
1

@AlexanderCska, have you figured it out? The above answer will only return the first match of a string. If you want all matches to be returned add the `g` modifier. `/[^\s\\]/g` – Ben Carp Dec 01 '19 at 12:13

score 17 · Answer 2 · answered May 25 '11 at 13:22

17

You can use a lookahead:

/(?=\S)[^\\]/

answered May 25 '11 at 13:22

Denis de Bernardy

75,850
13
131
154

2

It looks ahead if it's not a space. And then the negative class accepts anything (which is not a space) except the characters in your class. – Denis de Bernardy May 25 '11 at 14:30
I like this solution. It's good for things like "give me all the non-word characters except whitespace": `/(?=\S)\W/` – jocull Feb 24 '17 at 19:55
I had a situation where I needed to match any non whitespace character as well as non quotes. It also had to allow for SPACES. Ex: `THIS IS A TEST, AND AGAIN`. The following worked well for me `(?=\S)[^"]*`. – Arvo Bowen Jun 27 '19 at 21:46
the accepted answer didn't work for me but this did. i was using this in sublime text regex search – Christian Noel Jun 11 '20 at 07:17
I searching for how to select any non Word char except - and her it's `/(?=\W)[^-]/g` – Taufik Nurhidayat Jan 01 '21 at 11:51

storm_m2138 · Answer 3 · 2016-02-10T22:23:36.377

12

This worked for me using sed [Edit: comment below points out sed doesn't support \s]

[^ ]

while

[^\s]

didn't

# Delete everything except space and 'g'
echo "ghai ghai" | sed "s/[^\sg]//g"
gg

echo "ghai ghai" | sed "s/[^ g]//g"
g g

edited Feb 10 '16 at 22:23

answered Feb 10 '16 at 20:15

storm_m2138

2,281
2
20
18

3

`\s` matches more than just the space character. It includes TAB, linefeed carriage return, and others (how *many* others depends on the regex flavor). It's a Perl invention, originally a shorthand for the POSIX character class `[:space:]`, and not supported in `sed`. Your first regex above should be `s/[^[:space:]g]//g`. – Alan Moore Feb 10 '16 at 20:43
Yup @AlanMoore works: ```echo "ghai ghai" | sed "s/[^[:space:]g]//g"``` Yields: ```g g``` – storm_m2138 Feb 10 '16 at 22:20

score 2 · Answer 4 · answered Nov 15 '17 at 15:43

On my system: CentOS 5

I can use \s outside of collections but have to use [:space:] inside of collections. In fact I can use [:space:] only inside collections. So to match a single space using this I have to use [[:space:]] Which is really strange.

echo a b cX | sed -r "s/(a\sb[[:space:]]c[^[:space:]])/Result: \1/"

Result: a b cX

first space I match with \s
second space I match alternatively with [[:space:]]
the X I match with "all but no space" [^[:space:]]

These two will not work:

a[:space:]b  instead use a\sb or a[[:space:]]b

a[^\s]b      instead use a[^[:space:]]b

As of sed 4.4, it is apparently still true that you have to use `([^[:space:]])` instead of `([^\s])`. I'm on openSUSE Tumbleweed 2018 04 03. — user2394284, Apr 06 '18 at 11:01

Gabriel Staples · Answer 5 · 2022-03-07T16:49:52.743

If using regular expressions in bash or grep or something instead of just in perl, \S doesn't work to match all non-whitespace chars. The equivalent of \S, however, is [^\r\n\t\f\v ].

So, instead of this:

[^\s\\]

...you'll have to do this instead, to match no whitespace chars (regex: \r\n\t\f\v ) and no backslash (\; regex: \\)

[^\r\n\t\f\v \\]

References:

[my answer] Unix & Linux: Any non-whitespace regular expression

score 0 · Answer 6 · answered Mar 07 '22 at 22:05

In this case, it's easier to define the problem of "non-whitespace without the backslash" to be not "whitespace or backslash", as the accepted answer shows:

/[^\s\\]/

However, for tricker problems, the regex set feature might be handy. You can perform set operations on character classes to get what you want. This one subtracts the set that is just the backslash from the set that is the non-whitespace characters:

use v5.18;
use experimental qw(regex_sets);

my $regex = qr/abc(?[ [\S] - [\\] ])/;


while( <DATA> ) {
    chomp;
    say "[$_] ", /$regex/ ? 'Matched' : 'Missed';
    }

__DATA__
abcd
abc d
abc\d
abcxyz
abc\\xyz

The output shows that neither whitespace nor the backslash matches after c:

[abcd] Matched
[abc d] Missed
[abc\d] Missed
[abcxyz] Matched
[abc\\xyz] Missed

This gets more interesting when the larger set would be difficult to express gracefully and set operations can refine it. I'd rather see the set operation in this example:

[b-df-hj-np-tv-z]
(?[ [a-z] - [aeiou] ])

How to match any non white space character except a particular one?

6 Answers6

References:

Linked