155

In Perl \S matches any non-whitespace character.

How can I match any non-whitespace character except a backslash \?

Hannele
  • 9,301
  • 6
  • 48
  • 68
Lazer
  • 90,700
  • 113
  • 281
  • 364

6 Answers6

194

You can use a character class:

/[^\s\\]/

matches anything that is not a whitespace character nor a \. Here's another example:

[abc] means "match a, b or c"; [^abc] means "match any character except a, b or c".

Ben Carp
  • 24,214
  • 9
  • 60
  • 72
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • When is `^` interpreted as negation and when as line beginning ? In that respect, why this wont match a line starting with number of white spaces `$0~/\s*^\s/` – Alexander Cska Mar 26 '19 at 21:43
  • 1
    Outside of a character class, it's "beginning of the string" (or line, depending on the current matching mode). Inside a character class, and only if it's the first character after the opening bracket, it negates the contents of the character class. – Tim Pietzcker Mar 26 '19 at 21:45
  • Will the following match line that begins with a number of white spaces `$0~/\s*^\s/` followed by any character that is not a white spaces – Alexander Cska Mar 26 '19 at 21:47
  • 1
    That should probably be `/^\s+/` - start of line, followed by one or more whitespace characters. – Tim Pietzcker Mar 26 '19 at 21:47
  • Unfortunately it does not work. I am trying to match a line if it begins with an indent – Alexander Cska Mar 26 '19 at 21:49
  • 1
    @AlexanderCska, have you figured it out? The above answer will only return the first match of a string. If you want all matches to be returned add the `g` modifier. `/[^\s\\]/g` – Ben Carp Dec 01 '19 at 12:13
17

You can use a lookahead:

/(?=\S)[^\\]/
Denis de Bernardy
  • 75,850
  • 13
  • 131
  • 154
  • 2
    It looks ahead if it's not a space. And then the negative class accepts anything (which is not a space) except the characters in your class. – Denis de Bernardy May 25 '11 at 14:30
  • I like this solution. It's good for things like "give me all the non-word characters except whitespace": `/(?=\S)\W/` – jocull Feb 24 '17 at 19:55
  • I had a situation where I needed to match any non whitespace character as well as non quotes. It also had to allow for SPACES. Ex: `THIS IS A TEST, AND AGAIN`. The following worked well for me `(?=\S)[^"]*`. – Arvo Bowen Jun 27 '19 at 21:46
  • the accepted answer didn't work for me but this did. i was using this in sublime text regex search – Christian Noel Jun 11 '20 at 07:17
  • I searching for how to select any non Word char except - and her it's `/(?=\W)[^-]/g` – Taufik Nurhidayat Jan 01 '21 at 11:51
12

This worked for me using sed [Edit: comment below points out sed doesn't support \s]

[^ ]

while

[^\s] 

didn't

# Delete everything except space and 'g'
echo "ghai ghai" | sed "s/[^\sg]//g"
gg

echo "ghai ghai" | sed "s/[^ g]//g"
g g
storm_m2138
  • 2,281
  • 2
  • 20
  • 18
  • 3
    `\s` matches more than just the space character. It includes TAB, linefeed carriage return, and others (how *many* others depends on the regex flavor). It's a Perl invention, originally a shorthand for the POSIX character class `[:space:]`, and not supported in `sed`. Your first regex above should be `s/[^[:space:]g]//g`. – Alan Moore Feb 10 '16 at 20:43
  • Yup @AlanMoore works: ```echo "ghai ghai" | sed "s/[^[:space:]g]//g"``` Yields: ```g g``` – storm_m2138 Feb 10 '16 at 22:20
2

On my system: CentOS 5

I can use \s outside of collections but have to use [:space:] inside of collections. In fact I can use [:space:] only inside collections. So to match a single space using this I have to use [[:space:]] Which is really strange.

echo a b cX | sed -r "s/(a\sb[[:space:]]c[^[:space:]])/Result: \1/"

Result: a b cX
  • first space I match with \s
  • second space I match alternatively with [[:space:]]
  • the X I match with "all but no space" [^[:space:]]

These two will not work:

a[:space:]b  instead use a\sb or a[[:space:]]b

a[^\s]b      instead use a[^[:space:]]b
Torge
  • 2,174
  • 1
  • 23
  • 33
  • 1
    As of sed 4.4, it is apparently still true that you have to use `([^[:space:]])` instead of `([^\s])`. I'm on openSUSE Tumbleweed 2018 04 03. – user2394284 Apr 06 '18 at 11:01
0

If using regular expressions in bash or grep or something instead of just in perl, \S doesn't work to match all non-whitespace chars. The equivalent of \S, however, is [^\r\n\t\f\v ].

So, instead of this:

[^\s\\]

...you'll have to do this instead, to match no whitespace chars (regex: \r\n\t\f\v ) and no backslash (\; regex: \\)

[^\r\n\t\f\v \\]

References:

  1. [my answer] Unix & Linux: Any non-whitespace regular expression
Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
0

In this case, it's easier to define the problem of "non-whitespace without the backslash" to be not "whitespace or backslash", as the accepted answer shows:

/[^\s\\]/

However, for tricker problems, the regex set feature might be handy. You can perform set operations on character classes to get what you want. This one subtracts the set that is just the backslash from the set that is the non-whitespace characters:

use v5.18;
use experimental qw(regex_sets);

my $regex = qr/abc(?[ [\S] - [\\] ])/;


while( <DATA> ) {
    chomp;
    say "[$_] ", /$regex/ ? 'Matched' : 'Missed';
    }

__DATA__
abcd
abc d
abc\d
abcxyz
abc\\xyz

The output shows that neither whitespace nor the backslash matches after c:

[abcd] Matched
[abc d] Missed
[abc\d] Missed
[abcxyz] Matched
[abc\\xyz] Missed

This gets more interesting when the larger set would be difficult to express gracefully and set operations can refine it. I'd rather see the set operation in this example:

[b-df-hj-np-tv-z]
(?[ [a-z] - [aeiou] ])
brian d foy
  • 129,424
  • 31
  • 207
  • 592