150

How to search for occurrences of more than one space between words in a line

1. this is a line containing  2 spaces
2. this is a line containing   3 spaces
3. this is a line containing multiple spaces first  second   three   four

All the above are valid matches for this regex. What regex should I use?

Stevoisiak
  • 23,794
  • 27
  • 122
  • 225
Sam
  • 8,387
  • 19
  • 62
  • 97
  • Are you trying to check consecutive blank spaces or all spaces in that line? – Sachin Shanbhag Sep 21 '10 at 09:14
  • consecutive blank spaces not all spaces – Sam Sep 21 '10 at 09:22
  • 1
    What exactly do you mean by "between words"? In two of your examples, there are multiple spaces between a word and a digit. What about punctuation (for example, do you want to match multiple spaces after a dot and before the next word)? What about spaces before/after the last character in a line? Do you want to match tabs, too? What about lines that consist of nothing but whitespace? – Tim Pietzcker Sep 21 '10 at 11:35
  • spaces between "containing and 2", "containing and 3", "first and second", "second and three" ... Yes, I want to match spaces after a dot and before the next word. – Sam Sep 21 '10 at 12:11

5 Answers5

253
[ ]{2,}

SPACE (2 or more)

You could also check that before and after those spaces words follow. (not other whitespace like tabs or new lines)

\w[ ]{2,}\w

the same, but you can also pick (capture) only the spaces for tasks like replacement

\w([ ]{2,})\w

or see that before and after spaces there is anything, not only word characters (except whitespace)

[^\s]([ ]{2,})[^\s]
Alex
  • 14,338
  • 5
  • 41
  • 59
  • 2
    `\w` means 'word characters', that is, alphanumeric and underscore, but not other non-space characters. To check for non-whitespace, use `\S` (capital S). Also, the first one will only match lines that contain two or more spaces and nothing else. – tdammers Sep 21 '10 at 09:19
  • I tried to evolve the question. I understood that I missed what you said with `\S`, I just prefer not to rely on character case for such functionality, it's easier to read. – Alex Sep 21 '10 at 09:22
  • 1
    Why are you using anchors at all? He's looking for spaces embedded somewhere in the lines. – Tim Pietzcker Sep 21 '10 at 09:43
  • no particular reason. At first I thought I needed them, so I dragged them all along the process. In fact, you are right that I am wrong for using them in this case. I'll edit my answer right away. – Alex Sep 21 '10 at 09:45
  • 2
    `\w[ ]{2,}\w` will fail to match `word.<2 spaces>more words` or a string that consists entirely of spaces. `[^\s]([ ]{2,})[^\s]\w` will fail on lines that start with spaces or strings like `bla<2 spaces>.`... – Tim Pietzcker Sep 21 '10 at 09:48
  • at the moment it wasn't clear what the question was about. Finding spaces between words only (second regex), free spaces (first regex), or spaces between non-whitespace characters (last regex). Only the first regex checks for spaces regardless of whether they are preceded or followed by non-whitespace chars (including newlines). I specified it clearly in my answer. The question was too vague to give one single, definitive answer. – Alex Sep 21 '10 at 09:52
  • You are right, he *is* asking for "between words". Sorry, and thanks for making me aware of this. +1 for your answer then, but you still might want to point out that your second and third regex won't catch leading and trailing whitespace. Which might not be a problem for the OP. – Tim Pietzcker Sep 21 '10 at 10:07
  • 1
    Explanations: **1)** The `{min,max}` operator is the **_general repetition quantifier_** and **2)** Omitting `max` but _leaving_ the comma means unlimited repetitions. – Dem Pilafian Jan 31 '21 at 22:15
22

Simple solution:

/\s{2,}/

This matches all occurrences of one or more whitespace characters. If you need to match the entire line, but only if it contains two or more consecutive whitespace characters:

/^.*\s{2,}.*$/

If the whitespaces don't need to be consecutive:

/^(.*\s.*){2,}$/
tdammers
  • 20,353
  • 1
  • 39
  • 56
  • the `.*` is usually greedy, meaning that it will reach the end of the tested string, and all which follows, if there are mandatory characters, won't match. Usually in this case it's a good practice to add `?` , like this `.*?`. It happened to me using PHP's PCRE – Alex Sep 21 '10 at 09:35
  • It does match. "Greedy" means that it matches as much as possible while still matching the pattern as a whole. `/^.*b.*$/` does in fact match `"foobar"`, even though you'd expect the first greedy `.*` to match the entire string already. – tdammers Sep 21 '10 at 10:10
  • To search for one or more spaces, this worked in gvim: \s\{1,\} > I had to add escape char for { and }. Thanks :) – vineeshvs Nov 04 '22 at 14:11
14

This regex selects all spaces, you can use this and replace it with a single space

\s+

example in python

result = re.sub('\s+',' ', data))
Owen Yuwono
  • 342
  • 3
  • 9
  • Why is an answer to the question? This wouldn't work if, for instance, one wants to only replace the multiple spaces (with a tab) or any other operation that matches _multiple spaces only_ (as per original question). – gented Feb 16 '22 at 09:58
  • This will select everything, not spaces only – Faliorn Jun 22 '22 at 09:59
4

Search for [ ]{2,}. This will find two or more adjacent spaces anywhere within the line. It will also match leading and trailing spaces as well as lines that consist entirely of spaces. If you don't want that, check out Alexander's answer.

Actually, you can leave out the brackets, they are just for clarity (otherwise the space character that is being repeated isn't that well visible :)).

The problem with \s{2,} is that it will also match newlines on Windows files (where newlines are denoted by CRLF or \r\n which is matched by \s{2}.

If you also want to find multiple tabs and spaces, use [ \t]{2,}.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • `more than one space between words in a line`. How is `[ ]{2,}` between words? Have you even read the question? – Alex Sep 21 '10 at 10:02
  • Which is why I have referred to your answer in case the OP really wants to be as strict as he is writing. Maybe we should ask him. – Tim Pietzcker Sep 21 '10 at 11:32
2

Here is my solution

[^0-9A-Z,\n]

This will remove all the digits, commas and new lines but select the middle space such as data set of

  • 20171106,16632 ESCG0000018SB
  • 20171107,280 ESCG0000018SB
  • 20171106,70476 ESCG0000018SB
Ojitha
  • 950
  • 7
  • 8