Regex for ONE-or-more letters/digits And ZERO-or-more spaces

Question

I want to allow 0 or more white spaces in my string and one or more A-Z or a-z or 0-9 in my string.

Regex allowing a space character in Java

suggests [0-9A-Za-z ]+.

I doubt that, this regex matches patterns having zero or more white spaces.

What to do to allow 0 or more whitespaces anywhere in the string and one or more characters anywhere in the string.

Will this work? ([0-9A-Za-z]+)([ ]*)

By any convenient chance, do you know the first letter is not a whitespace? — Duncan Jones, Apr 15 '14 at 18:24
why do you have doubts? It will match "blanks" - if you want accept all whitespaces, use `[0-9A-Za-z\s]+` — dognose, Apr 15 '14 at 18:24
`([0-9A-Za-z]+)([ ])*` will require that it cannot start with a space. — Cruncher, Apr 15 '14 at 18:26
@vvid: Zero or more spaces... Well, the regex covers EVERYTHING except the case of `zero whitespaces` - for that, use an empty-string check and you got all you need. — dognose, Apr 15 '14 at 20:32

Cruncher · Accepted Answer · 2014-04-15T18:50:16.190

20

I believe you can do something like this:

([ ]*+[0-9A-Za-z]++[ ]*+)+

This is 0 or more spaces, followed by at least 1 alphanum char, followed by 0 or more spaces

^^ that whole thing at least once.

Using Pshemo's idea of possessive quantifiers to speed up the regex.

edited Apr 15 '14 at 18:50

answered Apr 15 '14 at 18:29

Cruncher

7,641
1
31
65

2

+1 While not as fancy as the look-ahead options, this is certainly easier to understand! – Duncan Jones Apr 15 '14 at 18:32
Does this check for spaces between characters? Eg:"Good morning J"? – vplusplus Apr 15 '14 at 18:36
@wid absolutely. `Good` will be matched by the `[0-9A-Za-z]+`, then since it doesn't work, it will go to the beginning of the group again and match the space with `[ ]*`, and repeat. – Justin Apr 15 '14 at 18:38
@Quincunx Hmm, you're right. I don't know much about regex interpreters. Could that potentially be slower since it has to recheck for spaces after every char? – Cruncher Apr 15 '14 at 18:40
@Quincunx Not exactly. What if spaces would be placed at end of String? – Pshemo Apr 15 '14 at 18:40
@Pshemo Then Cruncher's regex won't work. `([ ]*[0-9A-Za-z])+` is equivalent to `([ ]*[0-9A-Za-z]+)+` (that is what I meant in my comment) – Justin Apr 15 '14 at 18:41
@Pshemo Ah, you're right, in that case mine doesn't work as is. I have to put the space back into the end – Cruncher Apr 15 '14 at 18:41
@Cruncher Or just do `([ ]*[0-9A-Za-z]*)+` or even `([ ]?[0-9A-Za-z]?)+` – Justin Apr 15 '14 at 18:41
2

@Cruncher Yep. Also `([foo]+)+` smells like catastrophic backtracking. – Pshemo Apr 15 '14 at 18:42
@Quincunx `([ ]*[0-9A-Za-z]*)+` will match empty – Cruncher Apr 15 '14 at 18:42
Fine, then `[0-9A-Za-z ]([ ][0-9A-Za-z]|[0-9A-Za-z]|[ ])+` – Justin Apr 15 '14 at 18:43
@Quincunx at that `([0-9A-Za-z ]+[ ]*)+` works as well. There's a tonne of variations of this actually. – Cruncher Apr 15 '14 at 18:44
My thought is that avoiding nested `*` and `+` might speed up the regex. – Justin Apr 15 '14 at 18:45
@Quincunx yeah, I actually like donut's the best, although you have to think about it for a few seconds first – Cruncher Apr 15 '14 at 18:47
1

If you want to speed up this regex you can use [possessive quantifiers](http://www.regular-expressions.info/possessive.html) `([ ]*+[0-9A-Za-z]++[ ]*+)+`. – Pshemo Apr 15 '14 at 18:47
@Pshemo In this case, the possessive quantifiers shouldn't change the language right? – Cruncher Apr 15 '14 at 18:52
I am not sure what you mean by "change the language", but they will not change set of matched/accepted Strings. – Pshemo Apr 15 '14 at 18:54
@Pshemo Well, any regex string represents a formal language (specifically a regular one) which is defined by which strings are and are not in it. Put more specifically, do the positive quantifiers reject or accept any string that it wouldn't have before? – Cruncher Apr 15 '14 at 18:56
I pressed enter key to fast. See my updated earlier comment :) – Pshemo Apr 15 '14 at 18:57
@Pshemo, Quincunx: This works. If I have another constraint like the string must have atleast one digit and atleast one alpha, I tried doing `([ ]*+[0-9]++[A-Za-z]++[ ]*+)+`. That doesn't work. What is wrong here? – vplusplus Apr 15 '14 at 22:11
@wid In that case you need to use [look-ahead](http://www.regular-expressions.info/lookaround.html#lookahead) mechanisms like in [M42 answer](http://stackoverflow.com/a/23091541/1393766). – Pshemo Apr 15 '14 at 22:20
` ^(?=.*\\s*)(?=.*[0-9]+)(?=.*[a-zA-z]+)[a-zA-z0-9 ]+$ ` - I used this for the above constraint. Thanks a lot! – vplusplus Apr 15 '14 at 22:53

score 12 · Answer 2 · answered Nov 28 '16 at 08:44

The most simple answer

* means zero or more equivalent to {0,}

+ means one or more equivalent to {1,}

so look at this

[A-Z]+ means at least one Capital Letter, can be written as [A-Z]{1,}

[!@#$%&]. means you can have these Special Characters zero or more times can be written as [!@#$%&]{0,}

sorry but

the purpose of this answer to be as Simple as possible

score 7 · Answer 3 · answered Apr 15 '14 at 18:37

7

You can try also this :

  ^[0-9A-Za-z ]*[0-9A-Za-z]+[ ]*$

answered Apr 15 '14 at 18:37

donut

790
5
11

1

Smart! I seriously doubted this before trying it. – aliteralmind Apr 15 '14 at 18:44

score 4 · Answer 4 · answered Apr 15 '14 at 18:25

4

Use lookahead:

^(?=.*\s*)(?=.*[a-zA-Z0-9]+)[a-zA-Z0-9 ]+$

answered Apr 15 '14 at 18:25

Toto

89,455
62
89
125

3

This answer might benefit from some additional explanation if you have time. – Duncan Jones Apr 15 '14 at 18:33
`(?=.*\s*)` seems redundant. – Pshemo Apr 15 '14 at 18:37

aliteralmind · Answer 5 · 2014-04-15T19:00:48.473

Before looking at the other answers, I came up with doing it in two regexes:

boolean ok = (myString.matches("^[A-Za-z0-9 ]+$")  &&  !myString.matches("^ *$"));

This matches one-or-more letters/digits and zero-or-more spaces, but not only spaces (or nothing).

It could be made efficient by pre-creating a single matcher object for each regex:

   import  java.util.regex.Matcher;
   import  java.util.regex.Pattern;
public class OnePlusLetterDigitZeroPlusSpace  {
   //"": Unused search string, to reuse the matcher object
   private static final Matcher mtchr1PlusLetterDigitSpc = Pattern.compile("^[a-zA-z0-9 ]+$").matcher("");
   private static final Matcher mtchr0PlusSpc = Pattern.compile("^ *$").matcher("");
   public static final void main(String[] ignored)  {
      test("");
      test(" ");
      test("a");
      test("hello ");
      test(" hello ");
      test("hello there");
   }
   private static final void test(String to_search)  {
      System.out.print("\"" + to_search + "\": ");
      if(mtchr1PlusLetterDigitSpc.reset(to_search).matches()  &&  !mtchr0PlusSpc.reset(to_search).matches())  {
         System.out.println("good");
      }  else  {
         System.out.println("BAD");
      }
   }
}

Output:

[C:\java_code\]java OnePlusLetterDigitZeroPlusSpace
"": BAD
" ": BAD
"a": good
"hello ": good
" hello ": good
"hello there": good

Interesting regex question of the day.

Justin · Answer 6 · 2014-04-15T19:33:32.510

You are asking that the string (s) satisfies this condition (note: let c∈s mean c∈{x|x is a character in s}. Also, [] represent regex character classes):

(∀c∈s (c∈[0-9A-Za-z ])) ∧ (∃c∈s ∋ c∈[0-9A-Za-z])

Consider the negation:

¬((∀c∈s c∈[0-9A-Za-z ]) ∧ (∃c∈s ∋ c∈[0-9A-Za-z]))
⇔
(∃c∈s ∋ c∉[0-9A-Za-z ]) ∨ (∀c∈s c∉[0-9A-Za-z])
⇔
(∃c∈s ∋ c∈[^0-9A-Za-z ]) ∨ (∀c∈s c∈[^0-9A-Za-z])

So now we want to construct a regex that either contains a non-alphanumeric and non-space character or consists only of non-alphanumeric characters.

The first is easy: [^0-9A-Za-z ].
The second is like unto it: ^[^0-9A-Za-z]*$

Combine them together to get: [^0-9A-Za-z ]|^[^0-9A-Za-z]*$

Now we need to negate this regex. Obviously, we could just do (?![^0-9A-Za-z ]|^[^0-9A-Za-z]*$). Or we could manually negate the regex:

[^0-9A-Za-z ] becomes ^[0-9A-Za-z ]*$
^[^0-9A-Za-z]*$ becomes [0-9A-Za-z]. (note: we could easily have arrived here from the beginning)

But now we need to combine them with AND, not OR:

Since [0-9A-Za-z] is a subset of [0-9A-Za-z ], we can simply do this:

^[0-9A-Za-z ]*[0-9A-Za-z][0-9A-Za-z ]*$

Note that we can simplify it down to:

^[0-9A-Za-z ]*[0-9A-Za-z][ ]*$

This just requires that the character that matches [0-9A-Za-z] is the last character that could do so. We could also do

^[ ]*[0-9A-Za-z][0-9A-Za-z ]*$

Which would require that the character that matches [0-9A-Za-z] is the first character that could do so.

So now we're done. We can either use one of those or (?![^0-9A-Za-z ]|^[^0-9A-Za-z]*$).

_{Note: String#match acts as if the regex is ^ + regex + $ (where + is concatenation). This can throw a few things off.}

Pedro Lobito · Answer 7 · 2014-04-15T19:50:02.563

0

try {
    if (subjectString.matches("(?i)^(?=.*\\s*)(?!.*_)(?=.*[\\w]+)[\\w ]+$")) {
        // String matched entirely
    } else {
        // Match attempt failed
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

Or Simply:

^(.*\p{Blank}?\p{Alnum}+.*\p{Blank}?)$

Example

edited Apr 15 '14 at 19:50

answered Apr 15 '14 at 19:26

Pedro Lobito

94,083
31
258
268

1

I've never seen a try/catch for a regular expression syntax failure. Should that be necessary..? The only case I can think of is a memory-leak/timeout caused by catastrophic backtracing..but I don't think that would through a `PatternSyntaxException`. – Sam Apr 15 '14 at 19:35
Shouldn't you be testing the format before using in production code? – Sam Apr 15 '14 at 19:46
@Sam I normally do, but this way I can catch it faster if I don't. – Pedro Lobito Apr 15 '14 at 19:52
1

Fair enough, continue on. – Sam Apr 15 '14 at 20:00

Regex for ONE-or-more letters/digits And ZERO-or-more spaces

7 Answers7

Related