5

I want to have a regex for NAME;NAME;NAME and also for NAME;NAME;NAME;NAME where the fourth occurrence of NAME is optional.

I have one regex as (.+);(.+);(.+) which matched the first pattern but not the second. I tried playing with ? but its not working out with (.+);(.+);(.+)(;(.+))? Basically, I want to achieve the fourth (.+) as zero or one occurence.

Sonali Gupta
  • 494
  • 1
  • 5
  • 20
  • 3
    Dot `.` can represent/match any character including `;` which can complicate things. Instead of `.*` you could use negation of `;` like `[^;]+`. Anyway proper solution depends on what your real goal is (which we don't know about). Maybe simpler option would be splitting on `;` or using CSV parser with `;` as delimiter. – Pshemo May 02 '21 at 10:07
  • You can also use `(([^;]+);){2,4}([^;]+)`. `(([^;]+);){2,4}` means that `(([^;]+);)` will appear atlest 2 times but less than 4 times. – avasuilia May 02 '21 at 10:15
  • I came up with ([^;]+;)?(.+);(.+);(.+) --> the issue with it is for NAME;NAME;NAME the groups will be 2,3 and 4. Any cleaner way of doing this might help. Another one as per comments is ([^;]+);([^;]+);([^;]+);?([^;]+)? – Sonali Gupta May 02 '21 at 10:29
  • You need regex just for getting `boolean` result if some `String` matches it or for `groups` extraction from `String`? – Andrei Yusupau May 02 '21 at 10:52
  • groups extraction from String. also would prefer sticking with DOT. – Sonali Gupta May 02 '21 at 11:03

4 Answers4

3

Using .+ matches 1+ times any character including ;

If you want to match 3 or 4 groups separated by a ; and not including it, you could use a negated character class [^;]+ with an optional group at the end of the pattern.

^([^;]+);([^;]+);([^;]+)(?:;([^;]+))?$
  • ^ Start of string
  • ([^;]+);([^;]+);([^;]+) Capture group 1, 2 and 3 matching any char except ;
  • (?: Non capture group
    • ;([^;]+) Match ; and capture any char except ; in group 4
  • )? Close group and make it optional
  • $ End of string

Regex demo


If the parts in between can not contain ; you could also use split and count the number of the parts.

String arr[] = { "NAME;NAME;", "NAME;NAME;NAME", "NAME;NAME;NAME;NAME", "NAME;NAME;NAME;NAME;NAME" };

for (String s  : arr) {
    String [] parts = s.split(";");
    if (parts.length == 3 || parts.length == 4) {
        System.out.println(s);
    }
}

Output

NAME;NAME;NAME
NAME;NAME;NAME;NAME
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
2

You can use the regex, (.+);\1;\1(?:;\1)?

Demo:

import java.util.stream.Stream;

public class Main {
    public static void main(String args[]) {
        // Test
        Stream.of(
                    "NAME;NAME;NAME", 
                    "NAME;NAME;NAME;NAME",
                    "NAME;NAME;NAME;",
                    "NAME;NAME;NAMES",
                    "NAME;NAME;NAME;NAME;NAME"
        ).forEach(s -> System.out.println(s + " => " + s.matches("(.+);\\1;\\1(?:;\\1)?")));
    }
}

Output:

NAME;NAME;NAME => true
NAME;NAME;NAME;NAME => true
NAME;NAME;NAME; => false
NAME;NAME;NAMES => false
NAME;NAME;NAME;NAME;NAME => false

Explanation of the regex:

  • \1 matches the same text as most recently matched by the 1st capturing group.
  • ?: makes (?:;\1) a non-capturing group.
  • ? makes the previous token optional
Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
2

With your shown samples, please try following.

1st solution:

^(?:([^;]*);){2,3}\1$

Online demo for 1st solution

Explanation: Adding detailed explanation for above.

^(?:        ##Matching value from starting of the value here.
  ([^;]*);  ##Creating 1st capturing group which has everything till ; in it, followed by ;.
){2,3}      ##Looking for 2 to 3 occurrences of it.
\1$         ##Again matching 1st capturing group value at the end here.


2nd solution:

^([^;]*)(;)(?:\1\2){1,2}\1$

Online demo for 2nd solution

Explanation: Adding detailed explanation for above.

^([^;]*)  ##checking from starting of value, a capturing group till value of ; is coming here.
(;)       ##Creating 2nd capturing group which has ; in it.
(?:       ##Creating a non-capturing group here.
\1\2      ##Matching 1st and 2nd capturing group here.
){1,2}    ##Closing non-capturing group here, with occurrences of 1 to 2.    
\1$   ##Matching 1st capturing group value here at the end of value.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0

You could use lazy quantifier +?. Example:

    private static final Pattern pattern = Pattern.compile("((\\w+);?)+?");

    public void extractGroups(String input) {
        var matcher = pattern.matcher(input);
        while (matcher.find()) {
            System.out.println(matcher.group(2));
        }
    }

Input "FIRST;SECOND;THIRD;FOURTH" gives

FIRST
SECOND
THIRD
FOURTH

Input "FIRST;SECOND;THIRD" gives

FIRST
SECOND
THIRD

Lazy quantifier is used to match the shortest possible String. And if you call it repeatedly in while loop, you'll get all matches. Also you should better use \\w for mathing words, cause . also includes the ; symbol;

Andrei Yusupau
  • 587
  • 1
  • 11
  • 30