2

I am currently attempting to interpret some code I wrote for something. The information I would like to split looks something like this:

{hey=yes}TEST

What I am trying to accomplish, is splitting above string in between '}' and 'T' (T, which could be any letter). The result I am after is (in pseudocode):

["{hey=yes}", "TEST"]

How would one go about doing so? I know basic regex, but have never gotten into using it to split strings in between letters before.

Update:

In order to split the string I am using the String.split method. Do tell if there is a better way to go about doing this.

alexbt
  • 16,415
  • 6
  • 78
  • 87
D. Ataro
  • 107
  • 1
  • 11

3 Answers3

2

You can use String's split method, as follow:

String str = "{hey=foo}TEST";
String[] split = str.split("(?<=})");       
System.out.println(split[0] + ", " + split[1]);

It splits the string and prints this:

{hey=foo}, TEST

Community
  • 1
  • 1
alexbt
  • 16,415
  • 6
  • 78
  • 87
2

Usage of regexp for such a small piece of code can be really slow, if it is repeated thousands of times (e.g. like analysing Alfresco metadata for lot of documents).

Look at this snippet:

    String s = "{key=value}SOMETEXT";
    String[] e = null;
    long now = 0L;

    now = new Date().getTime();
    for (int i = 0; i < 3000000; i++) {
        e = s.split("(?<=})");
    }
    System.out.println("Regexp: " + (new Date().getTime() - now));

    now = new Date().getTime();
    for (int i = 0; i < 3000000; i++) {
        int idx = s.indexOf('}') + 1;
        e = new String[] { s.substring(0, idx), s.substring(idx) };
    }
    System.out.println("IndexOf:" + (new Date().getTime() - now));

result is

Regexp: 2544
IndexOf:113

This means that regexp is 25 times slower than a (easier) substring. Keep it in mind: it can make the difference between a efficient code and a elegant (!) one.

Sampisa
  • 1,487
  • 2
  • 20
  • 28
  • 1
    If the use case is processing several millions of records, there _might_ be point in optimizing the code. Otherwise focus on reliable and readable code. Remember Michael A Jacksons quote: “Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet.” – Per Huss Jun 12 '16 at 12:09
-2

If you're looking for a regex approach and also want some validation that input follows the expected syntax you probably want something like this:

public List<String> splitWithRegexp(String string)
{
    Matcher matcher = Pattern.compile("(\\{.*\\})(.*)").matcher(string);
    if (matcher.find()) 
        return Arrays.asList(matcher.group(1), matcher.group(2));
    else
        throw new IllegalArgumentException("Input didn't match!");
}

The parenthesis in the regexp captures groups, which you can access with matcher.group(n) calls. Group 0 matches the whole pattern.

Per Huss
  • 4,755
  • 12
  • 29
  • And if you worry about performance, declare the `Pattern.compile("(\\{.*\\})(.*)")` as a `static final` field and use that field in your method. – Per Huss Jun 12 '16 at 12:05