0

I'd like to match parts in a string that are enclosed by braces ({}), and refer back to the content within the braces. The content within the braces can also contain "nested" braces, which makes it important to match the correct closing braces. So, if another brace is opened, the following closing brace should be ignored.

As a starting point I used the following code, which should transform the String Stuff @upper{foo {bar} baz} {end} to Stuff FOO {BAR} BAZ {end}:

package com.stackoverflow;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegExBraces {

    public static void main(String[] args) {
        String string  = "Stuff @upper{foo {bar} baz} {end}";

        Pattern pattern = Pattern.compile("@upper\\{(.*?)\\}");

        Matcher matcher = pattern.matcher(string);
        StringBuffer result = new StringBuffer();

        while (matcher.find()) {
            String key = matcher.group(1);
            if (key != null) {
                matcher.appendReplacement(result, key.toUpperCase());
            }
        }
        matcher.appendTail(result);

        System.out.println(result.toString());

    } // END: main()

} // END: class

In the first place I'd like to ignore escaping of braces (\{, \}). So there is always the same number of opening and closing braces in the correct order.

Is the a concise regular expression that can solve this problem?

Edward
  • 4,453
  • 8
  • 44
  • 82
  • 2
    Is there a reason you are using regex? This would be easier to solve in just iterating through the String with a for loop. – Compass Oct 06 '14 at 13:32
  • 3
    Your expressions is equivalent to markup, you should not use regular expressions to interpret it - use your own parser for this. – Mena Oct 06 '14 at 13:41
  • 2
    Quick answer: No. Matching nested braces makes your "language" a recursive syntax, which isn't a regular language, which means it can't be matched by a regular expression. Advanced regexes may be able to do it, but it won't be concise or easy. – Ian McLaird Oct 06 '14 at 13:44
  • @Compass: I'm using regular expressions, because there are different parts of the String I'd like to match. e.g. `@upper{...}`, `@lower{...}`, `@spaces{...}`, `@time{...}`, etc. Besides that, a regular expression directly returns the Strings matching my pattern. I don't have to evaluate if something matches the pattern or not. – Edward Oct 06 '14 at 13:44
  • 1
    You may have a look [HERE](http://stackoverflow.com/questions/16874176/parenthesis-brackets-matching-using-stack-algorithm) for the parsing, adding a step to capture the content should not be too hard – Tensibai Oct 06 '14 at 13:53
  • Counting with Java regex is possible (which is required for arbitrary depth nesting), but it's a pain, nothing like PCREs simple `(?R)` and friends or .NETs balancing groups. So you'd be better off using something more fitting than regex here. (You probably don't want to use regex anyway for other reasons.) – Qtax Oct 06 '14 at 13:57
  • 1
    If you know max nested depth you can [build a regex like this one for max 3 levels](http://regex101.com/r/yX7sX3/1). If nesting is [deeper, add levels](http://regex101.com/r/cO4jV8/1). Note, that you have to escape the `{}` outside character class for Java ([see example / click on "Java"](http://fiddle.re/urwq1)). – Jonny 5 Oct 06 '14 at 13:59
  • I now used Clojure and instaparse to define a context-free grammar (see also: http://stackoverflow.com/questions/18187249/how-do-we-define-a-grammar-for-clojure-code-using-instaparse). If anyone writes an answer that summarizes all comments (and maybe also writes how to distunguish between regular and context-free languages), I'll accept it. – Edward Oct 13 '14 at 08:45

0 Answers0