3

There are 3 rules in the string:

  1. It contains either word or group (enclosed by parentheses), and group can be nested;

  2. If there is a space between word or group, those words or groups should append with "+".

For example:

"a b" needs to be "+a +b"
"a (b c)" needs to be "+a +(+b +c)"
  1. If there is a | between word or group, those words or groups should be surround with parentheses. For example:
"a|b" needs to be "(a b)"
"a|b|c" needs to be "(a b c)"

Consider all the rules, here is another example:

"aa|bb|(cc|(ff gg)) hh" needs to be "+(aa bb (cc (+ff +gg))) +hh"

I have tried to use regex, stack and recursive descent parser logic, but still cannot fully solve the problem.

Could anyone please share the logic or pseudo code on this problem?

New edited: One more important rule: vertical bar has higher precedence.

For example:

aa|bb hh cc|dd (a|b) needs to be +(aa bb) +hh +(cc dd) +((a b))

(aa dd)|bb|cc (ee ff)|(gg hh) needs to be +((+aa +dd) bb cc) +((+ee +ff) (+gg +hh))

New edited: To solve the precedence problem, I find a way to add the parentheses before calling Sunil Dabburi's methods.

For example:

aa|bb hh cc|dd (a|b) will be (aa|bb) hh (cc|dd) (a|b)

(aa dd)|bb|cc (ee ff)|(gg hh) will be ((aa dd)|bb|cc) ((ee ff)|(gg hh))

Since the performance is not a big concern to my application, this way at least make it work for me. I guess the JavaCC tool may solve this problem beautifully. Hope someone else can continue to discuss and contribute this problem.

Ray Chen
  • 99
  • 6
  • What was your problem with the stack? – Steyrix Dec 18 '19 at 16:29
  • I used the stack to check if the parentheses are balanced. But I do not know how to push all the segments properly in to the stack. So I used a tree-like structure to store all the segments of the string. But then I do not know how to construct the output. – Ray Chen Dec 19 '19 at 03:26

2 Answers2

2

I tried to solve the problem. Not sure if it works in all cases. Verified with the inputs given in the question and it worked fine.

  1. We need to format the pipes first. That will help add necessary parentheses and spacing.
  2. The spaces generated as part of pipe processing can interfere with actual spaces that are available in our expression. So used $ symbol to mask them.
  3. To process spaces, its tricky as parantheses need to be processed individually. So the approach I am following is to find a set of parantheses starting from outside and going inside.

So typically we have <left_part><parantheses_code><right_part>. Now left_part can be empty, similary right_part can be empty. we need to handle such cases. Also, if the right_part starts with a space, we need to add '+' to left_part as per space requirement.

NOTE: I am not sure what's expected of (a|b). If the result should be ((a b)) or (a b). I am going with ((a b)) purely by the definition of it.

Now here is the working code:

public class Test {

    public static void main(String[] args) {
        String input = "aa|bb hh cc|dd (a|b)";
        String result = formatSpaces(formatPipes(input)).replaceAll("\\$", " ");
        System.out.println(result);
    }

    private static String formatPipes(String input) {

        while (true) {
            char[] chars = input.toCharArray();
            int pIndex = input.indexOf("|");

            if (pIndex == -1) {
                return input;
            }

            input = input.substring(0, pIndex) + '$' + input.substring(pIndex + 1);

            int first = pIndex - 1;
            int closeParenthesesCount = 0;
            while (first >= 0) {

                if (chars[first] == ')') {
                    closeParenthesesCount++;
                }

                if (chars[first] == '(') {
                    if (closeParenthesesCount > 0) {
                        closeParenthesesCount--;
                    }
                }

                if (chars[first] == ' ') {
                    if (closeParenthesesCount == 0) {
                        break;
                    }
                }
                first--;
            }

            String result;

            if (first > 0) {
                result = input.substring(0, first + 1) + "(";
            } else {
                result = "(";
            }

            int last = pIndex + 1;
            int openParenthesesCount = 0;
            while (last <= input.length() - 1) {

                if (chars[last] == '(') {
                    openParenthesesCount++;
                }

                if (chars[last] == ')') {
                    if (openParenthesesCount > 0) {
                        openParenthesesCount--;
                    }
                }

                if (chars[last] == ' ') {
                    if (openParenthesesCount == 0) {
                        break;
                    }
                }
                last++;
            }

            if (last >= input.length() - 1) {
                result = result + input.substring(first + 1) + ")";
            } else {
                result = result + input.substring(first + 1, last) + ")" + input.substring(last);
            }

            input = result;
        }
    }

    private static String formatSpaces(String input) {

        if (input.isEmpty()) {
            return "";
        }

        int startIndex = input.indexOf("(");
        if (startIndex == -1) {
            if (input.contains(" ")) {
                String result = input.replaceAll(" ", " +");
                if (!result.trim().startsWith("+")) {
                    result = '+' + result;
                }
                return result;
            } else {
                return input;
            }
        }

        int endIndex = startIndex + matchingCloseParenthesesIndex(input.substring(startIndex));

        if (endIndex == -1) {
            System.out.println("Invalid input!!!");
            return "";
        }

        String first = "";
        String last = "";
        if (startIndex > 0) {
            first = input.substring(0, startIndex);
        }
        if (endIndex < input.length() - 1) {
            last = input.substring(endIndex + 1);
        }

        String result = formatSpaces(first);
        String parenthesesStr = input.substring(startIndex + 1, endIndex);
        if (last.startsWith(" ") && first.isEmpty()) {
            result = result + "+";
        }

        result = result + "("
                + formatSpaces(parenthesesStr)
                + ")"
                + formatSpaces(last);

        return result;
    }

    private static int matchingCloseParenthesesIndex(String input) {
        int counter = 1;
        char[] chars = input.toCharArray();
        for (int i = 1; i < chars.length; i++) {
            char ch = chars[i];
            if (ch == '(') {
                counter++;
            } else if (ch == ')') {
                counter--;
            }

            if (counter == 0) {
                return i;
            }
        }
        return -1;
    }

}
Sunil Dabburi
  • 1,442
  • 12
  • 18
  • Thank you so much! It is very closed to solve the problem! The expectation of (a|b) can be ((a b)) or (a b), it does not matter. But I forget to mention that the vertical line has higher precedence than the space. For example: `aa|bb hh cc|dd (a|b)` needs to be `+(aa bb) +hh +(cc dd) +((a b))` – Ray Chen Dec 19 '19 at 05:52
  • 1
    Base on your solution, I tried to add the parentheses for the | groups before calling your methods. The result looks good now. :) Thanks again! – Ray Chen Dec 19 '19 at 08:36
  • Understood. I have updated my code to address the requirement. Kindly test and let me know. You don't need to add the parentheses around | anymore. The code should do it. Thanks! – Sunil Dabburi Dec 19 '19 at 20:29
  • Thanks for the update! I did some testing on it and found it will create a lot of parentheses if there many vertical lines in between. For example: `a|b|c|d|e` => `((((a b c d e))))`. It would be great if it could be `a b c d e` – Ray Chen Dec 20 '19 at 06:15
2

Here is my attempt. Based on your examples and a few that I came up with I believe it is correct under the rules. I solved this by breaking the problem up into 2 parts.

  1. Solving the case where I assume the string only contains words or is a group with only words.
  2. Solving words and groups by substituting child groups out, use the 1) part and recursively repeating 2) with the child groups.
    private String transformString(String input) {
        Stack<Pair<Integer, String>> childParams = new Stack<>();
        String parsedInput = input;
        int nextInt = Integer.MAX_VALUE;
        Pattern pattern = Pattern.compile("\\((\\w|\\|| )+\\)");
        Matcher matcher = pattern.matcher(parsedInput);
        while (matcher.find()) {
            nextInt--;
            parsedInput = matcher.replaceFirst(String.valueOf(nextInt));
            String childParam = matcher.group();
            childParams.add(Pair.of(nextInt, childParam));
            matcher = pattern.matcher(parsedInput);
        }

        parsedInput = transformBasic(parsedInput);
        while (!childParams.empty()) {
            Pair<Integer, String> childGroup = childParams.pop();
            parsedInput = parsedInput.replace(childGroup.fst.toString(), transformBasic(childGroup.snd));
        }
        return parsedInput;
    }

    // Transform basic only handles strings that contain words. This allows us to simplify the problem
    // and not have to worry about child groups or nested groups.
    private String transformBasic(String input) {
        String transformedBasic = input;
        if (input.startsWith("(")) {
            transformedBasic = input.substring(1, input.length() - 1);
        }

        // Append + in front of each word if there are multiple words.
        if (transformedBasic.contains(" ")) {
            transformedBasic = transformedBasic.replaceAll("( )|^", "$1+");
        }
        // Surround all words containing | with parenthesis.
        transformedBasic = transformedBasic.replaceAll("([\\w]+\\|[\\w|]*[\\w]+)", "($1)");
        // Replace pipes with spaces.
        transformedBasic = transformedBasic.replace("|", " ");
        if (input.startsWith("(") && !transformedBasic.startsWith("(")) {
            transformedBasic = "(" + transformedBasic + ")";
        }
        return transformedBasic;
    }

Verified with the following test cases:

@ParameterizedTest
    @CsvSource({
            "a b,+a +b",
            "a (b c),+a +(+b +c)",
            "a|b,(a b)",
            "a|b|c,(a b c)",
            "aa|bb|(cc|(ff gg)) hh,+(aa bb (cc (+ff +gg))) +hh",
            "(aa(bb(cc|ee)|ff) gg),(+aa(bb(cc ee) ff) +gg)",
            "(a b),(+a +b)",
            "(a(c|d) b),(+a(c d) +b)",
            "bb(cc|ee),bb(cc ee)",
            "((a|b) (a b)|b (c|d)|e),(+(a b) +((+a +b) b) +((c d) e))"
    })
    void testTransformString(String input, String output) {
        Assertions.assertEquals(output, transformString(input));
    }

    @ParameterizedTest
    @CsvSource({
            "a b,+a +b",
            "a b c,+a +b +c",
            "a|b,(a b)",
            "(a b),(+a +b)",
            "(a|b),(a b)",
            "a|b|c,(a b c)",
            "(aa|bb cc|dd),(+(aa bb) +(cc dd))",
            "(aa|bb|ee cc|dd),(+(aa bb ee) +(cc dd))",
            "aa|bb|cc|ff gg hh,+(aa bb cc ff) +gg +hh"
    })
    void testTransformBasic(String input, String output) {
        Assertions.assertEquals(output, transformBasic(input));
    }
s.nguyen
  • 150
  • 7
  • The way I'm using placeholders is not the ideal solution as there is a possibility of ambiguity. You can always solve that by appending and prepending additional characters to the placeholders to make them unique. – s.nguyen Dec 19 '19 at 22:04
  • you are amazing to solve my problem! I have tested at my end and it worked! The only thing I have amended is to make it support Unicode characters and some special symbols, which is to change the regular expression from `\\w` to `a-zA-Z0-9*?$\u0080-\u9fff-`. Thank you very much, I will mark this as my accepted answer. – Ray Chen Dec 20 '19 at 05:31