5

I want to remove "un-partnered" parentheses from a string.

I.e., all ('s should be removed unless they're followed by a ) somewhere in the string. Likewise, all )'s not preceded by a ( somewhere in the string should be removed.

Ideally the algorithm would take into account nesting as well.

E.g.:

"(a)".remove_unmatched_parents # => "(a)"
"a(".remove_unmatched_parents # => "a"
")a(".remove_unmatched_parents # => "a"
sawa
  • 165,429
  • 45
  • 277
  • 381
Tom Lehman
  • 85,973
  • 71
  • 200
  • 272

5 Answers5

7

Instead of a regex, consider a push-down automata, perhaps. (I'm not sure if Ruby regular expressions can handle this, I believe Perl's can).

A (very trivialized) process may be:

For each character in the input string:

  1. If it is not a '(' or ')' then just append it to the output
  2. If it is a '(' increase a seen_parens counter and add it
  3. If it is a ')' and seen_parens is > 0, add it and decrease seen_parens. Otherwise skip it.

At the end of the process, if seen_parens is > 0 then remove that many parens, starting from the end. (This step can be merged into the above process with use of a stack or recursion.)

The entire process is O(n), even if a relatively high overhead

Happy coding.

  • 4
    Indeed. This is *literally* the example that is used in teaching all over the world for a language that cannot be parsed using regular expressions. Now, Ruby's `Regexp` are significantly more powerful than regular expressions, and they actually *can* parse this language, but it's not exactly maintainable. You can write a simple recursive-descent parser or push-down automaton in less time than it takes you to even *read* a `Regexp` someone else hands you, let alone write your own. And if you break up your `Regexp` into multiple lines to put comments in, maybe the automaton even ends up shorter. – Jörg W Mittag Mar 24 '11 at 21:06
  • Thanks for the algorithm. Does my answer (http://stackoverflow.com/questions/5424959/remove-unmatched-parentheses-from-a-string/5439883#5439883) look right to you? – Tom Lehman Mar 26 '11 at 02:00
3

The following uses oniguruma. Oniguruma is the regex engine built in if you are using ruby1.9. If you are using ruby1.8, see this: oniguruma.

Update

I had been so lazy to just copy and paste someone else's regex. It seemed to have problem.

So now, I wrote my own. I believe it should work now.

class String
    NonParenChar = /[^\(\)]/
    def remove_unmatched_parens
        self[/
            (?:
                (?<balanced>
                    \(
                        (?:\g<balanced>|#{NonParenChar})*
                    \)
                )
                |#{NonParenChar}
            )+
        /x]
    end
end
  • (?<name>regex1) names the (sub)regex regex1 as name, and makes it possible to be called.
  • ?g<name> will be a subregex that represents regex1. Note here that ?g<name> does not represent a particular string that matched regex1, but it represents regex1 itself. In fact, it is possible to embed ?g<name> within (?<name>...).

Update 2

This is simpler.

class String
    def remove_unmatched_parens
        self[/
            (?<valid>
                \(\g<valid>*\)
                |[^()]
            )+
        /x]
    end
end
sawa
  • 165,429
  • 45
  • 277
  • 381
  • @pst The previous one actually turned out to have problems. So now, I wrote a different regex myself. This time, it should be okay. Hoping no explosion. – sawa Mar 25 '11 at 08:42
2

Build a simple LR parser:

tokenize, token, stack = false, "", []

")(a))(()(asdf)(".each_char do |c|
  case c
  when '('
    tokenize = true
    token = c
  when ')'
    if tokenize
      token << c 
      stack << token
    end
    tokenize = false
  when /\w/
    token << c if tokenize
  end
end

result = stack.join

puts result

running yields:

wesbailey@feynman:~/code_katas> ruby test.rb
(a)()(asdf)

I don't agree with the folks modifying the String class because you should never open a standard class. Regexs are pretty brittle for parser and hard to support. I couldn't imagine coming back to the previous solutions 6 months for now and trying to remember what they were doing!

Wes
  • 6,455
  • 3
  • 22
  • 26
1

Here's my solution, based on @pst's algorithm:

class String
  def remove_unmatched_parens
    scanner = StringScanner.new(dup)
    output = ''
    paren_depth = 0

    while char = scanner.get_byte
      if char == "("
        paren_depth += 1
        output << char
      elsif char == ")"
        output << char and paren_depth -= 1 if paren_depth > 0
      else
        output << char
      end
    end

    paren_depth.times{ output.reverse!.sub!('(', '').reverse! }
    output
  end
end
Tom Lehman
  • 85,973
  • 71
  • 200
  • 272
0

Algorithm:

  1. Traverse through the given string.
  2. While doing that, keep track of "(" positions in a stack.
  3. If any ")" found, remove the top element from the stack.
    • If stack is empty, remove the ")" from the string.
  4. In the end, we can have positions of unmatched braces, if any.

Java code: Present @ http://a2ajp.blogspot.in/2014/10/remove-unmatched-parenthesis-from-given.html