8

Am I correct in thinking that Regex can't be used to detect missing parentheses (because there is no way of counting pairs)? Using JavaScript I've about a thousand strings which have been truncated and need to be edited by hand. I was hoping to be able to narrow this list down to the ones that need attention using code. The strings can be thought of in the form of:

  • (this is fine and does not need attention)
  • This is also [fine]
  • This is bad( and needs to be edited
  • This [is (also) bad
  • as is this} bad
  • this string has no brackets of any kind but must also be considered

If this is not possible then I'll just have to write a function to look for bracket pairs. Thank you

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Ghoul Fool
  • 6,249
  • 10
  • 67
  • 125
  • I don't think it's possible with regexes: http://bytes.com/topic/python/answers/802042-regex-matching-brackets – Paolo Stefan Jan 15 '13 at 09:40
  • With all respect to regular expressions, they are not a good tool to check the syntax of the programming language. You can count parentheses in much cleaner and error-safe way in a trivial for loop. Or use javacc - like tools for checking the syntax properly. – Audrius Meškauskas Jan 15 '13 at 09:47
  • Sorry - I forgot to mention the strings aren't from code; they are from file names. – Ghoul Fool Jan 15 '13 at 10:33

4 Answers4

10
function isFine(str) {  
  return /[(){}\[\]]/.test( str ) && 
    ( str.match( /\(/g ) || '' ).length == ( str.match( /\)/g ) || '' ).length &&
    ( str.match( /\[/g ) || '' ).length == ( str.match( /]/g ) || '' ).length &&
    ( str.match( /{/g ) || '' ).length == ( str.match( /}/g ) || '' ).length;
}

Test

isFine('(this is fine and does not need attention)');                 // true
isFine('This is also [fine]');                                        // true
isFine('This is bad( and needs to be edited');                        // false
isFine('This [is (also) bad');                                        // false
isFine('as is this} bad');                                            // false
isFine('this string has no brackets but must also be considered');    // false

Note though, that this doesn't check bracket order, i.e. a)b(c would be deemed fine.

For the record, here is a function that checks for missing brackets and checks that each type is correctly balanced. It doesn't allow a)b(c, but it does allow (a[bc)d] as each type is checked individually.

function checkBrackets( str ) {
    var lb, rb, li, ri,
        i = 0,
        brkts = [ '(', ')', '{', '}', '[', ']' ];   
    while ( lb = brkts[ i++ ], rb = brkts[ i++ ] ) { 
        li = ri = 0;
        while ( li = str.indexOf( lb, li ) + 1 ) {
            if ( ( ri = str.indexOf( rb, ri ) + 1 ) < li ) {
                return false;
            }
        }
        if ( str.indexOf( rb, ri ) + 1 ) {
            return false;
        } 
    }
    return true;
}

Finally, further to Christophe's post, here is what seems the best solution to checking for missing brackets and checking that all are correctly balanced and nested:

function checkBrackets( str ) {
    var s;
    str = str.replace( /[^{}[\]()]/g, '' );
    while ( s != str ) { 
        s = str;
        str = str.replace( /{}|\[]|\(\)/g, '' )
    }
    return !str;
};

checkBrackets( 'ab)cd(efg' );        // false   
checkBrackets( '((a)[{{b}}]c)' );    // true   
checkBrackets( 'ab[cd]efg' );        // true   
checkBrackets( 'a(b[c)d]e' );        // false   
MikeM
  • 13,156
  • 2
  • 34
  • 47
  • 1
    The latter function, since it checks the various types separately, yields `true` for input like `'a(b[c)d]e'`. – Nikita Kouevda Jan 16 '13 at 14:13
  • @Nikita Kouevda. Good point. Thank you. I should have said 'checks that each type is correctly balanced'. I have replaced the function with a new version which also checks that all types are correctly nested. – MikeM Jan 16 '13 at 18:22
  • 1
    @NikitaKouevda the purpose here is to detect truncated strings, not typing mistakes, so this technique looks just fine. – Christophe Jan 16 '13 at 18:52
  • @Christophe Apologies, I did not mean to be overly critical; I assumed that the question expects a more thorough solution because of the `as is this} bad` case, which cannot be the result of a truncation alone. – Nikita Kouevda Jan 16 '13 at 20:42
  • @NikitaKouevda I get your point, I have posted an alternative for more generic cases. – Christophe Jan 16 '13 at 22:48
  • btw +1 for the checkBrackets function, it might help on an earlier question I had http://stackoverflow.com/questions/11907275/regular-expression-to-match-brackets – Christophe Jan 16 '13 at 23:06
3

You can't do the recursion in the regex itself, but you can always do it in JavaScript.

Here is an example:

// First remove non-brackets:
string=string.replace(/[^{}[\]()]/g,"");
// Then remove bracket pairs recursively
while (string!==oldstring) {
  oldstring=string;
  string=string.replace(/({}|\[\]|\(\))/g,"");
}

The remainder are the non-matching brackets.

Live demo: http://jsfiddle.net/3Njzv/

If you need to count the pairs, you can do the replacements one at a time and add a counter:

// First remove non-brackets:
string=string.replace(/[^{}[\]()]/g,"");

// Then remove bracket pairs recursively
var counter=-1;
while (string!==oldstring) {
  counter ++;
  oldstring=string;
  string=string.replace(/({}|\[\]|\(\))/,"");
}
Christophe
  • 27,383
  • 28
  • 97
  • 140
2

It's possible to use recursive regex to verify matching parentheses. For example, in Perl, the following expression matches strings with proper () {} [] nesting:

$r = qr/(?:(?>[^(){}\[\]]+)|\((??{$r})\)|\{(??{$r})\}|\[(??{$r})\])*/;

Here is the same expression expanded for clarity:

$r = qr/
    (?:
        (?>
            [^(){}\[\]]+
        )
    |
        \(
            (??{$r})
        \)
    |
        \{
            (??{$r})
        \}
    |
        \[
            (??{$r})
        \]
    )*
/x;

The outer group is quantified with * instead of + so as to match empty strings, so in order to make $r useful, the actual matching must be done with an expression that utilizes lookaheads/lookbehinds or otherwise establishes context, e.g. /^$r$/. For example, the following prints only the lines in a file that do not have proper nesting:

perl -ne '$r = qr/(?:(?>[^(){}\[\]]+)|\((??{$r})\)|\{(??{$r})\}|\[(??{$r})\])*/; print if !m/^$r$/' file

To address your clarification: If these are filenames and not file contents, you could pipe the output of ls or find or whatever into the above command, sans file:

ls | perl -ne '$r = qr/(?:(?>[^(){}\[\]]+)|\((??{$r})\)|\{(??{$r})\}|\[(??{$r})\])*/; print if !m/^$r$/'

However, as others have said, a non-regex solution is probably better in general.

N.B. From the Perl doc: "WARNING: This extended regular expression feature is considered experimental, and may be changed without notice. Code executed that has side effects may not perform identically from version to version due to the effect of future optimisations in the regex engine."

Nikita Kouevda
  • 5,508
  • 3
  • 27
  • 43
1

Some regex flavors are able to match recursive structures like nested parentheses, but the syntax is so complicated, it's usually easier just to write a function. JavaScript regexes don't support recursion at all.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156