1

In short how do I best check if two collections of Reg Ex Patterns are equal(ignoring order)?

A bit more info on what i am actually trying to do with it, I have an object i use to filter log messages with, this filter object contains collections for each thing it can be used to filter on, one of those things are a Reg Ex Patterns. I have created a equals method for my filtering object so i can see if two filters are basically the same, for the other collections i can use the collections .equals method, but i have problem with the Pattern objects.

I could keep a list of the string input for the patterns in a different list and compare the string lists with .equals or I could run all the objects through and check them one by one if there .toString matched something in the other list for each list like this:

        boolean equals = true;
        for (Pattern p1: patternList1) {
            boolean found = false;
            for (Pattern p2: patternList2) {
                if(p1.toString().equals(p2.toString())) {
                    found = true;
                    break;
                }
            }
            if(!found) {
                equals = false;
                break;
            }
        }
        if(equals) {
            for (Pattern p1: patternList2) {
                boolean found = false;
                for (Pattern p2: patternList1) {
                    if(p1.toString().equals(p2.toString())) {
                        found = true;
                        break;
                    }
                }
                if(!found) {
                    equals = false;
                    break;
                }
            }
        }
        return equals;

This dos not look very effective but would work, but it would not catch if two of the regexp patterns matches the same but written different. How can the code above be optimized/done differently? and is there any "simple" way of catching reg ex that are written different but matches the same?

Blem
  • 796
  • 16
  • 36

2 Answers2

4

The issue here is that the Pattern class does not implement equals(), so you can't compare the collections easily. To get around this, create your own Pattern wrapper class, which holds a Pattern and implements equals(). For example:

public class PatternWrapper{
    private final Pattern pattern;

    public PatternWrapper(Pattern p) {
        this.pattern = p;
    }

    /**
     * @return the pattern
     */
    public Pattern getPattern() {
        return pattern;
    }

    /* (non-Javadoc)
     * @see java.lang.Object#hashCode()
     */
    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((pattern == null) ? 0 : pattern.hashCode());
        return result;
    }

    /* (non-Javadoc)
     * @see java.lang.Object#equals(java.lang.Object)
     */
    @Override
    public boolean equals(Object obj) {
        if (obj == this)
            return true;
        if (!(obj instanceof PatternWrapper))
            return false;
        PatternWrapper other = (PatternWrapper) obj;
        if (pattern == null) {
            if (other.pattern != null)
                return false;
        } else if (!pattern.toString().equals(other.pattern.toString()))
            return false;
        return true;
    }
}

Now you can store PatternWrapper objects in your collections instead.

Since you want to ignore ordering when comparing the collections, you can't simply call equals between the two collections, because that will compare corresponding elements. However, you can use another library such as Apache commons-collections CollectionUtils#isEqualCollection to compare them independent of order. Take a look at this SO question for more info: Is there a way to check if two Collections contain the same elements, independent of order?

I don't think there is a simple way of matching two regexes which have different patterns but match the same text. One way might be to have a list of test strings which you can run against each regex to see if they match.

Community
  • 1
  • 1
dogbane
  • 266,786
  • 75
  • 396
  • 414
  • Unless you have duplicate patterns and don't compare them all the time, you could put the patterns into sets before comparing. – henko Sep 13 '11 at 13:42
  • 2
    Recognizing that two regexes match exactly the same set of strings is non-trivial. You would have to reduce reduces each pattern to a so called minimal deterministic finite state machine and compare them instead. – henko Sep 13 '11 at 13:44
  • Thank you very much, i had a feeling it was something about the .equals in Patterns that was my problem but was not sure. Also the reason why i did not have problems with the ordering for other filtering options i got is that Patterns are the only once i got in a list, rest are in sets, so thank you for pointing that out. – Blem Sep 13 '11 at 14:22
0

I just answered a rather similar question regarding comparing one Pattern as that is problematic since its flag is ignored when comparing them. Because a list or an array of those is pretty much the same thing anyone running into this question might look there, too.

Regarding unsorted checks see dogbane's answer.

Community
  • 1
  • 1
sjngm
  • 12,423
  • 14
  • 84
  • 114