10

Is there an easy way to compare two Pattern objects?

I have a Pattern which compiled using the regex "//" to check for comments in a code.

Since there are several regex to describe comments, I want to find a way to difference them.

How can it be done? the Pattern class does not implements the equals method.

La bla bla
  • 8,558
  • 13
  • 60
  • 109

9 Answers9

8

You can compare Pattern objects by comparing the result of calling pattern() or toString but this doesn't do what you want (if I understand your question correctly). Specifically, this compares the strings that were passed to the Pattern.compile(...) factory method. However, this takes no account of flags passed separately to the pattern string.

There is no simple way to test if two non-identical regexes are equivalent. For example ".+" and "..*" represent equivalent regexes, but there is no straight-forward way to determine this using the Pattern API.


I don't know if the problem is theoretically solvable ... in the general case. @Akim comments:

There is no finite axiomatization to regex equivalence, so the short answer is "this is not doable by tree transformations of the regexes themselves". However one can compare the languages of two automata (test their equality), so one can compute whether two regexes are equivalent. Note that I'm referring to the "genuine" regexes, with no extensions such as back-references to capture groups, which escape the realm of rational languages, i.e., that of automata.


I also want to comment on the accepted answer. The author provides some code that he claims shows that Pattern's equals method is inherited from Object. In fact, the output he is seeing is consistent with that ... but it doesn't show it.

The correct way to know if this is the case is to look at the javadoc ... where the equals method is listed in the list of inherited methods. That is definitive.

So why doesn't the example show what the author says it shows?

  1. It is possible for two methods to behave the same way, but be implemented differently. If we treat the Pattern class as a black box, then we cannot show that this is not happening. (Or at least ... not without using reflection.)

  2. The author has only run this on one platform. Other platforms could behave differently.

On the second point, my recollection is that in the earlier implementation of Pattern (in Java 1.4) the Pattern.compile(...) methods kept a cache of recently compiled pattern objects1. If you compiled a particular pattern string twice, the second time you might get the same object as was returned the first time. That would cause the test code to output:

  true
  true
  true
  true

But what does that show? Does it show that Pattern overrides Object.equals? No!

The lesson here is that you should figure out how a Java library method behaves primarily by looking at the javadocs:

  • If you write a "black box" test, you are liable to draw incorrect conclusions ... or at least, conclusions that may not be true for all platforms.

  • If you base your conclusions on "reading the code", you run the risk of drawing conclusions that are invalid for other platforms.


1 - Even if my recollection is incorrect, such an implementation would be consistent with the javadocs for the Pattern.compile(...) methods. They do not say that each compile call returns a new Pattern object.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Pattern objects have never been automatically cached. As evidence, the API docs warn that `Pattern.matches()` and `String#matches()` don't allow the Pattern object to be reused, and so shouldn't be used for repeated calls, like in a loop. (The Scanner class *does* cache all the Patterns it uses, but that's handles internally.) – Alan Moore Jun 06 '16 at 15:03
  • "I don't even know if the problem is theoretically solvable ... in the general case." There is no finite axiomatization to regex equivalence, so the short answer is "this is not doable by tree transformations of the regexes themselves". However one can compare the languages of two automata (test their equality), so one can compute whether two regexes are equivalent. Note that I'm referring to the "genuine" regexes, with no extensions such as back-references to capture groups, which escape the realm of rational languages, i.e., that of automata. – akim Jun 03 '21 at 19:33
5

Maybe I do not fully understand to the question. But as you can see in the following example, there is a default java.lang.Object.equals(Object) method for every Java Object. This method compares the references to the objects, i.e. uses the == operator.


package test;

import java.util.regex.Pattern;

public class Main {

  private static final Pattern P1 = Pattern.compile("//.*");
  private static final Pattern P2 = Pattern.compile("//.*");

  public static void main(String[] args) {
    System.out.println(P1.equals(P1));
    System.out.println(P1.equals(P2));
    System.out.println(P1.pattern().equals(P1.pattern()));
    System.out.println(P1.pattern().equals(P2.pattern()));
  }
}

Outputs:


true
false
true
true

Jiri Patera
  • 3,140
  • 1
  • 20
  • 14
3

For mysterious reasons, the Pattern object doesn't implement equals(). For example, this simple unittest will fail:

    @Test
    public void testPatternEquals() {
        Pattern p1 = Pattern.compile("test");
        Pattern p2 = Pattern.compile("test");
        assertEquals(p1, p2); // fails!
    }

The most common workaround for this seems to be to compare the string representations of the Pattern objects (which returns the String used to create the Pattern):

    @Test
    public void testPatternEquals() {
        Pattern p1 = Pattern.compile("test");
        Pattern p2 = Pattern.compile("test");
        assertEquals(p1.toString(), p2.toString()); // succeeds!
    }
njudge
  • 177
  • 1
  • 7
  • While this might work in some scenarios, it won’t work in the general case. This approach will at least omit comparing the flags used to compile a `Pattern`. – Chriki Nov 25 '19 at 18:37
3

I know automata may solve your problem. But that maybe complicated. Roughly, you should compare pattern.pattern() and pattern.flags() at-least, though it‘s not enough to decide whether two regex are equivalent or not.

Heng
  • 96
  • 1
  • 5
2

Pattern doesn't but String does. Why not just compare the regex from which the Patterns were compiled?

darrengorman
  • 12,952
  • 2
  • 23
  • 24
  • While this might work in some scenarios, it won’t work in the general case. This approach will at least omit comparing the flags used to compile a `Pattern`. – Chriki Nov 25 '19 at 18:37
0

You can compare string representations from which patterns have been made:

Pattern p1 = getPattern1();
Pattern p2 = getPattern2();
if (p1.pattern().equals(p2.pattern())){
    // your code here
}
Askar Kalykov
  • 2,553
  • 1
  • 22
  • 43
0

I think I get the idea of the question and since I searched for ways to compare Patterns I end up here (two years too late probably, well, sorry...).

I'm writing tests and I need to know if a method of mine returns the expected pattern. While the text via toString() or pattern() might be the same, the flags can be different and the result when using the pattern would be unexpected.

A while ago I wrote my own general implementation of toString(). It collects all fields including the private ones and constructs a string that can be used for logging and apparently for testing. It showed that fields root and matchRoot were different when compiling two equal patterns. Assuming that those two aren't that relevant for equality and since there is a field flag, my solution is quite good if not perfect.

/**
 * Don't call this method from a <code>toString()</code> method with
 * <code>useExistingToString</code> set to <code>true</code>!!!
 */
public static String toString(Object object, boolean useExistingToString, String... ignoreFieldNames) {
  if (object == null) {
    return null;
  }

  Class<? extends Object> clazz = object.getClass();
  if (useExistingToString) {
    try {
      // avoid the default implementation Object.toString()
      Method methodToString = clazz.getMethod("toString");
      if (!methodToString.getDeclaringClass().isAssignableFrom(Object.class)) {
        return object.toString();
      }
    } catch (Exception e) {
    }
  }

  List<String> ignoreFieldNameList = Arrays.asList(ignoreFieldNames);
  Map<String, Object> fields = new HashMap<String, Object>();
  while (clazz != null) {
    for (Field field : clazz.getDeclaredFields()) {
      String fieldName = field.getName();
      if (ignoreFieldNameList.contains(fieldName) || fields.containsKey(fieldName)) {
        continue;
      }

      boolean accessible = field.isAccessible();
      if (!accessible) {
        field.setAccessible(true);
      }
      try {
        Object fieldValue = field.get(object);
        if (fieldValue instanceof String) {
          fieldValue = stringifyValue(fieldValue);
        }
        fields.put(fieldName, fieldValue);
      } catch (Exception e) {
        fields.put(fieldName, "-inaccessible- " + e.getMessage());
      }
      if (!accessible) {
        field.setAccessible(false);
      }
    }
    // travel upwards in the class hierarchy
    clazz = clazz.getSuperclass();
  }

  return object.getClass().getName() + ": " + fields;
}

public static String stringifyValue(Object value) {
  if (value == null) {
    return "null";
  }
  return "'" + value.toString() + "'";
}

And the test is green:

String toString1 = Utility.toString(Pattern.compile("test", Pattern.CASE_INSENSITIVE), false, "root", "matchRoot");
String toString2 = Utility.toString(Pattern.compile("test", Pattern.CASE_INSENSITIVE), false, "root", "matchRoot");
assertEquals(toString1, toString2);
sjngm
  • 12,423
  • 14
  • 84
  • 114
0

To determine whether two Pattern objects are equivalent, the simplest thing to do is to compare the actual string pattern and the flags used to create that pattern:

boolean isPatternEqualToPattern(final Pattern p1, final Pattern p2) {
    return p1.flags() == p2.flags() &&
        p1.pattern().equals(p2.pattern());
}
Brigham
  • 14,395
  • 3
  • 38
  • 48
0

Although the other answers might solve the problem, I do not think they are the real answer to the problem.

If you really want to compare two patterns you essentially want to compare two regular languages.

To do this, cs stackexchange has already posted a solution: https://cs.stackexchange.com/questions/12876/equivalence-of-regular-expressions

A fast method to check the equivalence of regular languages is the Hopcroft and Karp algorithm (HK).

Here is a java implementation of the algorithm: http://algs4.cs.princeton.edu/65reductions/HopcroftKarp.java.html

Community
  • 1
  • 1
Felipe Sulser
  • 1,185
  • 8
  • 19