0

My Aim: I want to check whether a Java string contains characters from GSM Extended. The existing code successfully checks for GSM characters, but I am struggling to grasp how to check for GSM Extended characters such as '[' and ']'.

My Code:

private static final String GSM_EXTENDED = "\u000c^{}\\[~]|\u20ac";
public static boolean isUnicode(String input) {        
    return !((input.matches('[' + GSM + "]*")) || (input.matches('[' + GSM_EXTENDED + "]*")));
}

Result: My unit tests don't recognise ']' as GSM Extended, and furthermore, when any GSM Extended character is typed in from the GUI, they are not recognised as GSM.

dda
  • 6,030
  • 2
  • 25
  • 34
Andy A
  • 4,191
  • 7
  • 38
  • 56
  • Can't you use the unicode id like the others? \u005B and \u005D – Djon Jun 05 '13 at 13:13
  • Hi Djon. If I try making the GSM_EXTENDED String as "\u005B\u005D", then when my isUnicode() method runs I get ... Exception occurred in target VM: Unclosed character class near index 4 [[]]* – Andy A Jun 05 '13 at 14:31
  • Ah! Maybe the String param of input.matches() needs the \ character to escape the bracket characters? – Andy A Jun 05 '13 at 14:35
  • \\\u005B\\\u005D can be used for this method. – Andy A Jun 05 '13 at 15:21

1 Answers1

1

Inside square brackets in a regular expression, the hyphen is a special character, so you need to escape it as "\\-" in your GSM string.

The closing bracket ("]") in your GSM_EXTENDED string is terminating the bracketed character class, so you need to escape it as "\\]".

VGR
  • 40,506
  • 4
  • 48
  • 63
  • Hi VGR. Ah yes, I think thats why ']' wasn't recognised in my unit test. However, GSM_EXTENDED characters are still not detected when typed into my GUI. – Andy A Jun 05 '13 at 14:33
  • Ah, it was a combination of your answer, and that I had the logic of my isUnicode() method wrong. :) – Andy A Jun 05 '13 at 14:53