5

I need to split a string with "-" as delimiter in java. Ex: "Single Room - Enjoy your stay"

I have the same data coming in english and german depending on locale . Hence I cannot use the usual string.split("-") . The unicode for "-" character is 8212(dec) or x2014(hex).How do I split the string using unicode ???

tchrist
  • 78,834
  • 30
  • 123
  • 180
Bhavya
  • 71
  • 2
  • 5

4 Answers4

7

You may be mistaken in which Unicode dash character you’re getting. As of Unicode v6.1, there are 27 code points that have the \p{Dash} property:

U+002D ‭ -  HYPHEN-MINUS
U+058A ‭ ֊  ARMENIAN HYPHEN
U+05BE ‭ ־  HEBREW PUNCTUATION MAQAF
U+1400 ‭ ᐀  CANADIAN SYLLABICS HYPHEN
U+1806 ‭ ᠆  MONGOLIAN TODO SOFT HYPHEN
U+2010 ‭ ‐  HYPHEN
U+2011 ‭ ‑  NON-BREAKING HYPHEN
U+2012 ‭ ‒  FIGURE DASH
U+2013 ‭ –  EN DASH
U+2014 ‭ —  EM DASH
U+2015 ‭ ―  HORIZONTAL BAR
U+2053 ‭ ⁓  SWUNG DASH
U+207B ‭ ⁻  SUPERSCRIPT MINUS
U+208B ‭ ₋  SUBSCRIPT MINUS
U+2212 ‭ −  MINUS SIGN
U+2E17 ‭ ⸗  DOUBLE OBLIQUE HYPHEN
U+2E1A ‭ ⸚  HYPHEN WITH DIAERESIS
U+2E3A ‭ ⸺  TWO-EM DASH
U+2E3B ‭ ⸻  THREE-EM DASH
U+301C ‭ 〜 WAVE DASH
U+3030 ‭ 〰 WAVY DASH
U+30A0 ‭ ゠ KATAKANA-HIRAGANA DOUBLE HYPHEN
U+FE31 ‭ ︱ PRESENTATION FORM FOR VERTICAL EM DASH
U+FE32 ‭ ︲ PRESENTATION FORM FOR VERTICAL EN DASH
U+FE58 ‭ ﹘ SMALL EM DASH
U+FE63 ‭ ﹣ SMALL HYPHEN-MINUS
U+FF0D ‭ - FULLWIDTH HYPHEN-MINUS

In Perl or ICU, you could just split directly on \p{dash}, but since the Sun Pattern class doesn’t support full Unicode properties like that, you have to synthesize it with an enumerated square-bracketed character class. So splitting on the pattern:

string.split("[\u002D\u058A\u05BE\u1400\u1806\u2010-\u2015\u2053\u207B\u208B\u2212\u2E17\u2E1A\u2E3A-\u301C\u3030\u30A0\uFE31\uFE32\uFE58\uFE63\uFF0D]")

should do the trick for you. You can actually double-backslash those if you fear for the Java preprocessor getting in your way, because the regex parser should know to understand the alternate notation.

tchrist
  • 78,834
  • 30
  • 123
  • 180
3
Pattern p = Pattern.compile("\u0001", Pattern.LITERAL);
String items[] = p.split(message);
Talal
  • 78
  • 6
1
String s = "Single Room - Enjoy your stay":
String splits[] = s.split("\u002D");
for(String s1:splits){
    System.out.println(s1);
}
Chandra Sekhar
  • 18,914
  • 16
  • 84
  • 125
0

The hex for "-" is 2d (or) 45 in decimal (or) 55 in octal. Use the following program to find integer values for all symbols. So split using \u002d

public static void main(String[] args) {        
    int j=0;


    for(int i=32; i<=131;i++)
    {

        System.out.print(i + ":\t"  + (char)i +"   ");


        j++;

        if(j>10)
        {
            System.out.println();
            j=0;
        }
    }
Praveen
  • 49
  • 6