-1

I'm working on translating Strings from English to German, but German words that are already translated are being translated again.

Say I have this word "Beim Hinzuf\u00E4gen" which has already been translated. I want to compare this to the same word but with umlauts, "Beim Hinzufügen". Both files are read as ISO-8859-1, but when I compare the words they are seen as being different and the word is translated again which I don't want. Even when I replace the Umlaut with the unicode and compare the two, they are still seen as different. I'm sure this is because when I replay the umlaut by "\u00E4", there's an extra backslash being added in.

Anyone have an idea of the preferred method for what I'm trying to do.

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
sean le roy
  • 571
  • 1
  • 7
  • 19
  • It's not clear whether the Unicode escape you've included here is in the file or not, or how the file is being read. Please provide a [mcve] so we can help you. – Jon Skeet Sep 05 '17 at 08:34
  • Will add code, cheers! – sean le roy Sep 05 '17 at 08:36
  • 1
    aren't you suppose to compare `Beim Hinzuf\u00E4gen` with `Beim Hinzufägen`; notice the `ä` in the second string – Eugene Sep 05 '17 at 08:44
  • Soory for the late reply. Eugene your absolutely right. Not only did I have the wrong unicodes mapped out but seems I was writing the file in UTF-8, which seemed to be getting the wrong Unicode which you pointed out. That has now fixed my problem, thank you! – sean le roy Sep 05 '17 at 11:00

2 Answers2

3

It seems that you need to compare these with a Collator:

String left = "Beim Hinzuf\u00E4gen";
String right = "Beim Hinzufägen";
Collator c = Collator.getInstance();
c.setStrength(Collator.PRIMARY);

int result = c.compare(left, right); // 0
koppor
  • 19,079
  • 15
  • 119
  • 161
Eugene
  • 117,005
  • 15
  • 201
  • 306
  • Turns out I had mapped the unicodes wrong. I've never heard of the Collator before so will read up on it never the less. Thanks. – sean le roy Sep 05 '17 at 11:03
1

As @Eugene points out, your result ist correct. You compare "Hinzufügen" with "Hinzufägen", which are different.

Unicode 00E4 is "ä",
Unicode 00FC is "ü".

IQV
  • 500
  • 6
  • 14