0

Our XML feed gives us encoded UTF-8 characters inside ISO-8859-1 a file. This is being fed into the database. So the text is ISO-8859-1 encoded and contains following stuff:

金融市场

Is there a way to convert that into a normal Java string? Similar to:

String str = fromHtmlUtf8("金融市场");

Where resulting str will contain normal UTF8 chars. Chinese in this case, but can be quite mixed.

Thanks.

Daniil
  • 1,398
  • 2
  • 17
  • 28

2 Answers2

3

You can use the StringEscapeUtils from Apache Commons: http://commons.apache.org/lang/api-2.6/org/apache/commons/lang/StringEscapeUtils.html

next time search before: How to convert from HTML to UTF-8 in java

Community
  • 1
  • 1
timaschew
  • 16,254
  • 6
  • 61
  • 78
0

If you need small lib for this, you can use HTMLEntitles

http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=htmlentities

Borislav Gizdov
  • 1,323
  • 12
  • 22