2

I am parsing some external XML into an object and displaying this inside a textview.

Apostrophe's/single quotes are being converted to these silly question mark symbols.

Nothing i've found is working - i've tried using replaceall and escaping it with \', it doesn't give me the desired result.

I've tried setting the textview using:

tv.setText(Html.fromHtml(news_item.getTitle()));

It doesn't seem to work, I can't find any other solutions to this one, your ideas appreciated.

makapaka
  • 169
  • 5
  • 16
  • 1
    Have you tried escaping it with `'`? – initramfs Dec 14 '14 at 15:36
  • ok i tried tv.setText(Html.fromHtml(news_item.getTitle().replaceAll("'", "&apos"))); and did not work ? – makapaka Dec 14 '14 at 15:41
  • With the `;` at the end? Apologies for forgetting to add that in my initial comment. – initramfs Dec 14 '14 at 15:43
  • ok added that ; and still not working :/ – makapaka Dec 14 '14 at 15:48
  • 2
    You have to determine what the characters really are, it's likely not actual apostrophes but some similar looking character. XML escaping characters only works before you parse the XML, and if you can parse it then it doesn't need escaping. – Guffa Dec 14 '14 at 15:48
  • I think you need to encode the response as UTF-8. See [this SO answer][1] for tips. [1]: http://stackoverflow.com/questions/18951741/encoding-in-android-textview – JASON G PETERSON Dec 14 '14 at 15:49
  • First use debug and see what is inside of your parsed string ... Texview will show you what you have in your string. – Mark Dec 14 '14 at 15:50
  • how can i do that - have a look at the feed: http://rss.forexfactory.net/news/all.xml you can see in the tag , you may be right, it certainly looks like apostrophe but any idea what that is ? – makapaka Dec 14 '14 at 15:50
  • indeed, debug shows up as a question mark in the variable itself! What is that character ? – makapaka Dec 14 '14 at 15:52
  • @makapaka: Are you talking about the "...Abe's ruling..."_ RSS news post? If it is, then that is not an apostrophe. The character is more like the acute accent character with a Unicode value of `U+00B4`. See [here](http://unicode-table.com/en/#00B4) for more info. Whenever a Unicode character cannot be represented in a current format, you'll see either the question mark or an empty square box. In your case, you're seeing the weird question mark symbol in your `TextView`. – ChuongPham Dec 14 '14 at 16:09
  • thx @ChuongPham but how do I replace or convert ? – makapaka Dec 14 '14 at 16:16

3 Answers3

2

Found it!

The mark you are looking for is called RIGHT SINGLE QUOTATION MARK with a unicode code of U+2019. This particular mark should be replaced via:

String.replace("’", "’");

for proper display.

If that doesn't work, you should do a substitution from that mark to a apostrophe via:

String.replace("’", "'");

or directly:

String.replace("’", "'");

to make sure the display actually displays it.


Close up of the difference between right single quotation mark vs apostrophe:

’ vs '

initramfs
  • 8,275
  • 2
  • 36
  • 58
  • grrr you are clearly right, but it is STILL not replacing that character at all ?? I actually copied that character from the XML and tried ALL of your versions above and none are picking it up – makapaka Dec 14 '14 at 16:12
  • @makapaka Have you tried a simple string containing that character and attempting to replace it with something else. Take it completely out of context of your program. See if it works... Because I tried in java (not android) and it works just fine... – initramfs Dec 14 '14 at 16:44
  • @makapaka Also, I noticed that the character encoding on the XML document is **not** `UTF-8` but rather `ISO-8859-1`. You need to make sure that is set up correctly before you attempt to even read the document. – initramfs Dec 14 '14 at 16:48
2

Try this:

tv.setText(news_item.getTitle().replaceAll("\u2019", "'"));

For other Unicode characters, please see this link.

ChuongPham
  • 4,761
  • 8
  • 43
  • 53
  • tried tv.setText(news_item.getTitle().replaceAll("\u00b4", "'")); still not working :( – makapaka Dec 14 '14 at 16:22
  • @makapaka: I have updated my answer. As it turned out, it's not the acute accent after all. Sorry for the delay, I had to code an Android RSS reader to test the RSS link you provided. – ChuongPham Dec 14 '14 at 16:41
  • thx so much! i wish i could award u double points ! I have awarded the response as the answer, but the only problem is now, I'm finding there are other characters that are being converted to "?" in the textview - so can you advise me on a catchall function that I could possibly use to catch everything that is going to be converted to question mark ? If you look at the RSS feed now, it contains "-" and peculiar dots that are not being translated well :S – makapaka Dec 15 '14 at 01:37
  • @makapaka: Unfortunately, there's no function that can transform all the punctuation characters so they will show up properly since the Unicode list is quite long. You may need to contact the webmaster of the RSS website and ask them what punctuation characters are allowed for posters to post with RSS feeds. Once you have a list, you can try multiple `replaceAll` methods, one after the other, to replace the Unicode punctuation characters with its corresponding hex values. For Unicode reference, see this [link](http://www.unicode.org/charts/). – ChuongPham Dec 16 '14 at 07:52
0

The documented solution will work, but it is not the right way of fixing this, as the root cause of the problem is encoding. In your case, the source's (XML document) encoding is most likely UTF-8 or some other multi-byte encoding. Your parser or consumer of the data is most likely ISO-8859-1 or ASCII. These characters (right/left apostrophes) are not part of that character set. Therefore, the correct solution is to change the encoding of your parser/processor/consumer to UTF-8.

If this is not the case, then it is probably the opposite. You have a process that writes down characters in UTF-8, but the XML's encoding is not compatible (i.e. ISO-8859-1).

Remember this: ALL characters in ISO-8859-1 are mapped in UTF-8, but not the other way around. So going from ISO-8859-1 to UTF-8 is not a problem. The problem is when you have to make the round trip to ISO-8859-1 to UTF-8. When converting UTF-8 characters, those characters NOT in the ISO character set, will show up funny on your display; either as question marks or "’

Nimantha
  • 6,405
  • 6
  • 28
  • 69
hfontanez
  • 5,774
  • 2
  • 25
  • 37