0

I access the Yummly database for recipes in my Android app using the html query:

http://api.yummly.com/v1/api/recipes?_app_id=MY-APP-ID_app_key=MY-APP-KEY&q=KEYWORD

Even though their documentation states that the GET requests are returned in the UTF-8 format, I find some strange characters in the code, like: Pots de Creme a l’Orange.

The problem is not only limited to my Android application, but the same is shown in the Chrome browser. Funnily enough, when I tried opening it in Internet Explorer, it appeared to be ok: Pots de Creme a l’Orange, but there were other things like crème fraĂ®che, which in Chrome appears sometimes as Crème Fraîche and sometimes correctly as Crème Fraîche.

What is the difference between the browsers that makes them interpret the response in different ways? And, more importantly, what can be done in Android/Java to eliminate this issue? Do you have any ideas?

In Android I use HttpGet to fetch the data from server and then I pass it to a JSONObject.

Szymon Przedwojski
  • 181
  • 2
  • 4
  • 12
  • Did you make sure the String variable storing the response is also UTF-8 encoded? – Raghav Sood Mar 26 '13 at 18:06
  • I didn't really precise that, however I followed one of the advice on StackOverflow and put `BufferedReader reader = new BufferedReader(new InputStreamReader(content, "UTF-8"));`. On the other hand the problem exists also in web browsers so I don't think that the Java code can do anything about it because I the issue exists probably earlier on. – Szymon Przedwojski Mar 26 '13 at 18:16

1 Answers1

0

I work for Yummly. There was an inconsistency in the way we handle these things, but it should be fixed now.

By way of explanation, the &; syntax is SGML/XML/HTML entities which are used to escape certain characters. See here for example. For users of most browsers, whether the document contains a & or & makes not difference and so we weren't thorough enough in normalizing them. But for an app such as yours, obviously it does make a difference and we've added a more thorough normalization. Everything you get from the API should not be UTF8 without any HTML entities.

Just for reference, Apache Commons Lang has a handy Java utility for this type of thing

Vadim Geshel
  • 141
  • 2