2

I'm making an API request with Faraday in Ruby and I'm parsing it with JSON.parse. The problem is that, the JSON response has sentences such as Longitud de la estaci\u00F3n meteorol\u00F3gica (grados) but it should be Longitud de la estación meteorológica (grados).

Is there a way to properly parse this? I have connection = Faraday.new(my_site) and if I do connection.get.body.encoding, then I get #<Encoding:ASCII-8BIT>, but when I try connection.get.body.force_encoding('ASCII-8BIT).force_encoding('UTF-8) or connection.get.body.force_encoding('ASCII-8BIT).encode('UTF-8) I get 'encode': "\xF3" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError).

Thanks a lot in advance!

noloman
  • 11,411
  • 20
  • 82
  • 129
  • Can you post String#bytes of the relevant substring, like "estación" part, in the JSON string? I can't tell "\u00F3" is an ASCII string of 6 bytes or a one-byte 8-bit string... – Masa Sakano Sep 17 '18 at 01:18

1 Answers1

2

Try this:

connection.get.body.force_encoding('ISO-8859-1').encode('UTF-8')

I don't know about Faraday, but judging from Encoding::UndefinedConversionError error, that is perhaps the case. I am assuming connection.get.body returns a normal String instance or its equivalent.

Background

As the official document (Ver.2.5.1) states, you should not try to convert ASCII-8BIT to any other encodings:

Encoding::ASCII_8BIT is a special encoding that is usually used for a byte string, not a character string.

The so-called extended ASCII, which contains some punctuations for alphabet, is usually ISO-8859-1, though other encoding methods exist. Certainly the codepoint of o with ' is \xF3 in ISO-8859-1. Here is a code snippet to demonstrate it:

"\xf3".force_encoding('ISO-8859-1').encode('UTF-8')
  # => "ó"
"\xf3".force_encoding('ASCII-8BIT').encode('UTF-8')
  # => Encoding::UndefinedConversionError 

This past answer explains it in a bit more detail.

Masa Sakano
  • 1,921
  • 20
  • 32