I'd guess that "\xBF"
already thinks it is encoded in UTF-8 so when you call encode
, it thinks you're trying to encode a UTF-8 string in UTF-8 and does nothing:
>> s = "\xBF"
=> "\xBF"
>> s.encoding
=> #<Encoding:UTF-8>
\xBF
isn't valid UTF-8 so this is, of course, nonsense. But if you use the three argument form of encode
:
encode(dst_encoding, src_encoding [, options] ) → str
[...] The second form returns a copy of str
transcoded from src_encoding
to dst_encoding
.
You can force the issue by telling encode
to ignore what the string thinks its encoding is and treat it as binary data:
>> foo = s.encode('utf-8', 'binary', :invalid => :replace, :undef => :replace)
=> "�"
Where s
is the "\xBF"
that thinks it is UTF-8 from above.
You could also use force_encoding
on s
to force it to be binary and then use the two-argument encode
:
>> s.encoding
=> #<Encoding:UTF-8>
>> s.force_encoding('binary')
=> "\xBF"
>> s.encoding
=> #<Encoding:ASCII-8BIT>
>> foo = s.encode('utf-8', :invalid => :replace, :undef => :replace)
=> "�"