I see this a lot and haven't figured out a graceful solution. If user input contains invalid byte sequences, I need to be able to have it not raise an exception. For example:
# @raw_response comes from user and contains invalid UTF-8
# for example: @raw_response = "\xBF"
regex.match(@raw_response)
ArgumentError: invalid byte sequence in UTF-8
Numerous similar questions have been asked and the result appears to be encoding or force encoding the string. Neither of these work for me however:
regex.match(@raw_response.force_encoding("UTF-8"))
ArgumentError: invalid byte sequence in UTF-8
or
regex.match(@raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?"))
ArgumentError: invalid byte sequence in UTF-8
Is this a bug with Ruby 2.0.0 or am I missing something?
What is strange is it appear to be encoding correctly, but match continues to raise an exception:
@raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?").encoding
=> #<Encoding:UTF-8>