I'm pulling some RSS feeds in from YouTube which have invalid UTF8. I can create a similar ruby string using
bad_utf8 = "\u{61B36}"
bad_utf8.encoding # => #<Encoding:UTF-8>
bad_utf8.valid_encoding? # => true
Ruby thinks this is a valid UTF-8 encoding and I'm pretty sure it isn't.
When talking to Mysql I get an error like so
require 'mysql2'
client = Mysql2::Client.new(:host => "localhost", :username => "root")
client.query("use test");
bad_utf8 = "\u{61B36}"
client.query("INSERT INTO utf8 VALUES ('#{moo}')")
# Incorrect string value: '\xF1\xA1\xAC\xB6' for column 'string' at row 1 (Mysql2::Error)
How can I detect or fix up these invalid types of encodings before I send them off to MySQL?