0

I am having a problem with users copying and pasting input which leads to non-printable characters preventing MySQL from saving records. Here's the error:

Mysql2::Error: Incorrect string value: '\xE2\x80\x8B\xE2\x80\x8B...' for column 'address' at row 1

I don't find the existing answers to similar questions satisfactory, as they seem to all require me to change the database to accept these unicode characters. But I don't want to be saving these characters! I'd much rather strip them out and keep valid ASCII characters.

What's the best/preferred way to do this? So far I have come across the following methods:

"offending string".encode(Encoding.find('ASCII'), {invalid: :replace, undef: :replace, replace: "", universal_newline: true})

This seems to work most of the time, except to compensate for nil fields I have to use a ternary expression on each field. Serious code smell!

Then there's this:

.gsub(/(\W|\d)/, "")

Which does part of what I want, but it also removes characters like quotes.

I'd prefer a setting that I can change rather than tack on method calls to my params.

Ten Bitcomb
  • 2,316
  • 1
  • 25
  • 39

1 Answers1

0

I think you're on the right track with the gsub command.

You never want to put user input directly into your database without cleaning it up. This can lead to serious security risks (just Google "SQL Injections").

Take a look at the following links: http://apidock.com/ruby/String/gsub http://www.oreillynet.com/pub/a/ruby/excerpts/ruby-learning-rails/ruby-guide-regular-expressions.html

These will let you build a regular expression to be the first argument to the gsub command, and you can leave the second argument as the empty string to strip them. You said that you did not want to use the gsub command you posted because it also strips quotes and other characters you want to keep...

Try something like:

 "offending string".gsub(/(\W|\d|^(\"|\.))/, "")

This should allow you to grab all characters that are

  1. not word characters
  2. digits
  3. not quote characters and
  4. not dot characters

And replace them with the empty string (""), which effectively strips them.

(Please check the regular expression and if it does not do what you want, consults the links I posted and build and iterate until you get one that works.)

From there you can do something like inform the user that they should re-type their input, or inform them that you will not accept numbers or special characters or whatever...

Hope this helps.

Jack
  • 100
  • 1
  • 4