I'd use a two-stage approach:
- Rule out strings containing non-Latin characters by attempting to encode the string as Latin-1 (ISO-8859-1).
- Test for accented characters with a regular expression.
Example:
def is_accented_latin?(test_string)
test_string.encode("ISO-8859-1") # just to see if it raises an exception
test_string.match(/[ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöùúûüýþÿ]/)
rescue Encoding::UndefinedConversionError
false
end
I strongly suggest you select for yourself the accented characters you're attempting to screen for, rather than just copying what I've written; I certainly may have missed some. Also note that this will always return false
for strings containing non-Latin characters, even if the string also contains a Latin character with an accent.