2

I just found I have 2 sorts of cedillas coming from my PostgreSQL database in my Ruby code.

enter image description here

Both are displayed the same way in my website and webapp. My only problem is when I compare the strings, they are not equal.

What's the best Ruby way to replace the second one by the first one ? All my website and database are UTF-8. I already use a custom method to replace "non latin" chars, and did things like that for teh space for example:

# various kinds of space characters
        "\xc2\xa0"     => " ",
        "\xe2\x80\x80" => " ",
        "\xe2\x80\x81" => " ",
        "\xe2\x80\x82" => " ",

Is there a code or ascci code like this for the second cedilla ?

EDIT: In fact, I found the cedilla difference by comparing strings in a ruby loop. What I just want was to list some strings, and break the loop when next string is different. My problem is Ruby trust the string to be different, event if it's the same at display. Any workaround ?

In this example, the 2 strings starting by "uniq_" are the same at display, but just comparing them with the "!=" operator, Ruby thinks they are not the same due to the encoding issue. Is there a way to bypass that ? enter image description here

alex.bour
  • 2,842
  • 9
  • 40
  • 66
  • This is a lost cause, you really can't de-UTF-8 strings without having a very exhaustive mapping table. Do you realize how many variants of "-" there are? There's also [a whole whack of space characters](https://stackoverflow.com/questions/2227921/simplest-way-to-get-a-complete-list-of-all-the-utf-8-whitespace-characters-in-ph) and more could be introduced or discovered later. – tadman Apr 09 '20 at 14:35
  • What's the objective here? If the string is showing up incorrectly that's something you should fix by identifying the root cause, not stripping accents. In some languages removing an accent can change the meaning of the word dramatically. – tadman Apr 09 '20 at 14:36
  • 2
    First thing to do here is to dump out the bytes for the first and second forms to see what's the byte-level difference is. – tadman Apr 09 '20 at 14:37
  • 1
    Maybe it's worth trying some form of [unicode_normalize](https://ruby-doc.org/core-2.6/String.html#method-i-unicode_normalize) – steenslag Apr 09 '20 at 17:51
  • Hello tadman and steenslag. I added more precision to my goal. As I wrote I have no display issues. It's just a comparing issue. – alex.bour Apr 09 '20 at 18:47

0 Answers0