3

I have a model which I use its friendly id as slug:

extend FriendlyId
friendly_id :slug_candidates, :use => :scoped, :scope => :account

def slug_candidates
  :title_and_sequence
end


def title_and_sequence
  slug = normalize_friendly_id(title)
      :
  # some login to add sequence in case of collision
      :
end

My problem is that when I use non-Latin chars (Arab, Hebrew,...) I get an empty slug. Is there any nice-and-easy solution?


UPDATE

Just to make my question clear, I would like to have the same behaviour as WordPress, which means:

+--------------------+----------------------------------------------------+
| Title              | url                                                |
+--------------------+----------------------------------------------------+
| Hello World!!      | /hello-world                                       |
+--------------------+----------------------------------------------------+
| Helló Világ        | /hello-vilag                                       |
+--------------------+----------------------------------------------------+
| שלום עולם          | /%D7%A9%D7%9C%D7%95%D7%9D-%D7%A2%D7%95%D7%9C%D7%9D |
+--------------------+----------------------------------------------------+
| مرحبا              | %D9%85%D8%B1%D8%AD%D8%A8%D8%A7                     |
+--------------------+----------------------------------------------------+

(both Arabic and Hebrew are translated in modern browsers to original and readable characters).

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
guyaloni
  • 4,972
  • 5
  • 52
  • 92
  • You need to use something to transliterate the non-latin alphabets to latin. Did you take a look at `String#parameterize` (it's added by Rails) – Mike Szyndel Nov 09 '15 at 14:45

2 Answers2

2

There's a Rails API method for that transliterate

Example use:

transliterate('Ãrøskøbing')
# => "AEroskobing"

By default it only supports latin-based languages and Russian but you should be able to find rules for other alphabets as well (as explained in the linked doc)

EDIT
To achieve the same behaviour as wordpress you can simply use url encoding, as in example below

URI::encode('שלום') => "%D7%A9%D7%9C%D7%95%D7%9D"
Mike Szyndel
  • 10,461
  • 10
  • 47
  • 63
  • Please check my update. If I understand your idea, I need to translate each character to its unicode using `transliterate`? – guyaloni Nov 09 '15 at 20:30
  • 1
    @guyaloni Well, to do what you wrote in an update you can just use url encode like this `URI::encode('שלום') => "%D7%A9%D7%9C%D7%95%D7%9D"`, no transliterate required. Transliterate would allow you to replace other alphabet chars with latin chars. – Mike Szyndel Nov 09 '15 at 20:37
  • That's right... so again, in simple words - how can I achieve the same behavior like WordPress? – guyaloni Nov 09 '15 at 20:49
  • Not completely... wordpress DO convert Latin characters with accent into non-accent chars (`Helló Világ -> /hello-vilag`), which in this case won't happen. – guyaloni Nov 09 '15 at 20:56
  • well, I think you should look into wordpress code then, but I think it's going to be hacky... looks like they are transliterating extended ascii but encoding utf8 – Mike Szyndel Nov 10 '15 at 10:39
  • 1
    So I rolled up my sleeves and dug into this fine piece of code. Title to slug conversion seems to be happening here, I looked at the code and well... have fun! https://github.com/WordPress/WordPress/blob/master/wp-includes/formatting.php#L1515 – Mike Szyndel Nov 10 '15 at 10:50
  • Here you go http://norman.github.io/friendly_id/FriendlyId/Slugged.html#normalize_friendly_id-instance_method If you redefine this method in your model and don't call super inside it will return any slug you want :) – Mike Szyndel Nov 10 '15 at 15:56
1

Thanks to @michalszyndel notes and ideas I managed to get the following solution, hope it will be helpful for more people.

First, how to make non-unicode chars in slug:

extend FriendlyId
friendly_id :slug_candidates, :use => :scoped, :scope => :account

def slug_candidates
  :title_and_sequence
end

def title_and_sequence
  # This line switch all special chars to its unicode
  title_unicode = heb_to_unicode(title)

  slug = normalize_friendly_id(title_unicode)
      :
  # some login to add sequence in case of collision
  # and whatever you need from your slug
      :
end

def heb_to_unicode(str)
  heb_chars = 'אבגדהוזחטיכךלמםנןסעפףצץקרשת'
  heb_map = {}
  heb_chars.split("").each {|c| heb_map.merge!({c => URI::encode(c)})}
  # This regex replace all Hebrew letters to their unicode representation
  heb_re = Regexp.new(heb_map.keys.map { |x| Regexp.escape(x) }.join('|'))

  return str.gsub(heb_re, heb_map)
end

I also needed to modify normalize_friendly_id in order to avoid it to get rid of the %.
I simply took the code of parameterize method and added % to the regex:

def normalize_friendly_id(string)
  # replace accented chars with their ascii equivalents
  parameterized_string = I18n.transliterate(string)

  sep = '-'

  # Turn unwanted chars into the separator
  # We permit % in order to allow unicode in slug
  parameterized_string.gsub!(/[^a-zA-Z0-9\-_\%]+/, sep)
  unless sep.nil? || sep.empty?
    re_sep = Regexp.escape(sep)
    # No more than one of the separator in a row.
    parameterized_string.gsub!(/#{re_sep}{2,}/, sep)
    # Remove leading/trailing separator.
    parameterized_string.gsub!(/^#{re_sep}|#{re_sep}$/, '')
  end
  parameterized_string.downcase
end

Now if I save a model with the title שלום its slug is saved as %D7%A9%D7%9C%D7%95%D7%9D.
In order to find the instance using the friendly method I need to do the following:

id = URI::encode(params[:id]).downcase
Page.friendly.find(id)
guyaloni
  • 4,972
  • 5
  • 52
  • 92