4

I have the following header:

From: =?iso-8859-1?Q?Marta_Falc=E3o?= <marta.falcao@example.com.br>

I can easily split out the stuff before the <, which leaves me with

"=?iso-8859-1?Q?Marta_Falc=E3o?="

What can I use to turn this into "Marta Falcão"?

Stefan
  • 109,145
  • 14
  • 143
  • 218
James A. Rosen
  • 64,193
  • 61
  • 179
  • 261
  • 4
    The header is encoded using the scheme from RFC 2047. Maybe that helps as a search term. – Roland Illig Sep 20 '11 at 17:18
  • 1
    It did indeed! Not only is there a gem for that, https://github.com/ConradIrwin/rfc2047-ruby/, searching the TMail source for 2047 revealed a method I could use without adding a new dependency. Well done, @RolandIllig :) – James A. Rosen Sep 20 '11 at 17:27

3 Answers3

9

Using the newer Mail gem:

Mail::Encodings.value_decode(str) or Mail::Encodings.unquote_and_convert_to(str, to_encoding)

Omri Sivan
  • 339
  • 3
  • 8
  • FYI, there is an encoding bug in Mail::Encodings#value_decode, https://github.com/mikel/mail/issues/1397. My answer can help if you encoundered this bug. – Weihang Jian Oct 18 '20 at 17:09
3

Thanks to Roland Illig for his comment, which led me to two options:

  1. install rfc2047-ruby and call Rfc2047.decode(header)
  2. install TMail and call TMail::Unquoter.unquote_and_convert_to(header, 'utf-8') or better yet TMail::Address.parse(header).friendly, the latter of which strips out the <email address> part
Community
  • 1
  • 1
James A. Rosen
  • 64,193
  • 61
  • 179
  • 261
  • First one worked fine for me! gem 'rfc2047', in my gem list, a restart, and call the method as described.. worked a charm. Many thanks! – Tim Oct 09 '15 at 13:23
2

Use Ruby to implement RFC 2047 isn't hard:

module Rfc2047
  TOKEN = /[\041\043-\047\052\053\055\060-\071\101-\132\134\136\137\141-\176]+/.freeze
  ENCODED_TEXT = /[\041-\076\100-\176]+/.freeze
  ENCODED_WORD = /=\?(?<charset>#{TOKEN})\?(?<encoding>[QB])\?(?<encoded_text>#{ENCODED_TEXT})\?=/i.freeze

  class << self
    def encode(input)
      "=?#{input.encoding}?B?#{[input].pack('m0')}?="
    end

    def decode(input)
      match_data = ENCODED_WORD.match(input)
      raise ArgumentError if match_data.nil?

      charset, encoding, encoded_text = match_data.captures
      decoded =
        case encoding
        when 'Q', 'q' then encoded_text.unpack1('M')
        when 'B', 'b' then encoded_text.unpack1('m')
        end
      decoded.force_encoding(charset)
    end
  end
end
Rfc2047.decode '=?iso-8859-1?Q?Marta_Falc=E3o?=' # => Marta_Falcão

Update

mikel/mail is currently having an encoding issue which might not decode the string correctly.

If that really bothers you, you can try new_rfc_2047:

$ gem install new_rfc_2047
$ ruby -rrfc_2047 -e 'puts Rfc2047.decode "From: =?iso-8859-1?Q?Marta_Falc=E3o?= <marta.falcao@example.com.br>"'
From: Marta Falcão <marta.falcao@example.com.br>

Since the source code of mikel/mail is a little too complicated for me to do the modification, I just made my own gem for this.

Gem source is here: https://github.com/tonytonyjan/rfc_2047/

Community
  • 1
  • 1
Weihang Jian
  • 7,826
  • 4
  • 44
  • 55