M2Crypto RSA Unicode Strings Encoded Differently Than Byte Strings

Question

I was encoding emails to be used with an external website's API using Python M2Crypto's RSA with PKCS1 padding. When using unicode, the encoded emails returned no results from the API, but when I used str(unicode_email), I received the correct information.

I was under the impression that both unicode and byte representations of a string should have worked in this case. Does anyone know why the unicode fails?

Code for reference:

from M2Crypto import RSA
email = u'email@example.com'  #fails
email = str(email)  # succeeds 
rsa = RSA.load_pub_key('rsa_pubkey.pem')
result = rsa.public_encrypt(email, RSA.pkcs1_padding).encode('base64')

characters are typically 1 byte wide .... unicode is typically 2 bytes wide... (at least afaik...(OS specific?)) and Im pretty sure that it is expecting 1 byte wide characters — Joran Beasley, Sep 04 '12 at 18:39
@JoranBeasley Please read [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html). Unicode is not a character encoding and not concerned with bytes, though as a matter of fact you cannot fit all unicode code points into 16 bit because there are more than 2^16 of them. — , Sep 04 '12 at 19:20
@delnan thanks ... bookmarked to read later.. but yeah ok Point conceded :) — Joran Beasley, Sep 04 '12 at 20:11

score 1 · Accepted Answer · answered Sep 04 '12 at 18:52

The M2Crypto module deals exclusively with opaque bytes, which are values between 0 and 255, represented as the python str type.

The Python 2.x str type consists of such bytes, but the unicode type is a different beast altogether. You can easily convert between the two by using the .decode() method and it's mirror method .encode().

When you call str() on a unicode object, it makes the conversion by applying the default encoding, in essence it calls email.encode(sys.getdefaultencoding()). That's fine for your all-ASCII email address, but you're bound to run into UnicodeEncodeError exceptions with anything else. Better stick to using the explicit methods only.

Note that you probably have to set the encoding you used on the MIME headers of the email you send.

I strongly recommend you read up on the all this in the Python Unicode HOWTO.

M2Crypto RSA Unicode Strings Encoded Differently Than Byte Strings

1 Answers1