5

I'm trying to figure out if my dev environment is somehow screwed up, since "it works on [my colleagues] computer" but not mine. Instead of tackling the 'meat' of the problem, I'm working on the first funny thing I've spotted.

I have a bit of code that doesn't make sense why one call would work, and the other not:

import sys
import zmq

if __name__ == "__main__":
    print sys.getdefaultencoding()  # Displays 'ascii'

    zContext = zmq.Context()
    zSocket = zContext.socket(zmq.SUB)

    # This works.
    zSocket.setsockopt_string( zmq.SUBSCRIBE, "Hello".decode('ascii'))

    # This fails with error... why?
    # raise TypeError("unicode strings only")
    #
    # Shouldn't the default encoding for "Hello" be ascii?
    # zSocket.setsockopt_string( zmq.SUBSCRIBE, "Hello")

    zSocket.connect( "tcp://localhost:5000")

I'm assuming for the working call to setsockopt_string, that I am passing an array of ascii characters. In the broken code, I must be sending something not ascii, but not unicode. How would I know what is getting passed to setsockopt_string?

Maybe this isn't even the questions to ask. I'm just rather confused.

Any help would be great.

Here's my environment:

python --version
Python 2.7.3
#1 SMP Debian 3.2.57-3+deb7u2 x86_64 GNU/Linux

thanks.

Bitdiot
  • 1,506
  • 2
  • 16
  • 30
  • 1
    `'hello'` is a string of bytes. When you decode it, you turn it into a unicode string (`u'hello')`, which is text. `setsockopt_string` will then encode your unicode `u'hello'` *back* into bytes (`'hello'`) and pass it into `setsockopt`. – Blender Aug 07 '14 at 00:20
  • Hello. Thanks for the answer. Can you post up, so I can give you checkmark? Also, I need some clarification, decode was passed 'ascii', so why would decode turn 'hello' to u'hello'? Isn't u'hello' and 'hello' the same looking bytes? I guess not. Also, are you saying in the case of the failed call, I am passing to setsockopt_string the converted input? (the results of what setsockopt_string would have done?) – Bitdiot Aug 07 '14 at 00:33
  • `u'hello'` is text. `'hello'` are bytes. They look the same becuase ascii characters' unicode code points were made to be the same. But if you were to encode `u'привет'` into bytes using UTF-8, you'd get `'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'`. – Blender Aug 07 '14 at 00:49

2 Answers2

5

The problem encountered caused by the unicode/str value set to the setsockopt_string method, a quick fix would be:

zSocket.setsockopt_string(zmq.SUBSCRIBE, b"Hello")

This will pass bytes instead of string, or if you have variable there, then you should write it this way:

zSocket.setsockopt_string(
    zmq.SUBSCRIBE,
    bytes("Hello", encoding="latin-1")
)

This would work on both python 2 and 3

HXH
  • 348
  • 1
  • 3
  • 11
  • 1
    bytes is an alias for str in Python 2 so you can't pass in an encoding. unicode("Hello", encoding="latin-1") does seem to work. – Charles Plager Mar 24 '16 at 16:33
0

As Subscription have created by another side it can be Unicode. While make them ascii one option you can adopt your receiver to unicodde name of subscription as well.

socket.setsockopt_string(zmq.SUBSCRIBE, topicfilter, encoding='utf-8')
Boris Ivanov
  • 4,145
  • 1
  • 32
  • 40