4

I got fed up last night and started porting PyVISA to Python 3 (progress here: https://github.com/thevorpalblade/pyvisa).

I've gotten it to the point where everything works, as long as I pass device addresses (well, any string really) as an ASCII string rather than the default unicode string (For example,
HP = vida.instrument(b"GPIB::16") works, whereas HP = vida.instrument("GPIB::16") does not, raising a ValueError.

Ideally, the end user should not have to care about string encoding. Any suggestions as to how I should approach this? Something in the ctypes type definitions perhaps?

As it stands, the relevant ctypes type definition is:

ViString = _ctypes.c_char_p
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131

1 Answers1

6

ctypes, like most things in Python 3, intentionally doesn't automatically convert between unicode and bytes. That's because in most use cases, that would just be asking for the same kind of mojibake or UnicodeEncodeError disasters that people switched to Python 3 to avoid.

However, when you know you're only dealing with pure ASCII, that's another story. You have to be explicit—but you can factor out that explicitness into a wrapper.


As explained in Specifying the required argument types (function prototypes), in addition to a standard ctypes type, you can pass any class that has a from_param classmethod—which normally returns an instance of some type (usually the same type) with an _as_parameter_ attribute, but can also just return a native ctypes-type value instead.

class Asciifier(object):
    @classmethod
    def from_param(cls, value):
        if isinstance(value, bytes):
            return value
        else:
            return value.encode('ascii')

This may not be the exact rule you want—for example, it'll fail on bytearray (just as c_char_p will) even though that could be converted quietly to bytes… but then you wouldn't want to implicitly convert an int to bytes. Anything, whatever rule you decide on should be easy to code.


Here's an example (on OS X; you'll obviously have to change how libc is loaded for linux, Windows, etc., but you presumably know how to do that):

>>> libc = CDLL('libSystem.dylib')
>>> libc.atoi.argtypes = [Asciifier]
>>> libc.atoi.restype = c_int
>>> libc.atoi(b'123')
123
>>> libc.atoi('123')
123
>>> libc.atoi('123') # Unicode fullwidth digits
ArgumentError: argument 1: <class 'UnicodeEncodeError'>: 'ascii' codec can't encode character '\uff10' in position 0: ordinal not in range(128)
>>> libc.atoi(123)
ArgumentError: argument 1: <class 'AttributeError'>: 'int' object has no attribute 'encode'

Obviously you can catch the exception and raise a different one if those aren't clear enough for your use case.

You can similarly write a Utf8ifier, or an Encodifier(encoding, errors=None) class factory, or whatever else you need for some particular library and stick it in the argtypes the same way.


If you also want to auto-decode return types, see Return types and errcheck.


One last thing: When you're sure the data are supposed to be UTF-8, but you want to deal with the case where they aren't in the same way Python 2.x would (by preserving them as-is), you can even do that in 3.x. Use the aforementioned Utf8ifier as your argtype, and a decoder errcheck, and use errors=surrogateescape. See here for a complete example.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • @eryksun: I generally show the `LoadLibrary` call in examples because that's what the first non-Windows example in the docs does and I don't want to explain irrelevant stuff. But now that I think about it, that's kind of silly, especially since the very next line shows the simpler example. Thanks! – abarnert Jan 22 '14 at 23:53
  • I do like this approach, but I was hoping for a solution that didn't require per-function code. This is more elegant than inserting `value = value.encode('ascii')` in every function definition, but I still wonder if I could do better by altering the ctypes definition itself? Rather than `ViString = _ctypes.c_char_p` something like `ViString = _ctypes.my_type` where my_type inherits from c_char_p but encodes as ascii first? – Matthew Lawson Jan 23 '14 at 23:51
  • @MatthewLawson: I'm not sure I understand what you're asking. What is `ViString`? It appears to be just another name for the `c_char_p` type, so… how are you using that? More importantly: You _should_ be setting the `argtypes` for every C function that you use via ctypes (otherwise, things will _often_ happen to work, when there aren't too many args, and they'll all exactly the same size as int, and you're lucky… which is usually not good enough). So, how is it any harder to put `Asciifier` or whatever there than `ctypes.c_char_p`? – abarnert Jan 24 '14 at 00:04
  • @MatthewLawson: Actually, looking at [your code on github](https://github.com/thevorpalblade/pyvisa/blob/master/pyvisa/vpp43.py), what you use `ViString` for _is_ to set `argtypes` (via a `__set_argument_types` wrapper method) on your ctypes functions, and in the `get_attribute`/`set_attribute` functions (which don't actually use the value, just check whether it's the value you stored), so… can't you just `ViString = Asciifier` and change nothing else? (You might also be using it in other places; I didn't download and ack your code or anything…) – abarnert Jan 24 '14 at 00:12
  • This library (the VISA library) requires all of its own custom types (bleh), so we need to define these custom Vi types to pass to the library. In this case, yeah, viString is just defined as c_char_p. __set_argument_types() is currently used on every function to set the expected argument types for each function, for example: `self.__set_argument_types("viFindRsrc", [ViSession, ViString, ViPFindList, ViPUInt32, ViAChar])` Where, for example, ViSession is (in other code) just defined as a ViString... Your solution may be the best way to go, I'm just still looking for a lazier way. – Matthew Lawson Jan 24 '14 at 00:22
  • But if you just set `ViString = Asciifier` instead of `ViString = c_char_p`, does that break anything? If so, you _can_ actually inherit from `c_char_p` and just override the `from_param` (you'll usually also have to do the "`return self` and make `_as_parameter_` trick rather than my shortcut, in that case). – abarnert Jan 24 '14 at 00:30
  • If it's also used as a return type, and you want it to return `str` rather than `bytes`, that's a bit trickier; you _can_ fake the `restype` as well, but that only works if the actual restype happens to be bit-compatible with `int` (which will be true on many 32-bit platforms, but no 64-bit platforms). Instead, what you'd probably want to do is leave the `restype` machinery alone, letting `ViString` act just like a `c_char_p`, but then loop over all the functions and `if f.restype == ViString: f.errcheck = MyStringErrCheck`. – abarnert Jan 24 '14 at 00:32
  • If I set `ViString = Asciifier` I get: TypeError: _type_ must have storage info Which I guess I would expect, because Asciifier does not currently seem to store data? Am I missing something there? And, sadly, as this is my first real foray into ctypes I don't quite follow on the "return self and make _as_parameter_ trick" bit, could you elucidate? Thanks for this, I feel like we are getting close :-) – Matthew Lawson Jan 24 '14 at 00:35
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/45962/discussion-between-matthew-lawson-and-abarnert) – Matthew Lawson Jan 24 '14 at 00:39
  • @abarnert: For modifying return values without setting `errcheck`, there's the `_check_retval_` hook, which takes the C function's return value as an argument and can return whatever you want. It's undocumented, unlike `errcheck`, but used by `ctypes.OleDLL`, and also by `numpy.ctypeslib.ndpointer` for returning a NumPy array. The `restype` descriptor looks for `_check_retval_` as an attribute of the assigned type, so the function pointer's `checker` gets updated whenever `restype` is modified. – Eryk Sun Mar 08 '14 at 17:00