0

I have to read a file in python that uses Microsoft VARIANT (I think - I really don't know much about Microsoft code :S). Basically I want to know if there are python packages that can do this for me.

To explain - the file I'm trying to read is just a whole bunch of { 2-byte integer, <data> } repeated over and over, where the 2-byte integer specifies what the <data> is.

The 2-byte integer corresponds to the Microsoft data types in VARIANT: VT_I2, VT_I4, etc, and based on the type I can write code to read in and coerce <data> to an appropriate Python object.

My current attempt is along the following lines:

while dtype = file.read(2):
    value = None

    # translate dtype (I've put in VT_XX myself to match up with Microsoft)
    if dtype == VT_I2:
        value = file.read(2)
    elif dtype == VT_I4:
        value = file.read(4)
    # ... and so on for other types

    # append value to the list of values

# return the values we read
return values

The thing is, I'm having trouble working out how to convert some of the bytes to the appropriate Python object (for example VT_BSTR, VT_DECIMAL, VT_DATE). However before I try further, I'd like to know if there are any existing python packages that do this logic for me (i.e. take in a file object/bytes and parse it into a set of python objects, be they float, int, dates, strings, ...).

It just seems like this is a fairly common thing to do. However, I've been having difficulty looking for packages to do it because not knowing anything about Microsoft code, I don't have the terminology to do the appropriate googling. (If it is relevant, I am running LINUX).

pnuts
  • 58,317
  • 11
  • 87
  • 139
mathematical.coffee
  • 55,977
  • 11
  • 154
  • 194
  • I'm assuming this is meant to be pseudocode, since `while dtype = file.read(2)` is not legal python. – Joel Cornett Jul 30 '12 at 02:40
  • Have you tried hachoir (https://bitbucket.org/haypo/hachoir/wiki/Home)? Works on linux and has support for some MS file types. Might be worth a look in case your particular format is covered. – azhrei Jul 30 '12 at 03:50
  • [OleFileIO_PL](http://www.decalage.info/python/olefileio) – ephemient Jul 30 '12 at 04:04
  • @ephemient - I am currently using OleFileIO_PL, but it doesn't cover parsing the streams as I have described here (the file type I am trying to parse is a little bit non-standard). The file type is ZVI - a microscope file format. It is encoded as a OLE object (hence OleFileIO_PL), but the streams are just endless `{2-byte-integer-which-is-thetype, data}` one after the other so the usual `getproperties` from `OleFileIO_PL won't do; hence me trying to write my own parser. – mathematical.coffee Jul 30 '12 at 04:09
  • @azhrei - thanks, will have a look at Hachoir. Fingers crossed! – mathematical.coffee Jul 30 '12 at 04:10

1 Answers1

0

The win32com package in pywin32 will do just that for you. The documentation is quite underwhelming, but there's a lot variant.html included explaining the basic use and a lot of tutorials and references online.

SilverbackNet
  • 2,076
  • 17
  • 29