0

We use struct.unpack to read a binary file created out of a dump of all the C structures fields and their values (integers and strings). The unpacked tuples are then used to create an intermediate dictionary representation of the fields and their values, which is later written to a text file output.

The text file output displays the strings as below:

ID = b'000194901137\x00\x00\x00\x00' 
timestampGMT = 1489215906
timezoneDiff = -5
timestampPackage = 1489215902
version = 293
type = b'FULL\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

The program was earlier written in python 2.6, where it used to work fine. We had used the below lambda expression to remove the unwanted hex characters, while writing to the text file :

filtered_string = filter(lambda x: x in string.printable, line)

Moving the to Python 3.5, the lambda expression isn't supported anymore, since it now returns a filter which can't be converted to a string easily.

What is the Pythonic way to convert these binary string literals to equivalent ascii text ( without the trailing NUL'\x00'), so its written as normal strings values.

Also, since there are multiple thousand of entries to be processed for each file ( again there are multiple files ), looking for some best possible solutions in the current context.

Rohit
  • 1
  • 3
  • Is it just nulls you're dealing with - eg: would `type.rstrip(b'\x00')` work or does it have to be a printable check... Can't quite tell if you want a `str` rather than a `bytes` afterwards either, so you've also got an option of `''.join(ch for ch in text.decode('ascii') if ch.isprintable())` I guess... – Jon Clements May 12 '17 at 08:29
  • Its only NULL's . Since its trying to print the whole string we print the trailing NULL's too. Tried rstrip() , but it strips off the carriage return character too. – Rohit May 12 '17 at 08:33
  • It should if you pass what you want to strip to rstrip as I posted above: `type.rstrip(b'\x00')` – Jon Clements May 12 '17 at 08:41
  • Don't think rstrip likes that :- line = line.rstrip(b'\x00') TypeError: rstrip arg must be None or str line = line.rstrip('\x00') This works but doesn't do anything. line = str(line, 'utf-8') TypeError: decoding str is not supported – Rohit May 12 '17 at 08:56
  • So you're trying to strip the source byte data itself and not the unpacked elements? That doesn't seem to make much sense as I'd have thought you'd want to unpack, then strip each unpacked element? – Jon Clements May 12 '17 at 09:08
  • Something like: `dictionary_name.update((k, v.rstrip(b'\x00') for k, v in dictionary_name.items() if isinstance(v, bytes))` – Jon Clements May 12 '17 at 09:10
  • Its unpacked data which is part of the dictionary, I'm trying to strip it while writing to a text file. for k in output: v = struct_values[k] line = struct_name + ": " + "%s = %s\n" % (k,v) self.text_data.write(line) Ofcourse we can do that while inserting into the dictionary. We might just have to remove the NULL's from the strings. – Rohit May 12 '17 at 09:17
  • Ahhh... So now we know where line is coming from and what you're doing with it (writing to a file) - you should [edit] your question to include that... Also what is "output"? Is it the dictionary itself? – Jon Clements May 12 '17 at 09:21
  • Right, we are unpacking from binary file, mapping it to the field names and then printing each of them to a text file. `output` is a tuple of output strings of field names. The set of fields we will be writing to the text file. `struct_values` is the dictionary, created from the unpacked values from the binary file. – Rohit May 12 '17 at 09:34
  • Okay - update your question - with all your comments etc... If you do that - it sounds like you could end up with an answerable question. – Jon Clements May 12 '17 at 09:37

1 Answers1

0

In Python 2 you could use the str type for both text and binary data interchangeably and it worked fine. From Python3 binary data read is of type bytes, and it doesn't share a common base class as in Python 2.

$ python3
Python 3.5.0 (default, Sep 15 2015, 13:42:03) 
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> type(b'aaa')
<class 'bytes'>
>>> type(b'aaa').__mro__
(<class 'bytes'>, <class 'object'>)
>>> type('aaa')
<class 'str'>
>>> type('aaa').__mro__
(<class 'str'>, <class 'object'>)

$ python
Python 2.6.6 (r266:84292, Nov 21 2013, 10:50:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> type(b'aaa').__mro__
(<type 'str'>, <type 'basestring'>, <type 'object'>)
>>> type('aaa').__mro__
(<type 'str'>, <type 'basestring'>, <type 'object'>)

Strings encoded in the binary file are read in as bytes type string literals, which need to be converted to the str (Unicode) type to be displayed/written to a file as normal strings.

After I retrieve the tuple from struct.unpack() , I do the following :

  valTuple = struct.unpack(fmt, self.data[off : goff + struct_size])

  valList = list(valTuple)
  for i in range(len(valList)):
    if type(valList[i]) == bytes:
      valList[i] = valList[i].rstrip(b'\x00').decode()

Read this https://docs.python.org/3/howto/pyporting.html#text-versus-binary-data

Rohit
  • 1
  • 3