1

I got several strings like :

BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\...

After googling a whlke, found that those seems to be Python bz2 encoded strings, since "BZh91AY" appears to be a bz2 standard header.

Now I need to decode such strings. I've tried some components (SharpCompress and SharpZipLib) to attempt the decoding, but failed miserably.

Can someone point me towards a viable solution that doesn't involve python coding (I don't want to create a python's bz2 wrapper with IronPython)?

Thanks.

falsetru
  • 357,413
  • 63
  • 732
  • 636
HiperiX
  • 395
  • 5
  • 19
  • See http://stackoverflow.com/a/6708486/576932 – ford Oct 21 '13 at 17:35
  • 1
    There is nothing special about Python's Bz2 implementation; it uses the same C libraries as everyone else. Are you referring to the `\xHH` byte escapes perhaps? Those are just representations of specific byte values; in Python that'd be because you are printing a representation of the string that can be round-tripped to Python literals again. – Martijn Pieters Oct 21 '13 at 17:38
  • Are you able to decompress them with Python? It's possible that they're bad strings. – ford Oct 21 '13 at 17:39
  • Unless you find those escape codes as *literal* characters (so 4 characters each, a `\`, an `x` and two hex digits), you have *normal* bz2 data. – Martijn Pieters Oct 21 '13 at 17:39
  • @ford Yes, I've found that python decompress it. – HiperiX Oct 21 '13 at 17:41
  • @MartijnPieters: // convert string to stream var byteArray = Encoding.ASCII.GetBytes(encodedString); var stream = new MemoryStream(byteArray); Even tried to make them streams to use with those libraries (and tried both UTF8 and ASCII encoding) – HiperiX Oct 21 '13 at 17:41
  • How did you *create* the stream or file, in Python? – Martijn Pieters Oct 21 '13 at 17:50
  • And a proper Bz2 stream is *not* ASCII! It is full 8-bit data. – Martijn Pieters Oct 21 '13 at 17:51
  • @MartijnPieters can you please give me a sample? – HiperiX Oct 21 '13 at 18:18
  • How did you obtain the `BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02` information? To me that looks like a series of normal bytes, directly decompressable as a Bz2 compression stream. I don't know how you'd handle byte streams in C#, but it is **not** a text stream. `Encoding.ASCII.GetBytes` sounds as if you are encoding text to bytes here, are you not able to get bytes directly? – Martijn Pieters Oct 21 '13 at 19:00

1 Answers1

0
import bz2

un = b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'

pw = b'BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03$ \x00!\x9ah3M\x13<]\xc9\x14\xe1BBP\x91\xf08'

print(bz2.decompress(un).decode('utf-8'))
print(bz2.decompress(pw).decode('utf-8'))

# pw = 'huge'
# un = 'file'
Zoe
  • 27,060
  • 21
  • 118
  • 148