0

I am new to python and we had been trying to use lzw code from GIT in the program. https://github.com/joeatwork/python-lzw/blob/master/lzw/init.py

This is working well if we have a smaller blob but if the blob size increases it doesn't decompress the blob. So I had been reading the documentation but I am unable to understand the below which might be the reason why the full blob is not getting decompressed.

I have also attached a strip of the python code I am using.

Our control codes are
    - CLEAR_CODE (codepoint 256). When this code is encountered, we flush
      the codebook and start over.
    - END_OF_INFO_CODE (codepoint 257). This code is reserved for
      encoder/decoders over the integer codepoint stream (like the
      mechanical bit that unpacks bits into codepoints)
When dealing with bytes, codes are emitted as variable
length bit strings packed into the stream of bytes.
codepoints are written with varying length
    - initially 9 bits
    - at 512 entries 10 bits
    - at 1025 entries at 11 bits
    - at 2048 entries 12 bits
    - with max of 4095 entries in a table (including Clear and EOI)
code points are stored with their MSB in the most significant bit
available in the output character.

My code strip :

def decompress_without_eoi(buf):
    # Decompress LZW into a bytes, ignoring End of Information code
    def gen():
        try:
            for byte in lzw.decompress(buf):
                yield byte
        except ValueError as exc:
            #print(repr(exc))
            if 'End of information code' in repr(exc):
                #print('Ignoring EOI error..\n')
                pass
            else:
                raise
            return
    try:
        #print('Trying a join..\n')
        deblob = b''.join(gen())
    except Exception as exc2:
        #print(repr(exc2))
        #print('Trying byte by byte..')
        deblob=[]

        try:
            for byte in gen():
                deblob.append(byte)
        except Exception as exc3:
            #print(repr(exc3))
            return b''.join(deblob)
    return deblob
     #current function to deblob
     def deblob3(row):
    if pd.notnull(row[0]):
        blob = row[0]

        h = html2text.HTML2Text()
        h.ignore_links=True
        h.ignore_images = True #zzzz


        if type(blob) != bytes:
            blobbytes = blob.read()[:-10]
        else:
            blobbytes = blob[:-10]

        if row[1]==361:
            # If compressed, return up to EOI-257 code, which is last non-null code before tag
       #     print (row[0])
            return h.handle(striprtf(decompress_without_eoi(blobbytes)))
        elif row[1]==360:
            # If uncompressed, return up to tag
            return h.handle(striprtf(blobbytes))

This function has been called as per below

nf['IS_BLOB'] = nf[['IS_BLOB','COMPRESSION']].apply(deblob3,axis=1)
ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
Doodle
  • 481
  • 2
  • 7
  • 20

0 Answers0