So far I've tried to decompress using mac command line tool bzip2 -dc
, it throws this error : 'huff+mtf data integrity (CRC) error in data', I've even tried recovering files using bzip2recover
command it converted my 4 mb file into 6000 small bz2 files with success message however decompression of all those files failed with the same error
Using python bz2 package : this throws error 'IO error Invalid data stream'
Using Apache nifi, It says Java.io.exception unexpected end of stream.
Data was fed to GSC bucket using this chain 'Palo alto block' -> 'pubsub' -> 'GCS bucket'.
All this indicates that data might be corrupt but I am not sure(can't simply blame Pub Sub). has anyone ever faced similar situation? any kind of help will be appreciated.
you can find a sample bz2 file here