1

I've got 30,000 folders and each folder contains 5 bz2 files of json data.

I'm trying to use os.walk() to loop through the file path and decompress each compressed file and save in the original directory.

import os
import bz2

path = "/Users/mac/PycharmProjects/OSwalk/Data"
for(dirpath,dirnames,files) in os.walk(path):

for filename in files:
    filepath = os.path.join(dirpath , filename)
     newfilepath = os.path.join(dirpath , filename + '.decompressed')

        with open(newfilepath , 'wb') as new_file , 
          bz2.BZ2File(filepath , 'rb') as file:

              for data in iter(lambda: file.read(100 * 1024) , b''):
                  new_file.write(data)

I'm getting the following error running the code.

File 
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/_compr 
ession.py", line 103, in read
data = self._decompressor.decompress(rawblock, size)
OSError: Invalid data stream

I've read that there can be an issue running the code on mac with decompressor method or am I missing something else?

tomoc4
  • 337
  • 2
  • 10
  • 29

1 Answers1

0

It looks like you might be trying to decompress your already decompressed results. You should filter them out.

import os
import bz2

path = "/Users/mac/PycharmProjects/OSwalk/Data"
for (dirpath, dirnames, files) in os.walk(path):
    for filename in files:
        # filter out decompressed files
        if filename.endswith('.decompressed'):
            continue

        filepath = os.path.join(dirpath, filename)
        newfilepath = os.path.join(dirpath, filename + '.decompressed')

        with open(newfilepath, 'wb') as new_file,
            bz2.BZ2File(filepath, 'rb') as file:

            for data in iter(lambda: file.read(100 * 1024), b''):
                new_file.write(data)
kichik
  • 33,220
  • 7
  • 94
  • 114
  • I added in your if statement but the code still doesn't run. I'm wondering is it with the bz2 import module? – tomoc4 Dec 07 '17 at 23:23
  • That depends. How do you compress these files? – kichik Dec 07 '17 at 23:25
  • I downloaded the files from web which came in .tar format. I converted the tar to a normal folder directory. The compression was done from the server I presume – tomoc4 Dec 07 '17 at 23:31
  • I don't think bunzip2 works on mac. Seems like a linux package. I tried pip installing and got a does not satisfy message. Could not find a version that satisfies the requirement bunzip2 (from versions: ) No matching distribution found for bunzip2 – tomoc4 Dec 11 '17 at 20:26