1

as the title says, I have several folders, several .ppm.bz2 files and I want to extract them exactly where they are using python.

Directory structure image

I am traversing in the folders as this:

 import tarfile
 import os
 path = '/Users/ankitkumar/Downloads/colorferet/dvd1/data/images/'
 folders = os.listdir(path)
 for folder in folders:  #the folders starting like 00001
     if not folder.startswith("0"):
         pass
     path2 = path + folder
     zips = os.listdir(path2)
     for zip in zips:
         if not zip.startswith("0"):
             pass
         path3 = path2+"/"+zip

         fh = tarfile.open(path3, 'r:bz2')
         outpath = path2+"/"
         fh.extractall(outpath)
         fh.close

`

then I get this error `

Traceback (most recent call last):
  File "ZIP.py", line 16, in <module>
    fh = tarfile.open(path3, 'r:bz2')
  File "/anaconda2/lib/python2.7/tarfile.py", line 1693, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/anaconda2/lib/python2.7/tarfile.py", line 1778, in bz2open
    t = cls.taropen(name, mode, fileobj, **kwargs)
  File "/anaconda2/lib/python2.7/tarfile.py", line 1723, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/anaconda2/lib/python2.7/tarfile.py", line 1587, in __init__
    self.firstmember = self.next()
  File "/anaconda2/lib/python2.7/tarfile.py", line 2370, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

`

1 Answers1

0

The tarfile module is for tar files, including tar.bz2. if your file is not tar you should use bz2 module directly.

Also, try using os.walk instead of multiple listdir as it can traverse the tree

import os
import bz2
import shutil

for path, dirs, files in os.walk(path):
    for filename in files:
        basename, ext = os.path.splitext(filename)
        if ext.lower() != '.bz2':
            continue
        fullname = os.path.join(path, filename)
        newname = os.path.join(path, basename)
        with bz2.open(fullname) as fh, open(newname, 'wb') as fw:
            shutil.copyfileobj(fh, fw)

This will uncompress all .bz2 files in all subfolders, in the same place they are. All other files will stay the same. If the uncompressed file already exists it will be overwritten.

Please backup your data before running destructive code

nosklo
  • 217,122
  • 57
  • 293
  • 297
  • but problem is not getting the bz2 file, I want the exact path for getting the file as well as extracting the ppm at that place. moreover after doing this, I got the error ` /Users/ankitkumar/Downloads/colorferet/dvd1/data/images/00658/00658_941121_hr.ppm.bz2 Traceback (most recent call last): File "File.py", line 12, in fh = tarfile.open(fullname) File "/anaconda2/lib/python2.7/tarfile.py", line 1680, in open raise ReadError("file could not be opened successfully") tarfile.ReadError: file could not be opened successfully ` –  Jul 24 '18 at 17:52
  • @ankiiiiiii That's what the code above will do - did you try it? The exact path is in the `path` variable. – nosklo Jul 24 '18 at 17:53
  • @ankiiiiiii you shouldn't use the tarfile module. I edited the answer. – nosklo Jul 24 '18 at 17:54
  • am I missing something? with bz2.open(fullname) as fh, bz2.open(newname, 'wb') as fw: AttributeError: 'module' object has no attribute 'open' –  Jul 24 '18 at 18:00
  • are you using python 3.3 or above? https://docs.python.org/3/library/bz2.html#bz2.open @ankiiiiiii if python 2 use `bz2.BZ2File` instead of `bz2.open` – nosklo Jul 24 '18 at 18:02
  • it is 2.7, should the file exist firsthand to write upon? as after changing open, it shows: permission denied IOError: [Errno 13] Permission denied: '/Users/ankitkumar/Downloads/colorferet/dvd1/data/images/00658/00658_941121_hr.ppm' –  Jul 24 '18 at 18:05
  • No, the file doesn't have to exist. That error means the user running the script has no permission to write to that folder - you need to give that user permission (or run the code with another user that can write there) – nosklo Jul 24 '18 at 18:06
  • okay... I have done sudo, its running, even copying took 3 minutes...its taking time. –  Jul 24 '18 at 18:09
  • @ankiiiiiii yup - `bz2` is a complex compression format and decompressing it takes time. – nosklo Jul 24 '18 at 18:11
  • the question is still open...files are corrupted. it is like just the extension has been removed without actually opening it. the file size also remains the same. –  Jul 24 '18 at 18:43
  • If you're using `bz2.BZ2File` then the files have been uncompressed. I actually tried the code in a folder here and it works fine @ankiiiiiii – nosklo Jul 24 '18 at 18:45
  • yes I used bz2.BZ2File is there something like closing the file too? or maybe with ppm file type –  Jul 24 '18 at 18:47
  • The `with` statement already closes the file for you, it is definitely working fine here. – nosklo Jul 24 '18 at 19:10