10

I have a tar archive in which I have a directory which I need to extract in a given directory. For example: I have a directory

TarPrefix/x/y/z

in a tar archive I want to extract it in a given target directory for example: extracted/a/ this directory should contain all the files and directories contained in directory TarPrefix/x/y/z.

subdir_and_files = [  tarinfo for tarinfo in tar.getmembers()
                      if tarinfo.name.startswith("subfolder/")
                   ]

to get the list of all the members in the directory path "subfolder/" and then I extract it using tar.extractall(extracted/a,subdir_and_files) but it extracts all the members with their directory path For example this results in extracted/a/x/y/z. Could you please help me in extracting these files in the given folder.

gaurav
  • 872
  • 2
  • 10
  • 25
  • I don't know, but this question seems to be sort of the opposite of yours: http://stackoverflow.com/questions/2239655/python-tarfile-adding-files-without-directory-hiearchy Maybe you can use extract() rather than extractall() and see what you can make happen, possibly by modifying the TarInfo objects you got in subdir_and_files? – John Zwinck Nov 24 '11 at 17:21
  • 1
    Sorry to ask a beginner's question. I am a beginner in python and did not found any answer on google that's why asked such a question. To help others I want to answer this question. You just need to change the tarinfo.name attribute value to the correct value. i.e. in my given example `tarinfo.name=tarinfo.name[len(Tarprefix/x/y/z):]` and then using the same code works. – gaurav Nov 24 '11 at 17:38
  • I tried to answer my own question but I am not allowed till eight hours so was waiting till then. – gaurav Nov 24 '11 at 17:38
  • I think the answer is not easy, the way how it [tarlib] is done it not good (tarinfo tells about object in archive, not about extracted object). I think it should be better api for extracting stuff (tar command for example has --strip-components parameter) – spinus Jan 15 '13 at 15:25

2 Answers2

17

Looks like you may have already found an answer, but here's my version anyway:

import sys, tarfile

def get_members(tar, prefix):
    if not prefix.endswith('/'):
        prefix += '/'
    offset = len(prefix)
    for tarinfo in tar.getmembers():
        if tarinfo.name.startswith(prefix):
            tarinfo.name = tarinfo.name[offset:]
            yield tarinfo

args = sys.argv[1:]

if len(args) > 1:
    tar = tarfile.open(args[0])
    path = args[2] if len(args) > 2 else '.'
    tar.extractall(path, get_members(tar, args[1]))
ekhumoro
  • 115,249
  • 20
  • 229
  • 336
  • Thanks for helping. Ya I found the answer by experimenting with stuff :). Anyways thanks a lot. – gaurav Nov 25 '11 at 03:44
4
with tarfile.open('sourcefile.tgz', 'r:gz') as _tar:
    for member in _tar:
      if member.isdir():
         continue
      fname = member.name.rsplit('/',1)[1]
      _tar.makefile(member, 'desination_dir' + '/' + fname)
Gerrit
  • 41
  • 1