11

I'm using the code below to extract .tgz files. The type of log files (.tgz) that I need to extract have sub-directories that have other .tgz files and .tar files inside them. I want to extract those too.

Ultimately, I'm trying to search for certain strings in all .log files and .txt files that may appear in a .tgz file.

Below is the code that I'm using to extract the .tgz file. I've been trying to work out how to extract the sub-files (.tgz and .tar). So far, I've been unsuccessful.

import os, sys, tarfile

try:
    tar = tarfile.open(sys.argv[1] + '.tgz', 'r:gz')
    for item in tar:
        tar.extract(item)
    print 'Done.'
except:
    name = os.path.basename(sys.argv[0])
    print name[:name.rfind('.')], '<filename>'
TRiG
  • 10,148
  • 7
  • 57
  • 107
suffa
  • 3,606
  • 8
  • 46
  • 66
  • 6
    This seems to be a great Use Case for a recursion. You provide the first tarfile to the function and if it encounters another tar-file, the function is calls itself with the new tar file. If you find a log-file, you can invoke another function that handles logfiles. – Jacob May 19 '11 at 12:48

1 Answers1

13

This should give you the desired result:

import os, sys, tarfile

def extract(tar_url, extract_path='.'):
    print tar_url
    tar = tarfile.open(tar_url, 'r')
    for item in tar:
        tar.extract(item, extract_path)
        if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
            extract(item.name, "./" + item.name[:item.name.rfind('/')])
try:

    extract(sys.argv[1] + '.tgz')
    print 'Done.'
except:
    name = os.path.basename(sys.argv[0])
    print name[:name.rfind('.')], '<filename>'

As @cularis said this is called recursion.

berni
  • 1,955
  • 1
  • 19
  • 16
  • The code unzips the .tgz file and dislays a folder - 'storage', and in that folder there are two other folders = 'Folder1' & 'Folder'2, both which have .tgz files and .tar files that have not been extracted. The above code is only unzipping the main .tgz file, but not files in subfolders. – suffa May 19 '11 at 14:16
  • 1
    Sorry, I forgot about the tar files. Code updated. Nevertheless it was unzippping .tgz in subfolders. Now it is working both for .tar and .tgz files nested in archive. – berni May 19 '11 at 16:01
  • how would I execute this same code as a script instead of from the cmd line? Thanks! – suffa May 20 '11 at 19:04
  • What do you mean by saying to execute it as a script? How do you want to start it and how would you like the script to behave? – berni May 20 '11 at 23:52