-2

I am new to scripting and trying to read the .gz file and copy the lines if it contains "Alas!" in its file. myfiles/all*/input.gz. In the mentioned path it should search for all the directories that starts with (all). for an input.gz file. In input.gz file it should search for a string "Alas!" and print the lines in a text file. I am sure how to do this linux using zgrep command zgrep 'Alas!' myfiles/all*/input.gz > file1.txt. I lost somewhere while trying to write a script for this.

  • `.gz` has multiple files in it or a single file - `input.txt`? – bigbounty Jul 21 '20 at 03:05
  • 2
    You will need to uncompress the file before searching through the file. You can do this by opening the file with `gzip.open` (see https://docs.python.org/3/library/gzip.html#gzip.open for more information) – jkr Jul 21 '20 at 03:06
  • @bigbounty there are multiple files that start with all such as "all_phpfiles", "all_csvfiles". In each file that starts with "all" it should go into the file and search for .gz file. In .gz file it should search for a string "Alas!" . I am not sure what is present in the .gz file – perkins royal Jul 21 '20 at 03:10
  • 1
    @bigbounty `.gz` is a pure compression format, not an archive format; a `.gz` file cannot contain multiple files (other than by containing an archive file with such structure, like `.tar.gz`). – tripleee Jul 21 '20 at 05:10

2 Answers2

1

The .gz file is compressed, so you cannot search for contents by opening it directly. You will need to uncompress it before searching. Python provides gzip.open to open and decompress gzip-compressed files.

import gzip

files = glob.glob('myfiles/all*/input.gz')
for file in files:
    with gzip.open(file, 'rt') as f, open('file1.txt', 'w') as o:
        for line in f:
            if 'Alas!' in line: # Changed this
                print(line, file=o)

You also need to change if 'Alas!' to if 'Alas!' in line. The former always evaluates to True, so every line will be added to the other file. The latter will add a line to the other file only if Alas! is found in the line.

For what it's worth, zgrep works in a similar way. It uncompresses the file and then pipes that to grep (see https://stackoverflow.com/a/45175234/5666087).

jkr
  • 17,119
  • 2
  • 42
  • 68
0

The statement

    if 'Alas!':

merely checks if the string value 'Alas!' is "truthy" (it is, by definition); you want to check if the variable line contains this substring;

    if 'Alas!' in line:

Another problem is that you are opening the output file multiple times, overwriting any results from previous input files. You want to open it only once, at the beginning (or open for appending; but repeatedly opening and closing the same file is unnecessary and inefficient).

A better design altogether might be to simply print to standard output, and let the user redirect the output to a file if they like. (Also, probably accept the input files as command-line arguments, rather than hardcoding a fugly complex relative path.)

A third problem is that the input line already contains a newline, but print() will add another. Either strip the newline before printing, or tell print not to supply another (or switch to write which doesn't add one).

import gzip
import glob

with open('file1.txt', 'w') as o:
    for file in glob.glob('myfiles/all*/input.gz'):
        with gzip.open(file, 'rt') as f:
            for line in f:
                if 'Alas!' in line:
                    print(line, file=o, end='')

Demo: https://ideone.com/rTXBSS

tripleee
  • 175,061
  • 34
  • 275
  • 318