I am new to scripting and trying to read the .gz file and copy the lines if it contains "Alas!" in its file. myfiles/all*/input.gz
. In the mentioned path it should search for all the directories that starts with (all). for an input.gz file. In input.gz file it should search for a string "Alas!" and print the lines in a text file. I am sure how to do this linux using zgrep
command
zgrep 'Alas!' myfiles/all*/input.gz > file1.txt
. I lost somewhere while trying to write a script for this.

- 1
- 7
-
`.gz` has multiple files in it or a single file - `input.txt`? – bigbounty Jul 21 '20 at 03:05
-
2You will need to uncompress the file before searching through the file. You can do this by opening the file with `gzip.open` (see https://docs.python.org/3/library/gzip.html#gzip.open for more information) – jkr Jul 21 '20 at 03:06
-
@bigbounty there are multiple files that start with all such as "all_phpfiles", "all_csvfiles". In each file that starts with "all" it should go into the file and search for .gz file. In .gz file it should search for a string "Alas!" . I am not sure what is present in the .gz file – perkins royal Jul 21 '20 at 03:10
-
1@bigbounty `.gz` is a pure compression format, not an archive format; a `.gz` file cannot contain multiple files (other than by containing an archive file with such structure, like `.tar.gz`). – tripleee Jul 21 '20 at 05:10
2 Answers
The .gz
file is compressed, so you cannot search for contents by opening it directly. You will need to uncompress it before searching. Python provides gzip.open
to open and decompress gzip-compressed files.
import gzip
files = glob.glob('myfiles/all*/input.gz')
for file in files:
with gzip.open(file, 'rt') as f, open('file1.txt', 'w') as o:
for line in f:
if 'Alas!' in line: # Changed this
print(line, file=o)
You also need to change if 'Alas!'
to if 'Alas!' in line
. The former always evaluates to True
, so every line will be added to the other file. The latter will add a line to the other file only if Alas!
is found in the line.
For what it's worth, zgrep
works in a similar way. It uncompresses the file and then pipes that to grep
(see https://stackoverflow.com/a/45175234/5666087).

- 17,119
- 2
- 42
- 68
-
Is is throwing error when I am trying to open the file saying "TypeError: filename must be str or byte object, or a file" – perkins royal Jul 21 '20 at 03:14
-
-
See my edits. You would have gotten the same error in the code in your question. You need to iterate over the `glob.glob` result, because it returns a list. – jkr Jul 21 '20 at 03:19
-
How do we uncompress the .gz file to read the file. I used gzip.open option but i couldn't be able to go through the .gz file – perkins royal Jul 21 '20 at 03:20
-
It might have to do with opening the file in read text mode. See my edited answer. – jkr Jul 21 '20 at 03:26
-
-
1
The statement
if 'Alas!':
merely checks if the string value 'Alas!'
is "truthy" (it is, by definition); you want to check if the variable line
contains this substring;
if 'Alas!' in line:
Another problem is that you are opening the output file multiple times, overwriting any results from previous input files. You want to open it only once, at the beginning (or open for appending; but repeatedly opening and closing the same file is unnecessary and inefficient).
A better design altogether might be to simply print to standard output, and let the user redirect the output to a file if they like. (Also, probably accept the input files as command-line arguments, rather than hardcoding a fugly complex relative path.)
A third problem is that the input line already contains a newline, but print()
will add another. Either strip the newline before printing, or tell print
not to supply another (or switch to write
which doesn't add one).
import gzip
import glob
with open('file1.txt', 'w') as o:
for file in glob.glob('myfiles/all*/input.gz'):
with gzip.open(file, 'rt') as f:
for line in f:
if 'Alas!' in line:
print(line, file=o, end='')

- 175,061
- 34
- 275
- 318
-
Your "improved" code shows `if 'Alas!' in f:` which is still wrong. Are you sure you tried exactly this? – tripleee Jul 21 '20 at 06:46
-
how do we print the directory name it found 'Alas!' which means in which directory of all* – perkins royal Jul 21 '20 at 17:33