I have basically the same question as this, but rather than using awk
I'd like to use Python, assuming it's not substantially slower than using some other method. I was thinking of reading line by line and compressing on the fly, but then I came across this post, and it sounds like that would be a bad idea (compression not very efficient). I came across this nice-looking gzip built-in Python library, so I'm hoping there is some clean, fast, and efficient pythonic way to do this.
I want to go from this:
gzcat file1.gz
# header
1
2
to this:
# header
1
2
1
2
1
2
1
2
I have a few hundred files, and the total uncompressed is about 60 GB. The files are gzipped CSV files.