How to extract a portion (not beginning) of .gz file?

Question

I have a large gz file (11 GB) that I can't decompress to my computer with even 100GB free. I've extracted the first 50 GB with the command:

gzip -cd file.gz | dd ibs=1024 count=50000000 > first_50_GB_file.txt

I was able to successfully parse my data from this portion of the file. Now I want to extract the other portion of the file to parse. I've tried to extract the last n lines from the file and then to decompress that as follows:

tail -50 file.gz > last_part_of_file.gz

I hoped that afterwards, I could use:

gzip -cd last_part_of_file.gz | dd ibs=1024 count=50000000 > last_50_GB_file.txt

but the tail command is taking >10 minutes for a test of only 50 lines.

If anyone has any solutions on how to extract (potentially arbitrary) portions of a .gz file that do not include the beginning I would be very grateful.

Nahuel Fouilleul · Accepted Answer · 2017-05-10T14:27:10.043

3

tail can't work with binary file ; tail -50 returns the last 50 lines looking for '\n' (char 10) delimiter.

gzip -cd file.gz | dd ibs=1024 count=50000000 > first_50_GB_file.txt

gzip -cd file.gz | dd ibs=1024 skip=50000000 > after_50_GB_file.txt

I though first the extracted file size was 100GB. To limit space to 50GB

gzip -cd file.gz | dd ibs=1024 skip=50000000 count=50000000 > next_50-100_GB_file.txt

for next 50GB

gzip -cd file.gz | dd ibs=1024 skip=100000000 count=50000000 > next_100-150_GB_file.txt

but each time gzip process must inflate from the beginning of the archive file due to compression algorithm.

edited May 10 '17 at 14:27

answered May 10 '17 at 12:11

Nahuel Fouilleul

18,726
2
31
36

Thanks, now I understand why tail wasn't working. I tried this and didn't have much success. Using 'gzip -cd file.gz | dd ibs=1024 skip=50000000 > after_50_GB_file.txt' Took up all of the space on my disk. So I assumed that I would have to tell the command to stop after a certain number of blocks. I then tried: 'gzip -cd file.gz | dd ibs=1024 skip=49000000 count=50000000 > after_49_GB_next_50GB.txt' and this produced a file of 90 GB. Do you know what might be going on? – Will Gibson May 10 '17 at 12:33
what did you get ? – Nahuel Fouilleul May 10 '17 at 12:37
I was able to get it to work with: gzip -cd file.gz | dd ibs=1024 skip=49000000 count=50000000 of=after_49GB_next_50_GB_file.txt Thank you for your help! – Will Gibson May 10 '17 at 14:19
in fact as first 50GB have already been extracted exact dd parameters are skip=50000000 count=50000000 – Nahuel Fouilleul May 10 '17 at 14:23

How to extract a portion (not beginning) of .gz file?

1 Answers1