Decompressing large files using multiple computers

Question

We have to deal with extracting gzip/bzip files over the internet, sometimes they are way over multiple gigabytes (eg. 15gb wiki dump).

Is there a way that those can be extracted by multiple computers instead of by one? Perhaps reading the header plus the bytes between X and Y by each node in the cluster, writing it into a shared folder?

Or any other way that can accelerate that process?

score 0 · Answer 1 · answered Aug 06 '18 at 23:47

0

Have you considered using a parallelized alternative to gzip/bzip?

In the scenario that you are using bzip, pbzip2 is a parallelized alternative using pthreads to speedup download. In addtion, a parallel alternative to gzip is pgzip.

answered Aug 06 '18 at 23:47

alephnerd

2,176
1
8
6

thanks but this only accelerates within one computer, what we want to achieve is enable clusters of nodes digesting one file. – Devrim Aug 07 '18 at 04:34

Decompressing large files using multiple computers

1 Answers1