-1

We have to deal with extracting gzip/bzip files over the internet, sometimes they are way over multiple gigabytes (eg. 15gb wiki dump).

Is there a way that those can be extracted by multiple computers instead of by one? Perhaps reading the header plus the bytes between X and Y by each node in the cluster, writing it into a shared folder?

Or any other way that can accelerate that process?

Devrim
  • 2,826
  • 5
  • 25
  • 31

1 Answers1

0

Have you considered using a parallelized alternative to gzip/bzip?

In the scenario that you are using bzip, pbzip2 is a parallelized alternative using pthreads to speedup download. In addtion, a parallel alternative to gzip is pgzip.

alephnerd
  • 2,176
  • 1
  • 8
  • 6
  • thanks but this only accelerates within one computer, what we want to achieve is enable clusters of nodes digesting one file. – Devrim Aug 07 '18 at 04:34