1

So, I have the following situation:

A code which produces a large (must be zipped) set of outputs as follows:

line00
line01
...
line0N
.
line10
line11
...
line1M
.
...

I generate this content and zip it with:

./my_cmd | gzip -9 > output.gz

What I would like to do is, in pseudo code:

./my_cmd \
| csplit --prefix=foo '/^\.$/+1' {*} \  # <-- this will just create files
| tar -zf ??? \                 # <-- don't know how to link files to tar
| gzip -9 > output.tar.gz

Ideally, nothing unzipped ever gets on the hard drive.

In summary: My objective is a set of files split at the delimiter on the hard drive in a zipped state, without intermediate read-write steps.

If I can't do this with tar/gzip/csplit, then maybe something else?

Chris
  • 28,822
  • 27
  • 83
  • 158

1 Answers1

2

Tar can handle the compression itself.

./my_cmd | csplit --prefix=foo - '/^\.$/+1' {*} ; # writes foo?? files 

printf "%s\n" foo[0-9][0-9] | tar czf output.tar.gz -T -
rm -f foo[0-9][0-9]  # clean up the temps     

If that's just not good enough, and you REALLY need that -9 compression,

printf "%s\n" foo[0-9][0-9] | 
    tar cOT -               |
    gzip -9 > output.tar.gz

Then you should be able to extract individual files from the archive for handling individually.

tar xvOf tst.tgz foo00 | wc -l

That lets you keep the file compressed, but pull out chunks to work on without writing them to disk.

Paul Hodges
  • 13,382
  • 1
  • 17
  • 36
  • I probably don't REALLY need it. Ha, but csplit does not spit out the file names. Guess awk it is. – Chris Jan 04 '19 at 21:47
  • Easy enough to do in consecutive steps. Why break a file up before putting it back into a zipped tarball, though? Wouldn't it be more efficient to just zip the whole file? Or did you want to be able to pull out a piece at a time to process? – Paul Hodges Jan 04 '19 at 21:50
  • Right, exactly: a piece at a time. – Chris Jan 04 '19 at 21:52
  • 1
    mostly, I was hoping to split it apart without smacking the hard drive. – Chris Jan 04 '19 at 21:53
  • I didn't test the `csplit` part, but if you break it into files then they have to be on the drive so `tar` can read the names. Take the `csplit` and the `tar` out and `gzip` can compress it's input. – Paul Hodges Jan 04 '19 at 22:02
  • yeah, I gotcha. Sounds like it has to write, unzipped, to the drive first, no matter what (unless I write something custom) – Chris Jan 04 '19 at 22:11
  • 1
    But I edited; as soon as it writes them to the archive you can remove all the subfiles. Alternately you could manually pull blocks of the file out and compress them...that would at least keep the data compressed on disk. YMMV. – Paul Hodges Jan 04 '19 at 22:13