Pipe files included in tar.gz file to c++ program in bash

Question

I have a C++ program that can be run in the following format (two cases):

./main 10000 file1.txt

OR

./main 10000 file1.txt file2.txt

where file1.txt and file2.txt are huge text files. I have the file.tar.gz that basically may include:

Just one file (file1.txt)
Two files file1.txt and file2.txt

Is there a way in bash to use pipe to read the files from the gz file directly, for both cases, i.e., even if the gz file contains one or two files? I have checked at Pipe multiple files (gz) into C program but I am not too savvy at bash and thus, I have trouble understanding the answers there.

A gz file cannot contain more than one file (unless that file is an archive containing other files, which is a slightly different problem). — davmac, Apr 16 '16 at 09:13
Also, not enough information. What does your program "main" actually do? Will it accept input via stdin? Are separate files treated individually or are there contents effectively concatenated? etc. — davmac, Apr 16 '16 at 09:14
"Will it accept input via stdin"? No, it needs to take the (one) two filenames from the args and read the files. No, the files cannot be concatenated but have to be read differently — Alexandros, Apr 16 '16 at 09:17
A pipe is redirecting `stdout` from one process to `stdin` in another process, so if you cannot read from `stdin` you cannot use `bash` pipes. You can do something similar though with fifos which give you pipe-like functionality on named (fifo) files. — totoro, Apr 16 '16 at 10:45
@ThomasChristensen technically a fifo _is_ a pipe (a _named_ pipe). — davmac, Apr 16 '16 at 15:45
@davmac Sounds reasonable, so I redact the "-like" in my previous comment :-) — totoro, Apr 16 '16 at 15:48

davmac · Accepted Answer · 2016-04-16T12:11:17.103

This isn't going to be particularly simple. Your question is really too broad, as it stands. One approach would be:

Determine whether the archive contains one or two files
Set up named pipes (fifos) for each of the files ("mkfifo" command)
Run commands to output the content of the files in the archive to the appropriate fifo, as a background process
Run the primary command, specifying the fifos as the filename arguments

Giving a full rundown of all of this is, I think, beyond what should be the scope of a Stackoverflow question. For (1), you could probably do something like:

FILECOUNT=`tar -vzf (filename.tar.gz) | wc -l`

This lists the files within the archive (tar -vzf) and counts the number of lines of output from that command (wc -l). It's not foolproof but should work if the filenames are simple names like the ones you suggested (file1.txt, file2.txt).

For (2), make either one or two fifos as appropriate:

mkfifo file1-fifo.txt
if [ $FILECOUNT = 2 ]; then
    mkfifo file2-fifo.txt
fi

For (3), use tar with -O to extract file contents from the archive, and redirect it to the fifo(s), as a background process:

tar -O -xf (filename.tar.gz) file1.txt > file1-fifo.txt &
if [ $FILECOUNT = 2 ]; then
    tar -O -xf (filename.tar.gz) file2.txt > file2-fifo.txt &
fi

And then (4) is just:

SECONDFILE=""
if [ $FILECOUNT = 2 ]; then
    SECONDFILE=file2-fifo.txt
fi
 ./main 1000 file1-fifo.txt $SECONDFILE

Finally, you should delete the fifo nodes:

rm file1-fifo.txt
rm file2-fifo.txt

Note that this will involve extracting the archive contents twice (in parallel), once for each file. There's no way (that I can think of) of getting around this.

Thanks for your time and effort. I will know beforehand if the tar.gz will contain one or two files and therefore your solution could be further simplified. — Alexandros, Apr 16 '16 at 11:33
This is a brilliant solution. And I don't think you have to run `tar` twice to extract the files separately. If you know the names of the files in the tar file (perhaps from the `tar t` pass you used to count them), you ought to be able to *constuct the fifos with those names* and have a regular `tar x` invocation (without the `-O`) extract on top of them. — Steve Summit, Apr 22 '16 at 18:07
Addendum to my previous comment: `tar` is sometimes reluctant to extract "on top of" an exiting file (in this case, a named pipe). So you might need a special option to make it work. See here: https://en.wikipedia.org/wiki/Wikipedia:Reference_desk/Archives/Computing/2016_March_23#tar_x_breaking_symlinks.3F — Steve Summit, Apr 22 '16 at 18:15

Pipe files included in tar.gz file to c++ program in bash

1 Answers1