1

Instead of decompressing via a temporary file, I can use named pipes to read .csv.gz and .dta.gz files directly in Stata as explained here. I have two questions about how to use named pipes in Stata in case someone is knowledgeable about them.

  1. The help advises to do the following (edit: which indeed works for me)

     #!/bin/sh
     fname=$1
     rm -f mypipe.pip
     mknod mypipe.pip p
     zcat $fname > mypipe.pip &
     !myprog testfile.Z >& /dev/null < /dev/null
     infile a b c using mypipe.pip
    

    I'd like to understand why the following code does not work.

     !rm -f mypipe.pip && mknod mypipe.pip p && zcat filename.gz > mypipe.pip &
     infile a b c using mypipe.pip
    
  2. Is there is a similar way to use named pipes when saving and gzipping .dta files? I have tried to replicate the code above but without success.

Matthew
  • 2,628
  • 1
  • 20
  • 35
  • 2
    On #2 Stata can zip files by itself, so I don't see that you need a pipe at all. See e.g. http://blog.stata.com/tag/zip/ On #1 you don't say what "does not work" means but I guess wildly that Stata does not wait for the shell to finish before trying to `infile`. You might try wrapping the OS call inside another Stata program. Then Stata would (should?) be obliged to wait for that to finish before trying to `infile`. – Nick Cox Aug 27 '14 at 07:31
  • My second thoughts on #1 are that suggestion of mine can't help, as it would just create another version of the same problem. But I've done no testing. – Nick Cox Aug 27 '14 at 07:59
  • Hello Nick. Thanks. About #2, from what I understand, Stata command zipfile takes a .dta file and compress it. However, I want to compress directly the datataset I'm working with - without writing it on the drive first. This diminishes the I/O burden. About #2, the purpose of the amperand in mypipe.pip & is exactly to execute the next command without waiting for the command to finish ie I think we exactly want Stata to start to read before everything is unzipped in the pipe. – Matthew Aug 27 '14 at 11:51
  • OK; but your "does not work" still seems to be unexplained. I doubt that Stata will let you zip anything without saving it; there isn't a file that you can name. – Nick Cox Aug 27 '14 at 14:00
  • Hello Nick. I would like to do exactly what people can do in R: http://stackoverflow.com/questions/17492409/how-to-directly-perform-write-csv-in-r-into-tar-gz-format Note that named pipe might enable to do this even if stata does not have this particular command (it is enough that Stata considers the named pipe as the temporary file to use or save). After all, it seems to work for use. – Matthew Aug 27 '14 at 15:00
  • Having tried several times #1 with no success, I would take it to (Stata tech support)[http://www.stata.com/support/tech-support/]. If you do, post the solution as an answer (you can answer your own questions). – Roberto Ferrer Aug 27 '14 at 21:22
  • Now I see that #1 works for you. That's good. I wasn't able to get that working. I must be doing something wrong. – Roberto Ferrer Aug 27 '14 at 21:33

1 Answers1

1

Edit: It's because you haven't recreated the code as called in the bash file + the stata do file. You've just done the bash file.
Your code should read:

!rm -f mypipe.pip && mknod mypipe.pip p && (zcat filename.gz > mypipe.pip &) >& /dev/null < /dev/null
infile a b c using mypipe.pip

If you could post what errors you're getting as per Nick's suggestion about clearing up what "does not work" means that would be helpful.

In any case there are a few things you should try first

(1) Create a bash script as per your link to the Stata website instead of trying to do it on one line
(2) Make sure your filename has no spaces, or put double quotes around $fname
(3) Make sure to chmod 775 /path/to/myprog to make it executable if you run *nix
(4) Make a do file as per your link again
(5) Put a pound sign after testfile.Z like the following : !myprog testfile.Z #>& /dev/null < /dev/null infile a b c using mypipe.pip this allows output to go to standard output so you can see whats going on. you can remove this after the problem is diagnosed.
(6) Change the !myprog to !/path/to/myprog
(7) Execute do mytest.do
(8) Tell us what Stata is saying the error is if any remain.

It works on my machine with .csv files so long as you specify all the variable names after infile, haven't got it to work with dta files. Here is the procedure

First make a bash file called myprog as recommended

#!/bin/sh
    cd /path/to/dir
    fname=$1
    rm -r mypipe.pip
    mknod mypipe.pip p
    zcat $fname > /path/to/dir/mypipe.pip &

make the script executable by typing in a terminal: `chmod 775 /path/to/dir/myprog'

Then make a do file. I have a dataset called complete which I used to test the principal

cd /path/to/dir

insheet using complete.csv
ds *
global vars "`r(varlist)'"

!7z a test.csv.gz complete.csv

!/path/to/dir/myprog test.csv.gz >& /dev/null < /dev/null
infile $vars using mypipe.pip, clear

Success. I'm running Debian Linux Wheezy (actually #! but same deal), using Stata version 12

  • When you say it works on your machine with .csv files, do you mean, that you put data in a .csv file, then zip it and work with that as your `testfile.Z`? Can you show your **exact procedure** (data input, code, file distribution, etc) and details of your setup (Stata version, OS, terminal, etc)? The original poster should do this as well. I've given this whole _named pipes_ issue a few tries, with no success until now. Stata seems to freeze after the `zcat` call. A bash process opens in the background, but nothing else happens. – Roberto Ferrer Aug 27 '14 at 21:18
  • I have to hit the Stata Break button **and** manually end the bash process (within the system manager) to unfreeze it. I tried with Mint Debian and Stata 12.1. – Roberto Ferrer Aug 27 '14 at 21:20
  • I mean exactly as you say in the first comment. I'll revise my above post to include my exact test. – Brian Albert Monroe Aug 27 '14 at 21:22
  • Hey, sorry I'm not clear. I have no problem with opening .csv.gz or .dta.gz as indicated in stata help. I just have a hard time understanding why the simplest syntax (in #1) does not work. Now, #2 is about the fact I cannot save .dta and .csv files in a pipe, even with a code similar to the one given in the help. – Matthew Aug 27 '14 at 21:22
  • Actually, your one line command does not work for me. It's ok though. Would you know why & /dev/null < /dev/null is required? As for #2, I think it is hopeless due to the fact that stata does not know how to write to an existing file. – Matthew Aug 27 '14 at 22:09
  • Strange, if I replace the `!/path/todir/myprog` line of my example with the line in my edit it works just the same for me. `>& /dev/null < /dev/null` redirects both standard output and standard error to /dev/null. Without it the shell would return the process number of the forked `zcat` command into standard output which must be causing an issue. check out this [thread](http://stackoverflow.com/questions/8208033/what-does-dev-null-dev-null-at-the-end-of-a-command-do) – Brian Albert Monroe Aug 27 '14 at 22:18
  • Ok thanks. Actually I can get away with only >/dev/null 2>&1 , which I understand better. – Matthew Aug 27 '14 at 23:01