4

I'm having trouble understanding how to use subprocess for a problem on mine.

Let's say I have a tab-delimited text file tabdelimited1.txt in my subdiretory which I would like to read into a pandas dataframe.

Naturally, you could simply import the data as follows:

import pandas as pd
df = pd.read_csv("tabdelimited1.txt", header=None, sep="\s+")

However, let's say we wanted to use subprocess. In the command line, $cat tabdelimited1.txt will output all of the lines.

Now, I want to use subprocess to read the output of cat tabdelimited1.txt. How does one do this?

We could use

import subprocess
task = subprocess.Popen("cat file.txt", shell=True,  stdout=subprocess.PIPE)
data = task.stdout.read()

but (1) I get an error for shell=True and (2) I would like to read in the data line-by-line.

How can I use subprocess to read tabdelimited1.txt line-by-line? The script should look something like this:

import subprocess
import pandas as pd

df = pd.DataFrame()
task = subprocess.Popen("cat file.txt", shell=True,  stdout=subprocess.PIPE)
# while lines exist:
    # line = subprocess std
    df=pd.concat([df, line])

EDITED

ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234
  • What error do you get with `shell=True`? – tdelaney Oct 05 '16 at 02:03
  • The subprocess module has nothing to do with reading from piped input, you you only need to [read from sys.stdin](http://stackoverflow.com/questions/17658512/how-to-pipe-input-to-python-line-by-line-from-linux-program) for `cat x.txt | python script1.py`. – TessellatingHeckler Oct 05 '16 at 02:03
  • @TessellatingHeckler But OP wants to use `subprocess`. – tdelaney Oct 05 '16 at 02:06
  • @tdelaney OP wants to use subprocess to read from a piped input, which doesn't make sense. One possible way forward is "*how to use subprocess to call cat to read from a file*", but another way forward is "*why you don't need to use subprocess to do the thing you describe at the end of your question which you think subprocess will enable (but it won't)"*. – TessellatingHeckler Oct 05 '16 at 02:17
  • What's the purpose of the last `cat...| ...script1.py` line? `script1.py is already doing a `cat file.txt`. It's not reading from `stdin` (which is where the outer `cat` is piping its output to. – hpaulj Oct 05 '16 at 02:28
  • @hpaulj Agreed, wasn't thinking. – ShanZhengYang Oct 05 '16 at 02:32
  • @TessellatingHeckler The user made it very clear that he wanted to use subprocess even though there are other better ways to do it. Besides, who to say that this particular code fragment is going to be in code where stdin is available. Suppose its part of a web server. – tdelaney Oct 05 '16 at 03:12
  • But why do you want to use subprocess to read a file? Is it a simplified version of something completely different that you are trying to accomplish, or are you just masochistic? You may get a better answer if you say what you actually want to do. – zvone Oct 05 '16 at 06:08

2 Answers2

3

You can skip the shell completely by breaking the command into a list. Then its just a matter of iterating the process stdout:

import subprocess
import pandas as pd

df = pd.DataFrame()
task = subprocess.Popen(["cat", "file.txt"], stdout=subprocess.PIPE)
for line in task.stdout:
    df=pd.concat([df, line])
task.wait()
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • Can you explain how `for line in task.stdout` and `task.wait()` work? `task.stdout` is a variable which holds the entire contents of `cat file.txt`? – ShanZhengYang Oct 05 '16 at 02:10
  • Both this and the OP `shell=True` versions work for me. – hpaulj Oct 05 '16 at 02:18
  • 2
    `task.stdout` is like `sys.stdout` and any open file; iterating on it reads it line by line. – hpaulj Oct 05 '16 at 02:21
  • When you say `stdout=subprocess.PIPE`, `Popen` will create a pipe for the process stdout and set that in a variable `stdout`. So, its a file object that will read the childs output. – tdelaney Oct 05 '16 at 03:14
0
import sys
for line in sys.stdin:
    print(line.split())

can be used with a shell command like:

0025:~/mypy$ cat x.txt | python3 stack39864304.py
['1', '3', 'test1;']
['2', '2', 'test2;']
['3', '2', 'test3;']

Otherwise in an interactive session I can do:

In [269]: task = subprocess.Popen("cat x.txt", shell=True,  stdout=subprocess.PIPE)
In [270]: for line in task1.stdout:print(line.split())
[b'1', b'3', b'test1;']
[b'2', b'2', b'test2;']
[b'3', b'2', b'test3;']

(py3 bytestrings)

python3 stack39864304.py < x.txt is another way of sending this file to the script.

cat afile | ... is perhaps too simple, and raise all the objections about why not read directly. But cat can be replaced by head, tail or even ls -l | python3 stack39864304.py to get a directory list with this split.

I use ipython for most of my interactive python coding; many of its %magic use subprocesses; I use cat x.txt, ls all the time from within this session.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • So, `sys.stdin` needs a command `cat x.txt`. Is there any other difference (e.g. performance, etc.) in contrast to using `task = subprocess.Popen()`? – ShanZhengYang Oct 05 '16 at 02:45
  • `stdin` reads from a pipe or `< x.txt` file redirection (in linux). You don't have to choose one approach over the others. Learn to use them all. – hpaulj Oct 05 '16 at 07:05
  • Thanks---by the way, I'm not sure why you were down voted above---I certainly appreciate the response and contribution – ShanZhengYang Oct 05 '16 at 07:23