How to read output from linux process status (ps) command in R?

Question

here is the data.txt:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND  
root         1  0.0  0.0   2280   728 ?        Ss   20:44   0:00 init [2]    
root         2  0.0  0.0      0     0 ?        S    20:44   0:00 [kthreadd]  
root       202  0.0  0.0      0     0 ?        S<   20:44   0:00 [ext4-dio-unwri  
root       334  0.0  0.1   2916  1452 ?        Ss   20:44   0:00 udevd --daemon

how to read the data into a data.frame?
1.can not to decide separator
the last field is a problem,space can not be the separator,
init [2] ,udevd --daemon are the one field,can not be separated by space.
2.no fixed width
every line has different width.

so ,how can i read the data.txt into a data.frame?

Are you sure that there are no fixed widths here? This seems to be fixed width except for the last column. — A5C1D2H2I1M1N2O1R2T1, Jan 21 '13 at 03:29
I'm able to read this sample you've provided with `read.fwf("data.txt", widths = c(4, 10, 5, 5, 7, 6, 4, 10, 6, 7, 20), skip = 1)`. Can't get the headers to work, that's why I have `skip = 1`, but those are easy to add in. The last width can be just set much larger than you expect and you should be fine... I think.... — A5C1D2H2I1M1N2O1R2T1, Jan 21 '13 at 04:06
@AnandaMahto - I think using `read.fwf` will be problematic as the `ps` linux command which is used to generate this text will use different sized columns each time it is run dependent on the values needed to be printed. I have commented below that the output format can however be manually specified when using the program. — thelatemail, Jan 21 '13 at 04:16
@thelatemail, or Thela, or whatever it may be ;), I didn't catch that part, but now that you've edited the title, I see the problem.... — A5C1D2H2I1M1N2O1R2T1, Jan 21 '13 at 04:34

score 4 · Answer 1 · answered Jan 21 '13 at 12:47

4

I would do it like this:

library(stringr) # has a convenient function for splitting to a fixed length 

raw          <- system("ps aux", intern = TRUE)
fields       <- strsplit(raw[1], " +")[[1]]
ps           <- str_split_fixed(raw[-1], " +", n = length(fields))
colnames(ps) <- fields

answered Jan 21 '13 at 12:47

flodel

87,577
21
185
223

Awesome, avoids any external file-I/O! `raw` is an R builtin, best to not shadow it, call the variable `out` instead. – smci May 16 '14 at 22:57

score 2 · Answer 2 · answered Apr 07 '13 at 10:15

Here is a one-liner that should do the trick:

do.call(rbind, lapply(strsplit(readLines("data.txt"), "\\s+"), function(fields) c(fields[1:10], paste(fields[-(1:10)], collapse = " "))))

This is what it does in detail:

read all lines of the file via readLines (results in a character vector where each vector element is one line of the file)
use strsplit to split each line into strigs separated by white space (\\s+)
for each line (lapply), merge all fields that come after the 10th field into one (via paste(..., collapse = " "))---this creates a list where each list element represents one line of the file and is a character vector of length 11 (one for each field)
finally, call rbind to merge the list into a matrix (or data frame)

score 0 · Answer 3 · edited Dec 17 '13 at 00:15

0

What format is your data in? If you can open it in Excel saving it as a tab delminated file is most likely the best way to move forward.

Saving files as a tab deliminated file is one of the more common ways to prepare data for import into R. This can be done in Excel by 'saving as' '.txt (tab deliminated)'. once this is done:

my_data <- read.table("path/to/file/", header = TRUE, sep = "\t")

sep = "\t" tells R that your file is tab deliminated

edited Dec 17 '13 at 00:15

Alpha

807
1
10
14

answered Jan 21 '13 at 03:29

Sam

755
2
7
11

1

please use command in linux `ps aux`,you can get the format file,`\t`can not be used here. – showkey Jan 21 '13 at 03:42
1

@KillKill - it might be worth looking at this question for how to redirect `ps aux` to a csv: http://stackoverflow.com/questions/3114741/generating-a-csv-list-from-linux-ps Looking at `man ps` you can specify output formats. – thelatemail Jan 21 '13 at 04:04

How to read output from linux process status (ps) command in R?

3 Answers3

Linked