5

here is the data.txt:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND  
root         1  0.0  0.0   2280   728 ?        Ss   20:44   0:00 init [2]    
root         2  0.0  0.0      0     0 ?        S    20:44   0:00 [kthreadd]  
root       202  0.0  0.0      0     0 ?        S<   20:44   0:00 [ext4-dio-unwri  
root       334  0.0  0.1   2916  1452 ?        Ss   20:44   0:00 udevd --daemon  

how to read the data into a data.frame?
1.can not to decide separator
the last field is a problem,space can not be the separator,
init [2] ,udevd --daemon are the one field,can not be separated by space.
2.no fixed width
every line has different width.

so ,how can i read the data.txt into a data.frame?

smci
  • 32,567
  • 20
  • 113
  • 146
showkey
  • 482
  • 42
  • 140
  • 295
  • 2
    Are you sure that there are no fixed widths here? This seems to be fixed width except for the last column. – A5C1D2H2I1M1N2O1R2T1 Jan 21 '13 at 03:29
  • I'm able to read this sample you've provided with `read.fwf("data.txt", widths = c(4, 10, 5, 5, 7, 6, 4, 10, 6, 7, 20), skip = 1)`. Can't get the headers to work, that's why I have `skip = 1`, but those are easy to add in. The last width can be just set much larger than you expect and you should be fine... I think.... – A5C1D2H2I1M1N2O1R2T1 Jan 21 '13 at 04:06
  • @AnandaMahto - I think using `read.fwf` will be problematic as the `ps` linux command which is used to generate this text will use different sized columns each time it is run dependent on the values needed to be printed. I have commented below that the output format can however be manually specified when using the program. – thelatemail Jan 21 '13 at 04:16
  • @thelatemail, or Thela, or whatever it may be ;), I didn't catch that part, but now that you've edited the title, I see the problem.... – A5C1D2H2I1M1N2O1R2T1 Jan 21 '13 at 04:34

3 Answers3

4

I would do it like this:

library(stringr) # has a convenient function for splitting to a fixed length 

raw          <- system("ps aux", intern = TRUE)
fields       <- strsplit(raw[1], " +")[[1]]
ps           <- str_split_fixed(raw[-1], " +", n = length(fields))
colnames(ps) <- fields
flodel
  • 87,577
  • 21
  • 185
  • 223
  • Awesome, avoids any external file-I/O! `raw` is an R builtin, best to not shadow it, call the variable `out` instead. – smci May 16 '14 at 22:57
2

Here is a one-liner that should do the trick:

do.call(rbind, lapply(strsplit(readLines("data.txt"), "\\s+"), function(fields) c(fields[1:10], paste(fields[-(1:10)], collapse = " "))))

This is what it does in detail:

  1. read all lines of the file via readLines (results in a character vector where each vector element is one line of the file)

  2. use strsplit to split each line into strigs separated by white space (\\s+)

  3. for each line (lapply), merge all fields that come after the 10th field into one (via paste(..., collapse = " "))---this creates a list where each list element represents one line of the file and is a character vector of length 11 (one for each field)

  4. finally, call rbind to merge the list into a matrix (or data frame)

QkuCeHBH
  • 960
  • 1
  • 9
  • 23
0

What format is your data in? If you can open it in Excel saving it as a tab delminated file is most likely the best way to move forward.

Saving files as a tab deliminated file is one of the more common ways to prepare data for import into R. This can be done in Excel by 'saving as' '.txt (tab deliminated)'. once this is done:

my_data <- read.table("path/to/file/", header = TRUE, sep = "\t")

sep = "\t" tells R that your file is tab deliminated

Alpha
  • 807
  • 1
  • 10
  • 14
Sam
  • 755
  • 2
  • 7
  • 11
  • 1
    please use command in linux `ps aux`,you can get the format file,`\t`can not be used here. – showkey Jan 21 '13 at 03:42
  • 1
    @KillKill - it might be worth looking at this question for how to redirect `ps aux` to a csv: http://stackoverflow.com/questions/3114741/generating-a-csv-list-from-linux-ps Looking at `man ps` you can specify output formats. – thelatemail Jan 21 '13 at 04:04