0

I have a data frame with 10k rows and 500 columns. For each column, I want to create a count for each unique value in the row. E.g.

      Fruit    Vegetable  Meat 
1     Apple    Carrot     Steak
2     Apple    Potato     Chicken
3     Pear     Peas       Duck

Would produce:

Fruit;Apple;2;Pear;1
Vegetable;Carrot;1;Potato;1;Peas;1
Meat;Steak;1;Chicken;1;Duck;1

The Hmisc describe function produces this kind of analysis, but the output is so badly formatted as to be useless.

Thanks.

MA2
  • 43
  • 4

2 Answers2

0

You could run the table function through apply

For instance

fruit <- c("Apple", "Apple", "Pear")
veg <- c("Carrot", "Potato", "Peas")
meat <- c("Steak", "Chicken", "Duck")
df <- data.frame(fruit, veg, meat)

apply(df, 2, table)

$fruit

Apple  Pear 
    2     1 

$veg

Carrot   Peas Potato 
     1      1      1 

$meat

Chicken    Duck   Steak 
      1       1       1
nico
  • 50,859
  • 17
  • 87
  • 112
  • Thanks, this is better although the output is still unusable, e.g. for countries I get a 2D matrix. Is there a way to force one result per row? Afghanistan Albania Algeria 5 6 4 Argentina Armenia Australia 1 2 3 Austria Azerbaijan Belgium 3 1 20 – MA2 Aug 18 '13 at 15:39
  • 1
    If your example is insufficiently complex, you should modify it so it illustrates the problem. It is completely unclear why this output is "unusable". Those are not "2d matrices. They are one-D tables with names, just what one would expect from the table function given a vector as input. – IRTFM Aug 18 '13 at 15:41
  • DWin - so how do you suggest I convert those 'one-D tables with names' to a CSV/JSON format as described in the initial question? Also I'm not going to give an example with 10k rows and 250 cols on SO when a simpler example will suffice. – MA2 Aug 18 '13 at 15:46
  • DWin, not really sure I understand... what do you mean with forcing one result per row? – nico Aug 18 '13 at 16:07
  • I'm an R newbie here, and it seems that R is set up to nicely format results so they are visually appealing by inserting multiple spaces to vertically align things. However, it is then impossible to export these results or do anything with them in another language. All I want (as I've reiterated many times now) is a machine-readable output from the describe function. – MA2 Aug 18 '13 at 16:16
  • Ah, OK, it wasn't really clear to me from the question... anyway I see the other answer covered that so you should be set. – nico Aug 18 '13 at 17:08
0
lapply(names(df),function(x){ tb <- table(df[[x]]);
      write.table(file="test.csv", append=TRUE, quote=FALSE, 
                  row.names=FALSE, col.names=FALSE, sep=";", 
                  x= paste(x, paste( names(tb), tb, collapse=";", sep=";") , 
                           sep=";") 
                  )})
#--------
fruit;Apple;2;Pear;1
veg;Carrot;1;Peas;1;Potato;1
meat;Chicken;1;Duck;1;Steak;1

You will also see a list of three NULLs which would not be sent to a text file. Writing tables and matrices to files is not a strong point of R. There is a write.matrix function in package::MASS. My initial effort with writeLines failed because it has no 'append' option and I wasn't able to cobble together a connection call that would do the append.

(The other gotcha' in R is that processing a list (and by inheritance a dataframe) with 'apply/lapply/sapply' does not pass the names of the list-element (and colnames for dataframes) to the function, so "write" functions would not have the names internally for writing to a file. That is why I worked with names(df) rather than just df.

As a further note, there are probably JSON-writing functions out there and they might be more reliable. I'll take a look and report back.

There is the RJSONIO package:

> require(RJSONIO)
Loading required package: RJSONIO
> toJSON(df)
[1] "{\n \"fruit\": [ \"Apple\", \"Apple\", \"Pear\" ],\n\"veg\": [ \"Carrot\", \"Potato\", \"Peas\" ],\n\"meat\": [ \"Steak\", \"Chicken\", \"Duck\" ] \n}"
IRTFM
  • 258,963
  • 21
  • 364
  • 487