11

I have some words in my dataframe df each belonging to category A or B. Within each category the words may be of type 1, 2 or 3. I used the table() function to show how the words are distributed across the categories and types. The output looks like:

         category
type     A    B
1        30  79
2        12  94
3        29  6 

As you can see the table counts frequencies, but I want it to calculate the percentages instead. I have tried prop.table but I get the following error

Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables

I couldn't find a solution anywhere else; please help. Thank you.

Here's my sample data:

head(items)

       item   type category
[1]    PA100   1    A
[2]    PB101   2    A
[3]    UR360   2    A
[4]    PX977   3    B
[5]    GA008   3    B
[6]    GR446   3    A
thelatemail
  • 91,185
  • 12
  • 128
  • 188
Tavi
  • 2,668
  • 11
  • 27
  • 41
  • 2
    See `?prop.table`. Also have a look on SO - [this for example](http://stackoverflow.com/questions/15866488/is-it-possible-to-add-percentages-to-a-contingency-table) [and this](http://stackoverflow.com/questions/12578741/r-compute-percentage-values-in-data-frame) – user20650 Aug 27 '14 at 00:12
  • @RichardScriven percentages of each type in each category instead of frequency, e.g in the top left cell instead of 30 it should be 42.25 since 30 is 42.25 percent of the total of column A – Tavi Aug 27 '14 at 00:19
  • @thelatemail @user20650 i have tried `prop.table` but i get the following error `Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables` – Tavi Aug 27 '14 at 00:25
  • Then it would be best to provide a small sample of the data. – Rich Scriven Aug 27 '14 at 00:26
  • 2
    @maryam - are you applying `prop.table` to your table, or your original `df` dataframe? - you want to do the former - try this: `mytab <- table(df$type,df$category); prop.table(mytab,2);` – thelatemail Aug 27 '14 at 00:27
  • @maryam Read the error message. It says your data frame must have all numeric variables. Does it have all numeric variables? If not, make them numeric or drop the non-numeric ones, and try again. – shadowtalker Aug 27 '14 at 00:33
  • @RichardScriven there – Tavi Aug 27 '14 at 00:37
  • @thelatemail okay thanks may I please ask what the `2` in `prop.table(mytab,2)` does? – Tavi Aug 27 '14 at 00:44
  • @maryam - it's the margin. 1 means across the rows, 2 means over the columns, NULL means the whole table. That's actually in my answer. – Rich Scriven Aug 27 '14 at 00:46
  • @RichardScriven ohhh all clear now thank yooooooouuuu!!! – Tavi Aug 27 '14 at 00:50

2 Answers2

11

As mentioned in the comments, you can use a prop.table on a table object. In your case, use a margin = 1, which means we want to calculate the percentages across the rows of the table.

> tab <- with(items, table(type, category))
> prop.table(tab, margin = 1)
#     category
# type         A         B
#    1 1.0000000 0.0000000
#    2 1.0000000 0.0000000
#    3 0.3333333 0.6666667

For actual percentages, you can multiply the table by 100

> prop.table(tab, 1)*100
#     category
# type         A         B
#    1 100.00000   0.00000
#    2 100.00000   0.00000
#    3  33.33333  66.66667

where

items <- 
structure(list(item = structure(c(3L, 4L, 6L, 5L, 1L, 2L), .Label = c("GA008", 
"GR446", "PA100", "PB101", "PX977", "UR360"), class = "factor"), 
    type = c(1L, 2L, 2L, 3L, 3L, 3L), category = structure(c(1L, 
    1L, 1L, 2L, 2L, 1L), .Label = c("A", "B"), class = "factor")), .Names = c("item", 
"type", "category"), class = "data.frame", row.names = c(NA, 
-6L))
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • thanks do you know how I can multiply by 100 so I get percentages instead of fractions? sorry! – Tavi Aug 27 '14 at 00:49
2

This might be quite late but sharing it in case someone else faces a similar problem. You can still achieve your required output with table() and prop.table(). You just have to do it in two steps for factor variables.

df = table(items$type, items$category)
prop.table(df)

Read below for further explanation.

For the following dataframe items:

 item type category
PA100  1        A
PB101  2        A
UR360  2        A
PX977  3        B
GA008  3        B
GR446  3        A

First, run the table() command and store it into df

df = table(items$type, items$category)

df

A  B
1  0
2  0
1  2

Then, run your prop.table() command on df as below:

prop.table(df)

A         B
0.1666667 0.0000000
0.3333333 0.0000000
0.1666667 0.3333333

With the round() command you can also specify the number of decimal places you want to keep:

round(prop.table(df),digits = 2)

A     B
0.17  0.00
0.33  0.00
0.17  0.33

And if you wanted to keep the percentages only, you could do the following:

round(100*prop.table(df),digits = 0)

A   B
17  0
33  0
17  33
Sandy
  • 1,100
  • 10
  • 18