How can I summarize a table in R by converting certain row names to column names and counting occurrences?

Question

I have some data which looks like this:

 `Category` `Count`
   <chr>        <chr>   
 1 X0101       <NA>    
 2 17           1       
 3 22           1       
 4 23           1       
 5 27           1       
 6 34           1       
 7 35           2       
 8 40           1       
 9 51           1       
10 66           1       
11 X0102     <NA>    
12 51           1       
13 53           1       
14 59           1       
15 61           1       
16 X0103     <NA>    
17 10           1       
18 22           1       
19 17           1

This is the code used to produce my dataframe:

 structure(list(`Row Labels` = c("X0101", "17", "22", "23", 
    "27", "34", "35", "40", "51", "66", "X0102", "51", "53", 
    "59", "61", "X0103", "10", "22", "17"), `Count` = c(NA, 
    "1", "1", "1", "1", "1", "2", "1", "1", "1", NA, "1", "1", "1", 
    "1", NA, "1", "1", "1")), .Names = c("Category", "Count"), row.names = c(NA, 
    -19L), class = c("tbl_df", "tbl", "data.frame"))

I would like to alter my table so that I would just have rows named "X0101", "X0102","X0103" and columns which would consist of the count for each of the subcategories. I am new to R and unsure what code would achieve this.

This is what my desired output would look like:

Category   10  17  22  23  27  34  35  40  51  53  59  61  66  
X0101          1   1   1   1   1   2   1   1                1
X0102                                     1        1   1
X0103     1  1   1

Hi Mich992. The keyword for changing rows and columns is `transposing`. This might help you a bit: https://stackoverflow.com/questions/7342306/ — jnns, Jul 09 '19 at 10:05

tmfmnk · Accepted Answer · 2019-07-09T10:10:15.360

One dplyr and tidyr possibility could be:

df %>%
 group_by(grp = cumsum(is.na(Count))) %>%
 mutate(Category2 = first(Category)) %>%
 ungroup() %>%
 na.omit() %>%
 select(-grp) %>%
 spread(Category, Count)

  Category2 `10`  `17`  `22`  `23`  `27`  `34`  `35`  `40`  `51`  `53`  `59`  `61` 
  <chr>     <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Higher    <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  1     1     1     1    
2 Lower     <NA>  1     1     1     1     1     2     1     1     <NA>  <NA>  <NA> 
3 Medium    1     1     1     <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>

Ronak Shah · Answer 2 · 2019-07-09T10:55:40.753

0

Here is one way to do it using dplyr and tidyr. Get all the non-numeric values in Category column (vals), create a factor variable using cumsum and labels as vals, remove NA rows and spread it to wide format.

library(dplyr)
library(tidyr)

vals <- grep("^\\d+$", df$Category, invert = TRUE, value = TRUE)

df %>%
  mutate(temp = factor(cumsum(Category %in% vals), labels = vals)) %>%
  na.omit %>%
  spread(Category, Count)

# A tibble: 3 x 14
#  temp   `10`  `17`  `22`  `23`  `27`  `34`  `35`  `40`  `51`  `53`  `59`  `61`  `66` 
#  <fct>  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 Lower  NA    1     1     1     1     1     2     1     1     NA    NA    NA    1    
#2 Higher NA    NA    NA    NA    NA    NA    NA    NA    1     1     1     1     NA   
#3 Medium 1     1     1     NA    NA    NA    NA    NA    NA    NA    NA    NA    NA

edited Jul 09 '19 at 10:55

answered Jul 09 '19 at 10:03

Ronak Shah

377,200
20
156
213

Thank you for your answer! This works well. My actual dataframe is a lot larger, is there a way for me to create the val object without typing put each of the words? – Mich992 Jul 09 '19 at 10:19
@Mich992 Updated the answer for that. – Ronak Shah Jul 09 '19 at 10:22
I'm so sorry i'm new to R and stack exchange so I didn't realize I had made a mistake with my original question. I have edited my question to change the name in the categories column. When I try your code i receive the error "Error in mutate_impl(.data, dots) : Evaluation error: invalid 'labels'; length 0 should be 1 or 1." Do you know anyway around this?. Thank you so much for any help you can provide! – Mich992 Jul 09 '19 at 10:53
@Mich992 No problem, small change in regex was required. I have updated it again. Also tmfmnk's answer would work in your case because it checks `NA` value in `Count` irrespective of the values in `Category`. – Ronak Shah Jul 09 '19 at 10:56

How can I summarize a table in R by converting certain row names to column names and counting occurrences?

2 Answers2