2

I've got a dataset with groups and a bunch of yes/no value (0=no, 1=yes). I'm trying to create a report with the totals per group, and nice layout, using just R-script.

Here's what I've got so far.

# test data (real data from API service is larger)
df_visits <- read.table(text = "
PATIENT_ID,RAND,UNIT_GT3,HEMOR_YN
1000,ARM1,1,0
1001,ARM2,0,0
1003,ARM1,1,0
1005,ARM1,1,0
1006,ARM2,1,1
1008,ARM2,0,0
1009,ARM1,1,1
1010,ARM2,1,0
1011,ARM1,1,0
1014,ARM2,0,0
1015,ARM2,1,0
1018,ARM1,1,1
", header = TRUE, sep = ",")

# results per group, using 'ifelse' feels clunky but it works
df_per_group  <- data.frame(
  cbind(
    RAND     = tapply(df_visits$RAND,     list(df_visits$RAND), max, na.rm=TRUE),
    UNIT_GT3_Y = tapply(ifelse(df_visits$UNIT_GT3 == 1, 1, 0), list(df_visits$RAND), sum, na.rm=TRUE),
    UNIT_GT3_N = tapply(ifelse(df_visits$UNIT_GT3 == 0, 1, 0), list(df_visits$RAND), sum, na.rm=TRUE),
    HEMOR_YN_Y = tapply(ifelse(df_visits$HEMOR_YN == 1, 1, 0), list(df_visits$RAND), sum, na.rm=TRUE),
    HEMOR_YN_N = tapply(ifelse(df_visits$HEMOR_YN == 0, 1, 0), list(df_visits$RAND), sum, na.rm=TRUE)
  )
)

# t{base} = Matrix Transpose
df_pivot <- as.data.frame(t(df_per_group))

The resulting df_pivot dataframe looks like this and I can export it using write.csv():

# RND          ARM1  ARM2
# ------------+-----+-----+
# UNIT_GT3_Y  |   6 |   3 |
# UNIT_GT3_N  |   0 |   3 |
# HEMOR_YN_Y  |   2 |   1 |
# HEMOR_YN_N  |   4 |   5 |
# ------------+-----+-----+

However, I would like the goal dataset to also have total percentages per question, and have nicer question labels, so something like this:

# RND            ARM1         ARM2
# --------------+-------------+------------+
# Units>3 : Yes |   6  (100%) |    3 (50%) |
#           No  |   0    (0%) |    3 (50%) |
# --------------+-------------+------------+
# Hemmorage Yes |   2 (33.3%) |  1 (16.7%) |
#           No  |   4 (66.7%) |  5 (83.3%) |
# --------------+-------------+------------+

Is there a way change the labels and to add the percentages per question like that?
Also, I'm new to R-script so is there maybe a more efficient way to get the desired result?

Waldi
  • 39,242
  • 6
  • 30
  • 78
BdR
  • 2,770
  • 2
  • 17
  • 36

2 Answers2

1

You could use the tables package which allows you to generate the summary table directly out of the initial data :

library(tables)

tables::tabular(((`Units>3` = factor(ifelse(UNIT_GT3==0,'No','Yes'),levels=c('Yes','No'))) +
                 (Hemmorage = factor(ifelse(HEMOR_YN==0,'No','Yes'),levels=c('Yes','No'))))
                ~(RND = factor(RAND))* ((n=1) + Percent(Equal(factor(RAND))))
                ,data = df_visits)

              RND                      
               ARM1         ARM2        
               n    Percent n    Percent
 Units>3   Yes 6    100.00  3    50.00  
           No  0      0.00  3    50.00  
 Hemmorage Yes 2     33.33  1    16.67  
           No  4     66.67  5    83.33  

EDIT : following your question in comments :

tables::tabular(((`Units>3` = factor(ifelse(UNIT_GT3==0,'No','Yes'),levels=c('Yes','No'))) +
                   (Hemmorage = factor(ifelse(HEMOR_YN==0,'No','Yes'),levels=c('Yes','No'))))
                ~(Total = 1)+(RND = factor(RAND))* ((Nb=1) + Percent(Equal(factor(RAND))))
                ,data = df_visits)

                     RND                      
                     ARM1         ARM2        
               Total Nb   Percent Nb   Percent
 Units>3   Yes 9     6    100.00  3    50.00  
           No  3     0      0.00  3    50.00  
 Hemmorage Yes 3     2     33.33  1    16.67  
           No  9     4     66.67  5    83.33  
Waldi
  • 39,242
  • 6
  • 30
  • 78
  • I like this solution because it is concise and to the point. Can't say I understand everything that is going on, but I can definitely use it so I'll take a closer look to figure it out. Thank :) – BdR Dec 08 '20 at 09:27
  • @BdR, glad I could help! I found `tables` syntax quite straightforward & powerful after a closer look a the documentation. Don't hesitate to send a comment if you have a question you'd like me to elaborate on. – Waldi Dec 08 '20 at 15:10
  • Is it also possible to have a columns with the totals for both arms? So in the example, the first row would also have `Units>3 Yes 9, No 3`. Is that possible by adding something to the `~` part? – BdR Jan 25 '21 at 22:30
1

You can use the tidyverse and the kableExtra package to create beatiful tables:

library(tidyverse)
library(kableExtra)

raw <- tibble::tibble(RND = c("UNIT_GT3_Y",
                              "UNIT_GT3_N",
                              "HEMOR_YN_Y",
                              "HEMOR_YN_N"),
                      ARM1 = c(6,0,2,4),
                      ARM2 = c(3,3,1,5))

tbl <- raw %>% 
  mutate(State = if_else(str_detect(str_sub(RND,-1), "Y"), "Yes", "No"),
         RND = word(RND, 1,2, sep="_")) %>% 
  group_by(RND) %>% 
  mutate_if(is.numeric, ~ paste0(.x, " [", 
                                 round((.x /sum(.x)) * 100, 1), 
                                 "%]")) %>% 
  ungroup() %>% 
  select(RND, State, everything()) %>% 
  kbl() %>%
  kable_paper(full_width = F) %>%
  column_spec(1, bold = T) %>% 
  collapse_rows(columns = 1, valign = "top") 

Created on 2020-12-07 by the reprex package (v0.3.0)

output

kableExtra has a lot of styling options

Florian
  • 1,248
  • 7
  • 21
  • Thanks for the suggested solution, this is close to what I was looking for. However using technical variable names to determine the factor labels seems a bit iffy, plus that doesn't work for all the variables I'm working with. I do like chaining the transformations with `%>%` like that, so I appreciated the example code. – BdR Dec 08 '20 at 09:34