1

I am quite new to R, coming from Stata. Below is the r markdown chunk with reproducible data example. The data is representative to the data i am working with. But only with more binary (logical) and factor variables in number.

The libraries and data:

# Setup and load package:
library(dplyr)
library(expss)
library(hablar)
library(kableExtra)
library(summarytools)

# Load data:
data("mtcars")
raw_df <- select(mtcars,c(wt,cyl,gear,vs,am))

# Data prep and labelling:
df <- raw_df %>%
  apply_labels(wt = "Facility ID",
               cyl = "Geographical Area",
               cyl = c("Area A" = 4,"Area B" = 6, "Area C" = 8),
               gear = "Tier",
               gear = c("Tier 1" = 3, "Tier 2" = 4, "Tier 3" = 5),
               vs = "E.coli",
               am = "V.choleri") %>%
  convert(chr(wt),
          fct(cyl,gear),
          lgl(vs,am))

Please note that in my actual data there are more categorical and logical variables. I have managed to make the following table in r markdown (html output):


df %>%
  tab_cells(cyl, gear) %>%
  tab_total_row_position("below") %>%
  tab_total_statistic("u_rpct")%>%
  tab_total_label("Total hosts (Row proportions)") %>% 
  tab_cols(vs, am) %>% 
  tab_stat_rpct() %>% 
  tab_cols(total(label = "Number of hosts")) %>%  
  tab_stat_cases() %>%
  tab_pivot(stat_position = "outside_columns") %>%
  recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy) %>% 
  split_table_to_df() %>% 
  kable(align = "c", digits = 1) %>% 
  kable_styling(bootstrap_options = c("striped", "condensed", "responsive"),
                full_width = F, position = "center") %>% 
  row_spec(1:2, bold = TRUE)

Problems: 1. I wish i could include only "TRUE" columns, dropping "FALSE" columns from the table. But keeping the 1st Row label intact ("E. coli", "V.choleri"). In fact i would not be needing the 2nd row ("TRUE","FALSE) 2. I have labelled the "Total Row proportion" (#Total hosts), But can not remove the leading "#" sign. In the right most column cell of the row with the "Total row proportion", it shows "100". I tried it to be the summation of column cells instead, but failed. "100" is totally misleading. 3. I have also tried to get my desired table through "ctable" function of "summarytools" package. As it has an excellent structure, with number of observations also induced within proportion cells. :

print(ctable(df$cyl,df$am), method = 'render')

But the problem is it seems to permit only one pair of categorical variables. And also, the "FALSE" can not be omited. But the last column is perfect with rowtotals (observations)

Details: R : 4.0.0 R studio: 1.2.5042 The packages are all up-to-date.

Mahm00d27
  • 17
  • 7

1 Answers1

1

Tables from expss are usual data.frames. Column labels is just column names with rows separated with "|" symbol. So, you can manipulate them as usual column names. Row labels are located in the column row_labels and we can remove '#' sign with search and replace operations. "Total row proportion" shows "100" because at the beginning you specify total statistic as row percent and row percent for single column is 100. Taking into account all the above:

library(dplyr)
library(expss)
library(hablar)
library(kableExtra)
library(summarytools)

# Load data:
data("mtcars")
raw_df <- select(mtcars,c(wt,cyl,gear,vs,am))

# Data prep and labelling:
df <- raw_df %>%
    apply_labels(wt = "Facility ID",
                 cyl = "Geographical Area",
                 cyl = c("Area A" = 4,"Area B" = 6, "Area C" = 8),
                 gear = "Tier",
                 gear = c("Tier 1" = 3, "Tier 2" = 4, "Tier 3" = 5),
                 vs = "E.coli",
                 am = "V.choleri") %>%
    convert(chr(wt),
            fct(cyl,gear),
            lgl(vs,am))


tbl = df %>%
    tab_cells(cyl, gear) %>%
    tab_total_row_position("below") %>%
    tab_total_statistic("u_rpct")%>%
    tab_total_label("Total hosts (Row proportions)") %>% 
    tab_cols(vs, am) %>% 
    tab_stat_rpct() %>% 
    tab_cols(total(label = "Number of hosts")) %>%  
    # specify total statistic for last column
    tab_stat_cases(total_statistic = "u_cases") %>%
    tab_pivot(stat_position = "outside_columns") %>%
    recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy) %>% 
    # remove columns with FALSE
    except(contains("FALSE")) %>% 
    compute(
        # remove '#' sign from row labels
        row_labels = gsub("#", "", row_labels)
    )

# remove '#' sign from column labels
colnames(tbl) = gsub("\\|TRUE", "", colnames(tbl))

tbl %>% 
    split_table_to_df() %>% 
    kable(align = "c", digits = 1) %>% 
    kable_styling(bootstrap_options = c("striped", "condensed", "responsive"),
                  full_width = F, position = "center") %>% 
    row_spec(1:2, bold = TRUE)
Gregory Demin
  • 4,596
  • 2
  • 20
  • 20
  • Thanks a million @Gregory ! Thanks a lot for your prompt super-fast reply. I have already marked answered. Can i just add that i have been also trying to get this on a docx file with pandoc. But the table would not just print. Do you think you can comment on adding some lines to get it in r markdown docx out put ? – Mahm00d27 May 03 '20 at 21:24
  • One more thing is that: And i can also edit the group header name of the row categorical variables, right ? I mean the "cyl" and "gear" in the output table ? @Gregory Demin – Mahm00d27 May 03 '20 at 21:31
  • Thanks again. Really helpful. – Mahm00d27 May 03 '20 at 21:47
  • @Mahm00d27 I tried to knit to word and get the message that we use html features which is not supported for Word. I don't know how to make `kable_styling` to do something useful with Word. One possible workaround is to use package huxtable instead of kable - https://cran.r-project.org/package=huxtable `tbl %>% as_huxtable` created word doc for me. But you will need to add some formatting as in manual: https://cran.r-project.org/web/packages/huxtable/vignettes/huxtable.html. – Gregory Demin May 03 '20 at 21:48