1

I am digging deeper and deeper into the expss package, and face one of the examples mentioned here --> https://gdemin.github.io/expss/#example_of_data_processing_with_multiple-response_variables (more particularly the last table of the section.

Consider the following dataframes:

vecA <- factor(c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10)),levels=c(1,2,3,4,5))
vecB <- factor(c(rep(1,20),rep(2,20),rep(NA,10)),levels=c(1,2,3,4,5))
df_fact <- data.frame(vecA, vecB)

vecA_num <- as.numeric(c(rep(1,10),rep(2,10),rep(3,10),rep(4,10),rep(5,10)))
vecB_num <- as.numeric(c(rep(1,20),rep(2,20),rep(NA,10)))
df_num <- data.frame(vecA, vecB)

Strictly copying the suggested code (URL above), here is what my table look like:

df_fact %>%
  tab_cols(total(label = "#Total| |")) %>% 
  tab_cells(list(vecA)) %>%
  tab_stat_cpct(label="vecA", total_row_position="above", total_statistic="u_cases") %>%
  tab_cells(list(vecB)) %>% 
  tab_stat_cpct(label="vecB", total_row_position="above", total_statistic="u_cases") %>%
  tab_pivot(stat_position = "inside_columns") %>%  
  recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy)

Slightly different procedure with a numeric example:

df_num %>%
  tab_cols(total(label = "#Total| |")) %>% 
  tab_cells(vecA_num, vecB_num) %>%
  tab_stat_valid_n(label = "Valid N") %>%
  tab_stat_mean(label="Mean") %>%
  tab_pivot(stat_position = "inside_columns") %>%  
  recode(as.criterion(is.numeric) & is.na ~ 0, TRUE ~ copy) %>%
  tab_transpose()

Issues start here, since these complex constructs are... complex!

1) I would like to include tab_last_sig* family of functions but I cannot figure out how to do it (and possibly subtotals/nets when variables are factors)

2) Including multiple statistics (cases, percents, means...) altogether is a challenge

3) Last, it is not clear to me where I should write the statistic names / variable names

I have not found detailed documentation for these constructs, hence this message in a bottle :)

Maxence Dum.
  • 121
  • 1
  • 9

1 Answers1

1
  1. It's a pity, but by now significance testing is supported only for independent samples. In your examples you want compare statistics on the dependent samples. You can ran significance calculations for independent proportions but results will be inaccurate.
  2. Including multiple statistics is not difficult - you need just sequentially write tab_stat_. But complex table layout really is a challenge :(
  3. Variable names for statistic always should be written in the tab_cells. After that you can write statistic functions with tab_stat_mean, tab_stat_cpct and etc. You can find documentation by printing ?tab_pivot in the R console. It is a standard way of getting manual for R functions.
Gregory Demin
  • 4,596
  • 2
  • 20
  • 20
  • Thanks for your answer, at least I know I'm heading in the wrong direction for the first point. I will try things later and will update my first message if I find something relevant. – Maxence Dum. Apr 06 '20 at 14:24