0

I want to calculate the effect size of my variables. I am getting ther error "missing value wher TRUE/FALSE needed" even though I purged my data.frame of NAs before. Any idea why this is happening?

I am using the cohens_d() function of rstatix . R version 4.2.2 (2022-10-31 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

My data.frame looks like this:

structure(list(y = c(7.18497519069826, 7.3003780648707, 7.17955179116519, 
8.36921585741014, 8.15836249209525, 7.09061070782841, 7.49108141342319, 
7.1846914308176, 6.67089495352021, 6.69143515214406, 6.42357351973274, 
7.52608069180203, 7.24501887073775, 6.85901814388889, 7.57170883180869, 
7.33425264233423, 8.04921802267018, 7.03181227133037, 7.59494473669508, 
7.19479175772192, 7.50365451924296, 7.98766626492627, 7.69670578093392, 
7.60357736815147, 6.96018527660461, 6.87390159786446, 7.06818586174616, 
7.73303668293358, 7.00902574208691, 7.43980621139333, 7.21563756343506, 
7.28869626059026, 7.16435285578444, 8.40397796366936, 8.11092624226642, 
6.87139778148748, 7.28510702956681, 7.28533222764388, 7.09131515969722, 
6.75541746281094, 7.48515334990365, 7.04727486738418, 7.05153839051533, 
6.94610823043691, 6.88677264305444, 7.17522180034305, 8.01535975540921, 
6.97657921864011, 7.44994098877334, 7.24328614608345, 6.94987770403687, 
7.0265332645233, 7.03662889536216, 6.7070589406276, 7.44075170047919, 
6.58972625625424, 6.75913881628117, 7.41597441137657, 7.57460994134019
), x = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L
), levels = c("untreated", "VRZ", "AMB", "untreated_107"), class = "factor")), row.names = c(NA, 
-59L), class = c("tbl_df", "tbl", "data.frame"), na.action = structure(c(`58` = 58L), class = "omit"))

r_test %>%
  cohens_d (y~ x) %>%  
  as.data.frame()

Any idea what the problem is?

Similarly, when I tried to use the function wilcox_effsize() instead, R returns the following error: "can't deal with factors containing only one level"

When I used this very similar data-frame the analysis worked even though iut contained NAs

structure(list(y = c(9.91e+08, 8.17e+08, 461200000, 15330000, 
175100000, 50320000, 13590000, 22970000, 2778000, 3453000, 12890000, 
375900000, 44590000, 1.611e+09, 1e+09, 889900000, 373200000, 
NA, NA, NA, NA, NA, 5010000, 6549000, 23160000, 32520000, 7707000, 
556900000, 634600000, 820900000, 391400000, 498300000, 147900000, 
646900000, 22060000, 1e+07, 306800000, 319400000, 41290000, 94100000, 
127200000, 117200000, 618300000, 570700000, 617100000, 284900000, 
449600000, 3866000, 6918000, 4177000, 14870000, 29380000, 2815000, 
1619000, 3126000, 1710000, 2191000), x = structure(c(1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("untreated", "VRZ", "AMB"
), class = "factor")), row.names = c(NA, -57L), class = c("tbl_df", 
"tbl", "data.frame"))
mr.raccoon
  • 47
  • 7
  • Thanks Maël for editing! Just for my learning and improving the quality of my questions in the future: when I post a sample data.frame, should it be also edited as code? This is what you did, right? – mr.raccoon Mar 15 '23 at 08:20

1 Answers1

1

EDIT:

The problem is that there is one unused factor level, namely untreated_107. There are several ways to deal with this situation:

Use droplevels from base R:

library(rstatix)
library(tidyverse)
r_test %>%
  mutate(x = droplevels(x))%>%
  cohens_d(y ~ x) %>%  
  as.data.frame()
  .y.    group1    group2    effsize n1 n2 magnitude
1   y       AMB untreated -1.1805582 19 20     large
2   y       AMB       VRZ -0.4735816 19 20     small
3   y untreated       VRZ  0.6551090 20 20  moderate

With fct_drop from forcats:

library(forcats)
library(rstatix)
library(tidyverse)
r_test %>%
  mutate(x = droplevels(x))%>%
  cohens_d(y ~ x) %>%  
  as.data.frame()

Or, to circumvent the problem of the missing factor level altogether, by converting x to character (but conceptually questionable, as x may/will be factor for a reason):

library(rstatix)
library(tidyverse)
r_test %>%
  mutate(x = as.character(x)) %>%
  cohens_d(y ~ x) %>%  
  as.data.frame()
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • Interesting, do you know the reason for it? I have a similiar dataset where I encountered the same problem. Though when I used a further refined version of the same dataset (it removed outliers) the code worked. The same dataset with outliers worked not (same error message as above) – mr.raccoon Mar 15 '23 at 09:38
  • Did you have NA in `x` at an earlier stage? If a factor vector did contain NA at some earlier stage that information gets carried forward – Chris Ruehlemann Mar 15 '23 at 10:19
  • Yes, I did have NAs at an earlier stage in the data set shown here. Now the weird thing is when I performed the same calculation with the same lines of code on the similiar data set, it worked. Even though it still contained NAs. – mr.raccoon Mar 16 '23 at 07:47
  • I updated my question. It includes now the data.frame as well that contains NAs and whith that my analysis worked. – mr.raccoon Mar 16 '23 at 11:46
  • The new data frame is only superficially similar. Check its structure more closely: in the originl data frame it says: "na.action = structure(c(`58` = 58L), class = "omit")". This bit is missing from the second data frame. It suggests, as mentioned, that there is an issue with NA values. Also, the second data frame contains NA values in `y`, which is not an issue as this variable is numeric. `x`, by contrast, is factor -- there, NA's can wreak havoc. – Chris Ruehlemann Mar 16 '23 at 14:47
  • Thanks, @Chris Ruehlemann for the in-depth explanation. Do you have a solution on how to fix the dataframe by any chance? – mr.raccoon Mar 20 '23 at 07:14
  • Have edited the answer. The issue is that there is an **unused factor level**. – Chris Ruehlemann Mar 20 '23 at 08:57
  • Thanks a lot for examining the problem and explaining in detail! I could clean my other data frames now. `droplevels()` did the job! – mr.raccoon Mar 22 '23 at 10:32