2

I have calculated the Anova F-Test p-value for differences in means for several variables. Now I would like to add "stars" that indicate the significance level of the p-value. I would like to have * for significance at at the 10% level, ** at the 5% level and *** at the 1% level.

My data looks like this:

structure(list(Variables = c("A", "B", "C", "D", "E"), 
               `Anova F-Test p-Value` = c(0.05, 5e-04, 0.5, 0.05, 0.01)), 
          class = "data.frame", row.names = c(NA, -5L))

Could someone help me with the code here?

miken32
  • 42,008
  • 16
  • 111
  • 154
remo
  • 365
  • 1
  • 10
  • Actually it is common to report `p < 0.001 ***, p < 0.01 **, p < 0.05 *.` Using different choices might be misleading. I **strongly** suggest using a symbol different from an asterisk, e.g. "+", for significance levels above 0.05, if you want to highlight such results. – jay.sf Apr 21 '22 at 11:48
  • 1
    Thank you jay.sf for your effort and comments. Maybe it depends on the field you do your research but it is not uncommon to have my significance levels as you also see them in academic journals. – remo Apr 21 '22 at 11:52
  • It's not really the significance levels that I'm criticizing, since those are presumably based on reasonable choices described in detail in the text. It is the different use of asterisks that can very quickly deceive your readers. – jay.sf Apr 21 '22 at 12:02

4 Answers4

5

You can build your own function. Note however that this is not the conventional star system (it's totally okay if you mention the scale somewhere though). See e.g. here.

stars.pval <- function(x){
  stars <- c("***", "**", "*", "n.s.")
  var <- c(0, 0.01, 0.05, 0.10, 1)
  i <- findInterval(x, var, left.open = T, rightmost.closed = T)
  stars[i]
}

transform(dat, stars = stars.pval(dat$`Anova F-Test p-Value`))

  Variables Anova.F.Test.p.Value stars
1         A                5e-02    **
2         B                5e-04   ***
3         C                5e-01  n.s.
4         D                5e-02    **
5         E                1e-02   ***
Maël
  • 45,206
  • 3
  • 29
  • 67
  • I didn't see it until now but in the last row, there should be *** instead of ** since it is 1% – remo Apr 21 '22 at 10:14
  • 1
    In that case, use left.open = T in the findInterval function. – Maël Apr 21 '22 at 10:16
  • @jay.sf look at OP's demand, this is what is being asked. – Maël Apr 21 '22 at 10:16
  • Thank you for your quick response and the edit @Maël!! – remo Apr 21 '22 at 10:35
  • When I try to run the code with my original, very large dataset, I have a lot of NAs for the stars eventhough there need to be *** since the pvalue is 0. Does this code only works for the data with 5 rows? – remo Apr 21 '22 at 10:46
  • No it should work with whatever vector length; have you tried stars.pval(0), it should yield `***` – Maël Apr 21 '22 at 10:53
  • When I do stars.pval(0), I get character(0). But when I e.g., do stars.pval(0.1) I get "*" – remo Apr 21 '22 at 10:56
  • 1
    True; add rightmost.closed = T. See edit. – Maël Apr 21 '22 at 11:07
4

There is an R builtin for this:

df$stars <- symnum(df$`Anova F-Test p-Value`, 
                     symbols   = c("***","**","*",".","n.s."),
                     cutpoints = c(0,  .001,.01,.05, .1, 1),
                     corr      = FALSE
                   )
df
  Variables Anova F-Test p-Value stars
1         A                5e-02     *
2         B                5e-04   ***
3         C                5e-01  n.s.
4         D                5e-02     *
5         E                1e-02    **
dash2
  • 2,024
  • 6
  • 15
3

I would suggest to use cut for this

Edit: notes. Use right = FALSE to define p <= alpha as significant, use right = TRUE for p < alpha to be significant. Also changed 0 and 1 for -Inf and Inf, this often handles boundaries better in cut.

dt$stars <- cut(dt[[2]], breaks = c(-Inf, 0.01, 0.05, 0.10, Inf), 
                labels = c("***", "**", "*", "n.s."), right = FALSE)

dt

#   Variables Anova F-Test p-Value stars
# 1         A               0.0500     *
# 2         B               0.0005   ***
# 3         C               0.5000  n.s.
# 4         D               0.0500     *
# 5         E               0.0100    **
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22
1

gtools library has a stars.pval() function that takes a numeric vector of p-values and returns stars using R's standard definition

GSA
  • 751
  • 8
  • 12