0

I want to create a function to return the type of n-value (which is n-value is the 6 column of a dataframe) by using the following rules:

# n-value types
missing : NA
n > 0.05 : 'n.s.'
0.05 >= n > 0.01 : '*'
0.01 >= n > 0.001 : '**'
0.001 >= n > 0.0001 : '***'
0.0001 >= n : '****'

The first row of the data looks like:

         n.name    bMean    log2FoldChange    lfcSE        stat            pn         padj
        <fct>      <dbl>      <dbl>           <dbl>         <dbl>         <dbl>       <dbl>
469    TNFRSF1B  542.82545  -3.406411        0.2267235    -15.024517    5.07e-51    3.25e-48

I tried the following:

c.1 <- function(n.1) {
    p<- if (n.1>0.05)
  return(p, paste0("n.s."))}
else{if (0.05 >= p > 0.01) return(p, paste0"'*'")
    }
else{if (0.01 >= p > 0.001) return(p, paste0"'**'")
    
}
else{if (0.001 >= p > 0.0001) return(p, paste0"'***'")
    
}
else{if (0.0001 >= p) return(p, paste0"'****'")
    
}
else{cat(paste0("NA"))}
}
pType<-lapply(df.1$pn, c.1)
pType
user432797
  • 593
  • 4
  • 13
  • 3
    Are you searching for [this](https://stackoverflow.com/questions/41262992/is-there-a-r-function-that-convert-p-value-to-significance-code) or do you want to code your own function? – Rui Barradas Sep 29 '20 at 20:10
  • @RuiBarradas i'm getting an error! – user432797 Sep 29 '20 at 20:14
  • 2
    `return` can return *one* value only, not two. – Rui Barradas Sep 29 '20 at 20:16
  • 2
    Don't use `cat()` inside a function - use `message()` if you want to tell the user something. Use `warning()` if you want to tell the user something might be wrong. Use `stop()` if something is wrong and you want an error message. In this case, I don't think you need any of those. – Gregor Thomas Sep 29 '20 at 20:24

1 Answers1

4

cut can be used to get the bins, avoiding a sequence of if conditions.

The function works as follows:

  1. Define a partition, breaks, of the interval [0, 1];
  2. Define a vector of corresponding strings, stars;
  3. Use cut to determine the intervals each x is in, and attribute as intervals labels the vector stars.

If any x is outside [0, 1] the return value of cut is NA.
The return value is a list with members p and stars, that can be accessed in the usual way to access named list members.

c.1 <- function(x){
  breaks <- c(0, 0.0001, 0.001, 0.01, 0.05, 1)
  stars <- c("****", "***", "**", "*", "n.s.")
  bins <- cut(x, breaks = breaks, labels = stars, include.lowest = TRUE)
  bins <- as.character(bins)
  list(p = x, stars = bins)
}

Now some examples.
The first example is of a random vector p1. Base function table counts how many stars of each type were returned.

set.seed(2020)    # make the next instruction
                  # result reproducible
p1 <- rexp(10, rate = 10)
out1 <- c.1(p1)
table(out1$stars, useNA = "ifany")
#
#   * n.s. 
#   4    6 

The second example is of a vector with several elements outside the unit interval. Those values should be NA.

p2 <- seq(-1, 2, by = 0.1)
out2 <- c.1(p2)
table(out2$stars, useNA = "ifany")
#
#****   n.s. <NA> 
#   1     10   20

The third example is an example with small values, all in [0, 0.05] by increments of 0.00001. No NA's should be returned.

p3 <- seq(0, 0.05, by = 0.00001)
out3 <- c.1(p3)
table(out3$stars, useNA = "ifany")
#
#****  ***   **    *
#  11   90  900 4000

And another way of seeing the first example's return value.

as.data.frame(out1)
#            p stars
#1  0.02938057     *
#2  0.12700502  n.s.
#3  0.02370036     *
#4  0.07398545  n.s.
#5  0.14195153  n.s.
#6  0.12656189  n.s.
#7  0.58675191  n.s.
#8  0.02404119     *
#9  0.05288280  n.s.
#10 0.04876715     *
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • thank you for answer! I appreciate support!, I'm novice to coding, your code is out of my level, I was hoping for simpler solution to my level where I make a function then lapply it to pn column to return the the different count of stars, or NA if data is missing. – user432797 Sep 30 '20 at 01:16
  • I also tried the link that you suggested:`p.values <- c(9.5e-20, 0.05) Signif <- symnum(p.values, corr = FALSE, na = FALSE, cutpoints = c(0, 0.001, 0.01, 0.05, 0.1, 1), symbols = c("***", "**", "*", ".", " ")) ghnhn<-lapply(dfx.1[,6], Signif))` but no luck. I got this error:`Error in get(as.character(FUN), mode = "function", envir = envir): object 'Signif' of mode 'function' was not found Traceback: 1. lapply(nav.d14[, 6], Signif) 2. match.fun(FUN) 3. get(as.character(FUN), mode = "function", envir = envir) ` – user432797 Sep 30 '20 at 01:42
  • I tried the following and it worked`p <- function(x){ breaks <- c(0, 0.0001, 0.001, 0.01, 0.05, Inf) stars <- c("****", "***", "**", "*", "n.s.") list(p = x, stars = cut(x, breaks = breaks, labels = stars) ) } n44<-lapply(44$pn, p) head(n44)` and I got this kind of results: `$p 0.001772662 $stars ** Levels: '****''***''**''*''n.s.' $p 0.917041262 $stars n.s. Levels: '****''***''**''*''n.s.'` How can remove the levels, and how can I add NA if data is missing? – user432797 Sep 30 '20 at 01:55
  • to add NA for missing, I tried 'breaks <- c(0, , 0.0001, 0.001, 0.01, 0.05, Inf) stars <- c("****", "NA" ,"***", "**", "*", "n.s.")` I've this error :`Error in c(0, , 1e-04, 0.001, 0.01, 0.05, Inf): argument 2 is empty` – user432797 Sep 30 '20 at 02:29
  • 1
    @user432797 I have tried to explain the function better. And vectors cannot have 2 commas with nothing between them, to return `NA` I have changed the code, ending `breaks` in 1, not Inf as before. – Rui Barradas Sep 30 '20 at 12:26
  • I don't why when I use @ your username ...it doesn't show up, how do I render NA count, because I have some missing data that are labeled NA, I was able to do the following: `c.1 <- function(x){ breaks <- c(0, 0.0001, 0.001, 0.01, 0.05, 1) stars <- c("****", "***", "**", "*", "n.s.") bins <- cut(x, breaks = breaks, labels = stars, include.lowest = TRUE) bins <- as.character(bins) list(p = x, stars = bins) } tab.1<-table(c.1(nav$pvalue)) apply(tab.1, 2, sum)` , I got: '*: 24 **: 14 ***: 30 ****: 93 n.s.:306', which is good but I'm still missing the NA – user432797 Oct 01 '20 at 04:20
  • @user432797 Look at my examples, `table` needs `useNA = "ifany"`, which you are not using. The default is to discard `NA`'s, if you run my 2nd example without that argument the `NA`'s won't show up. – Rui Barradas Oct 01 '20 at 08:30