2

I have a very large dataframe (around 100 rows, 200 columns). A subset of my data looks like this:

example <- data.frame("Station" = c("012", "013", "014"), "Value1" = c(145.23453, 1.022342, 0.4432), 
"Value2" = c(2.1221213, 4445.2231412, 0.3333421), "Name" = c("ABC", "SDS", "EFG"))

I would like to round all numeric variables in my table with these conditions.

if x<1, then 1 sig fig

if 1<= x < 99, then 2 sig figs

if x>= 100, then 3 sig figs

I know to do something like this for a specific column:

example$Value1 <- ifelse(example$Value1 < 1, signif(example$Value1, 1), example$Value1)

but I'm not sure what to do for a large dataframe with a mix of numeric and character values.

GKi
  • 37,245
  • 2
  • 26
  • 48
Sarah
  • 411
  • 4
  • 14

5 Answers5

1

Just put the ifelse into an lapply. To identify numeric columns use negate is.character in an sapply. You also could Vectorize a small replacement FUNction with all your desired conditions to use in the lapply, which might be convenient. However, note @GKi's comment, that your conditions are not complete.

nums <- sapply(example, is.numeric)

FUN <- Vectorize(function(x) {
  if (x < 1) x <- signif(x, 1)
  if (1 <= x & x < 99) x <- signif(x, 2)
  if (x >= 100) x <- signif(x, 3)
  x
})

example[nums] <- lapply(example[nums], FUN)
#   Station Value1 Value2 Name
# 1     012  145.0    2.1  ABC
# 2     013    1.0 4450.0  SDS
# 3     014    0.4    0.3  EFG
jay.sf
  • 60,139
  • 8
  • 53
  • 110
1

Use applyand nested ifelse:

If you do not know in advance which columns are numeric and you want to keep the original dataframe:

example[sapply(example, is.numeric)] <- apply(example[sapply(example, is.numeric)], 2, 
                                              function(x) ifelse(x < 1, signif(x, 1), 
                                                                 ifelse(x >= 1 & x < 99 , signif(x, 2), signif(x, 3))))
example
  Station Value1 Value2 Name
1     012  145.0    2.1  ABC
2     013    1.0 4450.0  SDS
3     014    0.4    0.3  EFG
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
1

I'll give the answer using data.table instead of data.frame because it's better and I don't remember data.frame syntax that well anymore.

library(data.table)

example = data.table(
  Station = c("012", "013", "014"),
  Value1 = c(145.23453, 1.022342, 0.4432),
  Value2 = c(2.1221213, 4445.2231412, 0.3333421),
  Name = c("ABC", "SDS", "EFG"))

numeric_colnames = names(example)[sapply(example,is.numeric)]

for(x in numeric_colnames){
  example[,(x):=ifelse(
    get(x)<1,
    signif(get(x),1),
    ifelse(
      get(x)<99,
      signif(get(x),2),
      signif(get(x),3)
  ))]
}

Result:

   Station Value1 Value2 Name
1:     012  145.0    2.1  ABC
2:     013    1.0 4450.0  SDS
3:     014    0.4    0.3  EFG

PS: Don't worry about the 145.0 and 4450.0; that's a display issue, not a data issue:

> example[,as.character(Value1)]
[1] "145" "1"   "0.4"
> example[,as.character(Value2)]
[1] "2.1"  "4450" "0.3"

PPS: the 99 cutoff produces some strange results, e.g.,

> signif(98.9,2)
[1] 99
> signif(99.1,3)
[1] 99.1

Why not use a cutoff of 100 instead?

> signif(99.4,2)
[1] 99
> signif(99.5,2)
[1] 100
> signif(100.1,3)
[1] 100
webb
  • 4,180
  • 1
  • 17
  • 26
1

CODE

example %>%
  pivot_longer(contains("Value")) %>%
  mutate(
    signf = case_when(
      value < 1 ~ 1,
      value >= 1 & value < 99 ~ 2,
      TRUE ~ 3
    ),
    value = map2_dbl(value, signf, ~signif(.x, .y))
  ) %>%
  select(-signf) %>%
  pivot_wider(names_from = "name", values_from = "value")

OUTPUT

# A tibble: 3 x 4
  Station Name  Value1 Value2
  <fct>   <fct>  <dbl>  <dbl>
1 012     ABC    145      2.1
2 013     SDS      1   4450  
3 014     EFG      0.4    0.3
det
  • 5,013
  • 1
  • 8
  • 16
1

You can use findInterval to set signif:

i <- sapply(example, is.numeric)
x <- unlist(example[,i])
example[,i] <- signif(x, findInterval(x, c(1, 99))+1)
example
#  Station Value1 Value2 Name
#1     012  145.0    2.1  ABC
#2     013    1.0 4450.0  SDS
#3     014    0.4    0.3  EFG

findIntervall result from @webb (Thanks!) example given in the comment:

findInterval(c(145.23453, 1.022342, 0.4432, 2.1221213, 4445.2231412
 , 0.3333421), c(1, 99))
#[1] 2 1 0 1 2 0
GKi
  • 37,245
  • 2
  • 26
  • 48
  • 1
    That's a cool trick :) I think a bit more explanation of what findInterval returns and how you're using the result would help many readers, i.e., `findInterval(c(145.23453, 1.022342, 0.4432, 2.1221213, 4445.2231412, 0.3333421), c(1, 99))` yields `2 1 0 1 2 0`, etc... – webb Jun 29 '20 at 11:18
  • is there something like `x <- unlist(example[,i]); example[,i] <- ...` that works with `data.table`? – webb Jun 29 '20 at 12:30
  • Maybe `x <- unlist(as.data.frame(example)[,i]); as.data.frame(example)[,i] <- ...` – GKi Jun 29 '20 at 12:33