0

I wrote the following function to add "-1" as a level to factors of my dataframe, and afterwards set NA's to "-1":

fun <- function(df) {

  add_na_level <- function(x){
    if(is.factor(x) & !"-1" %in% levels(x)) return(factor(x, levels=c(levels(x), "-1")))
    x[is.na(x)]<-"-1"
    return(x)
  }
  df<-sapply(df,add_na_level)

  return(df)

}

, but when I use it on my dataframe, it runs really really slow. Is it something with the sapply line?

df<-sapply(df,add_na_level)
HeyJane
  • 143
  • 4
  • 3
    Please provide a sample of your data. Few rows will be good to see. I suspect you don't need a function for this. It's much easier. – YOLO May 15 '18 at 12:57
  • This question deals with efficiently adding levels to a factor. Some of the answers might be helpful: https://stackoverflow.com/questions/23316815/add-extra-level-to-factors-in-dataframe – divibisan May 15 '18 at 13:30
  • Thanks, but why is it much easier? – HeyJane May 15 '18 at 13:31

1 Answers1

0

You can try

# The function
foo <- function(x){
  x <- as.numeric(as.character(x))
  x[is.na(x)] <- -1
  as.factor(x)
    }
# Run on numeric input vector
foo(c(1:4, NA))
[1] 1  2  3  4  -1
Levels: -1 1 2 3 4

And transforming a data.frame

set.seed(2134)
df <- data.frame(matrix(sample(c(NA, 1:9), 25, T),nrow = 5))
str(df)
'data.frame':   5 obs. of  5 variables:
 $ X1: int  7 5 4 5 2
 $ X2: int  4 3 7 2 2
 $ X3: int  9 8 4 4 4
 $ X4: int  8 7 5 6 9
 $ X5: int  8 7 7 4 NA

df[] <- lapply(df, foo)
str(df)
'data.frame':   5 obs. of  5 variables:
 $ X1: Factor w/ 4 levels "2","4","5","7": 4 3 2 3 1
 $ X2: Factor w/ 4 levels "2","3","4","7": 3 2 4 1 1
 $ X3: Factor w/ 3 levels "4","8","9": 3 2 1 1 1
 $ X4: Factor w/ 5 levels "5","6","7","8",..: 4 3 1 2 5
 $ X5: Factor w/ 4 levels "-1","4","7","8": 4 3 3 2 1
Roman
  • 17,008
  • 3
  • 36
  • 49
  • @HeyJane Because it is IMO not needed. Please provide some sample data. Then we will see what we have to add. – Roman May 16 '18 at 08:39
  • but what if some column is not a factor? your code just makes them all factors. – HeyJane May 22 '18 at 09:29