1

I am looking to apply an UDF to an entire dataframe via lapply. However, the act of doing so also coerces data in a way almost as if R wanted to dummy code the columns.

dummy data

df = data.frame(customer_id = c("c000000067", "c000678746")
                ,email = c("hello@gmail.com", "NULL")
                )

apply function

df[] = lapply(df, function(x) ifelse(x=='NULL', NA, x)); View(df)

As one can see, both customer_id and email have had their data changed. I have in the past used lapply on selected columns with success

df[ , c('date_1', 'date_2')] = data.frame(lapply(df[ , c('date_1', 'date_2')] , FUN = function(x) as.Date(x, "%d/%m/%Y")))

However, applying over a whole dataframe seems to be failing. Thank you for your help.

Sweepy Dodo
  • 1,761
  • 9
  • 15
  • 1
    Those are `factor` columns. Check the `str(df)` You may need to convert to `character` class. `lapply(df, function(x) ifelse(x=='NULL', NA, as.character(x)))` or use `stringsAsFactors = FALSE` in the `data.frame` call – akrun Mar 07 '19 at 14:42
  • 1
    Worked. As simple as that. I will bear in mind the importance of class when using apply in future. Thank you @akrun – Sweepy Dodo Mar 07 '19 at 14:49
  • @akrun Is there a way around ~~~ as.character(x) ~~~ because after the use of this the entire dataframe's class becomes character. – Sweepy Dodo Mar 11 '19 at 13:54
  • Create the dataset with `stringsAsFactors = FALSE`while reading the data with `read.csv/read.table` etc. Regarding the edited comment, you showed a dataset with all columns as `factor`s. Not sure about the actual dataset problem you have – akrun Mar 11 '19 at 13:55
  • 1
    It does work but I subsequently found as my csv has 'NULL' values, they render R's import (even with stringsAsFactors used) to set all columns with 'NULL' as character. Nonetheless, good to know your method, @akrun – Sweepy Dodo Mar 11 '19 at 14:06
  • 1
    @akrun My latest solution is to use na.strings=c("","NULL") instead of using the apply method. Still, apply is indispensable. Thank you. – Sweepy Dodo Mar 12 '19 at 08:55
  • Yes, that is the best approach with `na.strings` – akrun Mar 12 '19 at 10:49

0 Answers0