6

I have a dataframe datwe with 37 columns. I am interested in converting the integer values(1,2,99) in columns 23 to 35 to character values('Yes','No','NA').

datwe$COL23 <- sqldf("SELECT CASE COL23 WHEN 1 THEN 'Yes'
                                        WHEN 2 THEN 'No'
                                        WHEN 99 THEN 'NA'
                                   ELSE 'Name ittt' 
                              END as newCol
                              FROM datwe")$newCol

I have been using the above sqldf statements to convert each column separately. I was wondering if there is any other smart way to do this, perhaps apply functions ?

If you require any reproducible data for building dataframe datwe, I will add it here. Thanks.

Edit: Example datwe

set.seed(12)
data.frame(replicate(37,sample(c(1,2,99),10,rep=TRUE)))
zx8754
  • 52,746
  • 12
  • 114
  • 209
Prradep
  • 5,506
  • 5
  • 43
  • 84
  • In SQL statement you have 2 whens - `...WHEN WHEN 1...`. Also, why not just extract `col23 - col35`, then convert it to character within R? – zx8754 Jun 09 '15 at 07:25
  • @zx8754 Thanks for that, I have edited the post. I assume that the `as.character()` will convert the integer values to character values. Where should I pass the new values as arguments to convert ? – Prradep Jun 09 '15 at 07:37

1 Answers1

5

Not sure why you used sqldf, see this example:

#dummy data
set.seed(12)
datwe <- data.frame(replicate(37,sample(c(1,2,99),10,rep=TRUE)))

#convert to Yes/No
res <- as.data.frame(
  sapply(datwe[,23:37], function(i)
    ifelse(i==1, "Yes",
           ifelse(i==2, "No",
                  ifelse(i==99,NA,"Name itttt")))))

#update dataframe
datwe <- cbind(datwe[, 1:22],res)

#output, just showing first 2 columns
datwe[,23:24]
#     X23  X24
# 1    No  Yes
# 2   Yes  Yes
# 3   Yes   No
# 4    No   No
# 5   Yes   No
# 6   Yes  Yes
# 7  <NA>   No
# 8    No   No
# 9   Yes <NA>
#10    No <NA>

EDIT: Using sqldf within a for loop with an external variable:

library(sqldf)

#dummy data
set.seed(12)
datwe <- data.frame(replicate(37,sample(c(1,2,99),10,rep=TRUE)))

#sqldf within a loop
for(myCol in paste0("X",23:37))
  datwe[,myCol] <- 
   fn$sqldf("SELECT CASE $myCol
                    WHEN 1 THEN 'Yes'
                    WHEN 2 THEN 'No' 
                    WHEN 99 THEN 'NA' 
                    ELSE 'Name ittt' 
                    END as newCol
             FROM datwe")$newCol

#check output, showing only 2 columns
datwe[,23:24]
#    X23 X24
# 1   No Yes
# 2  Yes Yes
# 3  Yes  No
# 4   No  No
# 5  Yes  No
# 6  Yes Yes
# 7   NA  No
# 8   No  No
# 9  Yes  NA
# 10  No  NA
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • Thanks for the solution and teaching new way of doing it. I will mark it as accepted after hearing some other ways to do(with `sqldf`). I guess I would need to incorporate small change in your solution to make `99` as `NA`. Thanks ! – Prradep Jun 09 '15 at 07:45
  • Thanks for the solution. – Prradep Jun 09 '15 at 08:09
  • Now, I have realized my mistake that all `NA` are stored as `character` values when used `sqldf` statement. Is there any way to convert `NA` character values to `Not Available` ? I guess, I can't use `ifelse()` function on the vector/column as there is only one condition('NA' to NA). – Prradep Jun 09 '15 at 08:31
  • @Prradep `datwe[,23:37][datwe[,23:37]=="NA"] <- "Not Available"`. – zx8754 Jun 09 '15 at 08:53
  • sorry for my bad communication. I meant `NA`(Not Available) instead of `NA`(character). – Prradep Jun 09 '15 at 08:56
  • @Prradep `datwe[,23:37][datwe[,23:37]=="NA"] <- NA`, please invest some time and learn [basics of R](http://stackoverflow.com/tags/r/info). – zx8754 Jun 09 '15 at 08:56
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/80026/discussion-between-prradep-and-zx8754). – Prradep Jun 09 '15 at 09:00