0

I have a data.frame, all columns are numeric. I want to convert one integer column to factor, but doing so will convert all other columns to class character. Is there anyway to just convert one column to factor?

The example is from Converting variables to factors in R:

myData <- data.frame(A=rep(1:2, 3), B=rep(1:3, 2), Pulse=20:25)
myData$A <-as.factor(myData$A)

The result

apply(myData,2,class)
#           A           B       Pulse 
# "character" "character" "character" 

sessionInfo()

R version 3.1.2 (2014-10-31) 
Platform: x86_64-apple-darwin13.4.0 (64-bit) 

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8  
attached base packages: 
[1] splines stats graphics grDevices utils datasets methods base ... 

str(myData$A)
# Factor w/ 2 levels "1","2": 1 2 1 2 1 2
Community
  • 1
  • 1
Echo
  • 667
  • 3
  • 8
  • 19
  • 1
    Works for me. What is your `sessionInfo()`? –  Jul 15 '15 at 04:55
  • That seems unlikely. Does the code you provided actually reproduce the problem for you? – MrFlick Jul 15 '15 at 04:58
  • The only time I know of this happening is if `myData` is a matrix or array or vector, not a dataframe. are you sure `class(myData)` is a dataframe (for whichever data is causing you the problem)? – mathematical.coffee Jul 15 '15 at 05:03
  • `class(myData) ` is `data.frame`. `apply(myData,2,class)` produces A B Pulse "character" "character" "character" – Echo Jul 15 '15 at 05:06
  • @Pascal `sessionInfo()` gives `R version 3.1.2 (2014-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base ... ` – Echo Jul 15 '15 at 05:09
  • @MattO'Brien `str(myData$A)` gives `Factor w/ 2 levels "1","2": 1 2 1 2 1 2` – Echo Jul 15 '15 at 05:11
  • Use `sapply(myData, class)`, not `apply(myData,2,class)` You are testing the class of the column names. –  Jul 15 '15 at 05:27
  • Thank you @Pascal! using `sapply` is working. – Echo Jul 15 '15 at 05:31
  • @Pascal OP isn't testing the class of the column names, just try `apply(myData, 2, print)` compared to `sapply(myData, print)` and you'll see. See Thelas generous edit on the provided very partial anwser – David Arenburg Jul 15 '15 at 06:14
  • @DavidArenburg Yes, I know. The the OP wasn't doing what he thought he was doing. After, it was too long for a comment. –  Jul 15 '15 at 06:15

1 Answers1

1

Your code actually works when I test it.

This is my output from str(myData):

    'data.frame':   6 obs. of  3 variables:
 $ A    : Factor w/ 2 levels "1","2": 1 2 1 2 1 2
 $ B    : int  1 2 3 1 2 3
 $ Pulse: int  20 21 22 23 24 25

Your issue is because, as ?apply states:

‘apply’ attempts to coerce to an array via ‘as.matrix’ if it is two-dimensional (e.g., a data frame)

This is done before executing the function on each column. And when you run as.matrix(myData) you end up with everything forced to one class, in this case character data:

is.character(as.matrix(myData))
#[1] TRUE
thelatemail
  • 91,185
  • 12
  • 128
  • 188
MichaelVE
  • 1,304
  • 10
  • 15