5

I'm looking for an elegant way to change multiple vectors' datatypes in R.

I'm working with an educational dataset: 426 students' answers to eight multiple choice questions (1 = correct, 0 = incorrect), plus a column indicating which instructor (1, 2, or 3) taught their course.

As it stands, my data is sitting pretty in data.df, like this:

    str(data.df)
    'data.frame': 426 obs. of  9 variables:
    $ ques01: int  1 1 1 1 1 1 0 0 0 1 ...
    $ ques02: int  0 0 1 1 1 1 1 1 1 1 ...
    $ ques03: int  0 0 1 1 0 0 1 1 0 1 ...
    $ ques04: int  1 0 1 1 1 1 1 1 1 1 ...
    $ ques05: int  0 0 0 0 1 0 0 0 0 0 ...
    $ ques06: int  1 0 1 1 0 1 1 1 1 1 ...
    $ ques07: int  0 0 1 1 0 1 1 0 0 1 ...
    $ ques08: int  0 0 1 1 1 0 1 1 0 1 ...
    $ inst  : num  1 1 1 1 1 1 1 1 1 1 ...

But those ques0x values aren't really integers. Rather, I think it's better to have R treat them as experimental factors. Same goes for the "inst" values.

I'd love to turn all those ints and nums into factors

Ideally, an elegant solution should produce a dataframe—I call it factorData.df—that looks like this:

    str(factorData.df)
    'data.frame': 426 obs. of  9 variables:
    $ ques01: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 1 2 ...
    $ ques02: Factor w/ 2 levels "0","1": 1 1 2 2 2 2 2 2 2 2 ...
    $ ques03: Factor w/ 2 levels "0","1": 1 1 2 2 1 1 2 2 1 2 ...
    $ ques04: Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
    $ ques05: Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ...
    $ ques06: Factor w/ 2 levels "0","1": 2 1 2 2 1 2 2 2 2 2 ...
    $ ques07: Factor w/ 2 levels "0","1": 1 1 2 2 1 2 2 1 1 2 ...
    $ ques08: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 2 2 1 2 ...
    $ inst  : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...

I'm fairly certain that whatever solution you folks come up with, it ought to be easy to generalize to any n number of variables that'd need to get reclassified, and would work across most common conversions (int -> factor and num -> int, for example).

No matter what solution you folks generate, it's bound to be more elegant than mine

Because my current clunky code is just 9 separate factor() statements, one for each variable, like this

    factorData.df$ques01 

I'm brand-new to R, programming, and stackoverflow. Please be gentle, and thanks in advance for your help!

briandk
  • 6,749
  • 8
  • 36
  • 46
  • 1
    @briandk: Since the questions can only be correct or incorrect, you might be better converting columns 1-8 to logical vectors rather than factors. (Factors would be appropriate for, say, the answers to multiple choice questions, where there are more than 2 possibilities.) – Richie Cotton Sep 29 '09 at 07:23
  • @Richie: thanks for the suggestion! I'm not familiar with logical datatypes in vectors. If they are a datatype just like nums, ints, and factors, then would your suggestion be to just use lapply to turn columns 1-8 into logical factors? – briandk Sep 29 '09 at 15:35

2 Answers2

11

This was also answered in R-Help.

I imagine that there's a better way to do it, but here are two options:

# use a sample data set
> str(cars)
'data.frame':   50 obs. of  2 variables:
 $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
 $ dist : num  2 10 4 22 16 10 18 26 34 17 ...
> data.df <- cars 

You can use lapply:

> data.df <- data.frame(lapply(data.df, factor))

Or a for statement:

> for(i in 1:ncol(data.df)) data.df[,i] <- as.factor(data.df[,i])

In either case, you end up with what you want:

> str(data.df)
'data.frame':   50 obs. of  2 variables:
 $ speed: Factor w/ 19 levels "4","7","8","9",..: 1 1 2 2 3 4 5 5 5 6 ...
 $ dist : Factor w/ 35 levels "2","4","10","14",..: 1 3 2 9 5 3 7 11 14 6 ...
Shane
  • 98,550
  • 35
  • 224
  • 217
  • 1
    Shane, This is exactly the basic function I needed. Sorry I don't have enough reputation points to upvote it :-( – briandk Sep 29 '09 at 00:56
  • @briandk: Good to hear! At some point, just accept it so the community knows it answers your question. :) – Shane Sep 29 '09 at 01:05
5

I found an alternative solution in the plyr package:

# load the package and data
> library(plyr)
> data.df <- cars

Use the colwise function:

> data.df <- colwise(factor)(data.df)
> str(data.df)
'data.frame':   50 obs. of  2 variables:
 $ speed: Factor w/ 19 levels "4","7","8","9",..: 1 1 2 2 3 4 5 5 5 6 ...
 $ dist : Factor w/ 35 levels "2","4","10","14",..: 1 3 2 9 5 3 7 11 14 6 ...

Incidentally, if you look inside the colwise function, it just uses lapply:

df <- as.data.frame(lapply(filtered, .fun, ...))
Shane
  • 98,550
  • 35
  • 224
  • 217
  • @Shane: I wish I could "accept" this one too, since it combines your lapply suggestion with some powerful features from the plyr package. – briandk Sep 29 '09 at 15:37