I'm looking for an elegant way to change multiple vectors' datatypes in R.
I'm working with an educational dataset: 426 students' answers to eight multiple choice questions (1
= correct, 0
= incorrect), plus a column indicating which instructor (1, 2, or 3
) taught their course.
As it stands, my data is sitting pretty in data.df
, like this:
str(data.df)
'data.frame': 426 obs. of 9 variables:
$ ques01: int 1 1 1 1 1 1 0 0 0 1 ...
$ ques02: int 0 0 1 1 1 1 1 1 1 1 ...
$ ques03: int 0 0 1 1 0 0 1 1 0 1 ...
$ ques04: int 1 0 1 1 1 1 1 1 1 1 ...
$ ques05: int 0 0 0 0 1 0 0 0 0 0 ...
$ ques06: int 1 0 1 1 0 1 1 1 1 1 ...
$ ques07: int 0 0 1 1 0 1 1 0 0 1 ...
$ ques08: int 0 0 1 1 1 0 1 1 0 1 ...
$ inst : num 1 1 1 1 1 1 1 1 1 1 ...
But those ques0x
values aren't really integers. Rather, I think it's better to have R treat them as experimental factors. Same goes for the "inst" values.
I'd love to turn all those int
s and num
s into factors
Ideally, an elegant solution should produce a dataframe—I call it factorData.df
—that looks like this:
str(factorData.df)
'data.frame': 426 obs. of 9 variables:
$ ques01: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 1 2 ...
$ ques02: Factor w/ 2 levels "0","1": 1 1 2 2 2 2 2 2 2 2 ...
$ ques03: Factor w/ 2 levels "0","1": 1 1 2 2 1 1 2 2 1 2 ...
$ ques04: Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
$ ques05: Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ...
$ ques06: Factor w/ 2 levels "0","1": 2 1 2 2 1 2 2 2 2 2 ...
$ ques07: Factor w/ 2 levels "0","1": 1 1 2 2 1 2 2 1 1 2 ...
$ ques08: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 2 2 1 2 ...
$ inst : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
I'm fairly certain that whatever solution you folks come up with, it ought to be easy to generalize to any n number of variables that'd need to get reclassified, and would work across most common conversions (int -> factor
and num -> int
, for example).
No matter what solution you folks generate, it's bound to be more elegant than mine
Because my current clunky code is just 9 separate factor()
statements, one for each variable, like this
factorData.df$ques01
I'm brand-new to R, programming, and stackoverflow. Please be gentle, and thanks in advance for your help!