44

Possible Duplicate:
identifying or coding unique factors using R

I'm having some trouble with R.

I have a data set similar to the following, but much longer.

A B Pulse
1 2 23
2 2 24
2 2 12
2 3 25
1 1 65
1 3 45

Basically, the first 2 columns are coded. A has 1, 2 which represent 2 different weights. B has 1, 2, 3 which represent 3 different times.

As they are coded numerical values, R will treat them as numerical variables. I need to use the factor function to convert these variables into factors.

Help?

Will Ness
  • 70,110
  • 9
  • 98
  • 181
math11
  • 537
  • 2
  • 6
  • 8

2 Answers2

60

Here's an example:

#Create a data frame
> d<- data.frame(a=1:3, b=2:4)
> d
  a b
1 1 2
2 2 3
3 3 4

#currently, there are no levels in the `a` column, since it's numeric as you point out.
> levels(d$a)
NULL

#Convert that column to a factor
> d$a <- factor(d$a)
> d
  a b
1 1 2
2 2 3
3 3 4

#Now it has levels.
> levels(d$a)
[1] "1" "2" "3"

You can also handle this when reading in your data. See the colClasses and stringsAsFactors parameters in e.g. readCSV().

Note that, computationally, factoring such columns won't help you much, and may actually slow down your program (albeit negligibly). Using a factor will require that all values are mapped to IDs behind the scenes, so any print of your data.frame requires a lookup on those levels -- an extra step which takes time.

Factors are great when storing strings which you don't want to store repeatedly, but would rather reference by their ID. Consider storing a more friendly name in such columns to fully benefit from factors.

Jeff Allen
  • 17,277
  • 8
  • 49
  • 70
  • But each of the numbers represents something. For A, 1 represents long, and 2 represents short. For B, 1 2 3 represents, 1kg, 2kg, 3kg So I need to convert all those 1's, 2's etc into the 1kg, 2kg, long, short etc. I need to add labels to them. – math11 Nov 28 '12 at 20:51
  • 3
    Try running the code above followed by assigning the `levels` value to something more useful. For instance, `levels(d$a) <- c("Long", "Short")`. Now you (or a new user looking at your code) needn't worry about memorizing the mappings between your IDs and your labels. R will handle the mapping for you and just present the labels to you. – Jeff Allen Nov 28 '12 at 21:00
  • Jeff is a more complete solution because it adds the levels in the same command. – Juano Mar 07 '21 at 19:41
33

Given the following sample

myData <- data.frame(A=rep(1:2, 3), B=rep(1:3, 2), Pulse=20:25)  

then

myData$A <-as.factor(myData$A)
myData$B <-as.factor(myData$B)

or you could select your columns altogether and wrap it up nicely:

# select columns
cols <- c("A", "B")
myData[,cols] <- data.frame(apply(myData[cols], 2, as.factor))

levels(myData$A) <- c("long", "short")
levels(myData$B) <- c("1kg", "2kg", "3kg")

To obtain

> myData
      A   B Pulse
1  long 1kg    20
2 short 2kg    21
3  long 3kg    22
4 short 1kg    23
5  long 2kg    24
6 short 3kg    25
nbro
  • 15,395
  • 32
  • 113
  • 196
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178