3

I have a data frame like this:

> str(dynamics)
'data.frame':   3517 obs. of  3 variables:
 $ id   : int  1 2 3 4 5 6 7 8 9 10 ...
 $ y2015: int  245 129 301 162 123 125 115 47 46 135 ...
 $ y2016: int  NA 385 420 205 215 295 130 NA NA 380 ...

I take out the 3 vectors and name them differently,
Column 1:

> plantid <- dynamics$id
> head(plantid)
[1] 1 2 3 4 5 6

Column 2:
(I divide it into different classes and label them 2,3,4 and 5)

> y15 <- dynamics$y2015
> year15 <- cut(y15, breaks = c(-Inf, 50, 100, 150, Inf), labels = c("2", "3", "4", "5"))
> str(year15)
 Factor w/ 4 levels "2","3","4","5": 4 3 4 4 3 3 3 1 1 3 ...
> head(year15)
[1] 5 4 5 5 4 4
Levels: 2 3 4 5

Column 3:
(Same here)

> y16 <- dynamics$y2016
> year16 <- cut(y16, breaks = c(-Inf, 50, 100, 150, Inf), labels = c("2", "3", "4", "5"))
> str(year16)
 Factor w/ 4 levels "2","3","4","5": NA 4 4 4 4 4 3 NA NA 4 ...
> head(year16)
[1] <NA> 5    5    5    5    5   
Levels: 2 3 4 5

So far so good!

The problem arises when I combine the above 3 vectors by cbind() to form a new data frame, the newly created vector levels are gone

Look at my code:

SD1 = data.frame(cbind(plantid, year15, year16))
head(SD1)

and I get a data frame like this:

> head(SD1)
  plantid year15 year16
1       1      4     NA
2       2      3      4
3       3      4      4
4       4      4      4
5       5      3      4
6       6      3      4

as you can see the levels of 2nd and 3rd column have changed from 2, 3, 4, 5 back to 1, 2, 3, 4
How do I fix that?

jdobres
  • 11,339
  • 1
  • 17
  • 37
Muneer
  • 209
  • 1
  • 3
  • 13

1 Answers1

2

cbind is most commonly used to combine objects into matrices. It strips out special attributes from the inputs to help ensure that they are compatible for combining into a single object. This means that data types with special attributes (such as the name and format attributes for factors and Dates) will be simplified to their underlying numerical representations. This is why cbind turns your factors into numbers.

Conversely, data.frame() by itself will preserve the individual object attributes. In this case, your use of cbind is unnecessary. To preserve your factor levels, simply use:

SD1 <- data.frame(plantid, year15, year16)
jdobres
  • 11,339
  • 1
  • 17
  • 37