30

Consider the following R code.

> x = cbind(c(10, 20), c("[]", "[]"), c("[[1,2]]","[[1,3]]"))
> x
     [,1] [,2] [,3]     
[1,] "10" "[]" "[[1,2]]"
[2,] "20" "[]" "[[1,3]]"

Similarly

> x = rbind(c(10, "[]", "[[1,2]]"), c(20, "[]", "[[1,3]]"))
> x
     [,1] [,2] [,3]     
[1,] "10" "[]" "[[1,2]]"
[2,] "20" "[]" "[[1,3]]"

Now, I don't want the integers 10 and 20 to be converted to strings. How can I perform this operation without any such conversion? I would of course also like to know why this conversion happens. I looked at the cbind help and also tried Googling, but had no luck finding a solution. I also believe that in some cases. R converts strings to factors, and I don't want that to happen either, though it doesn't seem to be happening here.

Faheem Mitha
  • 6,096
  • 7
  • 48
  • 83
  • The problem is not with `cbind`, but with `c`. That is the function you need to understand better. – IRTFM Oct 08 '12 at 18:52

2 Answers2

47

Vectors and matrices can only be of a single type and cbind and rbind on vectors will give matrices. In these cases, the numeric values will be promoted to character values since that type will hold all the values.

(Note that in your rbind example, the promotion happens within the c call:

> c(10, "[]", "[[1,2]]")
[1] "10"      "[]"      "[[1,2]]"

If you want a rectangular structure where the columns can be different types, you want a data.frame. Any of the following should get you what you want:

> x = data.frame(v1=c(10, 20), v2=c("[]", "[]"), v3=c("[[1,2]]","[[1,3]]"))
> x
  v1 v2      v3
1 10 [] [[1,2]]
2 20 [] [[1,3]]
> str(x)
'data.frame':   2 obs. of  3 variables:
 $ v1: num  10 20
 $ v2: Factor w/ 1 level "[]": 1 1
 $ v3: Factor w/ 2 levels "[[1,2]]","[[1,3]]": 1 2

or (using specifically the data.frame version of cbind)

> x = cbind.data.frame(c(10, 20), c("[]", "[]"), c("[[1,2]]","[[1,3]]"))
> x
  c(10, 20) c("[]", "[]") c("[[1,2]]", "[[1,3]]")
1        10            []                 [[1,2]]
2        20            []                 [[1,3]]
> str(x)
'data.frame':   2 obs. of  3 variables:
 $ c(10, 20)              : num  10 20
 $ c("[]", "[]")          : Factor w/ 1 level "[]": 1 1
 $ c("[[1,2]]", "[[1,3]]"): Factor w/ 2 levels "[[1,2]]","[[1,3]]": 1 2

or (using cbind, but making the first a data.frame so that it combines as data.frames do):

> x = cbind(data.frame(c(10, 20)), c("[]", "[]"), c("[[1,2]]","[[1,3]]"))
> x
  c.10..20. c("[]", "[]") c("[[1,2]]", "[[1,3]]")
1        10            []                 [[1,2]]
2        20            []                 [[1,3]]
> str(x)
'data.frame':   2 obs. of  3 variables:
 $ c.10..20.              : num  10 20
 $ c("[]", "[]")          : Factor w/ 1 level "[]": 1 1
 $ c("[[1,2]]", "[[1,3]]"): Factor w/ 2 levels "[[1,2]]","[[1,3]]": 1 2
Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
  • Thanks for the detailed answer. I don't think I need the properties of Factors here, and my recollection is that they can cause problems. Is there some way of creating a data frame with string values instead? – Faheem Mitha Oct 08 '12 at 18:58
  • Add `stringsAsFactors=FALSE` to the `data.frame` calls. If the calls are implicit (as in the last example), then you have to make them explicit: `data.frame(c("[]", "[]"), stringsAsFactors=FALSE)`. – Brian Diggs Oct 08 '12 at 19:14
  • There is a global option `stringsAsFactors` as well which controls this. I leave it as the shipped default and change it on an as-needed basis for reproducibility. – Brian Diggs Oct 08 '12 at 19:15
  • I get `$ c("[]", "[]") : chr "[]" "[]"` instead of `$ c("[]", "[]") : Factor w/ 1 level "[]": 1 1.` I'm using R 1.15.1. Any idea why the difference? – Faheem Mitha Oct 08 '12 at 19:57
  • What do you get for `getOption("stringsAsFactors")`? – Brian Diggs Oct 08 '12 at 21:32
  • Ah yes, I forgot I set `options(stringsAsFactors=FALSE)` in .Rprofile. Maybe that wasn't such a good idea. – Faheem Mitha Oct 08 '12 at 23:33
15

Using data.frame instead of cbind should be helpful

x <- data.frame(col1=c(10, 20), col2=c("[]", "[]"), col3=c("[[1,2]]","[[1,3]]"))
x
  col1 col2    col3
1   10   [] [[1,2]]
2   20   [] [[1,3]]

sapply(x, class) # looking into x to see the class of each element
     col1      col2      col3 
"numeric"  "factor"  "factor" 

As you can see elements from col1 are numeric as you wish.

data.frame can have variables of different class: numeric, factor and character but matrix doesn't, once you put a character element into a matrix all the other will become into this class no matter what clase they were before.

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138