3

I've come across a strange behavior when playing with some dataframes: when I create two identical dataframes a,b, then swap their rownames around, they don't come out as identical:

rm(list=ls())

a <- data.frame(a=c(1,2,3),b=c(2,3,4))
b <- a
identical(a,b)
#TRUE

identical(rownames(a),rownames(b))
#TRUE

rownames(b) <- rownames(a)

identical(a,b)
#FALSE

Can anyone reproduce/explain why?

starball
  • 20,030
  • 7
  • 43
  • 238
NicE
  • 21,165
  • 3
  • 51
  • 68
  • 2
    Looks like the row names are initially numeric, but `rownames()` returns a character vector, so the row names end up as attributes of different types. – joran Feb 28 '17 at 18:05
  • 2
    You can check the structure i.e. `c(NA, -3L)` is the first case changed to `c("1", "2", "3")` – akrun Feb 28 '17 at 18:06
  • Don't use `identical` for comparing dataframes. Use `all.equal(a,b)` which tells you the names are different. – smci Mar 02 '19 at 10:40

1 Answers1

6

This is admittedly a bit confusing. Starting with ?data.frame we see that:

If row.names was supplied as NULL or no suitable component was found the row names are the integer sequence starting at one (and such row names are considered to be ‘automatic’, and not preserved by as.matrix).

So initially a and b each have an attribute called row.names that are integers:

> str(attributes(a))
List of 3
 $ names    : chr [1:2] "a" "b"
 $ row.names: int [1:3] 1 2 3
 $ class    : chr "data.frame"

But rownames() returns a character vector (as does dimnames(), actually a list of character vectors, called under the hood). So after reassigning the row names you end up with:

> str(attributes(b))
List of 3
 $ names    : chr [1:2] "a" "b"
 $ row.names: chr [1:3] "1" "2" "3"
 $ class    : chr "data.frame"
joran
  • 169,992
  • 32
  • 429
  • 468