Why do identical dataframes become different when changing rownames to the same?

Question

I've come across a strange behavior when playing with some dataframes: when I create two identical dataframes a,b, then swap their rownames around, they don't come out as identical:

rm(list=ls())

a <- data.frame(a=c(1,2,3),b=c(2,3,4))
b <- a
identical(a,b)
#TRUE

identical(rownames(a),rownames(b))
#TRUE

rownames(b) <- rownames(a)

identical(a,b)
#FALSE

Can anyone reproduce/explain why?

Looks like the row names are initially numeric, but `rownames()` returns a character vector, so the row names end up as attributes of different types. — joran, Feb 28 '17 at 18:05
You can check the structure i.e. `c(NA, -3L)` is the first case changed to `c("1", "2", "3")` — akrun, Feb 28 '17 at 18:06
Don't use `identical` for comparing dataframes. Use `all.equal(a,b)` which tells you the names are different. — smci, Mar 02 '19 at 10:40

score 6 · Accepted Answer · answered Feb 28 '17 at 18:10

This is admittedly a bit confusing. Starting with ?data.frame we see that:

If row.names was supplied as NULL or no suitable component was found the row names are the integer sequence starting at one (and such row names are considered to be ‘automatic’, and not preserved by as.matrix).

So initially a and b each have an attribute called row.names that are integers:

> str(attributes(a))
List of 3
 $ names    : chr [1:2] "a" "b"
 $ row.names: int [1:3] 1 2 3
 $ class    : chr "data.frame"

But rownames() returns a character vector (as does dimnames(), actually a list of character vectors, called under the hood). So after reassigning the row names you end up with:

> str(attributes(b))
List of 3
 $ names    : chr [1:2] "a" "b"
 $ row.names: chr [1:3] "1" "2" "3"
 $ class    : chr "data.frame"

Why do identical dataframes become different when changing rownames to the same?

1 Answers1