Duplicate row names in R using as.data.frame()

Asked Feb 28 '18 at 14:10

Active Mar 23 '23 at 07:49

Viewed 2,674 times

In R data frames, row names must be unique.

df <- mtcars
rownames(df) <- rep("duplicate!", nrow(df))
> Error in `row.names<-.data.frame`(`*tmp*`, value = value) : 
>   duplicate 'row.names' are not allowed
> In addition: Warning message:
> non-unique value when setting 'row.names': ‘duplicate!’

df <- data.frame(mtcars, row.names=rep("duplicate!", nrow(mtcars)))
> Error in data.frame(mtcars, row.names = rep("duplicate!", nrow(mtcars))) : 
  duplicate row.names: duplicate!

What, then, is the motivation for the following behavior with as.data.frame()? Is this intentional or a bug?

m <- as.matrix(mtcars)
rownames(m) <- rep("duplicate!", nrow(m))
df <- as.data.frame(m)

Resulting in the following:

any(duplicated(rownames(df)))  # == TRUE
nrow(df)  # == 32
length(unique(rownames(df)))  # == 1
df["duplicate!", ]  # returns a single row...
>            mpg cyl disp  hp drat   wt  qsec vs am gear carb
> duplicate!  21   6  160 110  3.9 2.62 16.46  0  1    4    4

(Run with R version 3.4.3 (2017-11-30))

edited Mar 23 '23 at 07:49

starball

20,030
7
43
238

asked Feb 28 '18 at 14:10

Megatron

15,909
12
89
97

This seems to be a bug. Considering that just calling `df` results in an error (`Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : duplicate row.names: duplicate!`), I'd say this error should pop up while calling `as.data.frame(m)`. – LAP Feb 28 '18 at 14:17
Right, but try `View(df)` (in RStudio v1.1.414) – Megatron Feb 28 '18 at 14:17
This does not answer your question, but look at the row names when you type `head(df)` – G5W Feb 28 '18 at 14:18
@G5W `head()` appends "\.\d+", but calling `rownames(df)` returns our duplicated rownames. – Megatron Feb 28 '18 at 14:19
1

From a quick perusal of the 24 (by my count) `as.data.frame` method sources, every one except for `as.data.frame.matrix` either explicitly ensures no duplicated row names, or at some point lets `data.frame` do the heavy lifting which also ensures no duplicated row names. So I'm inclined to guess this is not intented behaviour and could be called a bug. Or my limited imagination prevents me from guessing why this should be allowed. – ngm Feb 28 '18 at 14:48
I would add that there is periodic discussion of `as.data.frame` methods and their inconsistent handling of row names on the r-devel mailing list. – ngm Feb 28 '18 at 14:54

1 Answers1

Yes, as Martyn Plummer confirmed on the official R-devel mailing list (https://stat.ethz.ch/mailman/listinfo/r-devel/) in his reply, this is a bug, and I will probably soon a commit a change to the sources fixing that one.

answered Mar 07 '18 at 20:47

Martin Mächler

4,619
27
27

and I did commit it, e.g., visible at the github mirror of the R repository https://github.com/wch/r-source/commit/51a4342f1b93d85bf6750cded97d8fa013984f46 – Martin Mächler Mar 09 '18 at 20:45