5

I notice that if the row names of the dataframe follows a sequence of numbers from 1 to the number of rows. The row names of the dataframe will disappear after using as.matrix. But the row names re-appear if the row name is not a sequence.

Here are a reproducible example:

test <- as.data.frame(list(x=c(0.1, 0.1, 1), y=c(0.1, 0.2, 0.3)))
rownames(test)
# [1] "1" "2" "3"

rownames(as.matrix(test))
# NULL

rownames(as.matrix(test[c(1, 3), ]))
# [1] "1" "3"

Why does this happen?

Tong Qiu
  • 123
  • 2
  • 10

5 Answers5

3

First and foremost, we always have a numerical index for sub-setting that won't disappear and that we should not confuse with row names.

as.matrix(test)[c(1, 3), ]
#        x   y
# [1,] 0.1 0.1
# [2,] 1.0 0.3

WHAT's going on while using rownames is the dimnames feature in the serene source code of base:::rownames(),

function (x, do.NULL = TRUE, prefix = "row") 
{
  dn <- dimnames(x)
  if (!is.null(dn[[1L]])) 
    dn[[1L]]
  else {
    nr <- NROW(x)
    if (do.NULL) 
      NULL
    else if (nr > 0L) 
      paste0(prefix, seq_len(nr))
    else character()
  }
}

which yields NULL for dimnames(as.matrix(test))[[1]] but yields "1" "3" in the case of dimnames(as.matrix(test[c(1, 3), ]))[[1]].

Note, that the method base:::row.names.data.frame is applied in case of data frames, e.g. rownames(test).

The WHAT should be explained with it, fortunately you did not ask for the WHY, which would be rather opinion-based.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • 1
    I worked the last half hour to try to answer this question and did not noticed your answer. Would you be so kind to review my answer and tell me if I am wrong or confusing. Many thanks jay.sf – TarJae Mar 18 '22 at 20:51
  • 1
    @TarJae I felt the same philosophical temptation as you to get philosophical in answering this question. – jay.sf Mar 18 '22 at 21:05
  • 1
    Thanks a lot, now I understand it better! – Tong Qiu Mar 19 '22 at 03:38
3

You can enable rownames = TRUE when you apply as.matrix

> as.matrix(test, rownames = TRUE)
    x   y
1 0.1 0.1
2 0.1 0.2
3 1.0 0.3
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
1

I don't know exactly why it happens, but one way to fix it is to include the argument rownames.force = T, inside as.matrix

rownames(as.matrix(test, rownames.force = T))
Alexandre Jacob
  • 2,993
  • 3
  • 26
  • 36
dvera
  • 314
  • 1
  • 10
1

The difference dataframe vs. matrix:

?rownames

rownames(x, do.NULL = TRUE, prefix = "row")

The important part is do.NULL = TRUE the default is TRUE: This means:

If do.NULL is FALSE, a character vector (of length NROW(x) or NCOL(x)) is returned in any case,

If the replacement versions are called on a matrix without any existing dimnames, they will add suitable dimnames. But constructions such as

rownames(x)[3] <- "c"

may not work unless x already has dimnames, since this will create a length-3 value from the NULL value of rownames(x).

For me that means (maybe not correct or professional) to apply rownames() function to a matrix the dimensions of the row must be declared before otherwise you will get NULL -> because this is the default setting in the function rownames().

In your example you experience this kind of behaviour: Here you declare row 1 and 3 and get 1 and 3

rownames(as.matrix(test[c(1, 3), ]))
[1] "1" "3"

Here you declare nothing and get NULL because NULL is the default.

rownames(as.matrix(test))
NULL

You can overcome this by declaring before:

rownames(test) <- 1:3

rownames(as.matrix(test))
[1] "1" "2" "3"

or you could do :

rownames(as.matrix(test), do.NULL = FALSE)
[1] "row1" "row2" "row3"
> rownames(as.matrix(test), do.NULL = FALSE, prefix="")
[1] "1" "2" "3"

Similar effect with rownames.force: rownames.force
logical indicating if the resulting matrix should have character (rather than NULL) rownames. The default, NA, uses NULL rownames if the data frame has ‘automatic’ row.names or for a zero-row data frame. dimnames(matrix_test)

TarJae
  • 72,363
  • 6
  • 19
  • 66
1

There is a difference between 'automatic' and non-'automatic' row names.

Here is a motivating example:

automatic

test <- as.data.frame(list(x = c(0.1,0.1,1), y = c(0.1,0.2,0.3)))
rownames(test)
# [1] "1" "2" "3"

rownames(as.matrix(test))
# NULL

non-'automatic'

test1 <- test
rownames(test1) <- as.character(1:3)
rownames(test1)
# [1] "1" "2" "3"

rownames(as.matrix(test1))
# [1] "1" "2" "3"

You can read about this in e.g. ?data.frame, which mentions the behavior you discovered at the end:

If row.names was supplied as NULL or no suitable component was found the row names are the integer sequence starting at one (and such row names are considered to be ‘automatic’, and not preserved by as.matrix).

When you call test[c(1, 3), ] then you create non-'automatic' rownames implicitly, which is kinda documented in ?Extract.data.frame:

If `[` returns a data frame it will have unique (and non-missing) row names.

(type `[.data.frame` into your console if you want to go deeper here.)

Others showed what this means for your case already, see the argument rownames.force in ?matrix:

rownames.force: ... The default, NA, uses NULL rownames if the data frame has ‘automatic’ row.names or for a zero-row data frame.

markus
  • 25,843
  • 5
  • 39
  • 58