duplicate 'row.names' are not allowed - R

Question

So, I am new in R and trying to implement a differential gene expression analysis. I'm trying to store gene names as rownames so that I can create a DGEList object.

asthma <- read.csv("Asthma_3 groups-Our study gene expression.csv")
head(asthma, 10)
dim(asthma)

asthma <- na.omit(asthma)
distinct(asthma)

countdata <- asthma[,-1]

head(countdata)
rownames(countdata) <- asthma[,1]
'''
I am getting this error:

Error in `.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed

As the error mentioned, `data.frame` won't allow duplicate row names. You may need to convert to `matrix` or add as a column — akrun, Jun 15 '21 at 18:59

score 0 · Answer 1 · answered Jun 15 '21 at 19:25

The first column in asthma likely has duplicate values. Two options I can think of

Can the first column be combined with another column to generate a new column with unique values that can be used as the rownames?
If not, you can probably use make.names().

Here is a reproducible example.

df = data.frame(col1 = c('A', 'A', 'B'), col2 = c(1, 2, 3))
df

That defines a data.frame that looks like this

  col1 col2
1    A    1
2    A    2
3    B    3

The data.frame by default has rownames 1, 2, 3. If you try this

rownames(df) = df[,1]

you get an error because df[,1] has 'A' twice, so it can't be used as a rowname without modification. You use make.names to create rownames with unique values like this

unique.col1 = make.names(df[,1], unique=T)
unique.col1

This results in

"A"   "A.1" "B"

Note that the .1 was added to the second A to make it different from the first A. Then define the rownames as unique.col1:

rownames(df) = unique.col1
df

The data.frame df now looks like this

    col1 col2
A      A    1
A.1    A    2
B      B    3

duplicate 'row.names' are not allowed - R

1 Answers1