-1

So, I am new in R and trying to implement a differential gene expression analysis. I'm trying to store gene names as rownames so that I can create a DGEList object.

asthma <- read.csv("Asthma_3 groups-Our study gene expression.csv")
head(asthma, 10)
dim(asthma)

asthma <- na.omit(asthma)
distinct(asthma)

countdata <- asthma[,-1]

head(countdata)
rownames(countdata) <- asthma[,1]
'''
I am getting this error:

Error in `.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed

aynber
  • 22,380
  • 8
  • 50
  • 63
arn
  • 1
  • 1
    As the error mentioned, `data.frame` won't allow duplicate row names. You may need to convert to `matrix` or add as a column – akrun Jun 15 '21 at 18:59

1 Answers1

0

The first column in asthma likely has duplicate values. Two options I can think of

  1. Can the first column be combined with another column to generate a new column with unique values that can be used as the rownames?
  2. If not, you can probably use make.names().

Here is a reproducible example.

df = data.frame(col1 = c('A', 'A', 'B'), col2 = c(1, 2, 3))
df

That defines a data.frame that looks like this

  col1 col2
1    A    1
2    A    2
3    B    3

The data.frame by default has rownames 1, 2, 3. If you try this

rownames(df) = df[,1] 

you get an error because df[,1] has 'A' twice, so it can't be used as a rowname without modification. You use make.names to create rownames with unique values like this

unique.col1 = make.names(df[,1], unique=T)
unique.col1 

This results in

"A"   "A.1" "B"  

Note that the .1 was added to the second A to make it different from the first A. Then define the rownames as unique.col1:

rownames(df) = unique.col1
df

The data.frame df now looks like this

    col1 col2
A      A    1
A.1    A    2
B      B    3
bmacGTPM
  • 577
  • 3
  • 12