0

I am trying to add the gene names onto a heat map.

To do this I have had to use Biomart to match the Gene names to the Ensembl ID's. Then I wanted to bring the gene names to be the row names on the file (replacing the Ensembl ID's) so that when II plotted the heatmap they would be the row name labels.

First I did he biomart matching, then I merged the output of that file with my data file, then I removed the original original ensemble ID column during the gene ID merge.

So now my first column is a list of genes, and the subsequent columns are counts.

I would like to make this first column the row names for the data set.

I think judging from the errors I am getting that i) my data is not in the correct format (both because there is characters and numbers in the table, and because I cant simply move column 1 into row names) ii) the incorrect format is preventing the heatmap being plotted.

I have seen and tried the solutions presented elsewhere to get column 1 into row names.

Where am I going wrong?

THIS SETS UP THE BIOMART GENE ID / ENSEMBL ID SEARCH

library(biomaRt)
marts <- listMarts()
ensembl <- useMart("ensembl")
datasets <- listDatasets(ensembl)
ensembl=useDataset("sscrofa_gene_ensembl",mart=ensembl)
attributes <- listAttributes(ensembl)
my_ids <- as.data.frame(rownames(mat.z))

THIS CREATES A FILE WITH THE GENES AND ENSEMBLE IDS TOGETHER


results_end_1 <- getBM(attributes = c("ensembl_gene_id","external_gene_name"), values = my_ids, mart = ensembl )
View(results_end_1)
merged_with_my_ids <- merge(my_ids,results_end_1,by.x = "rownames(mat.z)",by.y = ,"ensembl_gene_id")

View(merged_with_my_ids)
merged_with_my_ids <- as.data.frame(merged_with_my_ids)
merged_with_my_ids

HERE I READ A FILE 'Mat.z' THAT I PREVIOUSLY SAVED AS A CSV, SO THAT I COULD GIVE THE FIRST COLUMN A COLUMN TITLE - SO I COULD UNDERTAKE THE MERGE BELOW (IT PREVIOUSLY HAD NO COLUMN NAME SO I COULDNT REFER TO THE COLUMN TO MERGE IT)

mat.z<-read.delim("matz.csv", header = TRUE, sep = ",")
mat.z

HERE IS THE MERGE

mat.z <- merge(mat.z,merged_with_my_ids,by.x = "ensembl_gene_id",by.y = ,"rownames(mat.z)")
mat.z

HERE I CHANGE THE DATA IN THE FIRST COLUMN WITH GENE NAMES FROM ENSEMBL IDS

mat.z$ensembl_gene_id <- ifelse(is.na(mat.z$external_gene_name), mat.z$ensembl_gene_id, mat.z$external_gene_name)
mat.z$external_gene_name <- NULL
mat.z

HERE I WROTE IT AT A CSV

write.csv(mat.z, "~/Documents/DPhil/In Vivo Data/Pig/matszo.csv", row.names = TRUE)

NOW I AM TRYING TO MAKE COLUMN 1 THE ROW TITLES

mat.z<-read.delim("matszo.csv", header = TRUE,  sep = ",")
mat.z <- mat.z[,-1,drop=F]

THIS IS THE ERROR I GET:

Warning: non-unique value when setting 'row.names': ''Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed

I also tried

rownames(mat.z) <- mat.z$X

and

library(tidyverse)
mat.z %>%
+      remove_rownames()
%>% column_to_rownames(var = 'X')

Finally, when I tried to just use the mat.z file to plot the heatmap I get the following error from the code:

heatmap(mat.z,cluster_rows = T, cluster_columns = T, name = "z-score",column_labels = colnames(mat.z), col = pal, legend = TRUE, annotation_col = coldata, cutree_rows = 2, main = "Heatmap of DEGS Normalized Counts in Pig Samples") 
Error in heatmap(mat.z, cluster_rows = T, cluster_columns = T, name = "z-score",  : 
  'x' must be a numeric matrix

help would be greatly appreciated!

1 Answers1

0

The error comes from assigning the rownames directly on the data.frame (read.delim/read.csv - returns a data.frame object) with the first column which may have duplicate elements as data.frame needs unique row names. It is mentioned in ?row.names

All data frames have row names, a character vector of length the number of rows with no duplicates nor missing values.

An option would be to convert to matrix and assign the row names on it as matrix can have duplicate row names

mat.z<-read.delim("matszo.csv", header = TRUE,  sep = ",")
mat1 <- as.matrix(mat.z[, -1, drop = FALSE])
row.names(mat1) <- mat.z[[1]]
akrun
  • 874,273
  • 37
  • 540
  • 662