I am trying to add the gene names onto a heat map.
To do this I have had to use Biomart to match the Gene names to the Ensembl ID's. Then I wanted to bring the gene names to be the row names on the file (replacing the Ensembl ID's) so that when II plotted the heatmap they would be the row name labels.
First I did he biomart matching, then I merged the output of that file with my data file, then I removed the original original ensemble ID column during the gene ID merge.
So now my first column is a list of genes, and the subsequent columns are counts.
I would like to make this first column the row names for the data set.
I think judging from the errors I am getting that i) my data is not in the correct format (both because there is characters and numbers in the table, and because I cant simply move column 1 into row names) ii) the incorrect format is preventing the heatmap being plotted.
I have seen and tried the solutions presented elsewhere to get column 1 into row names.
Where am I going wrong?
THIS SETS UP THE BIOMART GENE ID / ENSEMBL ID SEARCH
library(biomaRt)
marts <- listMarts()
ensembl <- useMart("ensembl")
datasets <- listDatasets(ensembl)
ensembl=useDataset("sscrofa_gene_ensembl",mart=ensembl)
attributes <- listAttributes(ensembl)
my_ids <- as.data.frame(rownames(mat.z))
THIS CREATES A FILE WITH THE GENES AND ENSEMBLE IDS TOGETHER
results_end_1 <- getBM(attributes = c("ensembl_gene_id","external_gene_name"), values = my_ids, mart = ensembl )
View(results_end_1)
merged_with_my_ids <- merge(my_ids,results_end_1,by.x = "rownames(mat.z)",by.y = ,"ensembl_gene_id")
View(merged_with_my_ids)
merged_with_my_ids <- as.data.frame(merged_with_my_ids)
merged_with_my_ids
HERE I READ A FILE 'Mat.z' THAT I PREVIOUSLY SAVED AS A CSV, SO THAT I COULD GIVE THE FIRST COLUMN A COLUMN TITLE - SO I COULD UNDERTAKE THE MERGE BELOW (IT PREVIOUSLY HAD NO COLUMN NAME SO I COULDNT REFER TO THE COLUMN TO MERGE IT)
mat.z<-read.delim("matz.csv", header = TRUE, sep = ",")
mat.z
HERE IS THE MERGE
mat.z <- merge(mat.z,merged_with_my_ids,by.x = "ensembl_gene_id",by.y = ,"rownames(mat.z)")
mat.z
HERE I CHANGE THE DATA IN THE FIRST COLUMN WITH GENE NAMES FROM ENSEMBL IDS
mat.z$ensembl_gene_id <- ifelse(is.na(mat.z$external_gene_name), mat.z$ensembl_gene_id, mat.z$external_gene_name)
mat.z$external_gene_name <- NULL
mat.z
HERE I WROTE IT AT A CSV
write.csv(mat.z, "~/Documents/DPhil/In Vivo Data/Pig/matszo.csv", row.names = TRUE)
NOW I AM TRYING TO MAKE COLUMN 1 THE ROW TITLES
mat.z<-read.delim("matszo.csv", header = TRUE, sep = ",")
mat.z <- mat.z[,-1,drop=F]
THIS IS THE ERROR I GET:
Warning: non-unique value when setting 'row.names': ''Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
I also tried
rownames(mat.z) <- mat.z$X
and
library(tidyverse)
mat.z %>%
+ remove_rownames()
%>% column_to_rownames(var = 'X')
Finally, when I tried to just use the mat.z file to plot the heatmap I get the following error from the code:
heatmap(mat.z,cluster_rows = T, cluster_columns = T, name = "z-score",column_labels = colnames(mat.z), col = pal, legend = TRUE, annotation_col = coldata, cutree_rows = 2, main = "Heatmap of DEGS Normalized Counts in Pig Samples")
Error in heatmap(mat.z, cluster_rows = T, cluster_columns = T, name = "z-score", :
'x' must be a numeric matrix
help would be greatly appreciated!