0

I currently have a data frame of frequencies and would like to make a heatmap. The Heat map that I have is not in-depth enough, and I am unsure of how to cluster the data by rows and by columns. What else should I add to my code?

Here is some sample data, to give a solid example, the first 6 rows of my .txt file:

df <- matrix(c("cdr3aa", "Ana-Wk0-CD4", "Ana-Wk27-CD4", "Ana-Wk0-CD8", "Ana-Wk27-CD8", "nonshared", "shared", 
 "CASSLSPSTGNYGYTF", 1.12E-05, 0, 0.067313861, 0.049220624, 0.188215557, 0, 
 "CASCSRTSTDTQYF", 0, 3.19E-06, 0.065083991, 0.034558507, 0.105607859, 0, 
 "CASGGQG_GNQPQHF", 4.61E-04, 2.98E-04, 0, 0, 0.079950136, 0, 
 "CASRLRTGYDTEAFF", 7.50E-06, 6.38E-06, 0.057943843, 0.047042366, 0.073757427, 0, 
 "CAISEWDRAGELFF", 3.75E-06, 1.12E-05, 0.069951496, 0.060104224, 0.060978181, 0, 
 "CASSFPIREESSYNEQFF", 0, 0, 0.018718651, 0.015058722, 0.023803322, 0), ncol = 7, byrow = TRUE)

This is what I currently have so far:

#Input of data
df <- read.table("TESTING.txt", header=TRUE, comment.char="", sep = "\t")
#Removing the first row
result <-df[-1]    
#Setting the rownames as the CDR3 sequences    
row.names(result) <- df$cdr3aa  

#Making the HeatMap
dt2 <- result %>%
   rownames_to_column() %>%
   gather(colname, value, -rowname)
head(dt2)

ggplot(dt2, aes(x = colname, y = rowname, fill = value)) +
   geom_tile()

This gives me a heatmap like so: Heat Map the is given

Now, I am trying to make my heatmap more like this: A Heat Map with Clustering

As you can see, this heatmap has the clustering of rows that I am looking for and the color scheme that I would like.

I am also guessing that a proper heatmap is done on a logarithmic scale. The first pic is not like that, and I apologize. I know that I am close to my goal but I am unsure on how to proceed to get to the final result. I would like to keep the format of the first pic; all I would have to do is put "Sample ID" as the x title and "CDR3AA" as the y title. Any help for this would be greatly appreciated. I am still learning R, so anything that there is to learn I am willing to learn it. Thx!

Lasarus9
  • 83
  • 9
  • 1
    It's very difficult to help cluster the fields without knowing what they are or what they represent. Also, while you're using `geom_tile()`, your desired result uses a hierarchical clustering technique. – OTStats Jan 10 '19 at 21:39
  • So the columns are looking at specific samples, and those samples could either be CD4 T-cells, or CD8 receptor cells. They have Wk0 or Wk27, saying the different timepoints. Shared and not shared are looking at another sample and looking at the sequences that are shared/not shared between them. CDR3 is looking at the sequences of Amino Acids. Now, each frequency is telling you how much of that sequence is in each sample. – Lasarus9 Jan 10 '19 at 21:48
  • 2
    I find https://github.com/talgalili/heatmaply to be very useful for creating heatmaps with a dendrogram. – Roman Luštrik Jan 10 '19 at 21:48
  • @RomanLuštrik Thank you! I am still stuck with how to proceed, and I just can't see the next step to take. R is new to me, so I am still trying to wrap my head around the this – Lasarus9 Jan 10 '19 at 22:01
  • If anyone knows how to change this into a logarithmic scale and format it in such a way that it looks like how I'd like it that would be great. I tried it again, and this time I got the colors the way I want, but still, I cannot figure out how to cluster correctly and how to make the chart with the correct color ranges (with the mid being white, and reddish/bluish being at either side of the spectrum, with grey being NULL). Also, how would i go about making it into a log scale? – Lasarus9 Jan 10 '19 at 23:31
  • Of potential interest: https://github.com/raivokolde/pheatmap – Artem Sokolov Jan 11 '19 at 16:39
  • 1
    @ArtemSokolov Thank you. This is something close to what I need. I am still unsure of how to change everything into a log scale, and add grey for the NULL, for those below a certain cutoff. – Lasarus9 Jan 11 '19 at 18:17
  • Suggested duplicate: [Is there a built in way to do a logarithmic color scale in ggplot?](https://stackoverflow.com/q/8069837/903061). I think you just need to add `scale_fill_gradient(trans = "log")` to your plot. Maybe set the values of anything you want grey to `NA`. – Gregor Thomas Jan 16 '19 at 16:47

1 Answers1

1

I am also guessing that a proper heatmap is done on a logarithmic scale. You can do heatmap with log-transformed data, but basically you can also use other transformations - as long as the values will be normalized.

You can do a log-transformation for a dataframe column like this:

df$log_col <- log(df$col)

or like this:

df$log10_col <- log10(df$col)

NB: You read the data as dataframe while your data sample/example uses the matrix, which are different structures.

Oka
  • 1,318
  • 6
  • 11