0

I'm trying to produce a heat map of gene expression from samples of different conditions, faceted by the conditions:

require(reshape2)
set.seed(1)
expression.mat <- matrix(rnorm(100*1000),nrow=100)
df <- reshape2::melt(expression.mat)
colnames(df) <- c("gene","sample","expression")
df$condition <- factor(c(rep("C1",2500),rep("C2",3500),rep("C3",3800),rep("C4",200)),levels=c("C1","C2","C3","C4"))

I'd like to color by expression range:

df$range <- cut(df$expression,breaks=6)

The width parameter in ggplot's aes is supposed to control the width of the different facets. My question is how to find the optimal width value such that the figure is not distorted?

I played around a bit with this plot command:

require(ggplot2)
ggplot(df,aes(x=sample,y=gene,fill=range,width=100))+facet_grid(~condition,scales="free")+geom_tile(color=NA)+labs(x="condition",y="gene")+theme_bw()

Setting width to be below 100 leaves gaps in the last facet (with the lowest number of samples), and already at this value of 100 you can see that the right column in the first facet from left is distorted (wider than the columns to its left):

enter image description here

So my question is how to fix this/find a width that doesn't cause this.

Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
dan
  • 6,048
  • 10
  • 57
  • 125
  • Sorry for mentioning the `width` parameter it only brought confusion. You didn't need to edit your question. The previous version, with a plot with white space was actually a better illustration of the problem. I have updated my answer to illustrate the issue. – Paul Rougieux Dec 15 '16 at 09:31

1 Answers1

1

Edit showing the issue with the sample variable faceted by condition

There is no C1 sample between 25 and 100, because they are by C2, c3 and C4. Here is an illustration for the sample < 200.

ggplot(filter(df[df$sample < 200,]),
       aes(x=sample, y = gene, fill=range)) +
    geom_tile() +
    facet_grid(~condition)

plot showing issue for sample below 200

The number of sample is not the same in all facets and faceting on conditoins creates wholes between sample numbers for each condition.

One way to go around this problem would be to create a sample2 number. I work using the dplyr package.

library(dplyr)
sample2 <- df %>% 
    group_by(condition) %>% 
    distinct(sample) %>% 
    mutate(sample2 = 1:n())

df <- df %>% 
    left_join(sample2, by = c("condition", "sample"))

Then plot using sample2 as the x variable

ggplot(df,aes(x = sample2, y = gene,
              fill = range))+
    facet_grid(~condition) + 
    geom_tile(color=NA) + theme_bw()

sample2 plot updated

Using the scales argument to vary scales on the x axis.

ggplot(df,aes(x = sample2, y = gene,
              fill = range))+
    facet_grid(~condition, scales = "free") + 
    geom_tile() + theme_bw()

Old answer using width

See for example this answer.

Adding a width aesthetic produces wider columns:

ggplot(df,aes(x = sample, y = gene,
              fill = range, width = 50))+
    facet_grid(~condition) + 
    geom_tile(color=NA) + 
    labs(x="condition",y="gene")+theme_bw()
Community
  • 1
  • 1
Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110
  • Thanks. I'm still experiencing problems - see my edited post – dan Dec 14 '16 at 02:39
  • The with parameter is not the correct way to do it see what happens when we look only at the 200 first sample for condition C1: `ggplot(df[df$condition =="C1" & df$sample<200,],aes(x = sample, y = gene, fill = range, width = 10))+ facet_grid(~condition) + geom_tile(color=NA) + labs(x="condition",y="gene")+theme_bw()` – Paul Rougieux Dec 15 '16 at 08:32
  • Thanks for the clarification Paul Rougieux. But it seems that this solution is still leaving gaps in the facets, although they're aligned to the right now. Is there a way to fill the entire area of each facet, but where all tiles have the same dimensions? With the width=100 parameter, as I mention in my question, the rightmost column of tiles in the leftmost facet are much wider than all other tile columns to its left. – dan Dec 15 '16 at 16:34
  • All tiles do have the same dimension. That's more a data problem than a representation problem because you have different number of samples under each condition. C3 only has 100 samples while C4 has 300 samples. This representation shows you an area proportional to the number of sample in each facet. It is possible to increase the size of tiles in C3 using the scale argument: `facet_grid(~condition, scale="free")`. But then tiles will not have the same size, they will be wider in C3 than in C4. – Paul Rougieux Dec 16 '16 at 12:34
  • Yes, scale="free" is what I've posted but it seems to produce a distortion. If you look at my figure, in the first tile from the left you'll see that the right most column is much wider that all columns to its left. So it's not really stretching them uniformly. My question is if that is fixable. – dan Dec 20 '16 at 17:26