2

I'm trying to create a bar graph of log-transformed gene expression data, but I can't seem to be able to change the region of intersect between my x and y axes. I can do this in base R, but this is part of a wider Shiny app and I was hoping for consistency between plots. Here's a minimal example:

df <- structure(list(Genes = 1:96, foldExpression = c(0.0004815809133408, 
0.000186554083405159, 0.000142020245166264, 9.91580733203794e-05, 
1.50586731037356e-06, 0.00481620829848941, 7.00667826311261e-06, 
0.000604779718674668, 0.163676740192247, 3.62788104054902, 2.44658474904461e-05, 
0.210999773018998, 1.32242024147151e-05, 4.45630431051514e-05, 
0.0125863492990385, 2.49268008840451e-06, 1.50586731037356e-06, 
3.02807684233181e-06, 3.50705461319705e-06, 0.000105632409818916, 
1.19134216162706e-05, 1.29449671876212e-05, 9.74212165299063e-06, 
3.63184214547378e-05, 3.26242458235749e-06, 0.000181524513078334, 
34.0598396752968, 0.000640482203907824, 0.146425597192933, 0.0211590493748083, 
0.00269592092938944, 8.12671770229108e-06, 0.191107177896105, 
0.0330151237231742, 0.000562693414872207, 0.0343156883059605, 
0.138685187592148, 1.50414231710658e-06, 0.000906002611814846, 
1.50586731037356e-06, 0.00070179892741303, 1.64817643465276e-05, 
0.0204972580462547, 0.99689996775548, 5.73398819576538e-06, 0.00172777432122405, 
0.000320413861481815, 1.55703647256429e-05, 4.93694330997287e-05, 
0.00048347227858637, 16.2495902202441, 0.000117459820797509, 
0.46855520182516, 0.187167487209099, 0.0126360264277908, 0.547826527450132, 
0.00010186283275126, 0.107099888658231, 0.928988877390118, 0.0255891506765197, 
14.5878478488994, 0.0351486534651654, 1.50586731037356e-06, 0.12898863274561, 
1.07807490617892e-06, 0.0866906132169558, 0.0846439755023068, 
1.50586731037356e-06, 0.0219408065875584, 0.000633712919015739, 
1.34286645334472, 0.000781651675786721, 0.00111374027766133, 
1.68108995564353, 0.000296064304023605, 0.00553646319822078, 
0.00406191327455512, 0.00597091227977721, 0.000671556330036686, 
0.000494591352109571, 5.85672221179354e-05, 0.0011496215564835, 
0.00382596985799774, 0.35229799361084, 1.04529619268133e-06, 
0.33618336876906, 0.369380625212315, 0.357075757608649, 1.38899057059804, 
1.43890812412937, 0.839895749200342, 0.0188847113666784, 18.005148013488, 
1.3403885084613, 2.74237272274652, 1.38899057059804)), .Names = c("Genes", 
"foldExpression"), row.names = c(NA, -96L), class = "data.frame")

ggplot(df, aes(Genes, foldExpression))+
geom_bar(stat='identity')+
scale_y_log10(breaks=trans_breaks("log10", function(x) 10^x), labels=trans_format("log10",math_format(10^.x)))

enter image description here

Effectively many of the bars 'look' negative because the y intercepts at 1. Ideally I want to recapitulate what a collaborator has made in Excel where the the y-axis crosses the x at 0.001, and values lower than this are not plotted (or are plotted as 0.001).

Is there an easy way to do this using ggplot2? I've been looking at scale_y_continuous() and expand_limits(), but while I can change the scale, I'm unable to change the intercept. The best I can come up with is forcing it with the geom_crossbar function to get mostly the plot I'm looking for:

df$foldExpression[which(df$foldExpression < 0.001)] <- 0.001
ggplot(df, aes(Genes, foldExpression))+
geom_crossbar(aes(ymin=0.001, ymax=foldExpression))+
scale_y_log10(breaks=trans_breaks("log10", function(x) 10^x), labels=trans_format("log10",math_format(10^.x))

)

enter image description here

But this looks a bit weird, as the 'middle' attribute is visible at the top of the bar, and it's hollow so I don't think I can 'fill' the boxes.

Any help would be greatly appreciated!

  • You can always draw the bars yourself with `geom_rect`. However, the reason this is hard to do is because bars are technically defined as reaching to 0 (which becomes 10 ^ 0 for technical reasons in your case), and therefore they aren't really defined on a log scale. I'd advise plotting points instead, optionally with an added `geom_hline` for a value that has some significance. (That being said, doesn't 1 make sense as a baseline for fold expression?) – Axeman Apr 03 '17 at 20:49
  • Thanks for the reply @Axeman (and for adding the plots)! I'll take a look at `geom_rect`. The values are actually relative expression levels which is why I wasn't wanting 1 as a baseline; this is part of a wider Shiny app, and while it mostly displays fold change, I was wanting the same plot function to show relative expression if no control samples were included. I was hoping for a quick "`+fix_everything()`" ggplot solution that I could toggle depending on user input, but it looks like I'll have to write a separate function for this scenario. – RareAquaticBadger Apr 03 '17 at 21:14
  • From `geom_col`: _"A bar chart uses height to represent a value, and so the base of the bar must always be shown to produce a valid visual comparison. Naomi Robbins has a nice [article on this topic](http://www.b-eye-network.com/view/index.php?cid=2468). This is why it doesn't make sense to use a log-scaled y axis with a bar chart."_ Your life will be easier with points, or lollipops (points with line segments). – Axeman Apr 04 '17 at 08:23
  • Multiply by 1000 and then relabel the y axis. – IRTFM Apr 04 '17 at 18:31
  • Awesome, thanks @42. I don't know why I didn't think of that. Works great! – RareAquaticBadger Apr 04 '17 at 21:16

0 Answers0