0

i'm fairly new to R so please excuse me for the noob question. I have a dataframe that looks like this:

gene    ctrl   treated
gene_1   100   37.5
gene_2   100   20.2
...      ...   ...

For each row (ie each gene) in the df, I want to plot the values in such a way that ctrl and treated are one next to the other. The code below gives something close to what i want, but the output is not grouped as it should: the bars for controls are plotted before the ones for treated samples.

 barplot(height = df$df.ctrl1, df$df.avg_treated), names.arg = df$df.gene)

I know there are many similar questions, but i've gone through them with no success. Anyone can help me understand what am i doing wrong?

Second (optional) question: what if i want to color-code the bars according to the gene id?

Many thanks.

OliverBit
  • 11
  • 1

2 Answers2

1

I would use ggplot for this. Let's start with a slightly expanded example:

df <- data.frame(genes   = c("gene_1", "gene_2", "gene_3", "gene_4"),
                 ctrl    = c(50, 60, 70, 80),
                 treated = c(55, 64, 75, 83))

df
#>    genes ctrl treated
#> 1 gene_1   50      55
#> 2 gene_2   60      64
#> 3 gene_3   70      75
#> 4 gene_4   80      83

The first thing we need to do is switch the dataframe to long format using tidyr::pivot_longer to put all your values in one column, and the labels of "ctrl" and "treatment" in another column. Then we can use ggplot to build our output:

library(tidyr)
library(ggplot2)

df %>% 
  pivot_longer(cols = c("ctrl", "treated")) %>%
  ggplot(aes(name, value, fill = genes, alpha = name)) +
  geom_col(position = position_dodge(), color = "black") +
  scale_alpha_manual(values = c(0.5, 1), guide = guide_none()) +
  facet_grid(~genes, scales = "free_x", switch = "x") +
  theme(strip.placement  = "outside",
        panel.spacing    = unit(0, "points"),
        strip.background = element_blank(),
        strip.text       = element_text(face = "bold", size = 12)) +
  labs(x = "Gene")

Created on 2020-08-22 by the reprex package (v0.3.0)

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thank you very much! This is awesome and very clear. – OliverBit Aug 23 '20 at 10:20
  • @OliverBit you're welcome. If this has answered your question, please consider marking it as accepted. – Allan Cameron Aug 23 '20 at 10:21
  • I don't mean to exploit your patience, but what if i want to prevent ggplot from alphabetically sorting my data? I've found some similar questions and came up with this code; is it correct to establish a new order? df$genes <- factor(df$genes, levels = df$genes[order(df$genes)]) Still, I'm not able to tell ggplot to use this new order. Any help? – OliverBit Aug 23 '20 at 10:51
  • @OliverBit the levels you supply should be unique. If you want the genes to appear in the order they appear in your data frame, do `df$genes <- factor(df$genes, levels = unique(df$genes))` . – Allan Cameron Aug 23 '20 at 12:59
1

Consider transposing your data, converting into matrix with dimnames. Then run barplot with legend. Below demonstrates with random data. Note: ylim is adjusted for pretty range limit.

set.seed(92220)

df <- data.frame(gene = paste("gene", 1:30),
                 ctrl = runif(30, 50, 100),
                 treated = runif(30, 50, 100))
head(df)
#     gene     ctrl  treated
# 1 gene 1 75.74607 76.15832
# 2 gene 2 61.73860 70.19874
# 3 gene 3 56.57906 63.67602
# 4 gene 4 60.23045 80.21108
# 5 gene 5 62.52773 60.86909
# 6 gene 6 85.71849 61.25974

# TRANSPOSE INTO MATRIX WITH DIMNAMES
dat <- `dimnames<-`(t(as.matrix(df[c("ctrl", "treated")])),
                    list(c("ctrl", "treated"), df$gene))

barplot(dat, beside=TRUE, col=c("blue", "red"), las=3,
        main="Control Vs. Treatment",
        ylim=range(pretty(c(0, dat*1.05))))

legend("top", legend=row.names(dat),
       fill=c("blue", "red"), ncol=2, cex=0.75)

BarPlot Output

Parfait
  • 104,375
  • 17
  • 94
  • 125
  • These are entirely different functions from different packages; `dimnames` simply adds `row.names` and `colnames` to a matrix object. What you probably meant to compare is transpose, `t()`, and `pivot_longer` which reshapes data (long<->wide). – Parfait Aug 23 '20 at 16:47