1

I have some RNA seq data for two different cell types (naive and Th1).

How would I find the average of the two replicates of each cell type for each gene (the grouping variable) and then plot a log-log scatter graph of these data?

My table looks like this:

df <- data.frame(Gene_name=c("Mrpl15", "Lypla1", "Tcea1"),
                 naive_A=c(5.21212, 6.62643, 5.74654),
                 naive_B=c(4.52376, 5.64459, 4.52153),
                 Th1_A=c(15.50650, 14.46030, 11.57770),
                 Th1_B=c(5.876490, 5.193010, 2.107200), stringsAsFactors=F)
df

  Gene_name naive_A naive_B   Th1_A   Th1_B
1    Mrpl15 5.21212 4.52376 15.5065 5.87649
2    Lypla1 6.62643 5.64459 14.4603 5.19301
3     Tcea1 5.74654 4.52153 11.5777 2.10720

If anyone can help, I'd be much obliged!! Thanks

Adam Smith
  • 2,584
  • 2
  • 20
  • 34
  • It'd be helpful if you removed the domain-specific jargon and generalized this to a question about taking averages and plotting. Also, you'll get an answer more quickly if your question includes how to create the required sample data. – Adam Smith Jun 18 '18 at 02:39

1 Answers1

2

You could do something like this

df %>%
    gather(sample, expr, -Gene_name) %>%
    mutate(condition = gsub("_\\w$", "", sample)) %>%
    group_by(Gene_name, condition) %>%
    summarise(expr.mean = mean(expr)) %>%
    spread(condition, expr.mean) %>%
    ggplot(aes(x = log10(naive), y = log10(Th1), label = Gene_name)) +
    geom_point() +
    geom_text()

enter image description here

Explanation: Reshape data from wide to long, extract a condition column and average expression values for every gene for every condition. Then reshape data from long to wide, and plot the log10-transformed mean expression from naive vs. mean expression from Th1 for every gene.

I've added gene labels just for illustration purposes. You can remove them by removing + geom_text().


Sample data

df <- read.table(text =
    "Gene_name   naive_A  naive_B    Th1_A    Th1_B
1    Mrpl15  5.21212 4.52376 15.50650 5.876490
2    Lypla1  6.62643 5.64459 14.46030 5.193010
3    Tcea1  5.74654 4.52153 11.57770 2.107200", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68