How to keep other columns when using aggregate in R?

Question

I have a dataframe, p4p5, that contains the following columns:

p4p5 <- c("SampleID", "expr", "Gene", "Period", "Consequence", "isPTV")

I've used the aggregate function here to find the median expression per Gene:

p4p5_med <- aggregate(expr ~ Gene, p4p5, median)

However, this results in a dataframe with the columns "expr" and "Gene" only. How can I still retain all the original columns when applying the aggregate function?

UPDATE:

Input (p4p5):

SampleID   expr  Gene        Period  Consequence            isPTV
HSB430    -1.23  ENSG000098  4       upstream_gene_variant  0
HSB321    -0.02  ENSG000098  5       stop_gained            1
HSB296     3.12  ENSG000027  4       upstream_gene_variant  0
HSB201     1.22  ENSG000027  4       intron_variant         0
HSB220     0.13  ENSG000013  6       intron_variant         0

Expected output:

SampleID   expr  Gene        Period  Consequence           isPTV  Median
HSB430    -1.23  ENSG000098  4       upstream_gene_variant  0    -0.625 
HSB321    -0.02  ENSG000098  5       stop_gained            1    -0.625
HSB296     3.12  ENSG000027  4       upstream_gene_variant  0     2.17
HSB201     1.22  ENSG000027  4       intron_variant         0     2.17
HSB220     0.13  ENSG000013  6       intron_variant         0     0.13

`aggregate()` doesn't return every column by design: the output of this function is the result of an aggregation and you can't combine it with the raw data (even conceptually). If you want the aggregation to be done on every column, you have to specify that explicitly — 12b345b6b78, Nov 27 '18 at 22:14
Please include some example data from `p4p5` in your question. The short answer is: you would need to join the aggregated data back to the original. Or yo could use `dplyr` to `group_by`, then `mutate` the data. — neilfws, Nov 27 '18 at 22:15
So something like the following? ```p4p5_med <- p4p5 %>% select(Gene, expr, SampleID, Period, isPTV) %>% group_by(Gene) %>% mutate(Median = median(expr)) ``` I tried this but it gives me the same median value for everything. — claudiadast, Nov 27 '18 at 22:27
Yes - I wrote my answer before seeing your comment and the output is from your example input. If there are different values for `expr` and > 1 value for `Gene`, the medians by group should be different. — neilfws, Nov 27 '18 at 22:33

score 1 · Answer 1 · answered Nov 27 '18 at 22:27

1

I'd use dplyr for this:

library(dplyr)

p4p5 %>% 
  group_by(Gene) %>% 
  mutate(Median = median(expr, na.rm = TRUE)) %>%
  ungroup()

  SampleID  expr Gene       Period Consequence           isPTV Median
  <chr>    <dbl> <chr>       <int> <chr>                 <int>  <dbl>
1 HSB430   -1.23 ENSG000098      4 upstream_gene_variant     0 -0.625
2 HSB321   -0.02 ENSG000098      5 stop_gained               1 -0.625
3 HSB296    3.12 ENSG000027      4 upstream_gene_variant     0  2.17 
4 HSB201    1.22 ENSG000027      4 intron_variant            0  2.17 
5 HSB220    0.13 ENSG000013      6 intron_variant            0  0.13

answered Nov 27 '18 at 22:27

neilfws

32,751
5
50
63

When I run that, it retains all the columns but posts the same median value across all rows – claudiadast Nov 27 '18 at 22:33
This output is from your example data, so you must be doing something different. – neilfws Nov 27 '18 at 22:34
Anybody have a base R approach? – theforestecologist Jan 27 '23 at 20:32

How to keep other columns when using aggregate in R?

1 Answers1