0

I have one GRanges object with thousands of score columns (tomap), and another with regions of interest and no metadata (roi). I am trying to map the max score from each column in tomap to the corresponding interval in roi.

I also want to retain the names of the score columns (in my real data these are meaningful names and not genralizable like score1, score2 etc...). I can do it for specific columns but am struggling to generalize it to every column.

Here is what I've got so far:

library(GenomicRanges)
tomap <- GRanges(
    seqnames = Rle(c("chr1"), c(10)),
    ranges = IRanges(1:10*10, end = 1:10*10+5),
    score1 = runif(10),score2=runif(10),score3=runif(10),score4=runif(10),score5=runif(10))

roi <- GRanges(
    seqnames = Rle(c("chr1"), c(5)),
    ranges = IRanges(1:5*20 + floor(runif(5)*4), width = 10))

hits <- findOverlaps(roi, tomap, ignore.strand = TRUE)

ans<-roi
mcols(ans) <- aggregate(tomap, hits, score1=max(score1), score2= max(score2))

ans
#GRanges object with 5 ranges and 3 metadata columns:
#      seqnames    ranges strand |             grouping            score1            score2
#         <Rle> <IRanges>  <Rle> | <ManyToManyGrouping>         <numeric>         <numeric>
#  [1]     chr1     22-31      * |                  2,3 0.326366489753127 0.925836584065109
#  [2]     chr1     42-51      * |                  4,5  0.92806151532568 0.897841389290988
#  [3]     chr1     62-71      * |                  6,7 0.980487102875486 0.940743445185944
#  [4]     chr1     83-92      * |                  8,9 0.798293181695044 0.381754550151527
#  [5]     chr1   101-110      * |                   10 0.872806148370728 0.953412540955469






As you can see, this works when I specify each score column individually, but how do I do this for thousands of columns?

polg
  • 1
  • 2
  • the only library loaded is GenomicRanges; I edited the question to reflect that. The result works with stats::aggregate too. – polg Mar 31 '19 at 17:34
  • @Parfait Thanks for trying but that results in an error for me too: `Error in FUN(Vector_window(x, start = start[i], end = end[i]), ...) : invalid 'type' (S4) of argument`. I'll try bioconductor.org too. – polg Mar 31 '19 at 20:23
  • is there perhaps a clever way to create the string `columnName1=max(columnName1), columnName2= max(columnName2)...` from a vector of column names and then have it be evaluated within the aggregate function? – polg Mar 31 '19 at 20:55
  • same error unfortunately – polg Mar 31 '19 at 23:14
  • Heeding [@MartinMorgan's comment](https://stackoverflow.com/questions/29609275/convert-s4-dataframe-of-rle-objects-to-sparse-matrix-in-r#comment47367707_29609275), let's have the [bioconductor support](https://support.bioconductor.org/) handle these special objects. I don't want to recommend converting to S3 data frame or using `eval(parse...))` when another solution could be available. I see you [posted](https://support.bioconductor.org/p/119588/). Good luck! – Parfait Mar 31 '19 at 23:25
  • I had tried eval(parse) with no success but turns out I'd made an error somewhere; your comment motivated me to try again, so thanks!. Anyways, although there might be a better solution to this, the following does the trick: `scoreagg<-paste0("mcols(ans)<-aggregate(tomap,hits,",paste0(colnames(tomap@elementMetadata)[1:5],"=","max(",colnames(tomap@elementMetadata)[1:5],")",collapse=","),")") eval(parse(text=scoreagg))` – polg Mar 31 '19 at 23:54

0 Answers0