1

I would like to generate an string output into a list if some values are met. I have a table that looks like this:

grp      V1  V2  V3  V4  V5 V6 V7 V8 V9 V10 V11   V12   V13 V14 V15 V16 V17
1:   1 go.1 142 144 132 134  0  31 11  F   D  T   hy     al qe  34   6   3
2:   2 go.1 313 315 303 305  0  31 11  q   z  t   hr     ye er  29  20  41
3:   3 go.1 316 318 306 308  0  31 11  f   w  y   hu     er es  64  43  19
4:   4 go.1 319 321 309 311  0  31 11  r   a  y   ie     uu qr  26  22  20
5:   5 go.1 322 324 312 314  0  31 11  g   w  y   hp     yu re  44   7   0

I'm using this function to generate a desired output:

library(IRanges); library(data.table)
rangeFinder = function(x){  
  x.ir = reduce(IRanges(x$V2, x$V3))
  max.idx = which.max(width(x.ir))
  ans = data.table(out = x[1,1], 
             start = start(x.ir)[max.idx], 
             end = end(x.ir)[max.idx])
return(ans)}

rangeFinder(x.out)
          out start end
1:          1   313 324

I would also like to generate a list with the letters (from column V9-V11) in the between the start and end output from rangeFinder.

For example, the output should look like this.

out
[[go.1]]
[1]     "qztfwyraygwy"

rangeFinder is looking at values in column V2 and V3 and printing the longest match of numbers. Notice how "FDT" is not included in the list output even though rangeFinder produced an output from 313-324 (and not from 142-324). How can I get the desired output?

user3141121
  • 480
  • 3
  • 8
  • 17

1 Answers1

0

reduce has an argument with.revmap to add a "metadata" column (accessible with mcols()) to the object. This associates with each reduced range the indexes of the original range that map to the reduced range, as an IntegerList class, basically a list where all elements are guaranteed to be integer vectors. So these are the rows you're interested in

ir <- with(x, IRanges(V2, V3))
r <- reduce(ir, with.revmap=TRUE)
i <- unlist(mcols(r)[which.max(width(r)), "revmap"])

and the data character string can be munged with something like

j <- paste0("V", 9:11)
paste0(as.matrix(x[i, j, drop=FALSE]), collapse="")

It's better to ask your questions about IRanges on the Bioconductor mailing list; no subscription required.

with.revmap is a convenience argument added relatively recently; I think

h = findOverlaps(ir, r)
i = queryHits(h)[subjectHits(h) == which.max(width(r))]

is a replacement.

Martin Morgan
  • 45,935
  • 7
  • 84
  • 112
  • It seems like I dont have the function `with.revmap` I'm getting this error. **Error in .local(x, ...) : unused argument (with.revmap = TRUE)** – user3141121 Apr 15 '14 at 17:11
  • @user3141121 yes, this is with the current Bioconductor, which is available when using R-3.1 (since yesterday!) – Martin Morgan Apr 15 '14 at 18:48
  • I see. Any workaround for 3.0? I dont want to update since all of my other libraries will need to be updated as well – user3141121 Apr 15 '14 at 19:45