2

I have a dataset "dat" as follows:

      ChromKey CHROM      POS   ID   REF ALT length                   
11438        1  chr1 27023450 <NA>  AGCG   A   4 
11755        1  chr1 27023767 <NA>    CA   C   3       
12521        1  chr1 27057930 <NA>    GA   G   2                
13174        1  chr1 27088681 <NA>    TC   T   3                   
14861        1  chr1 27100181 <NA>  CGCA   C   2     
15593        1  chr1 27101426 <NA> TCTAA   T   5 

This dataset was created as a subset of another much larger dataset that contains each of the rows in dat and more. Let's call this original full dataset "dat.ori". (The numbers on the extreme left are the row numbers from the dat.ori dataset that have been subsetted to create dat).

From the original larger dataset I would like to create a dataset such that I can extract the rows that are in dat along with n rows above and below that row number, where n is the value given under the variable length in dat. For example, the rows I need extracted from dat.ori are

11434, 11435, 11436, 11437, 11438, 11439, 11440, 11441, 11442, 11752, 11753, 
11754, 11755, 11756, 11757, 11758 and so on

That is 4 rows above and below 11438, 3 rows above and below 11755, 2 rows above and below 12521 etc.

Is there a way to do this in R? Many thanks! :)

(Apologies, its not the most reproducible example but I will try and edit this so that respondents can reproduce the example)

UPDATE: Here's what I did (from: Returning above and below rows of specific rows in r dataframe)

myRows=c(rownames(dat))
rowRanges <- lapply(which(rownames(dat.ori) %in% myRows), function(x) x + c(-1:1))
final=lapply(rowRanges, function(x) dat.ori[x, ])

This gives me exactly what I need but it gives me just one row above and below (set by c(-1:1)). What I need is this to be tweaked so that I get n rows above and below where n is determined by dat$length

Jaap
  • 81,064
  • 34
  • 182
  • 193
EskCargo
  • 35
  • 5

1 Answers1

3

A possible solution:

r <- rep(as.numeric(row.names(dat)), 2 * dat$length + 1)
u <- unlist(Map(':', -dat$length, dat$length))
idx <- r + u

Now you can extract these rows from dat.ori with:

dat.ori[idx, ]

Or:

dat.ori[r + u, ]
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • 1
    `n` is the `dat$length` column, not a constant 4. – zx8754 Feb 09 '18 at 13:37
  • Hi Jaap, Thanks for your response. This allows me to pull out from dat.ori, 4 rows above and below each row in dat. I need this number to be variable as per the value of dat$length. Please can you suggest that in your example? It looks really neat and easy to follow but I just need to figure out how to let "n" change according to length. Many thanks :) – EskCargo Feb 09 '18 at 13:42
  • If I try setting n<-dat$length or n<-c(dat$length), I get the following warning : Warning messages: 1: In rep(as.numeric(row.names(dat)), each = 2 * n + 1) : first element used of 'each' argument 2: In -n:n : numerical expression has 238 elements: only the first used 3: In -n:n : numerical expression has 238 elements: only the first used – EskCargo Feb 09 '18 at 13:51
  • 2
    @zx8754 fixed :-) – Jaap Feb 09 '18 at 14:13