1

I have a fairly big data.frame with several thousand rows and some dozens of columns. Some rows have NA values in the final columns. Example df:

          pos1    pos2    pos3    pos4    pos5    pos6    
case1     0.5     0.6     0.5     0.3     0.2      NA
case2     0.3     0.7     0.2     0.1     0.5      0.5
case3     0.1     0.2     0.6     0.8     NA       NA
case4     0.4     0.1     0.1     0.6     0.3      0.9
  . 
  .
  .

Moreover, I have to vector of indexes i1 and i2:

i1:

[1] 2 3 2 1

i2:

[1] 5 4 5 6

What I would like to do is to subset each row of the data.frame according to a range defined by indexes in i1 and i2. Say, I want to get a list of vectors or a second data.frame where each vector or row is a row of the initial data.frame, filtered according to i1:12, and possibly filling the gaps with NAs if the output is a data.frame.

The desired output would be:

List of vectors:

[[1]] 
[1] 0.6 0.5 0.3 0.2
[[2]]
[1] 0.2 0.1
[[3]]
[1] 0.2 0.6 0.8 NA
[[4]]
[1] 0.4 0.1 0.1 0.6 0.3 0.9

Data.frame:

          pos1    pos2    pos3    pos4    pos5    pos6    
case1     NA      0.6     0.5     0.3     0.2      NA
case2     NA      NA      0.2     0.1     NA       NA
case3     NA      0.2     0.6     0.8     NA       NA
case4     0.4     0.1     0.1     0.6     0.3      0.9
  . 
  .
  .

If I had just one index and wanted to get only one value for each row, I know I could use seq_along to get a vector of values in the form of:

subset <- df[cbind(seq_along(i1),i1)]

But I cannot get the correct code for doing somewhat similar but using a range of values delimited by two indexes.

Please I need some help. Many thanks.

1 Answers1

0

We can use Map

Map(function(x, i, j) x[i:j], asplit(df, 1), i1, i2)

-output

#$case1
#pos2 pos3 pos4 pos5 
# 0.6  0.5  0.3  0.2 

#$case2
#pos3 pos4 
# 0.2  0.1 

#$case3
#pos2 pos3 pos4 pos5 
# 0.2  0.6  0.8   NA 

#$case4
#pos1 pos2 pos3 pos4 pos5 pos6 
# 0.4  0.1  0.1  0.6  0.3  0.9 

For the second case

do.call(rbind, Map(function(x, i, j) replace(x, !seq_along(x) %in%
          i:j, NA), asplit(df, 1), i1, i2))

-output

#      pos1 pos2 pos3 pos4 pos5 pos6
#case1   NA  0.6  0.5  0.3  0.2   NA
#case2   NA   NA  0.2  0.1   NA   NA
#case3   NA  0.2  0.6  0.8   NA   NA
#case4  0.4  0.1  0.1  0.6  0.3  0.9

data

df <- structure(list(pos1 = c(0.5, 0.3, 0.1, 0.4), pos2 = c(0.6, 0.7, 
0.2, 0.1), pos3 = c(0.5, 0.2, 0.6, 0.1), pos4 = c(0.3, 0.1, 0.8, 
0.6), pos5 = c(0.2, 0.5, NA, 0.3), pos6 = c(NA, 0.5, NA, 0.9)),
class = "data.frame", row.names = c("case1", 
"case2", "case3", "case4"))

i1 <- c(2, 3, 2, 1)

i2 <- c(5, 4, 5, 6)
akrun
  • 874,273
  • 37
  • 540
  • 662