1

I am having a simple problem in R. I am working with a big dataset where I am trying to select any row that matches a certain criteria, along with the two rows above it and two rows below it in a dataframe. Here's what my data looks like

df <- structure(c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", 
"11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", 
"22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", 
"33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", 
"44", "45", "46", "47", "48", "49", "50", "a", "b", "a", "a", 
"a", "b", "a", "a", "a", "b", "a", "a", "a", "a", "a", "a", "a", 
"a", "a", "b"), .Dim = c(10L, 7L), .Dimnames = list(NULL, c("1", 
"2", "3", "4", "5", "6", "7")))

I am looking for instances with "b" in column6 and "a" in column7. selecting those instances can be done via this command:

rows <- df[which(df[,6] == "b"& df[,7] =="a"),]

But I am not sure how I can select two higher and two lower instances (esp that the first hit that matches the criteria has one higher instance). This is supposed to be basic but I couldn't figure out a good way to do it. Any ideas?

Thanks

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
Error404
  • 6,959
  • 16
  • 45
  • 58
  • Here the index I got was rows `2 and 6`. According to your condition, which other rows should be selected? – akrun Sep 11 '14 at 15:01
  • That is correct, 2 and 6 match the criteria. I am looking forward to select the succeeding two higher and two lower instances of each hit (if they exist). so the rows selected should be 1,2,3,4 (because 2 is a hit) and 4,5,6,7,8 (because 6 is a hit). The result will be the same dataframe with the rows 1 to 8. Does that make sense? – Error404 Sep 11 '14 at 15:05
  • Very closely related Q & A: http://stackoverflow.com/q/13155609/1270695 – A5C1D2H2I1M1N2O1R2T1 Sep 12 '14 at 04:11

2 Answers2

3

One way would be:

indx <-  which(df[,6] == "b"& df[,7] =="a")
indx1 <- unique(unlist(lapply(indx, function(x) c(seq(x-2,x), x, seq(x, x+2)))))
df[indx1,]
#     1   2    3    4    5    6   7  
#[1,] "1" "11" "21" "31" "41" "a" "a"
#[2,] "2" "12" "22" "32" "42" "b" "a"
#[3,] "3" "13" "23" "33" "43" "a" "a"
#[4,] "4" "14" "24" "34" "44" "a" "a"
#[5,] "5" "15" "25" "35" "45" "a" "a"
#[6,] "6" "16" "26" "36" "46" "b" "a"
#[7,] "7" "17" "27" "37" "47" "a" "a"
#[8,] "8" "18" "28" "38" "48" "a" "a"

Update

Thanks to @Ananda Mahto for finding the bug in the code and providing a shorter, compact code.

indx1 <- Filter(function(x) x > 0, unique(unlist(lapply(indx, "+", -2:2))))
df[indx1,]
akrun
  • 874,273
  • 37
  • 540
  • 662
2

I had written a function called getMyRows that is part of my GitHub-only "SOfun" package. In essence it's a generalization of @akrun's answer, and it's behavior is a little bit different--it results in a list (since the intended behavior I envisioned was to keep the relevant rows together).

With your data, usage and the relevant results would be:

library(SOfun)
getMyRows(df, which(df[, 6] == "b" & df[, 7] == "a"), range = -2:2)
# [[1]]
#      1   2    3    4    5    6   7  
# [1,] "1" "11" "21" "31" "41" "a" "a"
# [2,] "2" "12" "22" "32" "42" "b" "a"
# [3,] "3" "13" "23" "33" "43" "a" "a"
# [4,] "4" "14" "24" "34" "44" "a" "a"
# 
# [[2]]
#      1   2    3    4    5    6   7  
# [1,] "4" "14" "24" "34" "44" "a" "a"
# [2,] "5" "15" "25" "35" "45" "a" "a"
# [3,] "6" "16" "26" "36" "46" "b" "a"
# [4,] "7" "17" "27" "37" "47" "a" "a"
# [5,] "8" "18" "28" "38" "48" "a" "a"

A matter of note: The range argument should be written with : for what you want to do.


Install the package with:

library(devtools)
install_github("SOfun", "mrdwab")

(or your favorite method of installing packages from GitHub).

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • @akrun, thanks. I wrote a previous version of this function [as an answer](http://stackoverflow.com/a/13155669/1270695) almost 2 years ago... flodel's answer at that question can also be adapted to this question. – A5C1D2H2I1M1N2O1R2T1 Sep 12 '14 at 04:11