0

I'm trying to select 100 rows before and after a marker in a relatively large dataframe. The markers are sparse and for some reason I haven't been able to figure it out or find a solution - this doesn't seem like it should be that hard, so I'm probably missing something obvious.

Here's a very small simple example of what the data looks like:

timestamp talking_yn transition_yn
0.01      n          n
0.02      n          n
0.03      n          n
0.04      n          n
0.05      n          n
0.06      n          n
0.07      n          n
0.08      n          n
0.09      n          n
0.10      n          n
0.11      y          y
0.12      y          n
0.13      y          n
0.14      y          n
0.15      y          n
0.16      y          n
0.17      y          n
0.18      y          n

I've tried using different methods from a variety of answers (lag from zoo or dplyr), but they all focus on selecting one row or subsetting only those rows with the marker. For the dummy example data, how would I select the 5 rows before and after the transition == 'y' row?

Mik
  • 417
  • 6
  • 13

1 Answers1

2

I have a quick function for that:

#' Lead/Lag a logical
#'
#' @param lgl logical vector
#' @param bef integer, number of elements to lead by
#' @param aft integer, number of elements to lag by
#' @return logical, same length as 'lgl'
#' @export
leadlag <- function(lgl, bef = 1, aft = 1) {
  n <- length(lgl)
  bef <- min(n, max(0, bef))
  aft <- min(n, max(0, aft))
  befx <- if (bef > 0) sapply(seq_len(bef), function(b) c(tail(lgl, n = -b), rep(FALSE, b)))
  aftx <- if (aft > 0) sapply(seq_len(aft), function(a) c(rep(FALSE, a), head(lgl, n = -a)))
  rowSums(cbind(befx, lgl, aftx), na.rm = TRUE) > 0
}

dat[leadlag(dat$transition_yn == 'y', 2, 4),]
#    timestamp talking_yn transition_yn
# 9       0.09          n             n
# 10      0.10          n             n
# 11      0.11          y             y
# 12      0.12          y             n
# 13      0.13          y             n
# 14      0.14          y             n
# 15      0.15          y             n

Data

dat <- read.table(header=TRUE, stringsAsFactor=FALSE, text="
timestamp talking_yn transition_yn
0.01      n          n
0.02      n          n
0.03      n          n
0.04      n          n
0.05      n          n
0.06      n          n
0.07      n          n
0.08      n          n
0.09      n          n
0.10      n          n
0.11      y          y
0.12      y          n
0.13      y          n
0.14      y          n
0.15      y          n
0.16      y          n
0.17      y          n
0.18      y          n")
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • (This simple function fails when `length(lgl) < 2`. I might fix it at some point ...) – r2evans Nov 05 '19 at 18:29
  • 1
    Oh, good to know. I also noticed that it fails if only looking at before or after (so `dat[leadlag(dat$transition_yn == 'y', 2, 0),]` not important for my purposes and can just take out the the bef or aft components of the function to get them isolated - but if you're really interested in fully functionality it might be worth exploring. Thank you for this - super handy! – Mik Nov 05 '19 at 18:36
  • Mik, fixed, see the edits (minor). – r2evans Nov 05 '19 at 18:49
  • Very cool, thanks! – Mik Nov 05 '19 at 18:52