0

I am processing records from a large dataset with varying lengths using data.table[, somefunc(someseries), by=]. The length L of each record someseries could be anything from 1 to 50. I want to handle the following efficiently without needlessly adding an if expression:

For each group, I want the simplest way to access its middle entries someseries[3:(L-2)]

Problem: beware that when L<5, the expression someseries[3:(L-2)] actually misbehaves by inferring backwards direction. This is due to the default "helpful" behavior of [from:to] which uses seq(from..., to..., by = ((to - from)/(length.out - 1) ...) i.e. infers backwards direction by=-1

In that case I just want somefunc to get passed an empty vector() not someseries[4:2]

But you can't explicitly do seq(... by=1) because that errors if from > to.

Here's a testcase:

set.seed(15)
ragged_arrays <- lapply(ceiling(runif(5,1,5)), function(n) (1:n) )
# indexing with unwanted auto-backwards
lapply(ragged_arrays, function(someseries) someseries[2 : (length(someseries)-2)] )

For the sake of our testcase, somefunc is a function which behaves gracefully when passed an empty vector, e.g. median()

smci
  • 32,567
  • 20
  • 113
  • 146

1 Answers1

4

I'm assuming you want to drop the first two and last two elements.

ragged_arrays <- lapply(1:7, seq_len)
lapply(ragged_arrays, function(x) x[seq_along(x) > 2 & rev(seq_along(x)) > 2])
jennybryan
  • 2,606
  • 2
  • 18
  • 33