How to randomly divide interval into non overlapping, spaced bins of equal length

Question

I have an interval from for example from 1 to 671. I would like to divide it into 5 random non-overlapping bins of length 50 but also spaced with min 51.

interval <- 1:671  (example, it does not need to be 671)

Result (this is an example as the bins should be random but within interval, equal length and spaced as defined):

bin1 <- 3:52
bin2 <- 103:152
bin3 <- 209:258
bin4 <- 425:474
bin5 <- 610:659

I would preferentially like the output to be a dataframe(bin, startOfbin, endOfbin), but other types like list would be also ok.

I am currently writing a function in R that would use this sampling for large number of intervals and I cannot come up with sensible solution. Thank you in advance.

You say `bins of length 50 but also spaced with min 51`, which is it? Why doesn't the first bin start at 1? — user2974951, Sep 11 '19 at 12:52
I mean each bin is of length 50 (for example 1,2,3,4....50), but the next bin should start not earlier than 101 but not necessarily at 101 but somewhere randomly. Yes, the first bin can start at 1, but again it does not have to be at 1. — fattel, Sep 11 '19 at 12:54

T. Ewen · Accepted Answer · 2019-09-12T12:13:47.870

If I understand your problem correctly you want 5 parts of your interval with length 50 and minimal distance of 51.

So your randomness is in how much bigger each distance is than 51.

This means you calculate how much space there really is to distribute.

intervalLength <- 671
nBins <- 5
binWidth <- 50
binMinDistance <- 51

spaceToDistribute <- intervalLength - (nBins * binWidth + (nBins - 1) * binMinDistance)

calculate a random splitting of this value

distances <- diff(floor(c(0, sort(runif(nBins))) * spaceToDistribute))

and construct your desired data.frame

startOfBin <- cumsum(distances) + (0:(nBins-1)) * 101
result <- data.frame(bin = 1:nBins, startOfBin = startOfBin, endOfBin = startOfBin + 49)

It works well thank you very much. Only there is tiny error in this line, there should be startOfBin not startOfBins: result <- data.frame(bin = 1:nBins, startOfBin = startOfBin, endOfBin = startOfBins + 49) — fattel, Sep 11 '19 at 14:53

score 1 · Answer 2 · answered Sep 11 '19 at 13:50

I don't know if this has the desired kind of randomness:

interval <- 1:671 

set.seed(42)

repeat { #rejection sampling
  int <- list(interval)
  s <- integer(5) * NA

  for (i in 1:5) {
    #sample an interval from the list
    sel <- sample(length(int), 1)
    isel <- int[[sel]]

    #sample start value
    s[[i]] <- sample(head(isel,-49), 1)

    #remove sampled values from interval
    sp <-
      split(isel, findInterval(isel, c(0, s[[i]], s[[i]] + 50, Inf)))
    if (s[[i]] > isel[1] &&
        s[[i]] < length(isel) - 49)
      sp <- sp[-2]
    else
      if (s[[i]] == isel[1])
        sp <- sp[-1]
    else
      if (s[[i]] == length(isel) - 49)
        sp <- head(sp,-1)
    sp <- sp[lengths(sp) >= 50]
    int <- c(sp, int[-sel])

    #break out of for loop 
    #if not enough intervals of sufficient length left
    if (length(int) < 1) break
  }
  if (!anyNA(s)) break
}

s
#[1] 321  74 245 170 441

library(ggplot2)
ggplot(data.frame(s = s, e = s + 49), aes(x = s, xend = e, y = 0, yend = 0)) +
  geom_segment(size = 3) +
  theme_minimal() +
  theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        panel.grid.major.y = element_blank()) +
  xlab("") + ylab("")

Yes, it is kind of randomness I looked for. It works well thank you — fattel, Sep 11 '19 at 14:52

score 0 · Answer 3 · answered Sep 11 '19 at 14:56

Something like this could work:

set.seed(111)

n_bins <- 5
bl <- 50
spacing <- 51

start <- 1
end <- 671


end_int <- end - n_bins*bl - (n_bins-1)*spacing
first_bin_start <- sample(start:end_int, 1)
first_bin_end <- first_bin_start + bl
avail_spacing <- end - first_bin_end - (n_bins-1)*bl - (n_bins-1)*spacing

sp <- c()
for (i in 1:(n_bins-1)){
  end <- sample(1:avail_spacing, 1)
  sp <- c(sp, end)
  avail_spacing <- avail_spacing - end
}


bin_start <- c(first_bin_start, first_bin_start + cumsum(spacing + bl + sp))
bin_end <- bin_start + bl

df <- data.frame(bin = 1:n_bins,
                 bin_start = bin_start,
                 bin_end = bin_end)

df

How to randomly divide interval into non overlapping, spaced bins of equal length

3 Answers3