2

I have an interval from for example from 1 to 671. I would like to divide it into 5 random non-overlapping bins of length 50 but also spaced with min 51.

interval <- 1:671  (example, it does not need to be 671)

Result (this is an example as the bins should be random but within interval, equal length and spaced as defined):

bin1 <- 3:52
bin2 <- 103:152
bin3 <- 209:258
bin4 <- 425:474
bin5 <- 610:659

I would preferentially like the output to be a dataframe(bin, startOfbin, endOfbin), but other types like list would be also ok.

I am currently writing a function in R that would use this sampling for large number of intervals and I cannot come up with sensible solution. Thank you in advance.

fattel
  • 133
  • 6
  • You say `bins of length 50 but also spaced with min 51`, which is it? Why doesn't the first bin start at 1? – user2974951 Sep 11 '19 at 12:52
  • I mean each bin is of length 50 (for example 1,2,3,4....50), but the next bin should start not earlier than 101 but not necessarily at 101 but somewhere randomly. Yes, the first bin can start at 1, but again it does not have to be at 1. – fattel Sep 11 '19 at 12:54
  • This won't be easy. Does it have to be completely random? – user2974951 Sep 11 '19 at 13:46

3 Answers3

3

If I understand your problem correctly you want 5 parts of your interval with length 50 and minimal distance of 51.

So your randomness is in how much bigger each distance is than 51.

This means you calculate how much space there really is to distribute.

intervalLength <- 671
nBins <- 5
binWidth <- 50
binMinDistance <- 51

spaceToDistribute <- intervalLength - (nBins * binWidth + (nBins - 1) * binMinDistance)

calculate a random splitting of this value

distances <- diff(floor(c(0, sort(runif(nBins))) * spaceToDistribute))

and construct your desired data.frame

startOfBin <- cumsum(distances) + (0:(nBins-1)) * 101
result <- data.frame(bin = 1:nBins, startOfBin = startOfBin, endOfBin = startOfBin + 49)
T. Ewen
  • 126
  • 4
  • It works well thank you very much. Only there is tiny error in this line, there should be startOfBin not startOfBins: result <- data.frame(bin = 1:nBins, startOfBin = startOfBin, endOfBin = startOfBins + 49) – fattel Sep 11 '19 at 14:53
1

I don't know if this has the desired kind of randomness:

interval <- 1:671 

set.seed(42)

repeat { #rejection sampling
  int <- list(interval)
  s <- integer(5) * NA

  for (i in 1:5) {
    #sample an interval from the list
    sel <- sample(length(int), 1)
    isel <- int[[sel]]

    #sample start value
    s[[i]] <- sample(head(isel,-49), 1)

    #remove sampled values from interval
    sp <-
      split(isel, findInterval(isel, c(0, s[[i]], s[[i]] + 50, Inf)))
    if (s[[i]] > isel[1] &&
        s[[i]] < length(isel) - 49)
      sp <- sp[-2]
    else
      if (s[[i]] == isel[1])
        sp <- sp[-1]
    else
      if (s[[i]] == length(isel) - 49)
        sp <- head(sp,-1)
    sp <- sp[lengths(sp) >= 50]
    int <- c(sp, int[-sel])

    #break out of for loop 
    #if not enough intervals of sufficient length left
    if (length(int) < 1) break
  }
  if (!anyNA(s)) break
}

s
#[1] 321  74 245 170 441

library(ggplot2)
ggplot(data.frame(s = s, e = s + 49), aes(x = s, xend = e, y = 0, yend = 0)) +
  geom_segment(size = 3) +
  theme_minimal() +
  theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        panel.grid.major.y = element_blank()) +
  xlab("") + ylab("")

resulting plot

Roland
  • 127,288
  • 10
  • 191
  • 288
0

Something like this could work:

set.seed(111)

n_bins <- 5
bl <- 50
spacing <- 51

start <- 1
end <- 671


end_int <- end - n_bins*bl - (n_bins-1)*spacing
first_bin_start <- sample(start:end_int, 1)
first_bin_end <- first_bin_start + bl
avail_spacing <- end - first_bin_end - (n_bins-1)*bl - (n_bins-1)*spacing

sp <- c()
for (i in 1:(n_bins-1)){
  end <- sample(1:avail_spacing, 1)
  sp <- c(sp, end)
  avail_spacing <- avail_spacing - end
}


bin_start <- c(first_bin_start, first_bin_start + cumsum(spacing + bl + sp))
bin_end <- bin_start + bl

df <- data.frame(bin = 1:n_bins,
                 bin_start = bin_start,
                 bin_end = bin_end)

df
slava-kohut
  • 4,203
  • 1
  • 7
  • 24