8

I have a vector with around 4000 values. I would just need to bin it into 60 equal intervals for which I would then have to calculate the median (for each of the bins).

v<-c(1:4000)

V is really just a vector. I read about cut but that needs me to specify the breakpoints. I just want 60 equal intervals

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user3419669
  • 293
  • 2
  • 4
  • 11

3 Answers3

18

Use cut and tapply:

> tapply(v, cut(v, 60), median)
          (-3,67.7]          (67.7,134]           (134,201]           (201,268] 
               34.0               101.0               167.5               234.0 
          (268,334]           (334,401]           (401,468]           (468,534] 
              301.0               367.5               434.0               501.0 
          (534,601]           (601,668]           (668,734]           (734,801] 
              567.5               634.0               701.0               767.5 
          (801,867]           (867,934]         (934,1e+03]    (1e+03,1.07e+03] 
              834.0               901.0               967.5              1034.0 
(1.07e+03,1.13e+03]  (1.13e+03,1.2e+03]  (1.2e+03,1.27e+03] (1.27e+03,1.33e+03] 
             1101.0              1167.5              1234.0              1301.0 
 (1.33e+03,1.4e+03]  (1.4e+03,1.47e+03] (1.47e+03,1.53e+03]  (1.53e+03,1.6e+03] 
             1367.5              1434.0              1500.5              1567.0 
 (1.6e+03,1.67e+03] (1.67e+03,1.73e+03]  (1.73e+03,1.8e+03]  (1.8e+03,1.87e+03] 
             1634.0              1700.5              1767.0              1834.0 
(1.87e+03,1.93e+03]    (1.93e+03,2e+03]    (2e+03,2.07e+03] (2.07e+03,2.13e+03] 
             1900.5              1967.0              2034.0              2100.5 
 (2.13e+03,2.2e+03]  (2.2e+03,2.27e+03] (2.27e+03,2.33e+03]  (2.33e+03,2.4e+03] 
             2167.0              2234.0              2300.5              2367.0 
 (2.4e+03,2.47e+03] (2.47e+03,2.53e+03]  (2.53e+03,2.6e+03]  (2.6e+03,2.67e+03] 
             2434.0              2500.5              2567.0              2634.0 
(2.67e+03,2.73e+03]  (2.73e+03,2.8e+03]  (2.8e+03,2.87e+03] (2.87e+03,2.93e+03] 
             2700.5              2767.0              2833.5              2900.0 
   (2.93e+03,3e+03]    (3e+03,3.07e+03] (3.07e+03,3.13e+03]  (3.13e+03,3.2e+03] 
             2967.0              3033.5              3100.0              3167.0 
 (3.2e+03,3.27e+03] (3.27e+03,3.33e+03]  (3.33e+03,3.4e+03]  (3.4e+03,3.47e+03] 
             3233.5              3300.0              3367.0              3433.5 
(3.47e+03,3.53e+03]  (3.53e+03,3.6e+03]  (3.6e+03,3.67e+03] (3.67e+03,3.73e+03] 
             3500.0              3567.0              3633.5              3700.0 
 (3.73e+03,3.8e+03]  (3.8e+03,3.87e+03] (3.87e+03,3.93e+03]    (3.93e+03,4e+03] 
             3767.0              3833.5              3900.0              3967.0
Thomas
  • 43,637
  • 12
  • 109
  • 140
4

In the past, i've used this function

evenbins <- function(x, bin.count=10, order=T) {
    bin.size <- rep(length(x) %/% bin.count, bin.count)
    bin.size <- bin.size + ifelse(1:bin.count <= length(x) %% bin.count, 1, 0)
    bin <- rep(1:bin.count, bin.size)
    if(order) {    
        bin <- bin[rank(x,ties.method="random")]
    }
    return(factor(bin, levels=1:bin.count, ordered=order))
}

and then i can run it with

v.bin <- evenbins(v, 60)

and check the sizes with

table(v.bin)

and see they all contain 66 or 67 elements. By default this will order the values just like cut will so each of the factor levels will have increasing values. If you want to bin them based on their original order,

v.bin <- evenbins(v, 60, order=F)

instead. This just split the data up in the order it appears

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • could you just briefly expand how I calculate the mean for each bin and then sore it in a seperate vector?? – user3419669 Jun 23 '14 at 06:53
  • If you have a factor of bins, you would use `v.bin.means <- tapply(v, v.bin, mean)` – MrFlick Jun 23 '14 at 06:54
  • i do this for two vectors and have the vectors with the means (after the tapply step). They both have equal length and I would like to use 1 vector as x axis . I tried plot(vec1,vec2, type="l") but then I get multiple lines... – user3419669 Jun 23 '14 at 08:02
  • @user3419669 That sounds like a different problem. If you have a new question, you should start a different thread so others can as as well. – MrFlick Jun 23 '14 at 12:58
2

This result shows the 59 median values of the break-points. The 60 bin values are probably as close to equal as possible (but probably not exactly equal).

> sq <- seq(1, 4000, length = 60)
> sapply(2:length(sq), function(i) median(c(sq[i-1], sq[i])))
# [1]   34.88983  102.66949  170.44915  238.22881  306.00847  373.78814
# [7]  441.56780  509.34746  577.12712  644.90678  712.68644  780.46610
#  ......

Actually, after checking, the bins are pretty darn close to being equal.

> unique(diff(sq))
# [1] 67.77966 67.77966 67.77966
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245