From the cut2
helpfile:
Function like cut but left endpoints are inclusive and labels are of
the form [lower, upper), except that last interval is [lower,upper].
If cuts are given, will by default make sure that cuts include entire
range of x.
So, cut2
is basically cut
with a few different defaults. Let's look at cut
then.
From the cut
helpfile:
cut divides the range of x into intervals and codes the values in x
according to which interval they fall. The leftmost interval
corresponds to level one, the next leftmost to level two and so on.
From the quantile
helpfile:
The generic function quantile produces sample quantiles corresponding
to the given probabilities. The smallest observation corresponds to a
probability of 0 and the largest to a probability of 1.
One cuts the range of x
, the other cuts the "frequency" of x
.
An illustration:
out <- 0:100
out2 <- c(seq(0, 50, 0.001), 51:100)
Both have the same range. From 0 to 100.
levels(cut(out,4, include.lowest = T))
[1] "[-0.1,25]" "(25,50]" "(50,75]" "(75,100]"
levels(cut(out2,4, include.lowest = T))
[1] "[-0.1,25]" "(25,50]" "(50,75]" "(75,100]"
But there are many more "datapoints" living in out2
, in particular for values between 0 and 50. Therefore, they do not have the same frequencies along the range:
quantile(out)
0% 25% 50% 75% 100%
0 25 50 75 100
quantile(out2)
0% 25% 50% 75% 100%
0.0000 12.5125 25.0250 37.5375 100.0000
This is the difference between cut
and quantile
.
The above example also shows you when both agree, namely in the case of uniform distributions. The sequence from 0 to 100, for instance, is evenly distributed on the range from 0 to 100. Here, both are basically identical.
To illustrate even further, consider:
outdf <- data.frame(out=out, cut=cut(out,4, include.lowest = T))
out2df <- data.frame(out=out2, cut=cut(out2,4, include.lowest = T))
table(outdf$cut)
[-0.1,25] (25,50] (50,75] (75,100]
26 25 25 25
table(out2df$cut)
[-0.1,25] (25,50] (50,75] (75,100]
25001 25000 25 25
Here, you clearly see the different frequencies in each bin.