0

I tried to classify my grouped data into quartiles, therefore adding a column diam_quart to the data frame z assigning each row one of the four classes 1, 2, 3, or 4:

quart = ddply(z, .(Code), transform,
    diam_quart =
    ifelse(Diameter <= quantile(Diameter , 0.25), 1, 
    ifelse(Diameter <= quantile(Diameter , 0.5), 2, 
    ifelse(Diameter <= quantile(Diameter , 0.75), 3, 4)))
)

However, if I check the results, I don't get equal frequencies in each quartile class. I used this to check:

x <- ddply(quart, .(Code, diam_quart), summarize, 
    Diam_quartile = mean(diam_quart),
    Frequency = length(Diameter))
x

Shouldn't each quartile class by definition contain the same number of rows? If this is true, could it be that the quartile function refers to the overall data set and not to each subset (defined by "code" in the ddply function)? Not sure where my logic is failing. Light anyone?

Here's the result that I get with my original data frame:

head(x, 16)
      Code Diam_quartile Frequency
1  T1iOgP1             1        26
2  T1iOgP1             2        22
3  T1iOgP1             3        21
4  T1iOgP1             4        23
5  T1iOgP2             1        11
6  T1iOgP2             2        12
7  T1iOgP2             3        10
8  T1iOgP2             4        11
9  T1iOgP3             1         5
10 T1iOgP3             2         5
11 T1iOgP3             3         4
12 T1iOgP3             4         5
13 T1iRgP1             1        15
14 T1iRgP1             2         9
15 T1iRgP1             3        10
16 T1iRgP1             4        12

EDIT: That's the first 200 rows of the relevant columns of my data frame:

dput(z)
structure(list(Code = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L), .Label = c("T1iOgP1", "T1iOgP2", "T1iOgP3", "T1iRgP1", "T1iRgP2", 
"T1iRgP3", "T1iRtP1", "T1iRtP2", "T1iRtP3", "T1rOgP1", "T1rOgP2", 
"T1rOgP3", "T1rRgP1", "T1rRgP2", "T1rRgP3", "T1rRtP1", "T1rRtP2", 
"T1rRtP3", "T1sOgP1", "T1sOgP2", "T1sOgP3", "T1sRgP1", "T1sRgP2", 
"T1sRgP3", "T1sRtP1", "T1sRtP2", "T1sRtP3", "T2iOgP1", "T2iOgP2", 
"T2iOgP3", "T2iRgP1", "T2iRgP2", "T2iRgP3", "T2iRtP1", "T2iRtP2", 
"T2iRtP3", "T2rOgP1", "T2rOgP2", "T2rOgP3", "T2rRgP1", "T2rRgP2", 
"T2rRgP3", "T2rRtP1", "T2rRtP2", "T2rRtP3", "T2sOgP1", "T2sOgP2", 
"T2sOgP3", "T2sRgP1", "T2sRgP2", "T2sRgP3", "T2sRtP1", "T2sRtP2", 
"T2sRtP3"), class = "factor"), Diameter = c(3.819718634, 2.705634033, 
2.705634033, 3.978873577, 5.092958179, 7.957747155, 2.228169203, 
1.114084602, 4.933803236, 7.480282325, 8.435211984, 7.639437268, 
2.228169203, 2.387324146, 2.06901426, 10.50422624, 8.435211984, 
4.456338407, 3.819718634, 2.228169203, 6.843662553, 4.13802852, 
2.546479089, 4.965634224, 14.48309982, 13.36901522, 2.06901426, 
2.06901426, 1.591549431, 2.705634033, 2.228169203, 2.546479089, 
2.228169203, 1.909859317, 2.387324146, 7.480282325, 1.909859317, 
3.183098862, 4.774648293, 9.390141642, 7.002817496, 7.480282325, 
4.456338407, 3.342253805, 10.50422624, 12.89155039, 3.819718634, 
8.435211984, 1.750704374, 11.30000096, 3.660563691, 3.501408748, 
1.750704374, 1.591549431, 10.66338119, 3.501408748, 1.273239545, 
2.228169203, 11.93662073, 3.183098862, 3.501408748, 1.750704374, 
1.591549431, 1.273239545, 1.750704374, 12.09577567, 3.978873577, 
2.705634033, 2.228169203, 3.501408748, 3.183098862, 1.432394488, 
10.66338119, 1.432394488, 1.750704374, 2.228169203, 1.591549431, 
1.432394488, 2.546479089, 2.387324146, 1.114084602, 2.546479089, 
3.342253805, 3.978873577, 1.273239545, 1.273239545, 4.61549335, 
4.13802852, 0.795774715, 7.798592212, 1.273239545, 2.06901426, 
4.297183463, 4.297183463, 24.98732607, 6.207042781, 7.957747155, 
3.023943919, 1.432394488, 5.252113122, 7.002817496, 3.819718634, 
5.729577951, 18.97126922, 20.21267777, 3.978873577, 2.864788976, 
1.750704374, 10.66338119, 6.366197724, 19.73521294, 5.729577951, 
3.023943919, 12.41408556, 3.501408748, 21.16760743, 10.50422624, 
2.228169203, 9.071831756, 11.77746579, 8.435211984, 6.207042781, 
30.39859413, 8.912676813, 6.525352667, 1.909859317, 2.705634033, 
20.37183272, 3.501408748, 5.888732894, 14.32394488, 7.321127382, 
7.321127382, 3.023943919, 2.546479089, 3.342253805, 5.888732894, 
2.06901426, 1.782535363, 4.965634224, 5.092958179, 14.32394488, 
10.66338119, 16.55211408, 5.570423008, 2.228169203, 10.3450713, 
2.864788976, 10.18591636, 4.456338407, 8.75352187, 6.68450761, 
8.594366927, 1.909859317, 19.89436789, 1.591549431, 1.432394488, 
1.750704374, 1.273239545, 1.273239545, 1.909859317, 2.546479089, 
0.954929659, 2.705634033, 2.06901426, 0.954929659, 1.114084602, 
1.273239545, 1.273239545, 1.273239545, 1.273239545, 1.909859317, 
1.432394488, 1.273239545, 1.273239545, 1.909859317, 1.750704374, 
5.252113122, 1.273239545, 3.501408748, 2.546479089, 7.161972439, 
2.228169203, 1.909859317, 2.387324146, 4.456338407, 1.591549431, 
3.501408748, 1.273239545, 1.750704374, 1.909859317, 2.705634033, 
3.342253805, 1.909859317, 1.750704374, 2.06901426, 2.228169203, 
2.546479089, 1.273239545, 1.750704374), Diam_quartile = c(3, 
2, 2, 3, 4, 4, 2, 1, 3, 4, 4, 4, 2, 2, 1, 4, 4, 3, 3, 2, 4, 3, 
2, 3, 4, 4, 1, 1, 1, 2, 2, 2, 2, 1, 2, 4, 1, 2, 3, 4, 4, 4, 3, 
3, 4, 4, 3, 4, 1, 4, 3, 3, 1, 1, 4, 3, 1, 2, 4, 2, 3, 1, 1, 1, 
1, 4, 3, 2, 2, 3, 2, 1, 4, 1, 1, 2, 1, 1, 2, 2, 1, 2, 3, 3, 1, 
1, 3, 3, 1, 4, 1, 1, 2, 2, 4, 2, 3, 1, 1, 2, 3, 2, 2, 4, 4, 2, 
1, 1, 4, 3, 4, 2, 1, 4, 2, 4, 3, 1, 3, 4, 3, 2, 4, 3, 3, 1, 1, 
4, 2, 2, 4, 3, 3, 1, 1, 1, 2, 1, 1, 2, 2, 4, 4, 4, 2, 1, 4, 1, 
3, 2, 3, 3, 3, 1, 4, 2, 2, 2, 1, 1, 3, 4, 1, 4, 3, 1, 1, 1, 1, 
1, 1, 3, 2, 1, 1, 3, 2, 4, 1, 4, 4, 4, 3, 3, 4, 4, 2, 4, 1, 2, 
3, 4, 4, 3, 2, 3, 3, 4, 1, 2)), .Names = c("Code", "Diameter", 
"Diam_quartile"), row.names = c(NA, 200L), class = "data.frame")

EDIT II: Full result of x:

dput(x)
structure(list(Code = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 
10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 
13L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 
17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 20L, 
20L, 20L, 20L, 21L, 21L, 21L, 21L, 22L, 22L, 22L, 22L, 23L, 23L, 
23L, 23L, 24L, 24L, 24L, 24L, 25L, 25L, 25L, 25L, 26L, 26L, 26L, 
26L, 27L, 27L, 27L, 27L, 28L, 28L, 28L, 28L, 29L, 29L, 29L, 29L, 
30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L, 32L, 32L, 32L, 32L, 33L, 
33L, 33L, 33L, 34L, 34L, 34L, 34L, 35L, 35L, 35L, 35L, 36L, 36L, 
36L, 36L, 37L, 37L, 37L, 37L, 38L, 38L, 38L, 38L, 39L, 39L, 39L, 
39L, 40L, 40L, 40L, 40L, 41L, 41L, 41L, 41L, 42L, 42L, 42L, 42L, 
43L, 43L, 43L, 43L, 44L, 44L, 44L, 44L, 45L, 45L, 45L, 45L, 46L, 
46L, 46L, 46L, 47L, 47L, 47L, 47L, 48L, 48L, 48L, 48L, 49L, 49L, 
49L, 49L, 50L, 50L, 50L, 50L, 51L, 51L, 51L, 51L, 52L, 52L, 52L, 
52L, 53L, 53L, 53L, 53L, 54L, 54L, 54L, 54L), .Label = c("T1iOgP1", 
"T1iOgP2", "T1iOgP3", "T1iRgP1", "T1iRgP2", "T1iRgP3", "T1iRtP1", 
"T1iRtP2", "T1iRtP3", "T1rOgP1", "T1rOgP2", "T1rOgP3", "T1rRgP1", 
"T1rRgP2", "T1rRgP3", "T1rRtP1", "T1rRtP2", "T1rRtP3", "T1sOgP1", 
"T1sOgP2", "T1sOgP3", "T1sRgP1", "T1sRgP2", "T1sRgP3", "T1sRtP1", 
"T1sRtP2", "T1sRtP3", "T2iOgP1", "T2iOgP2", "T2iOgP3", "T2iRgP1", 
"T2iRgP2", "T2iRgP3", "T2iRtP1", "T2iRtP2", "T2iRtP3", "T2rOgP1", 
"T2rOgP2", "T2rOgP3", "T2rRgP1", "T2rRgP2", "T2rRgP3", "T2rRtP1", 
"T2rRtP2", "T2rRtP3", "T2sOgP1", "T2sOgP2", "T2sOgP3", "T2sRgP1", 
"T2sRgP2", "T2sRgP3", "T2sRtP1", "T2sRtP2", "T2sRtP3"), class = "factor"), 
    Diam_quartile = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 
    2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
    1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
    4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
    3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 
    2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
    1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
    4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
    3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 
    2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
    1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
    4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), Frequency = c(26, 
    22, 21, 23, 11, 12, 10, 11, 5, 5, 4, 5, 15, 9, 10, 12, 18, 
    16, 13, 14, **28, 22, 17, 22**, 21, 17, 17, 19, 14, 12, 12, 13, 
    22, 14, 19, 16, 9, 9, 8, 9, 6, 6, 5, 6, 9, 8, 8, 9, 13, 8, 
    12, 9, 11, 11, 11, 11, 13, 13, 12, 13, 30, 19, 20, 17, 19, 
    14, 15, 15, 25, 23, 19, 22, 14, 16, 12, 14, 12, 11, 11, 11, 
    10, 9, 9, 9, 17, 7, 12, 12, 36, 13, 23, 20, 24, 15, 15, 18, 
    33, 30, 31, 30, 28, 30, 26, 27, 37, 30, 31, 33, 24, 20, 21, 
    22, 12, 6, 9, 9, 6, 5, 5, 6, 13, 13, 9, 12, 18, 14, 16, 16, 
    22, 21, 15, 20, 13, 12, 8, 11, 11, 11, 7, 10, 8, 7, 7, 7, 
    9, 8, 8, 9, 9, 8, 8, 8, 15, 15, 14, 13, 11, 3, 7, 7, 11, 
    11, 11, 11, 14, 12, 12, 13, 31, 40, 19, 25, 25, 23, 21, 21, 
    30, 18, 25, 21, 22, 9, 15, 16, 10, 9, 10, 8, 12, 13, 11, 
    12, 15, 13, 13, 14, 19, 16, 9, 15, 14, 9, 8, 10, 20, 18, 
    17, 19, 48, 35, 39, 39, 23, 22, 22, 22)), .Names = c("Code", 
"Diam_quartile", "Frequency"), row.names = c(NA, 216L), class = "data.frame")

Comparison of the results of the first calculation of x with a second one with the <= operator replaced by <:

    quart = ddply(dataframe, .(Code), transform,
    diam_quart =
        ifelse(Diameter < quantile(Diameter , 0.25), 1, 
        ifelse(Diameter < quantile(Diameter , 0.5), 2, 
        ifelse(Diameter < quantile(Diameter , 0.75), 3, 4)))

Whereby Diam_quartile is the result of the calculation with <= and diam_quart with < only

dput(x)
structure(list(Code = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 
10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 
13L, 14L, 14L, 14L, 14L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 
17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 19L, 19L, 19L, 19L, 20L, 
20L, 20L, 20L, 21L, 21L, 21L, 21L, 22L, 22L, 22L, 22L, 23L, 23L, 
23L, 23L, 24L, 24L, 24L, 24L, 25L, 25L, 25L, 25L, 26L, 26L, 26L, 
26L, 27L, 27L, 27L, 27L, 28L, 28L, 28L, 28L, 29L, 29L, 29L, 29L, 
30L, 30L, 30L, 30L, 31L, 31L, 31L, 31L, 32L, 32L, 32L, 32L, 33L, 
33L, 33L, 33L, 34L, 34L, 34L, 34L, 35L, 35L, 35L, 35L, 36L, 36L, 
36L, 36L, 37L, 37L, 37L, 37L, 38L, 38L, 38L, 38L, 39L, 39L, 39L, 
39L, 40L, 40L, 40L, 40L, 41L, 41L, 41L, 41L, 42L, 42L, 42L, 42L, 
43L, 43L, 43L, 43L, 44L, 44L, 44L, 44L, 45L, 45L, 45L, 45L, 46L, 
46L, 46L, 46L, 47L, 47L, 47L, 47L, 48L, 48L, 48L, 48L, 49L, 49L, 
49L, 49L, 50L, 50L, 50L, 50L, 51L, 51L, 51L, 51L, 52L, 52L, 52L, 
52L, 53L, 53L, 53L, 53L, 54L, 54L, 54L, 54L), .Label = c("T1iOgP1", 
"T1iOgP2", "T1iOgP3", "T1iRgP1", "T1iRgP2", "T1iRgP3", "T1iRtP1", 
"T1iRtP2", "T1iRtP3", "T1rOgP1", "T1rOgP2", "T1rOgP3", "T1rRgP1", 
"T1rRgP2", "T1rRgP3", "T1rRtP1", "T1rRtP2", "T1rRtP3", "T1sOgP1", 
"T1sOgP2", "T1sOgP3", "T1sRgP1", "T1sRgP2", "T1sRgP3", "T1sRtP1", 
"T1sRtP2", "T1sRtP3", "T2iOgP1", "T2iOgP2", "T2iOgP3", "T2iRgP1", 
"T2iRgP2", "T2iRgP3", "T2iRtP1", "T2iRtP2", "T2iRtP3", "T2rOgP1", 
"T2rOgP2", "T2rOgP3", "T2rRgP1", "T2rRgP2", "T2rRgP3", "T2rRtP1", 
"T2rRtP2", "T2rRtP3", "T2sOgP1", "T2sOgP2", "T2sOgP3", "T2sRgP1", 
"T2sRgP2", "T2sRgP3", "T2sRtP1", "T2sRtP2", "T2sRtP3"), class = "factor"), 
    Diam_quartile = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 
    2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
    1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
    4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
    3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 
    2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
    1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
    4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
    3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 
    2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
    1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
    4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), Frequency = c(26, 
    22, 21, 23, 11, 12, 10, 11, 5, 5, 4, 5, 15, 9, 10, 12, 18, 
    16, 13, 14, 28, 22, 17, 22, 21, 17, 17, 19, 14, 12, 12, 13, 
    22, 14, 19, 16, 9, 9, 8, 9, 6, 6, 5, 6, 9, 8, 8, 9, 13, 8, 
    12, 9, 11, 11, 11, 11, 13, 13, 12, 13, 30, 19, 20, 17, 19, 
    14, 15, 15, 25, 23, 19, 22, 14, 16, 12, 14, 12, 11, 11, 11, 
    10, 9, 9, 9, 17, 7, 12, 12, 36, 13, 23, 20, 24, 15, 15, 18, 
    33, 30, 31, 30, 28, 30, 26, 27, 37, 30, 31, 33, 24, 20, 21, 
    22, 12, 6, 9, 9, 6, 5, 5, 6, 13, 13, 9, 12, 18, 14, 16, 16, 
    22, 21, 15, 20, 13, 12, 8, 11, 11, 11, 7, 10, 8, 7, 7, 7, 
    9, 8, 8, 9, 9, 8, 8, 8, 15, 15, 14, 13, 11, 3, 7, 7, 11, 
    11, 11, 11, 14, 12, 12, 13, 31, 40, 19, 25, 25, 23, 21, 21, 
    30, 18, 25, 21, 22, 9, 15, 16, 10, 9, 10, 8, 12, 13, 11, 
    12, 15, 13, 13, 14, 19, 16, 9, 15, 14, 9, 8, 10, 20, 18, 
    17, 19, 48, 35, 39, 39, 23, 22, 22, 22), diam_quart = c(1, 
    2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
    1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
    4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
    3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 
    2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
    1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
    4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
    3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 
    2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 
    1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 
    4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 
    3, 4, 1, 2, 3, 4), Frequency.1 = c(22, 23, 24, 23, 11, 10, 
    12, 11, 5, 4, 5, 5, 4, 15, 15, 12, 13, 16, 14, 18, **22, 20, 
    17, 30,** 10, 22, 23, 19, 11, 12, 15, 13, 12, 19, 19, 21, 9, 
    8, 9, 9, 6, 5, 6, 6, 9, 8, 8, 9, 10, 11, 8, 13, 11, 11, 11, 
    11, 13, 11, 14, 13, 21, 19, 17, 29, 12, 19, 15, 17, 17, 26, 
    21, 25, 14, 11, 17, 14, 11, 11, 11, 12, 9, 9, 9, 10, 10, 
    14, 12, 12, 21, 15, 32, 24, 15, 20, 19, 18, 26, 28, 32, 38, 
    28, 24, 26, 33, 26, 32, 40, 33, 17, 25, 23, 22, 8, 10, 9, 
    9, 6, 5, 5, 6, 11, 9, 15, 12, 9, 23, 16, 16, 13, 15, 30, 
    20, 10, 11, 12, 11, 9, 10, 10, 10, 3, 11, 7, 8, 9, 8, 8, 
    9, 8, 8, 8, 9, 13, 12, 17, 15, 5, 9, 7, 7, 11, 11, 11, 11, 
    12, 13, 13, 13, 18, 38, 22, 37, 12, 30, 24, 24, 22, 19, 25, 
    28, 15, 16, 15, 16, 8, 10, 9, 10, 12, 11, 13, 12, 12, 15, 
    14, 14, 14, 9, 21, 15, 9, 10, 9, 13, 15, 21, 19, 19, 38, 
    35, 45, 43, 22, 19, 23, 25)), .Names = c("Code", "Diam_quartile", 
"Frequency", "diam_quart", "Frequency.1"), row.names = c(NA, 
216L), class = "data.frame")
989
  • 12,579
  • 5
  • 31
  • 53
Brazza
  • 47
  • 6
  • 1
    Would you be able to use `dput(yourdataname)`? If you type that in your R Console, you will see some texts in the Console. That is your data frame. You can copy and paste it to your question. – jazzurro Mar 09 '15 at 02:51
  • It has 3348 rows, I'll shorten it, but it'll be a lot of data.. – Brazza Mar 09 '15 at 04:11

2 Answers2

1

I don't see anything wrong with your computation. Each quartile contains same number of elements only when you have a very large data set. Below I extracted Diameter for the first group.

table(findInterval(diameter, quantile(diameter, c(.25, .5, .75))))    
# 0  1  2  3 
#22 23 24 23 
sum(diameter < quantile(diameter, .25))
#[1] 22
sum(diameter <=quantile(diameter, .25))
#[1] 26

You'll get closer figures if you use < instead of <= in ifelse statements.

diameter <- c(3.819718634, 2.705634033, 2.705634033, 3.978873577, 5.092958179, 
7.957747155, 2.228169203, 1.114084602, 4.933803236, 7.480282325, 
8.435211984, 7.639437268, 2.228169203, 2.387324146, 2.06901426, 
10.50422624, 8.435211984, 4.456338407, 3.819718634, 2.228169203, 
6.843662553, 4.13802852, 2.546479089, 4.965634224, 14.48309982, 
13.36901522, 2.06901426, 2.06901426, 1.591549431, 2.705634033, 
2.228169203, 2.546479089, 2.228169203, 1.909859317, 2.387324146, 
7.480282325, 1.909859317, 3.183098862, 4.774648293, 9.390141642, 
7.002817496, 7.480282325, 4.456338407, 3.342253805, 10.50422624, 
12.89155039, 3.819718634, 8.435211984, 1.750704374, 11.30000096, 
3.660563691, 3.501408748, 1.750704374, 1.591549431, 10.66338119, 
3.501408748, 1.273239545, 2.228169203, 11.93662073, 3.183098862, 
3.501408748, 1.750704374, 1.591549431, 1.273239545, 1.750704374, 
12.09577567, 3.978873577, 2.705634033, 2.228169203, 3.501408748, 
3.183098862, 1.432394488, 10.66338119, 1.432394488, 1.750704374, 
2.228169203, 1.591549431, 1.432394488, 2.546479089, 2.387324146, 
1.114084602, 2.546479089, 3.342253805, 3.978873577, 1.273239545, 
1.273239545, 4.61549335, 4.13802852, 0.795774715, 7.798592212, 
1.273239545, 2.06901426)
ExperimenteR
  • 4,453
  • 1
  • 15
  • 19
  • I just didn't expect a difference of more than 2, there are difference like 9-15, 17-28, that seems a lot. So I did the same calculation with "<" only as you suggested and it changes quite a bit! I'll add the results to the post for comparison. Would you mind explaining what is the computational difference between "<" and "<=" ? – Brazza Mar 09 '15 at 12:55
  • 1
    `which(diameter == quantile(diameter, .25))` shows that there are 4 observations equal to 25% quantile point. Large discrepancies shows that there are many values tied at those quantile points. – ExperimenteR Mar 09 '15 at 13:05
  • Comparing the results I also noticed, that it didn't get "closer" to the actual quartile, just changed the proportion. E.g. what was the set "28, 22, 17, 22" is now "22, 20, 17, 30", with actually an even greater difference. Given the small size of the data subsets is that really the closest I get to the actual quartiles? – Brazza Mar 09 '15 at 13:09
  • Thanks ExperiementeR, that makes a lot of sense! That kind of gives me a reply to my other question, I'll have to live with those discrepancies, right? – Brazza Mar 09 '15 at 13:11
  • 1
    Take for example 6 observations `x <- c(1,5,5,5,5,5,10)`, 50% quantile is 5. `sum(x<5)=1`, `sum(x<=5)=6`. This can be radically different from the numbers suggested by the quantile if there are many ties. – ExperimenteR Mar 09 '15 at 13:11
  • Totally cleared it up. So it's not wrong, just have to decide where to put the ties. Thanks a lot! – Brazza Mar 09 '15 at 13:35
0

Just wanted to summarize what ExperimenteR helped me to understand: The calculation was correct. The large differences in numbers per quartile group result from small data subsets and relatively large proportions of data directly at the quartile limits. Depending on whether they are included into the lower or the higher quartile proportion, numbers can change significantly. See his comments for more detail. Thanks!

Brazza
  • 47
  • 6