Inconsistent ddply multiple quantiles by group

Question

I am trying to use ddply to summarize median and 25th/75th precentiles of multiple groups in a relatively small data set. I am grouping by DoseWt the measured datapoints AUC_INFobs and Cmax. (Using R 4.0.4 in RStudio 1.3.1093 on Windows 10) Although the results for AUCINF_obs agree whether calculated by line (for DoseWt==0.3) vs ddply & summarize, that is not the case with my data for Cmax:

median(NCAtrim$Cmax[NCAtrim$DoseWt==0.3])
quantile(NCAtrim$Cmax[NCAtrim$DoseWt==0.3], 0.25)
quantile(NCAtrim$Cmax[NCAtrim$DoseWt==0.3], 0.75)

NCA.by.Dose.25_75tile<-ddply(NCAtrim, .(DoseWt), summarize,
   AUC_inf = round(median(AUCINF_obs),2), AUCinf25 = round(quantile(AUCINF_obs, 0.25),2), AUCinf75 = round(quantile(AUCINF_obs, 0.75),2),
     Cmax = round(median(Cmax), 2), Cmax_25 = round(quantile(Cmax, 0.25), 2), Cmax_75 = round(quantile(Cmax, 0.75), 2))    
NCA.by.Dose.25_75tile

Can anyone explain why I am not able to generate the 25th and 75th percentiles with ddply summarize for Cmax here, but the 25th, 50th, and 75th percentiles AUCINF_obs work? (I also tried quantile(Cmax, probs =0.25).

NCAtrim <- structure(list(Subject = c(103L, 103L, 103L, 105L, 105L, 107L, 
107L, 107L, 109L, 111L, 111L, 111L, 113L, 113L, 113L, 114L, 114L, 
114L, 117L, 117L, 117L, 124L, 124L, 124L, 126L, 126L, 126L, 127L, 
127L, 127L, 130L, 130L, 130L), DoseWt = c(0.3, 0.45, 0.6, 0.3, 
0.45, 0.3, 0.45, 0.6, 0.3, 0.3, 0.45, 0.6, 0.3, 0.45, 0.6, 0.3, 
0.45, 0.6, 0.3, 0.45, 0.6, 0.3, 0.45, 0.6, 0.3, 0.45, 0.6, 0.3, 
0.45, 0.6, 0.3, 0.45, 0.6), AUCINF_obs = c(75.57957417, 104.7376298, 
193.1863023, 150.8553768, 231.6657641, 97.55371159, 153.2804929, 
213.179011, 90.84944244, 54.65739998, 93.3108462, 78.07527241, 
61.31713576, 89.91275385, 126.6723822, 94.02414615, 166.3379068, 
227.4162735, 98.84793101, 172.1750658, 149.2339892, 79.45304645, 
142.0389319, 171.7761067, 44.36951602, 86.64275743, 107.4389943, 
56.42917332, 112.4691754, 144.4193233, 87.22135293, 137.3190569, 
151.0853702), Cmax = c(17.2, 22.7, 54.1, 16, 43.3, 19.8, 35.1, 
48, 30.6, 12.4, 18.2, 16.4, 16, 27.8, 31.3, 14.5, 24.6, 37.6, 
15.3, 26, 27.7, 16.5, 24.3, 19.7, 11, 15.8, 43.2, 14.6, 29.8, 
35.6, 19, 38.1, 39)), class = "data.frame", row.names = c(NA, 
-33L))

score 0 · Accepted Answer · answered Mar 16 '21 at 02:42

That is because the value of Cmax is changed when you run Cmax = round(median(Cmax), 2). The next command that you run (Cmax_25 = round(quantile(Cmax, 0.25), 2)) gets this changed Cmax value and not the original one.

You can keep that line at the last so that it will not change the Cmax value. Also plyr is retired so you may want to switch to dplyr.

library(dplyr)

NCAtrim %>%
  group_by(DoseWt) %>%
  summarise(AUC_inf = round(median(AUCINF_obs),2), 
            AUCinf25 = round(quantile(AUCINF_obs, 0.25),2), 
            AUCinf75 = round(quantile(AUCINF_obs, 0.75),2),
            Cmax_25 = round(quantile(Cmax, 0.25), 2), 
            Cmax_75 = round(quantile(Cmax, 0.75), 2), 
            Cmax = round(median(Cmax), 2)) -> NCA.by.Dose.25_75tile

NCA.by.Dose.25_75tile

Thanks for a concise (and kind) correction to my stupid mistake, and for the pointer to dplyr. — PHutson, Mar 17 '21 at 15:27

Inconsistent ddply multiple quantiles by group

1 Answers1