0

I have a dataset that is as follows (10,000+ Rows):

P_ID SNUM RNUM X
ID_233 10 2 40.31
ID_233 10 3 23.21
ID_234 12 5 11.00
ID_234 12 6 0.31
ID_234 13 1 0.00
ID_235 10 2 66.23

From this dataset, I want to fit each distinct P_ID to a Gamma distribution (ignoring the testing of how well the sampled data fits the distribution)

Using the fitdistrplus package, I can achieve this by extracting the X for an individual P_ID into a vector and then run it through fw <- fitdist(data,"gamma") and then extract the shape and rate descriptive variables out, but this is all very manual.

I would like to find a method using tidyverse to go from the data frame above to:

P_ID Distrib G_Shape G_Rate
ID_233 Gamma 1.21557116 0.09206639
ID_234 Gamma 3.23234542 0.34566432
ID_235 Gamma 2.34555553 0.92344521

How would i achieve this with Tidyverse and Pipes and not doing a succession of for loops?

大陸北方網友
  • 3,696
  • 3
  • 12
  • 37
Ozmoges
  • 13
  • 2

1 Answers1

0

You could apply fitdist for every individual using group_by and extract shape and rate values out of each model.

library(dplyr)
library(purrr)
library(fitdistrplus)

data %>%
  group_by(P_ID) %>%
  summarise(model = list(fitdist(X, "gamma"))) %>%
  mutate(G_Shape = map_dbl(model, pluck, 'estimate', 'shape'),
         G_rate =  map_dbl(model, pluck, 'estimate', 'rate')) -> result

result
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213